Taking a Closer Look at Warnings Generated by PMD and SonarQube, their Rules and Compliance to Established Coding Standards

Lakmal Deshapriya; Sherlock A Licorish; Brendon J Woodford

doi:10.48550/arxiv.2603.00821

Back

Taking a Closer Look at Warnings Generated by PMD and SonarQube, their Rules and Compliance to Established Coding Standards

Preprint

Open access

Taking a Closer Look at Warnings Generated by PMD and SonarQube, their Rules and Compliance to Established Coding Standards

Lakmal Deshapriya, Sherlock A Licorish and Brendon J Woodford

ArXiv.org

Cornell University

28/02/2026

DOI: https://doi.org/10.48550/arxiv.2603.00821

Handle:

https://hdl.handle.net/10523/49895

Abstract

Computer Science - Software Engineering

Code Violation Dataset

PMD Rules

SonarQube Rules

False Positives

Vulnerability Detection

Static Code Analyser

Software Code Quality

Context: Static code analysis (SCA) tools play a vital role in software development, reducing the cost and time required for code reviews. However, high false-positive and false-negative rates are reported for the best tools in the community. Accordingly, studies often aim to develop datasets for learning SCA warning patterns to reduce false results. These datasets are meant to possess high-quality and high-volume in covering the full range of faults/rules that typically result in false warnings and be compliant with established coding standards. However, existing studies have not utilised such datasets or identified the breadth of rules that are prone to false positives and their compliance to coding standards. Objectives: We analysed code from Stack Overflow and Apache Tomcat to capture variations in code length and style in detecting false-positive warnings from best-performing tools PMD and SonarQube, addressing this gap. Method: In deriving false-positive warnings, outcomes from the tools were labelled using established coding standards. Deeper analyses were then conducted to identify the rules that are prone to false-positives, reasons for these, and agreement/gaps between SCA rules and established standards. Results: Among our main outcomes, we observe that only a few SCA rules generate false positives, ranging from 4.64% to 18.45% across four datasets. Additionally, eliminating rules that contradict established standards significantly reduce the false-positive rate. Additionally, our findings reveal discrepancies between tools and established standards. Conclusion: Given the evidence established in this study, we recommend further investigations into gaps between tools and established standards, including the use of machine learning approaches to annotate larger datasets.

Files and links (2)

pdf

2603.00821v11.50 MBDownload View

Preprint (Author's original)v1CC BY-NC-ND V4.0, Open Access

url

https://doi.org/10.48550/arXiv.2603.00821View

Preprint (Author's original)CC BY-NC-ND V4.0, Open

Metrics

1 Record Views

Details

Record Identifier: 9926849499201891
Title: Taking a Closer Look at Warnings Generated by PMD and SonarQube, their Rules and Compliance to Established Coding Standards
Creators: Lakmal Deshapriya
Sherlock A Licorish
Brendon J Woodford
Publication Details: ArXiv.org
Academic Unit: School of Computing
Publisher: Cornell University
Copyright: Copyright © 2026 The Author(s). This preprint was first posted on arXiv (arXiv.org). This version in OUR Archive is an open access original manuscript before peer review distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License (http://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial use, distribution and reproduction in any medium, provided the original work is properly attributed to the creator(s) and the source, is not altered, transformed, or built upon in any way, and a link to the Creative Commons license is provided.
Language: English
Resource Type ; Subtype: Preprint

Taking a Closer Look at Warnings Generated by PMD and SonarQube, their Rules and Compliance to Established Coding Standards

Abstract

Files and links (2)

Related content

Metrics

Details