Abstract
Static code analysis (SCA) tools are vital for software development to reduce the cost and time required for manual code reviews. It was established in our previous study that the best tools reported in the software engineering community return high false positive and false negative rates when used for code analysis. Several approaches have been taken in existing studies to identify false positive alarms generated by SCA tools, in helping to enhance the utility of these tools. However, existing studies reveal several limitations, chiefly among which is the tendency to focus on a limited number of alarm types or failing to evaluate a large breadth of machine-learning (ML) techniques comprehensively. This study has addressed this opportunity. We identified the best Code Representation Learning (CRL) techniques and ML algorithms for identifying false positives, where results show that performance outcomes vary based on the characteristics of the training datasets and the goal of the model building. Given the evidence established in this study, we recommend further investigations targeting specific configurations of advanced CRLs and ML Algorithms, including simple ML models.