Abstract
Software developers publish mobile apps at a continuous and growing pace due to the widespread adoption of mobile devices. To stay competitive in today's market, vendors are expected to release high-quality apps. Previous studies have shown that users frequently uninstall apps when they encounter quality issues in operation, such as crashes or unresponsiveness (reliability issues), and switch to alternative solutions. Therefore, early identification and mitigation of these issues before they reach the operational phase are crucial for producing high-quality apps and surviving in the app market.
An analysis of existing secondary reviews of the relevant research literature reveals a lack of focus on reliability, a specific quality attribute of mobile apps, particularly in terms of operational reliability. This thesis addresses this gap by focusing on the operational reliability of mobile apps and ways to minimize the occurrence and impact of such issues. In seeking to enhance operational reliability, this research addresses five phases: (a) performance of a systematic mapping study classifying the existing literature on reliability, (b) prediction of app crashes during development, (c) investigation of new metrics to enhance prediction performance, (d) evaluation of predictive models, and (e) understanding practitioners' perspectives on both app crash prediction and the findings of this research. The thesis employs a combination of quantitative and qualitative approaches.
Firstly, the research begins with a systematic mapping study to comprehensively survey and categorize existing research in the field. This study assesses and classifies 87 relevant works based on a pre-tested protocol and specific criteria. This mapping provides a comprehensive categorization of the relevant body of knowledge, classifying studies based on study focus, type, method, settings, contributions, quality attributes, metrics and datasets involved. The study also reveals research gaps related to reliability, including the need for approaches to minimize operational issues such as app crashes.
Secondly, based on the findings of the mapping study, the thesis delves into predicting app crashes based on existing metrics, developing machine learning models for predicting app crashes, and comparing various machine learning techniques. The analyses also consider the concerns of data imbalance, time-dependency of data, and hyperparameter tuning. Notably, the research identifies that some metrics originally developed for traditional software are important in the mobile domain, while others are less relevant. Furthermore, the research demonstrates that existing metrics originally developed for traditional software defect prediction can be adapted to predict crashes in mobile apps, albeit with high false positives rates.
Thirdly, a manual analysis of code commits sheds light on the common types of changes made by app developers in the commits that potentially lead to crashes. These findings offer valuable insights into practitioners' coding practices, emphasizing the need for caution when implementing such changes to reduce the occurrence of crashes. Fourthly, the thesis also investigates using these change types as metrics to train prediction models and evaluate their impact on model performance. Combining these new domain-specific metrics with traditional ones, the research shows that false positives can be reduced significantly, enhancing model performance. Furthermore, a comparison of different machine learning models reveals that the random forest model outperforms others in terms of precision, recall, F-measure, AUC score, and MCC score under both 10-fold cross-validation and time-wise validation. Ablation analysis confirms the importance of each type of metrics and the contribution of new domain metrics to model performance.
The thesis concludes with an evaluation of the random forest model on unseen mobile apps to validate its generalizability and to validate our findings from practitioners’ perspective through a survey. The research demonstrates that the model can be applied across diverse domains, including finance, communication, gaming, personalization, productivity, education, social, shopping and entertainment. Practitioners' feedback aligns with the study's findings, confirming the utility of models for predicting crashes at commit time. Furthermore, practitioners also provided suggestions for enhancing the outcomes of this project for industry adoption.
Taken together, the deeper insights gained from this study contribute to both academia and industry by providing comprehensive knowledge, datasets, strategies, prediction models, and practical recommendations. Additionally, this research lays the foundation for future investigations into the prediction of app crashes by minimizing false positives, and addresses industry-specific concerns when transferring these models to industry. Also, the research highlights the need for using mobile-specific metrics to predict crashes effectively.