Abstract
For some years software engineers have been attempting to develop useful prediction systems to estimate such attributes as the effort to develop a piece of software and the likely number of defects. Typically, prediction systems are proposed and then subjected to empirical evaluation. Claims are then made with regard to the quality of the prediction systems. A wide variety of prediction quality indicators have been suggested in the literature. Unfortunately, we believe that a somewhat confusing state of affairs prevails and that this impedes research progress. This paper aims to provide the research community with a better understanding of the meaning of, and relationship between, these indicators. We critically review twelve different approaches by considering them as descriptors of the residual variable. We demonstrate that the two most popular indicators MMRE and pred(25) are in fact indicators of the spread and shape respectively of prediction accuracy where prediction accuracy is the ratio of estimate to actual (or actual to estimate). Next we highlight the impact of the choice of indicator by comparing three prediction systems derived using four different simulated datasets. We demonstrate that the results of such a comparison depend upon the choice of indicator, the analysis technique, and the nature of the dataset used to derive the predictive model. We conclude that prediction systems cannot be characterised by a single summary statistic. We suggest that we need indicators of the central tendency and spread of accuracy as well as indicators of shape and bias. For this reason, boxplots of relative error or residuals are useful alternatives to simple summary metrics.