Bayesian statistical models for predicting software development effort
van Koten, Chikako
Constructing an accurate effort prediction model is a challenge in Software Engineering. This paper presents new Bayesian statistical models, in order to predict development effort of software systems in the International Software Benchmarking Standards Group (ISBSG) dataset. The first model is a Bayesian linear regression (BR) model and the second model is a Bayesian multivariate normal distribution (BMVN) model. Both models are calibrated using subsets randomly sampled from the dataset. The models’ predictive accuracy is evaluated using other subsets, which consist of only the cases unknown to the models. The predictive accuracy is measured in terms of the absolute residuals and magnitude of relative error. They are compared with the corresponding linear regression models. The results show that the Bayesian models have predictive accuracy equivalent to the linear regression models, in general. However, the advantage of the Bayesian statistical models is that they do not require a calibration subset as large as the regression counterpart. In the case of the ISBSG dataset it is confirmed that the predictive accuracy of the Bayesian statistical models, in particular the BMVN model is significantly better than the linear regression model, when the calibration subset consists of only five or smaller number of software systems. This finding justifies the use of Bayesian statistical models in software effort prediction, in particular, when the system of interest has only a very small amount of historical data.
Publisher: University of Otago
Series number: 2005/08
Keywords: effort prediction; Bayesian statistics; regression; software metrics
Research Type: Discussion Paper