A comparison of software effort prediction models using small datasets
van Koten, Chikako
Constructing an accurate effort prediction model is a challenge in Software Engineering. One difficulty practitioners often experience is that they have only a very small amount of local data to construct a model. The small dataset limits predictive accuracy of the model, since the accuracy deteriorates as the size of the dataset decreases. This paper compares three different software development effort prediction models that are applicable to these small datasets. They are: (1) Bayesian statistical models, (2) multiple linear regression models and (3) case-based reasoning/analogy-based models. The predictive accuracy of these models is evaluated using two different software datasets. The results have shown that the accuracy of the Bayesian statistical models is higher than or competitive with that of the others, when calibrated using data collected from fewer than 10 systems. These suggest that the Bayesian statistical model would be a better choice in effort prediction when the practitioners have only a very small dataset, consisting of fewer than 10 systems similar to their system of interest.
Publisher: University of Otago
Keywords: multivariate statistics; modeling methodologies; management techniques; statistical methods; cost estimation; time estimation
Research Type: Other
Submitted to IEEE Transactions on Software Engineering. If published, this version will be replaced by the final version.