LSEbA: least squares regression and estimation by analogy in a semi-parametric model for software cost estimation

2010 ◽  
Vol 15 (5) ◽  
pp. 523-555 ◽  
Author(s):  
Nikolaos Mittas ◽  
Lefteris Angelis
Author(s):  
Panagiota Chatzipetrou

Software cost estimation (SCE) is a critical phase in software development projects. A common problem in building software cost models is that the available datasets contain projects with lots of missing categorical data. There are several techniques for handling missing data in the context of SCE. The purpose of this article is to show a state-of-art statistical and visualization approach of evaluating and comparing the effect of missing data on the accuracy of cost estimation models. Five missing data techniques were used: multinomial logistic regression, listwise deletion, mean imputation, expectation maximization and regression imputation; and compared with respect to their effect on the prediction accuracy of a least squares regression cost model. The evaluation is based on various expressions of the prediction error. The comparisons are conducted using statistical tests, resampling techniques and visualization tools like the regression error characteristic curves.


Sign in / Sign up

Export Citation Format

Share Document