Software Cost Estimation

Software Cost Estimation is a critical phase in the development of a software project, and over the years has become an emerging research area. A common problem in building software cost models is that the available datasets contain projects with lots of missing categorical data. The purpose of this chapter is to show how a combination of modern statistical and computational techniques can be used to compare the effect of missing data techniques on the accuracy of cost estimation. Specifically, a recently proposed missing data technique, the multinomial logistic regression, is evaluated and compared with four older methods: listwise deletion, mean imputation, expectation maximization and regression imputation with respect to their effect on the prediction accuracy of a least squares regression cost model. The evaluation is based on various expressions of the prediction error and the comparisons are conducted using statistical tests, resampling techniques and a visualization tool, the regression error characteristic curves.

Download Full-text

Categorical missing data imputation for software cost estimation by multinomial logistic regression

Journal of Systems and Software ◽

10.1016/j.jss.2005.02.026 ◽

2006 ◽

Vol 79 (3) ◽

pp. 404-414 ◽

Cited By ~ 39

Author(s):

Panagiotis Sentas ◽

Lefteris Angelis

Keyword(s):

Logistic Regression ◽

Missing Data ◽

Cost Estimation ◽

Multinomial Logistic Regression ◽

Data Imputation ◽

Software Cost Estimation ◽

Software Cost ◽

Missing Data Imputation

Download Full-text

Visual comparison of software cost estimation models by regression error characteristic analysis

Journal of Systems and Software ◽

10.1016/j.jss.2009.10.044 ◽

2010 ◽

Vol 83 (4) ◽

pp. 621-637 ◽

Cited By ~ 17

Author(s):

Nikolaos Mittas ◽

Lefteris Angelis

Keyword(s):

Cost Estimation ◽

Error Characteristic ◽

Software Cost Estimation ◽

Characteristic Analysis ◽

Software Cost ◽

Visual Comparison ◽

Regression Error ◽

Estimation Models ◽

Cost Estimation Models

Download Full-text

A Framework of Statistical and Visualization Techniques for Missing Data Analysis in Software Cost Estimation

Intelligent Systems ◽

10.4018/978-1-5225-5643-5.ch014 ◽

2018 ◽

pp. 345-372

Author(s):

Lefteris Angelis ◽

Nikolaos Mittas ◽

Panagiota Chatzipetrou

Keyword(s):

Missing Data ◽

Cost Estimation ◽

Prediction Models ◽

Software Cost Estimation ◽

Software Cost ◽

Regression Imputation ◽

Mean Imputation ◽

Regression Error ◽

Visualization Techniques ◽

Missing Data Techniques

Software Cost Estimation (SCE) is a critical phase in software development projects. However, due to the growing complexity of the software itself, a common problem in building software cost models is that the available datasets contain lots of missing categorical data. The purpose of this chapter is to show how a framework of statistical, computational, and visualization techniques can be used to evaluate and compare the effect of missing data techniques on the accuracy of cost estimation models. Hence, the authors use five missing data techniques: Multinomial Logistic Regression, Listwise Deletion, Mean Imputation, Expectation Maximization, and Regression Imputation. The evaluation and the comparisons are conducted using Regression Error Characteristic curves, which provide visual comparison of different prediction models, and Regression Error Operating Curves, which examine predictive power of models with respect to under- or over-estimation.

Download Full-text

A Framework of Statistical and Visualization Techniques for Missing Data Analysis in Software Cost Estimation

Handbook of Research on Innovations in Systems and Software Engineering - Advances in Systems Analysis, Software Engineering, and High Performance Computing ◽

10.4018/978-1-4666-6359-6.ch003 ◽

2015 ◽

pp. 71-97

Author(s):

Lefteris Angelis ◽

Nikolaos Mittas ◽

Panagiota Chatzipetrou

Keyword(s):

Missing Data ◽

Cost Estimation ◽

Prediction Models ◽

Software Cost Estimation ◽

Software Cost ◽

Regression Imputation ◽

Mean Imputation ◽

Regression Error ◽

Visualization Techniques ◽

Missing Data Techniques

Software Cost Estimation (SCE) is a critical phase in software development projects. However, due to the growing complexity of the software itself, a common problem in building software cost models is that the available datasets contain lots of missing categorical data. The purpose of this chapter is to show how a framework of statistical, computational, and visualization techniques can be used to evaluate and compare the effect of missing data techniques on the accuracy of cost estimation models. Hence, the authors use five missing data techniques: Multinomial Logistic Regression, Listwise Deletion, Mean Imputation, Expectation Maximization, and Regression Imputation. The evaluation and the comparisons are conducted using Regression Error Characteristic curves, which provide visual comparison of different prediction models, and Regression Error Operating Curves, which examine predictive power of models with respect to under- or over-estimation.

Download Full-text

A permutation test based on regression error characteristic curves for software cost estimation models

Empirical Software Engineering ◽

10.1007/s10664-011-9177-5 ◽

2011 ◽

Vol 17 (1-2) ◽

pp. 34-61 ◽

Cited By ~ 13

Author(s):

Nikolaos Mittas ◽

Lefteris Angelis

Keyword(s):

Cost Estimation ◽

Permutation Test ◽

Error Characteristic ◽

Software Cost Estimation ◽

Characteristic Curves ◽

Software Cost ◽

Regression Error ◽

Estimation Models ◽

Cost Estimation Models

Download Full-text

LSEbA: least squares regression and estimation by analogy in a semi-parametric model for software cost estimation

Empirical Software Engineering ◽

10.1007/s10664-010-9128-6 ◽

2010 ◽

Vol 15 (5) ◽

pp. 523-555 ◽

Cited By ~ 30

Author(s):

Nikolaos Mittas ◽

Lefteris Angelis

Keyword(s):

Least Squares ◽

Cost Estimation ◽

Parametric Model ◽

Least Squares Regression ◽

Software Cost Estimation ◽

Software Cost ◽

Estimation By Analogy

Download Full-text

A Framework of Statistical and Visualization Techniques for Missing Data Analysis in Software Cost Estimation

Computer Systems and Software Engineering ◽

10.4018/978-1-5225-3923-0.ch017 ◽

2017 ◽

pp. 433-460

Author(s):

Lefteris Angelis ◽

Nikolaos Mittas ◽

Panagiota Chatzipetrou

Keyword(s):

Missing Data ◽

Cost Estimation ◽

Prediction Models ◽

Software Cost Estimation ◽

Software Cost ◽

Regression Imputation ◽

Mean Imputation ◽

Regression Error ◽

Visualization Techniques ◽

Missing Data Techniques

Software Cost Estimation (SCE) is a critical phase in software development projects. However, due to the growing complexity of the software itself, a common problem in building software cost models is that the available datasets contain lots of missing categorical data. The purpose of this chapter is to show how a framework of statistical, computational, and visualization techniques can be used to evaluate and compare the effect of missing data techniques on the accuracy of cost estimation models. Hence, the authors use five missing data techniques: Multinomial Logistic Regression, Listwise Deletion, Mean Imputation, Expectation Maximization, and Regression Imputation. The evaluation and the comparisons are conducted using Regression Error Characteristic curves, which provide visual comparison of different prediction models, and Regression Error Operating Curves, which examine predictive power of models with respect to under- or over-estimation.

Download Full-text