Methods for Statistical and Visual Comparison of Imputation Methods for Missing Data in Software Cost Estimation

Author(s):  
Lefteris Angelis ◽  
Panagiotis Sentas ◽  
Nikolaos Mittas ◽  
Panagiota Chatzipetrou

Software Cost Estimation is a critical phase in the development of a software project, and over the years has become an emerging research area. A common problem in building software cost models is that the available datasets contain projects with lots of missing categorical data. The purpose of this chapter is to show how a combination of modern statistical and computational techniques can be used to compare the effect of missing data techniques on the accuracy of cost estimation. Specifically, a recently proposed missing data technique, the multinomial logistic regression, is evaluated and compared with four older methods: listwise deletion, mean imputation, expectation maximization and regression imputation with respect to their effect on the prediction accuracy of a least squares regression cost model. The evaluation is based on various expressions of the prediction error and the comparisons are conducted using statistical tests, resampling techniques and a visualization tool, the regression error characteristic curves.

2018 ◽  
pp. 345-372
Author(s):  
Lefteris Angelis ◽  
Nikolaos Mittas ◽  
Panagiota Chatzipetrou

Software Cost Estimation (SCE) is a critical phase in software development projects. However, due to the growing complexity of the software itself, a common problem in building software cost models is that the available datasets contain lots of missing categorical data. The purpose of this chapter is to show how a framework of statistical, computational, and visualization techniques can be used to evaluate and compare the effect of missing data techniques on the accuracy of cost estimation models. Hence, the authors use five missing data techniques: Multinomial Logistic Regression, Listwise Deletion, Mean Imputation, Expectation Maximization, and Regression Imputation. The evaluation and the comparisons are conducted using Regression Error Characteristic curves, which provide visual comparison of different prediction models, and Regression Error Operating Curves, which examine predictive power of models with respect to under- or over-estimation.


Author(s):  
Lefteris Angelis ◽  
Nikolaos Mittas ◽  
Panagiota Chatzipetrou

Software Cost Estimation (SCE) is a critical phase in software development projects. However, due to the growing complexity of the software itself, a common problem in building software cost models is that the available datasets contain lots of missing categorical data. The purpose of this chapter is to show how a framework of statistical, computational, and visualization techniques can be used to evaluate and compare the effect of missing data techniques on the accuracy of cost estimation models. Hence, the authors use five missing data techniques: Multinomial Logistic Regression, Listwise Deletion, Mean Imputation, Expectation Maximization, and Regression Imputation. The evaluation and the comparisons are conducted using Regression Error Characteristic curves, which provide visual comparison of different prediction models, and Regression Error Operating Curves, which examine predictive power of models with respect to under- or over-estimation.


Author(s):  
Lefteris Angelis ◽  
Nikolaos Mittas ◽  
Panagiota Chatzipetrou

Software Cost Estimation (SCE) is a critical phase in software development projects. However, due to the growing complexity of the software itself, a common problem in building software cost models is that the available datasets contain lots of missing categorical data. The purpose of this chapter is to show how a framework of statistical, computational, and visualization techniques can be used to evaluate and compare the effect of missing data techniques on the accuracy of cost estimation models. Hence, the authors use five missing data techniques: Multinomial Logistic Regression, Listwise Deletion, Mean Imputation, Expectation Maximization, and Regression Imputation. The evaluation and the comparisons are conducted using Regression Error Characteristic curves, which provide visual comparison of different prediction models, and Regression Error Operating Curves, which examine predictive power of models with respect to under- or over-estimation.


Author(s):  
Panagiota Chatzipetrou

Software cost estimation (SCE) is a critical phase in software development projects. A common problem in building software cost models is that the available datasets contain projects with lots of missing categorical data. There are several techniques for handling missing data in the context of SCE. The purpose of this article is to show a state-of-art statistical and visualization approach of evaluating and comparing the effect of missing data on the accuracy of cost estimation models. Five missing data techniques were used: multinomial logistic regression, listwise deletion, mean imputation, expectation maximization and regression imputation; and compared with respect to their effect on the prediction accuracy of a least squares regression cost model. The evaluation is based on various expressions of the prediction error. The comparisons are conducted using statistical tests, resampling techniques and visualization tools like the regression error characteristic curves.


2018 ◽  
Vol 7 (2.32) ◽  
pp. 377
Author(s):  
Dr T. Vijaya Saradhi ◽  
A Lakshmi Pravallika ◽  
M Manoj

To estimate the cost of model accurately on which the software is functioning is one of the most important things in the software project. But due to the varying nature of the software, and complexity, accurate cost estimation of software has become difficult. Ascertaining the cost of the software at the beginning stage is helpful for designing the other activities of software development. Former estimation of the needed exertion to Creating programming need benefited the advancement acknowledging those provision about Meta heuristic streamlining calculations. These calculations need aid possibility and might a chance to be connected Likewise functional devices for programming expense estimation. In the recent times Meta- heuristic algorithms with high accuracy have brought a great improvement in the field of the software engineering. In this paper we have discussed about the one of the algorithm which help in software cost estimation which is Harmony Search.  


2014 ◽  
Vol 989-994 ◽  
pp. 1497-1500 ◽  
Author(s):  
Hai Yang

Software cost estimation is the key step to software development management. In order to make COCOMO model applicable to Chinese enterprises, an improved software cost estimation method based on COCOMO model and linear regression was proposed in this paper. Then the replication experiment was taken by using the historical software project data of given enterprises, and then compared experience estimation with the new improved method proposed in this paper about the forecasting accuracy. The results verified that the improved cost estimation method has more practical value to software development.


2020 ◽  
pp. 1-8
Author(s):  
Aman Ullah ◽  
Bin Wang ◽  
Jinfang Sheng ◽  
Jun Long ◽  
Muhammad Asim ◽  
...  

Estimating of software cost (ESC) is considered a crucial task in the software management life cycle as well as time and quality. Prior to the development of a software project, precise estimations are required in the form of person month and time. In the last few decades, various parametric and non-algorithmic or non-parametric regarding the estimating of software costs have been developed. Among them, the constrictive cost model (COCOMO-II) is a commonly used method for estimating software cost. To further improve the accuracy of this model, researchers and practitioners have applied numerous computational intelligence algorithms to optimize their parameters. However, accuracy is still a big problem in this model to be addressed. In this paper, we proposed a biogeography-based optimization (BBO) method to optimize the current coefficients of COCOMO-II for better estimating of software project cost or effort. The experiments are conducted on two standard data sets: NASA-93 and Turkish Industry software projects. The performance of the proposed algorithm called BBO-COCOMO-II is evaluated by using performance indicators including the Manhattan distance (MD) and the mean magnitude of relative error (MMRE). Simulation results reveal that the proposed algorithm obtained high accuracy and significant error minimization compared to original COCOMO-II, particle swarm optimization, genetic algorithm, flower pollination algorithm, and other various baseline cost estimation models.


2014 ◽  
Vol 989-994 ◽  
pp. 1501-1504
Author(s):  
Hai Yang

The accuracy of software cost estimation is essential for software development management. By introducing and analyzing the estimation methods of software cost systematically, the paper discussed the necessary of considering the software maintenance stage and estimating the software cost by separating the procedure of software development into several small stages. Then a staged software cost estimation method based on COCOMO model was proposed. The use of the new software cost estimation method proposed by this paper not only contributes to the cost control of software project, but also effectively avoids the bias problem due to using by single cost estimation method so that the accuracy of cost estimation could be improved.


Sign in / Sign up

Export Citation Format

Share Document