Methods for Statistical and Visual Comparison of Imputation Methods for Missing Data in Software Cost Estimation

Software Cost Estimation (SCE) is a critical phase in software development projects. However, due to the growing complexity of the software itself, a common problem in building software cost models is that the available datasets contain lots of missing categorical data. The purpose of this chapter is to show how a framework of statistical, computational, and visualization techniques can be used to evaluate and compare the effect of missing data techniques on the accuracy of cost estimation models. Hence, the authors use five missing data techniques: Multinomial Logistic Regression, Listwise Deletion, Mean Imputation, Expectation Maximization, and Regression Imputation. The evaluation and the comparisons are conducted using Regression Error Characteristic curves, which provide visual comparison of different prediction models, and Regression Error Operating Curves, which examine predictive power of models with respect to under- or over-estimation.

Download Full-text

A Framework of Statistical and Visualization Techniques for Missing Data Analysis in Software Cost Estimation

Handbook of Research on Innovations in Systems and Software Engineering - Advances in Systems Analysis, Software Engineering, and High Performance Computing ◽

10.4018/978-1-4666-6359-6.ch003 ◽

2015 ◽

pp. 71-97

Author(s):

Lefteris Angelis ◽

Nikolaos Mittas ◽

Panagiota Chatzipetrou

Keyword(s):

Missing Data ◽

Cost Estimation ◽

Prediction Models ◽

Software Cost Estimation ◽

Software Cost ◽

Regression Imputation ◽

Mean Imputation ◽

Regression Error ◽

Visualization Techniques ◽

Missing Data Techniques

Software Cost Estimation (SCE) is a critical phase in software development projects. However, due to the growing complexity of the software itself, a common problem in building software cost models is that the available datasets contain lots of missing categorical data. The purpose of this chapter is to show how a framework of statistical, computational, and visualization techniques can be used to evaluate and compare the effect of missing data techniques on the accuracy of cost estimation models. Hence, the authors use five missing data techniques: Multinomial Logistic Regression, Listwise Deletion, Mean Imputation, Expectation Maximization, and Regression Imputation. The evaluation and the comparisons are conducted using Regression Error Characteristic curves, which provide visual comparison of different prediction models, and Regression Error Operating Curves, which examine predictive power of models with respect to under- or over-estimation.

Download Full-text

A Framework of Statistical and Visualization Techniques for Missing Data Analysis in Software Cost Estimation

Computer Systems and Software Engineering ◽

10.4018/978-1-5225-3923-0.ch017 ◽

2017 ◽

pp. 433-460

Author(s):

Lefteris Angelis ◽

Nikolaos Mittas ◽

Panagiota Chatzipetrou

Keyword(s):

Missing Data ◽

Cost Estimation ◽

Prediction Models ◽

Software Cost Estimation ◽

Software Cost ◽

Regression Imputation ◽

Mean Imputation ◽

Regression Error ◽

Visualization Techniques ◽

Missing Data Techniques

Software Cost Estimation (SCE) is a critical phase in software development projects. However, due to the growing complexity of the software itself, a common problem in building software cost models is that the available datasets contain lots of missing categorical data. The purpose of this chapter is to show how a framework of statistical, computational, and visualization techniques can be used to evaluate and compare the effect of missing data techniques on the accuracy of cost estimation models. Hence, the authors use five missing data techniques: Multinomial Logistic Regression, Listwise Deletion, Mean Imputation, Expectation Maximization, and Regression Imputation. The evaluation and the comparisons are conducted using Regression Error Characteristic curves, which provide visual comparison of different prediction models, and Regression Error Operating Curves, which examine predictive power of models with respect to under- or over-estimation.

Download Full-text

Software Cost Estimation

International Journal of Service Science Management Engineering and Technology ◽

10.4018/ijssmet.2019070102 ◽

2019 ◽

Vol 10 (3) ◽

pp. 14-31

Author(s):

Panagiota Chatzipetrou

Keyword(s):

Missing Data ◽

Cost Estimation ◽

Multinomial Logistic Regression ◽

Cost Model ◽

Statistical Tests ◽

Error Characteristic ◽

Cost Models ◽

Least Squares Regression ◽

Software Cost Estimation ◽

Software Cost

Software cost estimation (SCE) is a critical phase in software development projects. A common problem in building software cost models is that the available datasets contain projects with lots of missing categorical data. There are several techniques for handling missing data in the context of SCE. The purpose of this article is to show a state-of-art statistical and visualization approach of evaluating and comparing the effect of missing data on the accuracy of cost estimation models. Five missing data techniques were used: multinomial logistic regression, listwise deletion, mean imputation, expectation maximization and regression imputation; and compared with respect to their effect on the prediction accuracy of a least squares regression cost model. The evaluation is based on various expressions of the prediction error. The comparisons are conducted using statistical tests, resampling techniques and visualization tools like the regression error characteristic curves.

Download Full-text

Categorical missing data imputation for software cost estimation by multinomial logistic regression

Journal of Systems and Software ◽

10.1016/j.jss.2005.02.026 ◽

2006 ◽

Vol 79 (3) ◽

pp. 404-414 ◽

Cited By ~ 39

Author(s):

Panagiotis Sentas ◽

Lefteris Angelis

Keyword(s):

Logistic Regression ◽

Missing Data ◽

Cost Estimation ◽

Multinomial Logistic Regression ◽

Data Imputation ◽

Software Cost Estimation ◽

Software Cost ◽

Missing Data Imputation

Download Full-text

Cost Estimation of the Models Using Harmony Search

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.32.15718 ◽

2018 ◽

Vol 7 (2.32) ◽

pp. 377

Author(s):

Dr T. Vijaya Saradhi ◽

A Lakshmi Pravallika ◽

M Manoj

Keyword(s):

Cost Estimation ◽

Heuristic Algorithms ◽

Harmony Search ◽

High Accuracy ◽

The Other ◽

Software Project ◽

Software Cost Estimation ◽

Software Cost ◽

The One ◽

The Cost

To estimate the cost of model accurately on which the software is functioning is one of the most important things in the software project. But due to the varying nature of the software, and complexity, accurate cost estimation of software has become difficult. Ascertaining the cost of the software at the beginning stage is helpful for designing the other activities of software development. Former estimation of the needed exertion to Creating programming need benefited the advancement acknowledging those provision about Meta heuristic streamlining calculations. These calculations need aid possibility and might a chance to be connected Likewise functional devices for programming expense estimation. In the recent times Meta- heuristic algorithms with high accuracy have brought a great improvement in the field of the software engineering. In this paper we have discussed about the one of the algorithm which help in software cost estimation which is Harmony Search.

Download Full-text

Improved Software Cost Estimation Method Based on COCOMO Model and Linear Regression

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.989-994.1497 ◽

2014 ◽

Vol 989-994 ◽

pp. 1497-1500 ◽

Cited By ~ 1

Author(s):

Hai Yang

Keyword(s):

Linear Regression ◽

Software Development ◽

Cost Estimation ◽

Estimation Method ◽

Software Project ◽

Software Cost Estimation ◽

Software Cost ◽

Replication Experiment ◽

Cocomo Model ◽

Project Data

Software cost estimation is the key step to software development management. In order to make COCOMO model applicable to Chinese enterprises, an improved software cost estimation method based on COCOMO model and linear regression was proposed in this paper. Then the replication experiment was taken by using the historical software project data of given enterprises, and then compared experience estimation with the new improved method proposed in this paper about the forecasting accuracy. The results verified that the improved cost estimation method has more practical value to software development.

Download Full-text

Optimization of software cost estimation model based on biogeography-based optimization algorithm

Intelligent Decision Technologies ◽

10.3233/idt-200103 ◽

2020 ◽

pp. 1-8

Author(s):

Aman Ullah ◽

Bin Wang ◽

Jinfang Sheng ◽

Jun Long ◽

Muhammad Asim ◽

...

Keyword(s):

Cost Estimation ◽

Cost Model ◽

Flower Pollination Algorithm ◽

Manhattan Distance ◽

Software Project ◽

Estimation Model ◽

Software Cost ◽

Standard Data ◽

Cost Estimation Model ◽

Cocomo Ii

Estimating of software cost (ESC) is considered a crucial task in the software management life cycle as well as time and quality. Prior to the development of a software project, precise estimations are required in the form of person month and time. In the last few decades, various parametric and non-algorithmic or non-parametric regarding the estimating of software costs have been developed. Among them, the constrictive cost model (COCOMO-II) is a commonly used method for estimating software cost. To further improve the accuracy of this model, researchers and practitioners have applied numerous computational intelligence algorithms to optimize their parameters. However, accuracy is still a big problem in this model to be addressed. In this paper, we proposed a biogeography-based optimization (BBO) method to optimize the current coefficients of COCOMO-II for better estimating of software project cost or effort. The experiments are conducted on two standard data sets: NASA-93 and Turkish Industry software projects. The performance of the proposed algorithm called BBO-COCOMO-II is evaluated by using performance indicators including the Manhattan distance (MD) and the mean magnitude of relative error (MMRE). Simulation results reveal that the proposed algorithm obtained high accuracy and significant error minimization compared to original COCOMO-II, particle swarm optimization, genetic algorithm, flower pollination algorithm, and other various baseline cost estimation models.

Download Full-text

Research on Improved Staged Software Cost Estimation Method Based on COCOMO Model

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.989-994.1501 ◽

2014 ◽

Vol 989-994 ◽

pp. 1501-1504

Author(s):

Hai Yang

Keyword(s):

Software Development ◽

Cost Estimation ◽

Software Maintenance ◽

Estimation Method ◽

Estimation Methods ◽

Software Project ◽

Software Cost Estimation ◽

Software Cost ◽

Cocomo Model ◽

The Cost

The accuracy of software cost estimation is essential for software development management. By introducing and analyzing the estimation methods of software cost systematically, the paper discussed the necessary of considering the software maintenance stage and estimating the software cost by separating the procedure of software development into several small stages. Then a staged software cost estimation method based on COCOMO model was proposed. The use of the new software cost estimation method proposed by this paper not only contributes to the cost control of software project, but also effectively avoids the bias problem due to using by single cost estimation method so that the accuracy of cost estimation could be improved.

Download Full-text