Empirical Evaluation of Mimic Software Project Data Sets for Software Effort Estimation

Software Project Estimation is a challenging and important activity in developing software projects. Software Project Estimation includes Software Time Estimation, Software Resource Estimation, Software Cost Estimation, and Software Effort Estimation. Software Effort Estimation focuses on predicting the number of hours of work (effort in terms of person-hours or person-months) required to develop or maintain a software application. It is difficult to forecast effort during the initial stages of software development. Various machine learning and deep learning models have been developed to predict the effort estimation. In this paper, single model approaches and ensemble approaches were considered for estimation. Ensemble techniques are the combination of several single models. Ensemble techniques considered for estimation were averaging, weighted averaging, bagging, boosting, and stacking. Various stacking models considered and evaluated were stacking using a generalized linear model, stacking using decision tree, stacking using a support vector machine, and stacking using random forest. Datasets considered for estimation were Albrecht, China, Desharnais, Kemerer, Kitchenham, Maxwell, and Cocomo81. Evaluation measures used were mean absolute error, root mean squared error, and R-squared. The results proved that the proposed stacking using random forest provides the best results compared with single model approaches using the machine or deep learning algorithms and other ensemble techniques.

Download Full-text

Using Genetic Programming to Improve Software Effort Estimation Based on General Data Sets

Genetic and Evolutionary Computation — GECCO 2003 - Lecture Notes in Computer Science ◽

10.1007/3-540-45110-2_151 ◽

2003 ◽

pp. 2477-2487 ◽

Cited By ~ 23

Author(s):

Martin Lefley ◽

Martin J. Shepperd

Keyword(s):

Genetic Programming ◽

Data Sets ◽

Effort Estimation ◽

Software Effort Estimation ◽

General Data

Download Full-text

Can k-NN imputation improve the performance of C4.5 with small software project data sets? A comparative evaluation

Journal of Systems and Software ◽

10.1016/j.jss.2008.05.008 ◽

2008 ◽

Vol 81 (12) ◽

pp. 2361-2370 ◽

Cited By ~ 35

Author(s):

Qinbao Song ◽

Martin Shepperd ◽

Xiangru Chen ◽

Jun Liu

Keyword(s):

Comparative Evaluation ◽

Software Project ◽

Data Sets ◽

Project Data

Download Full-text

A new imputation method for small software project data sets

Journal of Systems and Software ◽

10.1016/j.jss.2006.05.003 ◽

2007 ◽

Vol 80 (1) ◽

pp. 51-62 ◽

Cited By ~ 45

Author(s):

Qinbao Song ◽

Martin Shepperd

Keyword(s):

Imputation Method ◽

Software Project ◽

Data Sets ◽

Project Data

Download Full-text

An Accurate FFPA-PSR Estimator Algorithm and Tool for Software Effort Estimation

The Scientific World JOURNAL ◽

10.1155/2015/919825 ◽

2015 ◽

Vol 2015 ◽

pp. 1-5

Author(s):

Senthil Kumar Murugesan ◽

Chidhambara Rajan Balasubramanian

Keyword(s):

Performance Metrics ◽

Development Stage ◽

Effort Estimation ◽

Software Effort Estimation ◽

Reliability Factor ◽

Function Point Analysis ◽

Software Companies ◽

Point Analysis ◽

Function Point ◽

Project Data

Software companies are now keen to provide secure software with respect to accuracy and reliability of their products especially related to the software effort estimation. Therefore, there is a need to develop a hybrid tool which provides all the necessary features. This paper attempts to propose a hybrid estimator algorithm and model which incorporates quality metrics, reliability factor, and the security factor with a fuzzy-based function point analysis. Initially, this method utilizes a fuzzy-based estimate to control the uncertainty in the software size with the help of a triangular fuzzy set at the early development stage. Secondly, the function point analysis is extended by the security and reliability factors in the calculation. Finally, the performance metrics are added with the effort estimation for accuracy. The experimentation is done with different project data sets on the hybrid tool, and the results are compared with the existing models. It shows that the proposed method not only improves the accuracy but also increases the reliability, as well as the security, of the product.

Download Full-text

Regression Analysis Based Software Effort Estimation Method

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194016500261 ◽

2016 ◽

Vol 26 (05) ◽

pp. 807-826 ◽

Cited By ~ 4

Author(s):

Fatih Yücalar ◽

Deniz Kilinc ◽

Emin Borandag ◽

Akin Ozcift

Keyword(s):

Regression Analysis ◽

Linear Regression ◽

Linear Regression Analysis ◽

Estimation Method ◽

Estimation Methods ◽

Estimation Accuracy ◽

Development Effort ◽

Software Project ◽

Effort Estimation ◽

Software Effort Estimation

Estimating the development effort of a software project in the early stages of the software life cycle is a significant task. Accurate estimates help project managers to overcome the problems regarding budget and time overruns. This paper proposes a new multiple linear regression analysis based effort estimation method, which has brought a different perspective to the software effort estimation methods and increased the success of software effort estimation processes. The proposed method is compared with standard Use Case Point (UCP) method, which is a well-known method in this area, and simple linear regression based effort estimation method developed by Nassif et al. In order to evaluate and compare the proposed method, the data of 10 software projects developed by four well-established software companies in Turkey were collected and datasets were created. When effort estimations obtained from datasets and actual efforts spent to complete the projects are compared with each other, it has been observed that the proposed method has higher effort estimation accuracy compared to the other methods.

Download Full-text

Research and Appalication of Software Defect Predictionn based on BP-Migration learning

MATEC Web of Conferences ◽

10.1051/matecconf/201823203017 ◽

2018 ◽

Vol 232 ◽

pp. 03017

Author(s):

Jie Zhang ◽

Gang Wang ◽

Haobo Jiang ◽

Fangzheng Zhao ◽

Guilin Tian

Keyword(s):

Prediction Model ◽

Historical Data ◽

Defect Prediction ◽

Software Project ◽

Data Sets ◽

Software Defect Prediction ◽

Software Module ◽

Data Set ◽

Software Defect ◽

Project Data

Software Defect Prediction has been an important part of Software engineering research since the 1970s. This technique is used to calculate and analyze the measurement and defect information of the historical software module to complete the defect prediction of the new software module. Currently, most software defect prediction model is established on the basis of the same software project data set. The training date sets used to construct the model and the test data sets used to validate the model are from the same software projects. But in practice, for those has less historical data of a software project or new projects, the defect of traditional prediction method shows lower forecast performance. For the traditional method, when the historical data is insufficient, the software defect prediction model cannot be fully studied. It is difficult to achieve high prediction accuracy. In the process of cross-project prediction, the problem that we will faced is data distribution differences. For the above problems, this paper presents a software defect prediction model based on migration learning and traditional software defect prediction model. This model uses the existing project data sets to predict software defects across projects. The main work of this article includes: 1) Data preprocessing. This section includes data feature correlation analysis, noise reduction and so on, which effectively avoids the interference of over-fitting problem and noise data on prediction results. 2) Migrate learning. This section analyzes two different but related project data sets and reduces the impact of data distribution differences. 3) Artificial neural networks. According to class imbalance problems of the data set, using artificial neural network and dynamic selection training samples reduce the influence of prediction results because of the positive and negative samples data. The data set of the Relink project and AEEEM is studied to evaluate the performance of the f-measure and the ROC curve and AUC calculation. Experiments show that the model has high predictive performance.

Download Full-text

PERFORMANCE EVALUATION OF IMPUTATION METHODS FOR INCOMPLETE DATASETS

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194007003173 ◽

2007 ◽

Vol 17 (01) ◽

pp. 127-152 ◽

Cited By ~ 10

Author(s):

SUMANTH YENDURI ◽

S. S. IYENGAR

Keyword(s):

Performance Evaluation ◽

Stepwise Regression ◽

Prediction Models ◽

Software Project ◽

Data Sets ◽

Imputation Methods ◽

Listwise Deletion ◽

Project Data ◽

Incomplete Datasets ◽

The Impact

In this study, we compare the performance of four different imputation strategies ranging from the commonly used Listwise Deletion to model based approaches such as the Maximum Likelihood on enhancing completeness in incomplete software project data sets. We evaluate the impact of each of these methods by implementing them on six different real-time software project data sets which are classified into different categories based on their inherent properties. The reliability of the constructed data sets using these techniques are further tested by building prediction models using stepwise regression. The experimental results are noted and the findings are finally discussed.

Download Full-text

SOFTWARE DEVELOPMENT EFFORT ESTIMATION USING CLASSICAL AND FUZZY ANALOGY: A CROSS-VALIDATION COMPARATIVE STUDY

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026814500138 ◽

2014 ◽

Vol 13 (03) ◽

pp. 1450013 ◽

Cited By ~ 20

Author(s):

FATIMA AZZAHRA AMAZAL ◽

ALI IDRI ◽

ALAIN ABRAN

Keyword(s):

Software Development ◽

Case Based Reasoning ◽

Development Effort ◽

Software Project ◽

Effort Estimation ◽

Software Effort Estimation ◽

Software Projects ◽

Software Development Effort ◽

Research Questions ◽

Estimation Models

Software effort estimation is one of the most important tasks in software project management. Of several techniques suggested for estimating software development effort, the analogy-based reasoning, or Case-Based Reasoning (CBR), approaches stand out as promising techniques. In this paper, the benefits of using linguistic rather than numerical values in the analogy process for software effort estimation are investigated. The performance, in terms of accuracy and tolerance of imprecision, of two analogy-based software effort estimation models (Classical Analogy and Fuzzy Analogy, which use numerical and linguistic values respectively to describe software projects) is compared. Three research questions related to the performance of these two models are discussed and answered. This study uses the International Software Benchmarking Standards Group (ISBSG) dataset and confirms the usefulness of using linguistic instead of numerical values in analogy-based software effort estimation models.

Download Full-text

Filtering of Inconsistent Software Project Data for Analogy-Based Effort Estimation

2010 IEEE 34th Annual Computer Software and Applications Conference ◽

10.1109/compsac.2010.56 ◽

2010 ◽

Cited By ~ 7

Author(s):

Tuan Khanh Le-Do ◽

Kyung-A Yoon ◽

Yeong-Seok Seo ◽

Doo-Hwan Bae

Keyword(s):

Software Project ◽

Effort Estimation ◽

Project Data

Download Full-text