COMPARATIVE ANALYSIS OF SOFTWARE EFFORT ESTIMATION USING DATA MINING TECHNIQUE AND FEATURE SELECTION

Abdul Latif; Lady Agustin Fitriana; Muhammad Rifqi Firdaus

doi:10.33480/jitk.v6i2.1968

COMPARATIVE ANALYSIS OF SOFTWARE EFFORT ESTIMATION USING DATA MINING TECHNIQUE AND FEATURE SELECTION

JITK (Jurnal Ilmu Pengetahuan dan Teknologi Komputer) ◽

10.33480/jitk.v6i2.1968 ◽

2021 ◽

Vol 6 (2) ◽

pp. 167-174

Author(s):

Abdul Latif ◽

Lady Agustin Fitriana ◽

Muhammad Rifqi Firdaus

Keyword(s):

Machine Learning ◽

Data Mining ◽

Feature Selection ◽

Linear Regression ◽

Software Development ◽

Data Mining Algorithm ◽

Effort Estimation ◽

Software Effort Estimation ◽

Data Mining Technique ◽

Software Business

Software development involves several interrelated factors that influence development efforts and productivity. Improving the estimation techniques available to project managers will facilitate more effective time and budget control in software development. Software Effort Estimation or software cost/effort estimation can help a software development company to overcome difficulties experienced in estimating software development efforts. This study aims to compare the Machine Learning method of Linear Regression (LR), Multilayer Perceptron (MLP), Radial Basis Function (RBF), and Decision Tree Random Forest (DTRF) to calculate estimated cost/effort software. Then these five approaches will be tested on a dataset of software development projects as many as 10 dataset projects. So that it can produce new knowledge about what machine learning and non-machine learning methods are the most accurate for estimating software business. As well as knowing between the selection between using Particle Swarm Optimization (PSO) for attributes selection and without PSO, which one can increase the accuracy for software business estimation. The data mining algorithm used to calculate the most optimal software effort estimate is the Linear Regression algorithm with an average RMSE value of 1603,024 for the 10 datasets tested. Then using the PSO feature selection can increase the accuracy or reduce the RMSE average value to 1552,999. The result indicates that, compared with the original regression linear model, the accuracy or error rate of software effort estimation has increased by 3.12% by applying PSO feature selection

Download Full-text

Software Effort Estimation

Organizational Efficiency through Intelligent Information Technologies ◽

10.4018/978-1-4666-2047-6.ch012 ◽

2012 ◽

pp. 186-198

Author(s):

Jeremiah D. Deng ◽

Martin Purvis ◽

Maryam Purvis

Keyword(s):

Machine Learning ◽

Data Mining ◽

Software Development ◽

Domain Knowledge ◽

Modeling Processes ◽

Machine Learning Algorithms ◽

Development Effort ◽

Effort Estimation ◽

Software Effort Estimation ◽

Software Development Effort

Software development effort estimation is important for quality management in the software development industry, yet its automation still remains a challenging issue. Applying machine learning algorithms alone often cannot achieve satisfactory results. This paper presents an integrated data mining framework that incorporates domain knowledge into a series of data analysis and modeling processes, including visualization, feature selection, and model validation. An empirical study on the software effort estimation problem using a benchmark dataset shows the necessity and effectiveness of the proposed approach.

Download Full-text

Software Development Effort Duration and Cost Estimation using Linear Regression and K-Nearest Neighbors Machine Learning Algorithms

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.k2306.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 1043-1047

Keyword(s):

Machine Learning ◽

Linear Regression ◽

Software Development ◽

Cost Estimation ◽

Performance Metrics ◽

Nearest Neighbors ◽

Machine Learning Algorithms ◽

Development Effort ◽

Effort Estimation ◽

K Nearest Neighbors

Effort estimation is a crucial step that leads to Duration estimation and cost estimation in software development. Estimations done in the initial stage of projects are based on requirements that may lead to success or failure of the project. Accurate estimations lead to success and inaccurate estimates lead to failure. There is no one particular method which cloud do accurate estimations. In this work, we propose Machine learning techniques linear regression and K-nearest Neighbors to predict Software Effort estimation using COCOMO81, COCOMONasa, and COCOMONasa2 datasets. The results obtained from these two methods have been compared. The 80% data in data sets used for training and remaining used as the test set. The correlation coefficient, Mean squared error (MSE) and Mean magnitude relative error (MMRE) are used as performance metrics. The experimental results show that these models forecast the software effort accurately.

Download Full-text

Software Effort Estimation

International Journal of Intelligent Information Technologies ◽

10.4018/jiit.2011070104 ◽

2011 ◽

Vol 7 (3) ◽

pp. 41-53 ◽

Cited By ~ 4

Author(s):

Jeremiah D. Deng ◽

Martin Purvis ◽

Maryam Purvis

Keyword(s):

Machine Learning ◽

Software Development ◽

Domain Knowledge ◽

Modeling Processes ◽

Machine Learning Algorithms ◽

Development Effort ◽

Effort Estimation ◽

Software Effort Estimation ◽

Software Development Effort ◽

Software Development Effort Estimation

Download Full-text

A Review Article on Software Effort Estimation in Agile Methodology

Pertanika Journal of Science and Technology ◽

10.47836/pjst.29.2.08 ◽

2021 ◽

Vol 29 (2) ◽

Author(s):

Pantjawati Sudarmaningtyas ◽

Rozlina Mohamed

Keyword(s):

Machine Learning ◽

Software Development ◽

Hybrid Approach ◽

Estimation Method ◽

Expert Judgement ◽

Effort Estimation ◽

Software Effort Estimation ◽

Agile Methodology ◽

Implementation Approach ◽

Agile Software

Currently, Agile software development method has been commonly used in software development projects, and the success rate is higher than waterfall projects. The effort estimation in Agile is still a challenge because most existing means are developed based on the conventional method. Therefore, this study aimed to ascertain the software effort estimation method that is applied in Agile, the implementation approach, and the attributes that affect effort estimation. The results showed the top three estimation that is applied in Agile, are machine learning (37%), Expert Judgement (26%), and Algorithmic (21%). The implementation of all machine learning methods used a hybrid approach, which is a combination of machine learning and expert judgement, or a mix of two or more machine learning. Meanwhile, the implementation of effort estimation through a hybrid approach was only used in 47% of relevant articles. In addition, effort estimation in Agile involved twenty-four attributes, where Complexity, Experience, Size, and Time are the most commonly used and implemented.

Download Full-text

Optimizing SVR using Local Best PSO for Software Effort Estimation

Journal of Information Technology and Computer Science ◽

10.25126/jitecs.2016117 ◽

2016 ◽

Vol 1 (1) ◽

pp. 28 ◽

Cited By ~ 1

Author(s):

Dinda Novitasari ◽

Imam Cholissodin ◽

Wayan Firdaus Mahmudy

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Software Industry ◽

Optimal Parameter ◽

Effort Estimation ◽

Software Effort Estimation ◽

Learning Methods ◽

Machine Learning Methods ◽

Proposed Model

Abstract. In the software industry world, it’s known to fulfill the tremendous demand. Therefore, estimating effort is needed to optimize the accuracy of the results, because it has the weakness in the personal analysis of experts who tend to be less objective. SVR is one of clever algorithm as machine learning methods that can be used. There are two problems when applying it; select features and find optimal parameter value. This paper proposed local best PSO-SVR to solve the problem. The result of experiment showed that the proposed model outperforms PSO-SVR and T-SVR in accuracy. Keywords: Optimization, SVR, Optimal Parameter, Feature Selection, Local Best PSO, Software Effort Estimation

Download Full-text

GA-based method for feature selection and parameters optimization for machine learning regression applied to software effort estimation

Information and Software Technology ◽

10.1016/j.infsof.2010.05.009 ◽

2010 ◽

Vol 52 (11) ◽

pp. 1155-1166 ◽

Cited By ~ 90

Author(s):

Adriano L.I. Oliveira ◽

Petronio L. Braga ◽

Ricardo M.F. Lima ◽

Márcio L. Cornélio

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Parameters Optimization ◽

Effort Estimation ◽

Software Effort Estimation

Download Full-text

Estimating Software Development Efforts Using a Random Forest-Based Stacked Ensemble Approach

Electronics ◽

10.3390/electronics10101195 ◽

2021 ◽

Vol 10 (10) ◽

pp. 1195

Author(s):

Priya Varshini A G ◽

Anitha Kumari K ◽

Vijayakumar Varadarajan

Keyword(s):

Deep Learning ◽

Random Forest ◽

Software Development ◽

Weighted Averaging ◽

Software Project ◽

Effort Estimation ◽

Software Effort Estimation ◽

Single Model ◽

Ensemble Techniques ◽

Project Estimation

Software Project Estimation is a challenging and important activity in developing software projects. Software Project Estimation includes Software Time Estimation, Software Resource Estimation, Software Cost Estimation, and Software Effort Estimation. Software Effort Estimation focuses on predicting the number of hours of work (effort in terms of person-hours or person-months) required to develop or maintain a software application. It is difficult to forecast effort during the initial stages of software development. Various machine learning and deep learning models have been developed to predict the effort estimation. In this paper, single model approaches and ensemble approaches were considered for estimation. Ensemble techniques are the combination of several single models. Ensemble techniques considered for estimation were averaging, weighted averaging, bagging, boosting, and stacking. Various stacking models considered and evaluated were stacking using a generalized linear model, stacking using decision tree, stacking using a support vector machine, and stacking using random forest. Datasets considered for estimation were Albrecht, China, Desharnais, Kemerer, Kitchenham, Maxwell, and Cocomo81. Evaluation measures used were mean absolute error, root mean squared error, and R-squared. The results proved that the proposed stacking using random forest provides the best results compared with single model approaches using the machine or deep learning algorithms and other ensemble techniques.

Download Full-text

Toward a Progress Indicator for Machine Learning Model Building and Data Mining Algorithm Execution

ACM SIGKDD Explorations Newsletter ◽

10.1145/3166054.3166057 ◽

2017 ◽

Vol 19 (2) ◽

pp. 13-24 ◽

Cited By ~ 3

Author(s):

Gang Luo

Keyword(s):

Machine Learning ◽

Data Mining ◽

Model Building ◽

Learning Model ◽

Data Mining Algorithm ◽

Mining Algorithm ◽

Machine Learning Model

Download Full-text

Predicting Software Effort Estimation Using Machine Learning Techniques

2018 8th International Conference on Computer Science and Information Technology (CSIT) ◽

10.1109/csit.2018.8486222 ◽

2018 ◽

Cited By ~ 1

Author(s):

Ahmed BaniMustafa

Keyword(s):

Machine Learning ◽

Machine Learning Techniques ◽

Effort Estimation ◽

Software Effort Estimation ◽

Learning Techniques

Download Full-text

Dimension Reduction for Objects Composed of Vector Sets

International Journal of Applied Mathematics and Computer Science ◽

10.1515/amcs-2017-0012 ◽

2017 ◽

Vol 27 (1) ◽

pp. 169-180 ◽

Cited By ~ 1

Author(s):

Marton Szemenyei ◽

Ferenc Vajda

Keyword(s):

Machine Learning ◽

Data Mining ◽

Feature Selection ◽

Discriminant Analysis ◽

Probability Distribution ◽

Dimension Reduction ◽

Pose Estimation ◽

Real World ◽

Single Object ◽

Real World Datasets

Abstract Dimension reduction and feature selection are fundamental tools for machine learning and data mining. Most existing methods, however, assume that objects are represented by a single vectorial descriptor. In reality, some description methods assign unordered sets or graphs of vectors to a single object, where each vector is assumed to have the same number of dimensions, but is drawn from a different probability distribution. Moreover, some applications (such as pose estimation) may require the recognition of individual vectors (nodes) of an object. In such cases it is essential that the nodes within a single object remain distinguishable after dimension reduction. In this paper we propose new discriminant analysis methods that are able to satisfy two criteria at the same time: separating between classes and between the nodes of an object instance. We analyze and evaluate our methods on several different synthetic and real-world datasets.

Download Full-text