Investigation of Software Defect Prediction Using Data Mining Framework

Ketidakseimbangan kelas seringkali menjadi masalah di berbagai set data dunia nyata, di mana satu kelas (yaitu kelas minoritas) berisi sejumlah kecil titik data dan yang lainnya (yaitu kelas mayoritas) berisi sejumlah besar titik data. Sangat sulit untuk mengembangkan model yang efektif dengan menggunakan data mining dan algoritma machine learning tanpa mempertimbangkan preprocessing data untuk menyeimbangkan set data yang tidak seimbang. Random undersampling dan oversampling telah digunakan dalam banyak penelitian untuk memastikan bahwa kelas yang berbeda mengandung jumlah titik data yang sama. Dalam penelitian ini, kami mengusulkan kombinasi two-step clustering-based random undersampling dan bagging technique untuk meningkatkan nilai akurasi software defect prediction. Metode yang diusulkan dievaluasi menggunakan lima set data dari repositori program data metrik NASA dan area under the curve (AUC) sebagai evaluasi utama. Hasil telah menunjukkan bahwa metode yang diusulkan menghasilkan kinerja yang sangat baik untuk semua dataset (AUC> 0,9). Dalam hal SN, percobaan kedua mengungguli percobaan pertama di hampir semua dataset (3 dari 5 dataset). Sementara itu, dalam hal SP, percobaan pertama tidak mengungguli percobaan kedua di semua dataset. Secara keseluruhan percobaan kedua mengungguli dan lebih baik daripada percobaan pertama karena evaluasi utama dalam klasifikasi kelas yang tidak seimbang seperti SDP adalah AUC Oleh karena itu, dapat disimpulkan bahwa metode yang diusulkan menghasilkan kinerja yang optimal baik untuk set data skala kecil maupun besar.

Download Full-text

Improved Random Forest Algorithm for Software Defect Prediction through Data Mining Techniques

International Journal of Computer Applications ◽

10.5120/20693-3582 ◽

2015 ◽

Vol 117 (23) ◽

pp. 18-22 ◽

Cited By ~ 11

Author(s):

Kalai Magal.R ◽

Shomona Gracia Jacob

Keyword(s):

Data Mining ◽

Random Forest ◽

Defect Prediction ◽

Software Defect Prediction ◽

Random Forest Algorithm ◽

Data Mining Techniques ◽

Software Defect

Download Full-text

Software-defect prediction within and across projects based on improved self-organizing data mining

The Journal of Supercomputing ◽

10.1007/s11227-021-04113-8 ◽

2021 ◽

Author(s):

Qing Zhang ◽

Junhua Ren

Keyword(s):

Data Mining ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect ◽

Self Organizing

Download Full-text

Data Mining Techniques in Software Defect Prediction

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse/v7i3/0173 ◽

2017 ◽

Vol 7 (3) ◽

pp. 301-303

Author(s):

A. R. Pon Periasamy ◽

◽

A. Mishbahulhuda ◽

Keyword(s):

Data Mining ◽

Defect Prediction ◽

Software Defect Prediction ◽

Data Mining Techniques ◽

Software Defect

Download Full-text

Research on software defect prediction based on data mining

2010 The 2nd International Conference on Computer and Automation Engineering (ICCAE) ◽

10.1109/iccae.2010.5451355 ◽

2010 ◽

Cited By ~ 1

Author(s):

Yuan Chen ◽

Xiang-heng Shen ◽

Peng Du ◽

Bing Ge

Keyword(s):

Data Mining ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect

Download Full-text

A STUDY ON SOFTWARE DEFECT PREDICTION SYSTEM USING DATA MINING TECHNIQUES

INFORMATION TECHNOLOGY IN INDUSTRY ◽

10.17762/itii.v9i2.435 ◽

2021 ◽

Vol 9 (2) ◽

pp. 950-955

Author(s):

J. Mary Catherine, Et. al.

Keyword(s):

Data Mining ◽

Nearest Neighbor ◽

Prediction Models ◽

Model Development ◽

Defect Prediction ◽

Software Defect Prediction ◽

K Nearest Neighbor ◽

Software Cost ◽

Record Keeping ◽

Software Defect

Defects in software modules are a source of significant concern. Software reliability and software quality assurance ensure the high quality of applications. A software defect triggers software malfunction in an executable product. A number of methods for forecasting machine faults have been suggested, but none have proven to be sufficiently accurate. In the design of software error prediction models, the aim is to use metrics that can be obtained comparatively early in the life cycle of software production to provide fair initial quality estimates of an evolving software framework.Here are various data mining classification and forecasting techniques. Artificial Neural Network (ANN), K-Nearest Neighbor (KNN) have been analyzed and compared for software defect prediction model development. For this paper, the DATATRIEVETM project developed by Digital Engineering, Italy was used to validate the algorithm. The findings revealed that the model was an exceptional statistical model using the NN classification methodology. The main challenges faced in the secure software development process are quality and reliability. There are major software cost violations when a software product with errors in its various components is used on the customer’s side. The software warehouse is commonly used as a record keeping repository, which is often needed when adding new features or fixing bugs. Software errors can lead to erroneous and different results. As a result, software programs run late, are canceled, or become unreliable after use. Different social and technical issues are associated with software failure and software defects are the main reasons for deteriorating product quality. In software engineering, the most active research in software domain is defect prediction.This study discusses the bug-fix time forecast model, pre-release release, post-release error and different measurements to predict failures. Predicted results help developers identify and fix potential vulnerabilities, thereby improving software stability and reliability.

Download Full-text