scholarly journals Software Defect Fault Intelligent Location and Identification Method Based on Data Mining

2022 ◽  
Vol 2146 (1) ◽  
pp. 012012
Author(s):  
Fang Wang

Abstract With the advancement of the times, computer technology is also constantly improving, and people’s requirements for software functions are also constantly improving, and as software functions become more and more complex, developers are technically limited and teamwork is not tacitly coordinated. And so on, so in the software development process, some errors and problems will inevitably lead to software defects. The purpose of this paper is to study the intelligent location and identification methods of software defects based on data mining. This article first studies the domestic and foreign software defect fault intelligent location technology, analyzes the shortcomings of traditional software defect detection and fault detection, then introduces data mining technology in detail, and finally conducts in-depth research on software defect prediction technology. Through in-depth research on several technologies, it reduces the accidents of software equipment and delays its service life. According to the experiments in this article, the software defect location proposed in this article uses two methods to compare. The first error set is used as a unit to measure the subsequent error set software error location cost. The first error set 1F contains 19 A manually injected error program, and the average positioning cost obtained is 3.75%.

2014 ◽  
Vol 701-702 ◽  
pp. 67-70
Author(s):  
Wan Jiang Han ◽  
He Yang Jiang ◽  
Yi Sun ◽  
Tian Bo Lu

Effective detection of software defects is an important activity of software development process. In this paper, we propose an approach to predict residual defects for BOSS project, which applies defect distribution model. Experiment results show that this approach can effectively improve the accuracy of defect prediction.


Author(s):  
J. L. ÁLVAREZ-MACÍAS ◽  
J. MATA-VÁZQUEZ ◽  
J. C. RIQUELME-SANTOS

In this paper we present a new method for the application of data mining tools on the management phase of software development process. Specifically, we describe two tools, the first one based on supervised learning, and the second one on unsupervised learning. The goal of this method is to induce a set of management rules that make easy the development process to the managers. Depending on how and to what is this method applied, it will permit an a priori analysis, a monitoring of the project or a post-mortem analysis.


Author(s):  
Meenakshi Kathayat

Continuous integration is a software development process where members of a team frequently integrate the work done by them. Generally each person integrates at least daily - leading to multiple integrations per day. Integration done by each developer is verified by an automated build (including test) to detect integration errors as quickly as possible. Many teams find that this approach reduces integration problems and allows a team to develop cohesive software rapidly. Continuous Integration doesn’t remove bugs, but it does make them dramatically easier to find and remove. This paper provides an overview of various issues regarding Continuous Integration and how various data mining techniques can be applied in continuous integration data for extracting useful knowledge and solving continuousintegration problems.


2019 ◽  
Vol 6 (1) ◽  
pp. 107-113
Author(s):  
Muhammad Faittullah Akbar ◽  
Ilham Kurniawan ◽  
Ahmad Fauzi

Ketidakseimbangan kelas seringkali menjadi masalah di berbagai set data dunia nyata, di mana satu kelas (yaitu kelas minoritas) berisi sejumlah kecil titik data dan yang lainnya (yaitu kelas mayoritas) berisi sejumlah besar titik data. Sangat sulit untuk mengembangkan model yang efektif dengan menggunakan data mining dan algoritma machine learning tanpa mempertimbangkan preprocessing data untuk menyeimbangkan set data yang tidak seimbang. Random undersampling dan oversampling telah digunakan dalam banyak penelitian untuk memastikan bahwa kelas yang berbeda mengandung jumlah titik data yang sama. Dalam penelitian ini, kami mengusulkan kombinasi two-step clustering-based random undersampling dan bagging technique untuk meningkatkan nilai akurasi software defect prediction. Metode yang diusulkan dievaluasi menggunakan lima set data dari repositori program data metrik NASA dan area under the curve (AUC) sebagai evaluasi utama. Hasil telah menunjukkan bahwa metode yang diusulkan menghasilkan kinerja yang sangat baik untuk semua dataset (AUC> 0,9). Dalam hal SN, percobaan kedua mengungguli percobaan pertama di hampir semua dataset (3 dari 5 dataset). Sementara itu, dalam hal SP, percobaan pertama tidak mengungguli percobaan kedua di semua dataset. Secara keseluruhan percobaan kedua mengungguli dan lebih baik daripada percobaan pertama karena evaluasi utama dalam klasifikasi kelas yang tidak seimbang seperti SDP adalah AUC Oleh karena itu, dapat disimpulkan bahwa metode yang diusulkan menghasilkan kinerja yang optimal baik untuk set data skala kecil maupun besar. 


Author(s):  
HONGHUA DAI ◽  
WEI DAI ◽  
GANG LI

To have an effective and efficient mechanism to store, manage and utilize software sources is essential to the automation of software engineering. The paper presents an innovative approach in managing software resources using software warehouse where software assets are systematically accumulated, deposited, retrieved, packaged, managed and utilized, driven by data-mining and OLAP technologies. The results lead to streamlined high efficient software development process and enhance the productivity in response to modern challenges of the design and development of software applications.


Author(s):  
Joko Suntoro ◽  
Febrian Wahyu Christanto ◽  
Henny Indriyawati

The most important part in software engineering is a software defect prediction. Software defect prediction is defined as a software prediction process from errors, failures, and system errors. Machine learning methods are used by researchers to predict software defects including estimation, association, classification, clustering, and datasets analysis. Datasets of NASA Metrics Data Program (NASA MDP) is one of the metric software that researchers use to predict software defects. NASA MDP datasets contain unbalanced classes and high dimensional data, so they will affect the classification evaluation results to be low. In this research, data with unbalanced classes will be solved by the AdaCost method and high dimensional data will be handled with the Average Weight Information Gain (AWEIG) method, while the classification method that will be used is the Naïve Bayes algorithm. The proposed method is named AWEIG + AdaCost Bayesian. In this experiment, the AWEIG + AdaCost Bayesian algorithm is compared to the Naïve Bayesian algorithm. The results showed the mean of Area Under the Curve (AUC) algorithm AWEIG + AdaCost Bayesian yields better than just a Naïve Bayes algorithm with respectively mean of AUC values are 0.752 and 0.696.


Sign in / Sign up

Export Citation Format

Share Document