Enhancing the Unified Features to Locate Buggy Files by Exploiting the Sequential Nature of Source Code

Bug reports provide an effective way for end-users to disclose potential bugs hidden in a software system, while automatically locating the potential buggy source files according to a bug report remains a great challenge in software maintenance. Many previous approaches represent bug reports and source code from lexical and structural information correlated their relevance by measuring their similarity, and recently a CNN-based model is proposed to learn the unified features for bug localization, which overcomes the difficulty in modeling natural and programming languages with different structural semantics. However, previous studies fail to capture the sequential nature of source code, which carries additional semantics beyond the lexical and structural terms and such information is vital in modeling program functionalities and behaviors. In this paper, we propose a novel model LS-CNN, which enhances the unified features by exploiting the sequential nature of source code. LS-CNN combines CNN and LSTM to extract semantic features for automatically identifying potential buggy source code according to a bug report. Experimental results on widely-used software projects indicate that LS-CNN significantly outperforms the state-of-the-art methods in locating buggy files.

Download Full-text

Control Flow Graph Embedding Based on Multi-Instance Decomposition for Bug Localization

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5844 ◽

2020 ◽

Vol 34 (04) ◽

pp. 4223-4230

Author(s):

Xuan Huo ◽

Ming Li ◽

Zhi-Hua Zhou

Keyword(s):

Programming Languages ◽

Software Maintenance ◽

Source Code ◽

Control Flow ◽

Control Flow Graph ◽

Bug Localization ◽

Flow Graph ◽

Bug Reports ◽

Bug Report ◽

Sequential Nature

During software maintenance, bug report is an effective way to identify potential bugs hidden in a software system. It is a great challenge to automatically locate the potential buggy source code according to a bug report. Traditional approaches usually represent bug reports and source code from a lexical perspective to measure their similarities. Recently, some deep learning models are proposed to learn the unified features by exploiting the local and sequential nature, which overcomes the difficulty in modeling the difference between natural and programming languages. However, only considering local and sequential information from one dimension is not enough to represent the semantics, some multi-dimension information such as structural and functional nature that carries additional semantics has not been well-captured. Such information beyond the lexical and structural terms is extremely vital in modeling program functionalities and behaviors, leading to a better representation for identifying buggy source code. In this paper, we propose a novel model named CG-CNN, which is a multi-instance learning framework that enhances the unified features for bug localization by exploiting structural and sequential nature from the control flow graph. Experimental results on widely-used software projects demonstrate the effectiveness of our proposed CG-CNN model.

Download Full-text

Adaptive Ranking Relevant Source Files for Bug Reports Using Genetic Algorithm

10.3233/faia210042 ◽

2021 ◽

Author(s):

Thi Mai Anh Bui ◽

Nhat Hai Nguyen

Keyword(s):

Genetic Algorithm ◽

Software Maintenance ◽

Large Scale ◽

Maintenance Phase ◽

Bug Localization ◽

Software Projects ◽

Source File ◽

Bug Reports ◽

Localization Model ◽

Bug Report

Precisely locating buggy files for a given bug report is a cumbersome and time-consuming task, particularly in a large-scale project with thousands of source files and bug reports. An efficient bug localization module is desirable to improve the productivity of the software maintenance phase. Many previous approaches rank source files according to their relevance to a given bug report based on simple lexical matching scores. However, the lexical mismatches between natural language expressions used to describe bug reports and technical terms of software source code might reduce the bug localization system’s accuracy. Incorporating domain knowledge through some features such as the semantic similarity, the fixing frequency of a source file, the code change history and similar bug reports is crucial to efficiently locating buggy files. In this paper, we propose a bug localization model, BugLocGA that leverages both lexical and semantic information as well as explores the relation between a bug report and a source file through some domain features. Given a bug report, we calculate the ranking score with every source files through a weighted sum of all features, where the weights are trained through a genetic algorithm with the aim of maximizing the performance of the bug localization model using two evaluation metrics: mean reciprocal rank (MRR) and mean average precision (MAP). The empirical results conducted on some widely-used open source software projects have showed that our model outperformed some state of the art approaches by effectively recommending relevant files where the bug should be fixed.

Download Full-text

Guiding Bug Triage through Developer Analysis in Bug Reports

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194016500170 ◽

2016 ◽

Vol 26 (03) ◽

pp. 405-431 ◽

Cited By ~ 1

Author(s):

Tao Zhang ◽

Geunseok Yang ◽

Byungjeong Lee ◽

Alvin T. S. Chan

Keyword(s):

Empirical Study ◽

Software Maintenance ◽

Large Scale ◽

Software Projects ◽

Bug Reports ◽

Bug Fixing ◽

Bug Report ◽

Report Analysis ◽

Mozilla Firefox ◽

Triage Algorithm

An important part of software maintenance is bug report analysis during bug-fixing, especially for large-scale software projects. Since bugs reported to the bug repository need to be fixed, triagers are responsible to identify appropriate developers to execute the fix. Previous research focused on optimizing this process, such as by duplicate detection and use of developer recommendations for reducing the workload of triagers. However, there were scant studies that analyzed developer roles (e.g. reporter and assignee) in the bug-fixing process. Therefore, in this paper, we perform an in-depth empirical study of the different roles that developers perform in bug resolution. By extracting the factors that affect bug resolution from the analysis results, we propose a novel bug triage algorithm to recommend the appropriate developers to fix a given bug. We implement the proposed recommendations on the Eclipse and Mozilla Firefox projects, with the results showing that the new bug triage algorithm can effectively recommend which experts should fix given bugs.

Download Full-text

Utilizing Topic-Based Similar Commit Information and CNN-LSTM Algorithm for Bug Localization

Symmetry ◽

10.3390/sym13030406 ◽

2021 ◽

Vol 13 (3) ◽

pp. 406

Author(s):

Geunseok Yang ◽

Byungjeong Lee

Keyword(s):

Software Maintenance ◽

Short Term Memory ◽

Source Code ◽

Performance Comparison ◽

Bug Localization ◽

Source Codes ◽

Software Bugs ◽

Bug Reports ◽

Bug Report ◽

Model Training

With the use of increasingly complex software, software bugs are inevitable. Software developers rely on bug reports to identify and fix these issues. In this process, developers inspect suspected buggy source code files, relying heavily on a bug report. This process is often time-consuming and increases the cost of software maintenance. To resolve this problem, we propose a novel bug localization method using topic-based similar commit information. First, the method determines similar topics for a given bug report. Then, it extracts similar bug reports and similar commit information for these topics. To extract similar bug reports on a topic, a similarity measure is calculated for a given bug report. In the process, for a given bug report and source code, features shared by similar source codes are classified and extracted; combining these features improves the method’s performance. The extracted features are presented to the convolutional neural network’s long short-term memory algorithm for model training. Finally, when a bug report is submitted to the model, a suspected buggy source code file is detected and recommended. To evaluate the performance of our method, a baseline performance comparison was conducted using code from open-source projects. Our method exhibits good performance.

Download Full-text

Augmenting Bug Localization with Part-of-Speech and Invocation

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194017500346 ◽

2017 ◽

Vol 27 (06) ◽

pp. 925-949 ◽

Cited By ~ 5

Author(s):

Yu Zhou ◽

Yanxiang Tong ◽

Taolue Chen ◽

Jin Han

Keyword(s):

Software Maintenance ◽

Large Scale ◽

Bug Localization ◽

Bug Reports ◽

Part Of Speech ◽

Adaptive Technique ◽

Bug Report ◽

Software Maintenance And Evolution ◽

Speech Features ◽

Localization Approach

Bug localization represents one of the most expensive, as well as time-consuming, activities during software maintenance and evolution. To alleviate the workload of developers, numerous methods have been proposed to automate this process and narrow down the scope of reviewing buggy files. In this paper, we present a novel buggy source-file localization approach, using the information from both the bug reports and the source files. We leverage the part-of-speech features of bug reports and the invocation relationship among source files. We also integrate an adaptive technique to further optimize the performance of the approach. The adaptive technique discriminates Top 1 and Top N recommendations for a given bug report and consists of two modules. One module is to maximize the accuracy of the first recommended file, and the other one aims at improving the accuracy of the fixed defect file list. We evaluate our approach on six large-scale open source projects, i.e. ASpectJ, Eclipse, SWT, Zxing, Birt and Tomcat. Compared to the previous work, empirical results show that our approach can improve the overall prediction performance in all of these cases. Particularly, in terms of the Top 1 recommendation accuracy, our approach achieves an enhancement from 22.73% to 39.86% for ASpectJ, from 24.36% to 30.76% for Eclipse, from 31.63% to 46.94% for SWT, from 40% to 55% for ZXing, from 7.97% to 21.99% for Birt, and from 33.37% to 38.90% for Tomcat.

Download Full-text

Predicting the Severity of Bug Reports Based on Feature Selection

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194018500158 ◽

2018 ◽

Vol 28 (04) ◽

pp. 537-558 ◽

Cited By ~ 4

Author(s):

Wenjie Liu ◽

Shanshan Wang ◽

Xin Chen ◽

He Jiang

Keyword(s):

Feature Selection ◽

Software Maintenance ◽

Feature Selection Method ◽

Selection Methods ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Bug Reports ◽

Single Feature ◽

Bug Report ◽

Severity Prediction

In software maintenance process, it is a fairly important activity to predict the severity of bug reports. However, manually identifying the severity of bug reports is a tedious and time-consuming task. So developing automatic judgment methods for predicting the severity of bug reports has become an urgent demand. In general, a bug report contains a lot of descriptive natural language texts, thus resulting in a high-dimensional feature set which poses serious challenges to traditionally automatic methods. Therefore, we attempt to use automatic feature selection methods to improve the performance of the severity prediction of bug reports. In this paper, we introduce a ranking-based strategy to improve existing feature selection algorithms and propose an ensemble feature selection algorithm by combining existing ones. In order to verify the performance of our method, we run experiments over the bug reports of Eclipse and Mozilla and conduct comparisons with eight commonly used feature selection methods. The experiment results show that the ranking-based strategy can effectively improve the performance of the severity prediction of bug reports by up to 54.76% on average in terms of [Formula: see text]-measure, and it also can significantly reduce the dimension of the feature set. Meanwhile, the ensemble feature selection method can get better results than a single feature selection algorithm.

Download Full-text

Surprise Bug Report Prediction Utilizing Optimized Integration with Imbalanced Learning Strategy

Complexity ◽

10.1155/2020/8509821 ◽

2020 ◽

Vol 2020 ◽

pp. 1-14

Author(s):

Hui Li ◽

Yang Qu ◽

Shikai Guo ◽

Guofeng Gao ◽

Rong Chen ◽

...

Keyword(s):

Learning Strategy ◽

Work Force ◽

Ensemble Method ◽

Software Systems ◽

Imbalanced Learning ◽

Software Projects ◽

Bug Reports ◽

Bug Report ◽

Bug Repositories ◽

Better Than

In software projects, a large number of bugs are usually reported to bug repositories. Due to the limited budge and work force, the developers often may not have enough time and ability to inspect all the reported bugs, and thus they often focus on inspecting and repairing the highly impacting bugs. Among the high-impact bugs, surprise bugs are reported to be a fatal threat to the software systems, though they only account for a small proportion. Therefore, the identification of surprise bugs becomes an important work in practices. In recent years, some methods have been proposed by the researchers to identify surprise bugs. Unfortunately, the performance of these methods in identifying surprise bugs is still not satisfied for the software projects. The main reason is that surprise bugs only occupy a small percentage of all the bugs, and it is difficult to identify these surprise bugs from the imbalanced distribution. In order to overcome the imbalanced category distribution of the bugs, a method based on machine learning to predict surprise bugs is presented in this paper. This method takes into account the textual features of the bug reports and employs an imbalanced learning strategy to balance the datasets of the bug reports. Then these datasets after balancing are used to train three selected classifiers which are built by three different classification algorithms and predict the datasets with unknown type. In particular, an ensemble method named optimization integration is proposed to generate a unique and best result, according to the results produced by the three classifiers. This ensemble method is able to adjust the ability of the classifier to detect different categories based on the characteristics of different projects and integrate the advantages of three classifiers. The experiments performed on the datasets from 4 software projects show that this method performs better than the previous methods in terms of detecting surprise bugs.

Download Full-text

A Weighted PageRank-Based Bug Report Summarization Method Using Bug Report Relationships

Applied Sciences ◽

10.3390/app9245427 ◽

2019 ◽

Vol 9 (24) ◽

pp. 5427 ◽

Cited By ~ 1

Author(s):

Beomjun Kim ◽

Sungwon Kang ◽

Seonah Lee

Keyword(s):

Software Maintenance ◽

State Of The Art ◽

The State ◽

Experimental Results ◽

Pagerank Algorithm ◽

Bug Reports ◽

Summarization Method ◽

Bug Report

For software maintenance, bug reports provide useful information to developers because they can be used for various tasks such as debugging and understanding previous changes. However, as they are typically written in the form of conversations among developers, bug reports tend to be unnecessarily long and verbose, with the consequence that developers often have difficulties reading or understanding bug reports. To mitigate this problem, methods that automatically generate a summary of bug reports have been proposed, and various related studies have been conducted. However, existing bug report summarization methods have not fully exploited the inherent characteristics of bug reports. In this paper, we propose a bug report summarization method that uses the weighted-PageRank algorithm and exploits the 'duplicates’, ‘blocks’, and ‘depends-on’ relationships between bug reports. The experimental results show that our method outperforms the state-of-the-art method in terms of both the quality of the summary and the number of applicable bug reports.

Download Full-text

IMPROVING DETECTION PERFORMANCE OF DUPLICATE BUG REPORTS USING EXTENDED CLASS CENTROID INFORMATION

Scientific Journal of Tra Vinh University ◽

10.35382/18594816.1.26.2017.107 ◽

2019 ◽

Vol 1 (26) ◽

pp. 71-79

Author(s):

Phuc Minh Nhan

Keyword(s):

Software Maintenance ◽

Software Projects ◽

Centroid Method ◽

Detection Scheme ◽

Bug Reports ◽

Software Packages ◽

Extended Class ◽

Detection Schemes ◽

Class Centroid ◽

Duplicate Bug Reports

In software maintenance, bug reports play an important role in the correctness of software packages. Unfortunately, the duplicatebug report problem arises because there are too many duplicate bug reports in various software projects. Handling with duplicate bug reports is thus time-consuming and has high cost of software maintenance. Therefore, this research introduces a detection scheme based on the extended class centroid information (ECCI) to enhance thedetection performance. This method is extended from the previous one, which used only centroid method without considering the effects of both inner and inter class. Besides, this method also improved the previous use of normalized cosine in identifying the similarity between two bug reports by denormalized cosine. The effectiveness of ECCI is proved through the empirical study with three open-source projects: SVN, Argo UML and Apache. The experimental results show thatECCI outperforms other detection schemes by about 10% in all cases.

Download Full-text

The Influence Ranking for Testers in Bug Tracking Systems

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194019500050 ◽

2019 ◽

Vol 29 (01) ◽

pp. 93-113 ◽

Cited By ~ 41

Author(s):

Hui Li ◽

Guofeng Gao ◽

Rong Chen ◽

Xin Ge ◽

Shikai Guo ◽

...

Keyword(s):

Tracking System ◽

Tracking Systems ◽

Software Projects ◽

Actual Performance ◽

Bug Reports ◽

Existence Time ◽

Bug Report ◽

Bug Tracking ◽

Node Removal ◽

Bug Fixes

At present, bug tracking systems are used to collect and manage bug reports in many software projects. As participants, the testers not only submit bug reports to the system, but also comment on bug reports in the system. The tester’s behaviors of submitting and commenting reflect his/her influence in bug tracking systems. However, with the rapid increase of the bug reports in software projects, evaluating the testers’ influence in the projects accurately becomes more and more difficult. Aiming at solving this problem, the submission and comment on bug report can be regarded as social behaviors of the testers, and thus the method of Influence Ranking for Testers (IRfT) in bug tracking systems is presented and used for measuring the influence of the testers in this paper. The case study of the Eclipse project in Bugzilla shows that the result produced by IRfT is consistent with the actual performance of the testers in this project. The ranking results can keep stable in the cases of link adding or removing and tester removing in tester networks, and the results are also proved to be valid in the future. The further investigation on the speed of network break-down by node removal demonstrates that the top-ranking testers are important in the organization of tester networks. Additionally, the results also show that the ranking of the testers is related to the existence time in bug tracking system. Therefore, IRfT is proved to be an effective measurement for evaluating the influence of the testers in bug tracking system, and it can further demonstrate the testers’ contributions in software testing, such as bug validations, bug fixes, etc.

Download Full-text