Adaptive Ranking Relevant Source Files for Bug Reports Using Genetic Algorithm

Mapping Intimacies ◽

10.3233/faia210042 ◽

2021 ◽

Author(s):

Thi Mai Anh Bui ◽

Nhat Hai Nguyen

Keyword(s):

Genetic Algorithm ◽

Software Maintenance ◽

Large Scale ◽

Maintenance Phase ◽

Bug Localization ◽

Software Projects ◽

Source File ◽

Bug Reports ◽

Localization Model ◽

Bug Report

Precisely locating buggy files for a given bug report is a cumbersome and time-consuming task, particularly in a large-scale project with thousands of source files and bug reports. An efficient bug localization module is desirable to improve the productivity of the software maintenance phase. Many previous approaches rank source files according to their relevance to a given bug report based on simple lexical matching scores. However, the lexical mismatches between natural language expressions used to describe bug reports and technical terms of software source code might reduce the bug localization system’s accuracy. Incorporating domain knowledge through some features such as the semantic similarity, the fixing frequency of a source file, the code change history and similar bug reports is crucial to efficiently locating buggy files. In this paper, we propose a bug localization model, BugLocGA that leverages both lexical and semantic information as well as explores the relation between a bug report and a source file through some domain features. Given a bug report, we calculate the ranking score with every source files through a weighted sum of all features, where the weights are trained through a genetic algorithm with the aim of maximizing the performance of the bug localization model using two evaluation metrics: mean reciprocal rank (MRR) and mean average precision (MAP). The empirical results conducted on some widely-used open source software projects have showed that our model outperformed some state of the art approaches by effectively recommending relevant files where the bug should be fixed.

Download Full-text

Augmenting Bug Localization with Part-of-Speech and Invocation

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194017500346 ◽

2017 ◽

Vol 27 (06) ◽

pp. 925-949 ◽

Cited By ~ 5

Author(s):

Yu Zhou ◽

Yanxiang Tong ◽

Taolue Chen ◽

Jin Han

Keyword(s):

Software Maintenance ◽

Large Scale ◽

Bug Localization ◽

Bug Reports ◽

Part Of Speech ◽

Adaptive Technique ◽

Bug Report ◽

Software Maintenance And Evolution ◽

Speech Features ◽

Localization Approach

Bug localization represents one of the most expensive, as well as time-consuming, activities during software maintenance and evolution. To alleviate the workload of developers, numerous methods have been proposed to automate this process and narrow down the scope of reviewing buggy files. In this paper, we present a novel buggy source-file localization approach, using the information from both the bug reports and the source files. We leverage the part-of-speech features of bug reports and the invocation relationship among source files. We also integrate an adaptive technique to further optimize the performance of the approach. The adaptive technique discriminates Top 1 and Top N recommendations for a given bug report and consists of two modules. One module is to maximize the accuracy of the first recommended file, and the other one aims at improving the accuracy of the fixed defect file list. We evaluate our approach on six large-scale open source projects, i.e. ASpectJ, Eclipse, SWT, Zxing, Birt and Tomcat. Compared to the previous work, empirical results show that our approach can improve the overall prediction performance in all of these cases. Particularly, in terms of the Top 1 recommendation accuracy, our approach achieves an enhancement from 22.73% to 39.86% for ASpectJ, from 24.36% to 30.76% for Eclipse, from 31.63% to 46.94% for SWT, from 40% to 55% for ZXing, from 7.97% to 21.99% for Birt, and from 33.37% to 38.90% for Tomcat.

Download Full-text

Guiding Bug Triage through Developer Analysis in Bug Reports

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194016500170 ◽

2016 ◽

Vol 26 (03) ◽

pp. 405-431 ◽

Cited By ~ 1

Author(s):

Tao Zhang ◽

Geunseok Yang ◽

Byungjeong Lee ◽

Alvin T. S. Chan

Keyword(s):

Empirical Study ◽

Software Maintenance ◽

Large Scale ◽

Software Projects ◽

Bug Reports ◽

Bug Fixing ◽

Bug Report ◽

Report Analysis ◽

Mozilla Firefox ◽

Triage Algorithm

An important part of software maintenance is bug report analysis during bug-fixing, especially for large-scale software projects. Since bugs reported to the bug repository need to be fixed, triagers are responsible to identify appropriate developers to execute the fix. Previous research focused on optimizing this process, such as by duplicate detection and use of developer recommendations for reducing the workload of triagers. However, there were scant studies that analyzed developer roles (e.g. reporter and assignee) in the bug-fixing process. Therefore, in this paper, we perform an in-depth empirical study of the different roles that developers perform in bug resolution. By extracting the factors that affect bug resolution from the analysis results, we propose a novel bug triage algorithm to recommend the appropriate developers to fix a given bug. We implement the proposed recommendations on the Eclipse and Mozilla Firefox projects, with the results showing that the new bug triage algorithm can effectively recommend which experts should fix given bugs.

Download Full-text

Enhancing the Unified Features to Locate Buggy Files by Exploiting the Sequential Nature of Source Code

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/265 ◽

2017 ◽

Cited By ~ 15

Author(s):

Xuan Huo ◽

Ming Li

Keyword(s):

Programming Languages ◽

Software Maintenance ◽

Structural Information ◽

Source Code ◽

Semantic Features ◽

Software Projects ◽

Bug Reports ◽

Bug Report ◽

Structural Semantics ◽

Sequential Nature

Bug reports provide an effective way for end-users to disclose potential bugs hidden in a software system, while automatically locating the potential buggy source files according to a bug report remains a great challenge in software maintenance. Many previous approaches represent bug reports and source code from lexical and structural information correlated their relevance by measuring their similarity, and recently a CNN-based model is proposed to learn the unified features for bug localization, which overcomes the difficulty in modeling natural and programming languages with different structural semantics. However, previous studies fail to capture the sequential nature of source code, which carries additional semantics beyond the lexical and structural terms and such information is vital in modeling program functionalities and behaviors. In this paper, we propose a novel model LS-CNN, which enhances the unified features by exploiting the sequential nature of source code. LS-CNN combines CNN and LSTM to extract semantic features for automatically identifying potential buggy source code according to a bug report. Experimental results on widely-used software projects indicate that LS-CNN significantly outperforms the state-of-the-art methods in locating buggy files.

Download Full-text

Control Flow Graph Embedding Based on Multi-Instance Decomposition for Bug Localization

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5844 ◽

2020 ◽

Vol 34 (04) ◽

pp. 4223-4230

Author(s):

Xuan Huo ◽

Ming Li ◽

Zhi-Hua Zhou

Keyword(s):

Programming Languages ◽

Software Maintenance ◽

Source Code ◽

Control Flow ◽

Control Flow Graph ◽

Bug Localization ◽

Flow Graph ◽

Bug Reports ◽

Bug Report ◽

Sequential Nature

During software maintenance, bug report is an effective way to identify potential bugs hidden in a software system. It is a great challenge to automatically locate the potential buggy source code according to a bug report. Traditional approaches usually represent bug reports and source code from a lexical perspective to measure their similarities. Recently, some deep learning models are proposed to learn the unified features by exploiting the local and sequential nature, which overcomes the difficulty in modeling the difference between natural and programming languages. However, only considering local and sequential information from one dimension is not enough to represent the semantics, some multi-dimension information such as structural and functional nature that carries additional semantics has not been well-captured. Such information beyond the lexical and structural terms is extremely vital in modeling program functionalities and behaviors, leading to a better representation for identifying buggy source code. In this paper, we propose a novel model named CG-CNN, which is a multi-instance learning framework that enhances the unified features for bug localization by exploiting structural and sequential nature from the control flow graph. Experimental results on widely-used software projects demonstrate the effectiveness of our proposed CG-CNN model.

Download Full-text

CooBa: Cross-project Bug Localization via Adversarial Transfer Learning

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/493 ◽

2020 ◽

Author(s):

Ziye Zhu ◽

Yun Li ◽

Hanghang Tong ◽

Yu Wang

Keyword(s):

Transfer Learning ◽

Private Information ◽

Large Scale ◽

Public Information ◽

Supervised Machine Learning ◽

Bug Localization ◽

Software Projects ◽

Real World Data ◽

Bug Reports ◽

Cross Project

Bug localization plays an important role in software quality control. Many supervised machine learning models have been developed based on historical bug-fix information. Despite being successful, these methods often require sufficient historical data (i.e., labels), which is not always available especially for newly developed software projects. In response, cross-project bug localization techniques have recently emerged whose key idea is to transferring knowledge from label-rich source project to locate bugs in the target project. However, a major limitation of these existing techniques lies in that they fail to capture the specificity of each individual project, and are thus prone to negative transfer. To address this issue, we propose an adversarial transfer learning bug localization approach, focusing on only transferring the common characteristics (i.e., public information) across projects. Specifically, our approach (CooBa) learns the indicative public information from cross-project bug reports through a shared encoder, and extracts the private information from code files by an individual feature extractor for each project. CooBa further incorporates adversarial learning mechanism to ensure that public information shared between multiple projects could be effectively extracted. Extensive experiments on four large-scale real-world data sets demonstrate that the proposed CooBa significantly outperforms the state of the art techniques.

Download Full-text

Utilizing Topic-Based Similar Commit Information and CNN-LSTM Algorithm for Bug Localization

Symmetry ◽

10.3390/sym13030406 ◽

2021 ◽

Vol 13 (3) ◽

pp. 406

Author(s):

Geunseok Yang ◽

Byungjeong Lee

Keyword(s):

Software Maintenance ◽

Short Term Memory ◽

Source Code ◽

Performance Comparison ◽

Bug Localization ◽

Source Codes ◽

Software Bugs ◽

Bug Reports ◽

Bug Report ◽

Model Training

With the use of increasingly complex software, software bugs are inevitable. Software developers rely on bug reports to identify and fix these issues. In this process, developers inspect suspected buggy source code files, relying heavily on a bug report. This process is often time-consuming and increases the cost of software maintenance. To resolve this problem, we propose a novel bug localization method using topic-based similar commit information. First, the method determines similar topics for a given bug report. Then, it extracts similar bug reports and similar commit information for these topics. To extract similar bug reports on a topic, a similarity measure is calculated for a given bug report. In the process, for a given bug report and source code, features shared by similar source codes are classified and extracted; combining these features improves the method’s performance. The extracted features are presented to the convolutional neural network’s long short-term memory algorithm for model training. Finally, when a bug report is submitted to the model, a suspected buggy source code file is detected and recommended. To evaluate the performance of our method, a baseline performance comparison was conducted using code from open-source projects. Our method exhibits good performance.

Download Full-text

Predicting the Severity of Bug Reports Based on Feature Selection

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194018500158 ◽

2018 ◽

Vol 28 (04) ◽

pp. 537-558 ◽

Cited By ~ 4

Author(s):

Wenjie Liu ◽

Shanshan Wang ◽

Xin Chen ◽

He Jiang

Keyword(s):

Feature Selection ◽

Software Maintenance ◽

Feature Selection Method ◽

Selection Methods ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Bug Reports ◽

Single Feature ◽

Bug Report ◽

Severity Prediction

In software maintenance process, it is a fairly important activity to predict the severity of bug reports. However, manually identifying the severity of bug reports is a tedious and time-consuming task. So developing automatic judgment methods for predicting the severity of bug reports has become an urgent demand. In general, a bug report contains a lot of descriptive natural language texts, thus resulting in a high-dimensional feature set which poses serious challenges to traditionally automatic methods. Therefore, we attempt to use automatic feature selection methods to improve the performance of the severity prediction of bug reports. In this paper, we introduce a ranking-based strategy to improve existing feature selection algorithms and propose an ensemble feature selection algorithm by combining existing ones. In order to verify the performance of our method, we run experiments over the bug reports of Eclipse and Mozilla and conduct comparisons with eight commonly used feature selection methods. The experiment results show that the ranking-based strategy can effectively improve the performance of the severity prediction of bug reports by up to 54.76% on average in terms of [Formula: see text]-measure, and it also can significantly reduce the dimension of the feature set. Meanwhile, the ensemble feature selection method can get better results than a single feature selection algorithm.

Download Full-text

A Commit Messages-Based Bug Localization for Android Applications

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194019500207 ◽

2019 ◽

Vol 29 (04) ◽

pp. 457-487 ◽

Cited By ~ 1

Author(s):

Tao Zhang ◽

Wenjun Hu ◽

Xiapu Luo ◽

Xiaobo Ma

Keyword(s):

Open Source ◽

Software Maintenance ◽

State Of The Art ◽

Bug Localization ◽

Two Phase ◽

Android Apps ◽

Bug Reports ◽

Android Applications ◽

The Given ◽

General Method

Recently, there has been consistent growth in Android applications (apps). Under these circumstances, software maintenance for Android apps becomes an essential and important task. The core of software maintenance is to locate bugs in source files. Previous bug localization approaches mainly focus on open-source desktop software (e.g. Eclipse, Mozilla, GCC). Even though a few studies locate the bugs in the Android apps, they are dedicated to a special app named ZXing, without developing a general method to locate the bugs in Android apps by taking into account the unique characteristics of Android apps’ bug reports. Such characteristics include fewer number of historical bug reports, insufficient detailed description, etc. These characteristics hinder existing localization approaches from being directly delivered to Android apps, because lack of enough information degrades the performance of those localization approaches relying on historical bug reports. Commit messages include more informative data which can provide the details of reported bugs. Therefore, in this paper, we propose a novel information retrieval-based approach which utilizes commit messages to locate new bugs in Android apps. This approach not only considers the structured textual similarity between the given bug and the candidate source files, but also computes the unstructured textual similarities between the new bug and the commit messages linked to the corresponding source files. According to the experimental results on 10 popular open-source Android apps managed by GitHub, our approach outperforms the state-of-the-art bug localization methods that include BugLocator, BLUiR, and two-phase model.

Download Full-text

Surprise Bug Report Prediction Utilizing Optimized Integration with Imbalanced Learning Strategy

Complexity ◽

10.1155/2020/8509821 ◽

2020 ◽

Vol 2020 ◽

pp. 1-14

Author(s):

Hui Li ◽

Yang Qu ◽

Shikai Guo ◽

Guofeng Gao ◽

Rong Chen ◽

...

Keyword(s):

Learning Strategy ◽

Work Force ◽

Ensemble Method ◽

Software Systems ◽

Imbalanced Learning ◽

Software Projects ◽

Bug Reports ◽

Bug Report ◽

Bug Repositories ◽

Better Than

In software projects, a large number of bugs are usually reported to bug repositories. Due to the limited budge and work force, the developers often may not have enough time and ability to inspect all the reported bugs, and thus they often focus on inspecting and repairing the highly impacting bugs. Among the high-impact bugs, surprise bugs are reported to be a fatal threat to the software systems, though they only account for a small proportion. Therefore, the identification of surprise bugs becomes an important work in practices. In recent years, some methods have been proposed by the researchers to identify surprise bugs. Unfortunately, the performance of these methods in identifying surprise bugs is still not satisfied for the software projects. The main reason is that surprise bugs only occupy a small percentage of all the bugs, and it is difficult to identify these surprise bugs from the imbalanced distribution. In order to overcome the imbalanced category distribution of the bugs, a method based on machine learning to predict surprise bugs is presented in this paper. This method takes into account the textual features of the bug reports and employs an imbalanced learning strategy to balance the datasets of the bug reports. Then these datasets after balancing are used to train three selected classifiers which are built by three different classification algorithms and predict the datasets with unknown type. In particular, an ensemble method named optimization integration is proposed to generate a unique and best result, according to the results produced by the three classifiers. This ensemble method is able to adjust the ability of the classifier to detect different categories based on the characteristics of different projects and integrate the advantages of three classifiers. The experiments performed on the datasets from 4 software projects show that this method performs better than the previous methods in terms of detecting surprise bugs.

Download Full-text

A Weighted PageRank-Based Bug Report Summarization Method Using Bug Report Relationships

Applied Sciences ◽

10.3390/app9245427 ◽

2019 ◽

Vol 9 (24) ◽

pp. 5427 ◽

Cited By ~ 1

Author(s):

Beomjun Kim ◽

Sungwon Kang ◽

Seonah Lee

Keyword(s):

Software Maintenance ◽

State Of The Art ◽

The State ◽

Experimental Results ◽

Pagerank Algorithm ◽

Bug Reports ◽

Summarization Method ◽

Bug Report

For software maintenance, bug reports provide useful information to developers because they can be used for various tasks such as debugging and understanding previous changes. However, as they are typically written in the form of conversations among developers, bug reports tend to be unnecessarily long and verbose, with the consequence that developers often have difficulties reading or understanding bug reports. To mitigate this problem, methods that automatically generate a summary of bug reports have been proposed, and various related studies have been conducted. However, existing bug report summarization methods have not fully exploited the inherent characteristics of bug reports. In this paper, we propose a bug report summarization method that uses the weighted-PageRank algorithm and exploits the 'duplicates’, ‘blocks’, and ‘depends-on’ relationships between bug reports. The experimental results show that our method outperforms the state-of-the-art method in terms of both the quality of the summary and the number of applicable bug reports.

Download Full-text