Classifying Bug Reports into Bugs and Non-bugs Using LSTM

Author(s):  
Hanmin Qin ◽  
Xin Sun
Keyword(s):  
Author(s):  
Yu Zhou ◽  
Yanxiang Tong ◽  
Taolue Chen ◽  
Jin Han

Bug localization represents one of the most expensive, as well as time-consuming, activities during software maintenance and evolution. To alleviate the workload of developers, numerous methods have been proposed to automate this process and narrow down the scope of reviewing buggy files. In this paper, we present a novel buggy source-file localization approach, using the information from both the bug reports and the source files. We leverage the part-of-speech features of bug reports and the invocation relationship among source files. We also integrate an adaptive technique to further optimize the performance of the approach. The adaptive technique discriminates Top 1 and Top N recommendations for a given bug report and consists of two modules. One module is to maximize the accuracy of the first recommended file, and the other one aims at improving the accuracy of the fixed defect file list. We evaluate our approach on six large-scale open source projects, i.e. ASpectJ, Eclipse, SWT, Zxing, Birt and Tomcat. Compared to the previous work, empirical results show that our approach can improve the overall prediction performance in all of these cases. Particularly, in terms of the Top 1 recommendation accuracy, our approach achieves an enhancement from 22.73% to 39.86% for ASpectJ, from 24.36% to 30.76% for Eclipse, from 31.63% to 46.94% for SWT, from 40% to 55% for ZXing, from 7.97% to 21.99% for Birt, and from 33.37% to 38.90% for Tomcat.


Author(s):  
Bancha Luaphol ◽  
Jantima Polpinij ◽  
Manasawee Kaenampornpan

Most studies relating to bug reports aims to automatically identify necessary information from bug reports for software bug fixing. Unfortunately, the study of bug reports focuses only on one issue, but more complete and comprehensive software bug fixing would be facilitated by assessing multiple issues concurrently. This becomes a challenge in this study, where it aims to present a method of identifying bug reports at severe level from a bug report repository, together with assembling their related bug reports to visualize the overall picture of a software problem domain. The proposed method is called “mining bug report repositories”. Two techniques of text mining are applied as the main mechanisms in this method. First, classification is applied for identifying severe bug reports, called “bug severity classification”, while “threshold-based similarity analysis” is then applied to assemble bug reports that are related to a bug report at severe level. Our datasets are from three opensource namely SeaMonkey, Firefox, and Core:Layout downloaded from the Bugzilla. Finally, the best models from the proposed method are selected and compared with two baseline methods. For identifying severe bug reports using classification technique, the results show that our method improved accuracy, F1, and AUC scores over the baseline by 11.39, 11.63, and 19% respectively. Meanwhile, for assembling related bug reports using threshold-based similarity technique, the results show that our method improved precision, and likelihood scores over the other baseline by 15.76, and 9.14% respectively. This demonstrate that our proposed method may help increasing chance to fix bugs completely.


Author(s):  
Wenjie Liu ◽  
Shanshan Wang ◽  
Xin Chen ◽  
He Jiang

In software maintenance process, it is a fairly important activity to predict the severity of bug reports. However, manually identifying the severity of bug reports is a tedious and time-consuming task. So developing automatic judgment methods for predicting the severity of bug reports has become an urgent demand. In general, a bug report contains a lot of descriptive natural language texts, thus resulting in a high-dimensional feature set which poses serious challenges to traditionally automatic methods. Therefore, we attempt to use automatic feature selection methods to improve the performance of the severity prediction of bug reports. In this paper, we introduce a ranking-based strategy to improve existing feature selection algorithms and propose an ensemble feature selection algorithm by combining existing ones. In order to verify the performance of our method, we run experiments over the bug reports of Eclipse and Mozilla and conduct comparisons with eight commonly used feature selection methods. The experiment results show that the ranking-based strategy can effectively improve the performance of the severity prediction of bug reports by up to 54.76% on average in terms of [Formula: see text]-measure, and it also can significantly reduce the dimension of the feature set. Meanwhile, the ensemble feature selection method can get better results than a single feature selection algorithm.


Author(s):  
Carolina Sokolowicz ◽  
Marcus Guidoti ◽  
Donat Agosti

Plazi is a non-profit organization focused on the liberation of data from taxonomic publications. As one of Plazi’s goals of promoting the accessibility of taxonomic data, our team has developed different ways of getting the outside community involved. The Plazi community on GitHub encourages the scientific community and other contributors to post GGI-related (Golden Gate Imagine document editor) questions, requirements, ideas, and/or suggestions, including bug reports and feature requests. One can contact us via this GitHub community by creating either an Issue (to report problems on our data or related systems) or a Discussion (to post questions, ideas, or suggestions). We use Github's built-in label system to actively curate the content posted in this repository in order to facilitate further interaction, including filtering and searching before creating new entries. In the plazi/community repository, there is a Q&A (question & answer) section with selected questions and answers that might help solving the encountered problems. Aiming at increasing external participation in the task of liberating taxonomic data, we are developing training courses with independent learning modules that can be combined in different ways to target different audiences (e.g., undergraduates, researchers, developers) in various formats. This material will include text, print-screens, slides, screencasts, and, eventually to a minor extent, online teaching. Each topic within a module will have one or more ‘inline tests', which will be HTML form-based with hard-coded answers to directly assess progress regarding the subject being covered in that particular topic. At the end of each module, we will have a capstone (form-based test asking questions about the topics covered in the respective module) which the user can access whenever needed. As examples of our independent learning modules we can cite Modules I, II and III and their respective topics. Module I (Biodiversity Taxonomy Basis) includes introductory topics (e.g., Topic I — Why do we classify living things; Topic II — Linnaean binomial; Topic III — How is taxonomic information displayed in the literature) aimed at those who don't have a biology/taxonomy background. Module II (The Plazi way) topics (Topic I — Plazi mission; Topic II — Taxomic treatments; Topic III — FAIR taxonomic treatments) are designed in a way that course takers can learn about Plazi processes. Module III (The Golden Gate Imagine) includes topics (Topic I — Introduction to GGI; Topic II — Other User Interface-based alternatives to annotate documents) about the document editor for marking up documents in XML. Other modules include subjects such as individual extractions, material and treatment citations, data quality control, and others. On completion of a module, the user will be awarded a certificate. The combination of these certificates will grant badges that will translate into server permissions that will allow the user to upload new liberated taxonomic treatments and edit treatments already in the system, for instance. Taxonomic treaments are any piece of information about a given taxon concept that involves, includes, or results from an interpretation of the concept of that given taxon. Additionally, Plazi TreatmentBank APIs (Application Programming Interface) are currently being expanded and redesigned and the documentation for these long-waited endpoints will be displayed, for the first time, in this talk.


Sign in / Sign up

Export Citation Format

Share Document