scholarly journals Automated Classification of Unstructured Bilingual Software Bug Reports: An Industrial Case Study Research

2021 ◽  
Vol 12 (1) ◽  
pp. 338
Author(s):  
Ömer Köksal ◽  
Bedir Tekinerdogan

Software bug report classification is a critical process to understand the nature, implications, and causes of software failures. Furthermore, classification enables a fast and appropriate reaction to software bugs. However, for large-scale projects, one must deal with a broad set of bugs from multiple types. In this context, manually classifying bugs becomes cumbersome and time-consuming. Although several studies have addressed automated bug classification using machine learning techniques, they have mainly focused on academic case studies, open-source software, and unilingual text input. This paper presents our automated bug classification approach applied and validated in an industrial case study. In contrast to earlier studies, our study is applied to a commercial software system based on unstructured bilingual bug reports written in English and Turkish. The presented approach adopts and integrates machine learning (ML), text mining, and natural language processing (NLP) techniques to support the classification of software bugs. The approach has been applied within an industrial case study. Compared to manual classification, our results show that bug classification can be automated and even performs better than manual bug classification. Our study shows that the presented approach and the corresponding tools effectively reduce the manual classification time and effort.

2021 ◽  
Vol 17 (12) ◽  
pp. e1009613
Author(s):  
Kaitlin E. Frasier

Machine learning algorithms, including recent advances in deep learning, are promising for tools for detection and classification of broadband high frequency signals in passive acoustic recordings. However, these methods are generally data-hungry and progress has been limited by challenges related to the lack of labeled datasets adequate for training and testing. Large quantities of known and as yet unidentified broadband signal types mingle in marine recordings, with variability introduced by acoustic propagation, source depths and orientations, and interacting signals. Manual classification of these datasets is unmanageable without an in-depth knowledge of the acoustic context of each recording location. A signal classification pipeline is presented which combines unsupervised and supervised learning phases with opportunities for expert oversight to label signals of interest. The method is illustrated with a case study using unsupervised clustering to identify five toothed whale echolocation click types and two anthropogenic signal categories. These categories are used to train a deep network to classify detected signals in either averaged time bins or as individual detections, in two independent datasets. Bin-level classification achieved higher overall precision (>99%) than click-level classification. However, click-level classification had the advantage of providing a label for every signal, and achieved higher overall recall, with overall precision from 92 to 94%. The results suggest that unsupervised learning is a viable solution for efficiently generating the large, representative training sets needed for applications of deep learning in passive acoustics.


Author(s):  
Bancha Luaphol ◽  
Jantima Polpinij ◽  
Manasawee Kaenampornpan

Most studies relating to bug reports aims to automatically identify necessary information from bug reports for software bug fixing. Unfortunately, the study of bug reports focuses only on one issue, but more complete and comprehensive software bug fixing would be facilitated by assessing multiple issues concurrently. This becomes a challenge in this study, where it aims to present a method of identifying bug reports at severe level from a bug report repository, together with assembling their related bug reports to visualize the overall picture of a software problem domain. The proposed method is called “mining bug report repositories”. Two techniques of text mining are applied as the main mechanisms in this method. First, classification is applied for identifying severe bug reports, called “bug severity classification”, while “threshold-based similarity analysis” is then applied to assemble bug reports that are related to a bug report at severe level. Our datasets are from three opensource namely SeaMonkey, Firefox, and Core:Layout downloaded from the Bugzilla. Finally, the best models from the proposed method are selected and compared with two baseline methods. For identifying severe bug reports using classification technique, the results show that our method improved accuracy, F1, and AUC scores over the baseline by 11.39, 11.63, and 19% respectively. Meanwhile, for assembling related bug reports using threshold-based similarity technique, the results show that our method improved precision, and likelihood scores over the other baseline by 15.76, and 9.14% respectively. This demonstrate that our proposed method may help increasing chance to fix bugs completely.


2020 ◽  
Vol 2 (4) ◽  
Author(s):  
Suzanna Schmeelk

This research examines industry-based dissertation research in a doctoral computing program through the lens of machine learning algorithms to understand topics explored by senior and experienced full-time working professionals (EFWPs).  Our research categorizes dissertation by both their abstracts and by their full-text using the Graplab Create library from Apple’s Turi. We also compare the dissertation categorizations using IBM’s Watson Discovery deep machine learning tool.  Our research provides perspectives on the practicality of the manual classification of technical documents; and, it provides insights into the: (1) categories of academic work created by EFWPs in a Computing doctoral program, (2) viability of automated categorization versus human abstraction, and (3) differences in categorization algorithms.


Author(s):  
Liguo Yu

Android is an operating system for mobile devices. Its development is led by Google and some other companies. Because of the open-source property of Android, anyone can report a bug through its online bug tracking system. In this paper, we analyze the bug reports of Android operating systems. Specifically, through this study, we would like to answer the following questions regarding Android development and its project management: (1) Could Android bug reports be handled on time? (2) What is the distribution of different maintenance activities initiated by Android bug reports? (3) How long does it take to handle an Android bug report? (4) Are the number of followers and the number of following messages of an Android bug report related to the effort spent on handling this bug report? Through answering these questions, this paper presents a comprehensive study of Android bug reporting and handling process. The information and knowledge obtained through this case study could help us better understand open-source software project, such as its development process and project management.


2012 ◽  
Vol 4 (2) ◽  
pp. 32-59 ◽  
Author(s):  
K. K. Chaturvedi ◽  
V.B. Singh

Bug severity is the degree of impact that a defect has on the development or operation of a component or system, and can be classified into different levels based on their impact on the system. Identification of severity level can be useful for bug triager in allocating the bug to the concerned bug fixer. Various researchers have attempted text mining techniques in predicting the severity of bugs, detection of duplicate bug reports and assignment of bugs to suitable fixer for its fix. In this paper, an attempt has been made to compare the performance of different machine learning techniques namely Support vector machine (SVM), probability based Naïve Bayes (NB), Decision Tree based J48 (A Java implementation of C4.5), rule based Repeated Incremental Pruning to Produce Error Reduction (RIPPER) and Random Forests (RF) learners in predicting the severity level (1 to 5) of a reported bug by analyzing the summary or short description of the bug reports. The bug report data has been taken from NASA’s PITS (Projects and Issue Tracking System) datasets as closed source and components of Eclipse, Mozilla & GNOME datasets as open source projects. The analysis has been carried out in RapidMiner and STATISTICA data mining tools. The authors measured the performance of different machine learning techniques by considering (i) the value of accuracy and F-Measure for all severity level and (ii) number of best cases at different threshold level of accuracy and F-Measure.


PLoS ONE ◽  
2020 ◽  
Vol 15 (4) ◽  
pp. e0232391 ◽  
Author(s):  
Gurjit S. Randhawa ◽  
Maximillian P. M. Soltysiak ◽  
Hadi El Roz ◽  
Camila P. E. de Souza ◽  
Kathleen A. Hill ◽  
...  

2019 ◽  
Vol 1 (1) ◽  
Author(s):  
Ha Manh Tran ◽  
Son Thanh Le ◽  
Sinh Van Nguyen ◽  
Phong Thanh Ho

Sign in / Sign up

Export Citation Format

Share Document