Automated Classification of Unstructured Bilingual Software Bug Reports: An Industrial Case Study Research

Ömer Köksal; Bedir Tekinerdogan

doi:10.3390/app12010338

Automated Classification of Unstructured Bilingual Software Bug Reports: An Industrial Case Study Research

Applied Sciences ◽

10.3390/app12010338 ◽

2021 ◽

Vol 12 (1) ◽

pp. 338

Author(s):

Ömer Köksal ◽

Bedir Tekinerdogan

Keyword(s):

Machine Learning ◽

Industrial Case Study ◽

Software Bugs ◽

Text Input ◽

Bug Reports ◽

Bug Report ◽

Software Bug ◽

Manual Classification

Software bug report classification is a critical process to understand the nature, implications, and causes of software failures. Furthermore, classification enables a fast and appropriate reaction to software bugs. However, for large-scale projects, one must deal with a broad set of bugs from multiple types. In this context, manually classifying bugs becomes cumbersome and time-consuming. Although several studies have addressed automated bug classification using machine learning techniques, they have mainly focused on academic case studies, open-source software, and unilingual text input. This paper presents our automated bug classification approach applied and validated in an industrial case study. In contrast to earlier studies, our study is applied to a commercial software system based on unstructured bilingual bug reports written in English and Turkish. The presented approach adopts and integrates machine learning (ML), text mining, and natural language processing (NLP) techniques to support the classification of software bugs. The approach has been applied within an industrial case study. Compared to manual classification, our results show that bug classification can be automated and even performs better than manual bug classification. Our study shows that the presented approach and the corresponding tools effectively reduce the manual classification time and effort.

Download Full-text

A machine learning pipeline for classification of cetacean echolocation clicks in large underwater acoustic datasets

PLoS Computational Biology ◽

10.1371/journal.pcbi.1009613 ◽

2021 ◽

Vol 17 (12) ◽

pp. e1009613

Author(s):

Kaitlin E. Frasier

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Machine Learning Algorithms ◽

Viable Solution ◽

Broadband Signal ◽

Toothed Whale ◽

Passive Acoustic ◽

Manual Classification

Machine learning algorithms, including recent advances in deep learning, are promising for tools for detection and classification of broadband high frequency signals in passive acoustic recordings. However, these methods are generally data-hungry and progress has been limited by challenges related to the lack of labeled datasets adequate for training and testing. Large quantities of known and as yet unidentified broadband signal types mingle in marine recordings, with variability introduced by acoustic propagation, source depths and orientations, and interacting signals. Manual classification of these datasets is unmanageable without an in-depth knowledge of the acoustic context of each recording location. A signal classification pipeline is presented which combines unsupervised and supervised learning phases with opportunities for expert oversight to label signals of interest. The method is illustrated with a case study using unsupervised clustering to identify five toothed whale echolocation click types and two anthropogenic signal categories. These categories are used to train a deep network to classify detected signals in either averaged time bins or as individual detections, in two independent datasets. Bin-level classification achieved higher overall precision (>99%) than click-level classification. However, click-level classification had the advantage of providing a label for every signal, and achieved higher overall recall, with overall precision from 92 to 94%. The results suggest that unsupervised learning is a viable solution for efficiently generating the large, representative training sets needed for applications of deep learning in passive acoustics.

Download Full-text

Assisting Bug Report Assignment Using Automated Fault Localisation: An Industrial Case Study

2021 14th IEEE Conference on Software Testing, Verification and Validation (ICST) ◽

10.1109/icst49551.2021.00041 ◽

2021 ◽

Author(s):

Jeongju Sohn ◽

Gabin An ◽

Jingun Hong ◽

Dongwon Hwang ◽

Shin Yoo

Keyword(s):

Industrial Case Study ◽

Bug Report

Download Full-text

Using methods from the data-mining and machine-learning literature for disease classification and prediction: a case study examining classification of heart failure subtypes

Journal of Clinical Epidemiology ◽

10.1016/j.jclinepi.2012.11.008 ◽

2013 ◽

Vol 66 (4) ◽

pp. 398-407 ◽

Cited By ~ 120

Author(s):

Peter C. Austin ◽

Jack V. Tu ◽

Jennifer E. Ho ◽

Daniel Levy ◽

Douglas S. Lee

Keyword(s):

Machine Learning ◽

Heart Failure ◽

Data Mining ◽

Disease Classification

Download Full-text

Mining Bug Report Repositories to Identify Significant Information for Software Bug Fixing

Applied Science and Engineering Progress ◽

10.14416/j.asep.2021.03.005 ◽

2021 ◽

Author(s):

Bancha Luaphol ◽

Jantima Polpinij ◽

Manasawee Kaenampornpan

Keyword(s):

The Other ◽

Problem Domain ◽

Significant Information ◽

Bug Reports ◽

Bug Fixing ◽

Classification Technique ◽

Bug Report ◽

Multiple Issues ◽

Improved Accuracy ◽

Software Bug

Most studies relating to bug reports aims to automatically identify necessary information from bug reports for software bug fixing. Unfortunately, the study of bug reports focuses only on one issue, but more complete and comprehensive software bug fixing would be facilitated by assessing multiple issues concurrently. This becomes a challenge in this study, where it aims to present a method of identifying bug reports at severe level from a bug report repository, together with assembling their related bug reports to visualize the overall picture of a software problem domain. The proposed method is called “mining bug report repositories”. Two techniques of text mining are applied as the main mechanisms in this method. First, classification is applied for identifying severe bug reports, called “bug severity classification”, while “threshold-based similarity analysis” is then applied to assemble bug reports that are related to a bug report at severe level. Our datasets are from three opensource namely SeaMonkey, Firefox, and Core:Layout downloaded from the Bugzilla. Finally, the best models from the proposed method are selected and compared with two baseline methods. For identifying severe bug reports using classification technique, the results show that our method improved accuracy, F1, and AUC scores over the baseline by 11.39, 11.63, and 19% respectively. Meanwhile, for assembling related bug reports using threshold-based similarity technique, the results show that our method improved precision, and likelihood scores over the other baseline by 15.76, and 9.14% respectively. This demonstrate that our proposed method may help increasing chance to fix bugs completely.

Download Full-text

Applying machine learning techniques to implement the technical requirements of energy management systems in accordance with ISO 50001:2018, an industrial case study

Energy Sources Part A Recovery Utilization and Environmental Effects ◽

10.1080/15567036.2021.2011989 ◽

2021 ◽

pp. 1-18

Author(s):

Meisam Moghadasi ◽

Nima Izadyar ◽

Amirali Moghadasi ◽

Hossein Ghadamian

Keyword(s):

Machine Learning ◽

Energy Management ◽

Machine Learning Techniques ◽

Management Systems ◽

Industrial Case Study ◽

Energy Management Systems ◽

Technical Requirements ◽

Learning Techniques

Download Full-text

Researching the Research: Applying Machine Learning Techniques to Dissertation Classification

Journal of Computer Science Research ◽

10.30564/jcsr.v2i4.2230 ◽

2020 ◽

Vol 2 (4) ◽

Author(s):

Suzanna Schmeelk

Keyword(s):

Machine Learning ◽

Full Text ◽

Doctoral Program ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Full Time ◽

Learning Techniques ◽

Manual Classification ◽

Machine Learning Tool

This research examines industry-based dissertation research in a doctoral computing program through the lens of machine learning algorithms to understand topics explored by senior and experienced full-time working professionals (EFWPs). Our research categorizes dissertation by both their abstracts and by their full-text using the Graplab Create library from Apple’s Turi. We also compare the dissertation categorizations using IBM’s Watson Discovery deep machine learning tool. Our research provides perspectives on the practicality of the manual classification of technical documents; and, it provides insights into the: (1) categories of academic work created by EFWPs in a Computing doctoral program, (2) viability of automated categorization versus human abstraction, and (3) differences in categorization algorithms.

Download Full-text

From Android Bug Reports to Android Bug Handling Process

International Journal of Open Source Software and Processes ◽

10.4018/ijossp.2016100101 ◽

2016 ◽

Vol 7 (4) ◽

pp. 1-18

Author(s):

Liguo Yu

Keyword(s):

Project Management ◽

Open Source ◽

Tracking System ◽

Software Project ◽

Bug Reports ◽

Bug Report ◽

Bug Tracking ◽

Android Development ◽

Comprehensive Study

Android is an operating system for mobile devices. Its development is led by Google and some other companies. Because of the open-source property of Android, anyone can report a bug through its online bug tracking system. In this paper, we analyze the bug reports of Android operating systems. Specifically, through this study, we would like to answer the following questions regarding Android development and its project management: (1) Could Android bug reports be handled on time? (2) What is the distribution of different maintenance activities initiated by Android bug reports? (3) How long does it take to handle an Android bug report? (4) Are the number of followers and the number of following messages of an Android bug report related to the effort spent on handling this bug report? Through answering these questions, this paper presents a comprehensive study of Android bug reporting and handling process. The information and knowledge obtained through this case study could help us better understand open-source software project, such as its development process and project management.

Download Full-text

An Empirical Comparison of Machine Learning Techniques in Predicting the Bug Severity of Open and Closed Source Projects

International Journal of Open Source Software and Processes ◽

10.4018/jossp.2012040103 ◽

2012 ◽

Vol 4 (2) ◽

pp. 32-59 ◽

Cited By ~ 17

Author(s):

K. K. Chaturvedi ◽

V.B. Singh

Keyword(s):

Machine Learning ◽

Tracking System ◽

Machine Learning Techniques ◽

Support Vector ◽

Severity Level ◽

Bug Reports ◽

Learning Techniques ◽

Bug Report ◽

Closed Source ◽

F Measure

Bug severity is the degree of impact that a defect has on the development or operation of a component or system, and can be classified into different levels based on their impact on the system. Identification of severity level can be useful for bug triager in allocating the bug to the concerned bug fixer. Various researchers have attempted text mining techniques in predicting the severity of bugs, detection of duplicate bug reports and assignment of bugs to suitable fixer for its fix. In this paper, an attempt has been made to compare the performance of different machine learning techniques namely Support vector machine (SVM), probability based Naïve Bayes (NB), Decision Tree based J48 (A Java implementation of C4.5), rule based Repeated Incremental Pruning to Produce Error Reduction (RIPPER) and Random Forests (RF) learners in predicting the severity level (1 to 5) of a reported bug by analyzing the summary or short description of the bug reports. The bug report data has been taken from NASA’s PITS (Projects and Issue Tracking System) datasets as closed source and components of Eclipse, Mozilla & GNOME datasets as open source projects. The analysis has been carried out in RapidMiner and STATISTICA data mining tools. The authors measured the performance of different machine learning techniques by considering (i) the value of accuracy and F-Measure for all severity level and (ii) number of best cases at different threshold level of accuracy and F-Measure.

Download Full-text

Machine learning using intrinsic genomic signatures for rapid classification of novel pathogens: COVID-19 case study

PLoS ONE ◽

10.1371/journal.pone.0232391 ◽

2020 ◽

Vol 15 (4) ◽

pp. e0232391 ◽

Cited By ~ 102

Author(s):

Gurjit S. Randhawa ◽

Maximillian P. M. Soltysiak ◽

Hadi El Roz ◽

Camila P. E. de Souza ◽

Kathleen A. Hill ◽

...

Keyword(s):

Machine Learning ◽

Genomic Signatures

Download Full-text

An Analysis of Software Bug Reports Using Machine Learning Techniques

SN Computer Science ◽

10.1007/s42979-019-0004-1 ◽

2019 ◽

Vol 1 (1) ◽

Cited By ~ 2

Author(s):

Ha Manh Tran ◽

Son Thanh Le ◽

Sinh Van Nguyen ◽

Phong Thanh Ho

Keyword(s):

Machine Learning ◽

Machine Learning Techniques ◽

Bug Reports ◽

Learning Techniques ◽

Software Bug

Download Full-text