Mining Stack Overflow: a Recommender Systems-Based Model

10.20944/preprints202008.0265.v1 ◽

2020 ◽

Author(s):

Fouzi Harrag ◽

Mokdad Khamliche

Keyword(s):

Deep Learning ◽

Domain Knowledge ◽

Learning To Rank ◽

Automated System ◽

Abnormal Behavior ◽

Large Domain ◽

Stack Overflow ◽

Bug Reports ◽

Bug Report ◽

Software Bug

In software development, developers received bug reports that describe the software bug. Developers find the cause of bug through reviewing the code and reproducing the abnormal behavior that can be considered as tedious and time-consuming processes. The developers need an automated system that incorporates large domain knowledge and recommends a solution for those bugs to ease on developers rather than spending more manual efforts to fixing the bugs or waiting on Q&A websites for other users to reply to them. Stack Overflow is a popular question-answer site that is focusing on programming issues, thus we can benefit knowledge available in this rich platform. This paper, presents a survey covering the methods in the field of mining software repositories. We propose an architecture to build a recommender System using the learning to rank approach. Deep learning is used to construct a model that solve the problem of learning to rank using stack overflow data. Text mining techniques were invested to extract, evaluate and recommend the answers that have the best relevance with the solution of this bug report.

Download Full-text

Mining Bug Report Repositories to Identify Significant Information for Software Bug Fixing

Applied Science and Engineering Progress ◽

10.14416/j.asep.2021.03.005 ◽

2021 ◽

Author(s):

Bancha Luaphol ◽

Jantima Polpinij ◽

Manasawee Kaenampornpan

Keyword(s):

The Other ◽

Problem Domain ◽

Significant Information ◽

Bug Reports ◽

Bug Fixing ◽

Classification Technique ◽

Bug Report ◽

Multiple Issues ◽

Improved Accuracy ◽

Software Bug

Most studies relating to bug reports aims to automatically identify necessary information from bug reports for software bug fixing. Unfortunately, the study of bug reports focuses only on one issue, but more complete and comprehensive software bug fixing would be facilitated by assessing multiple issues concurrently. This becomes a challenge in this study, where it aims to present a method of identifying bug reports at severe level from a bug report repository, together with assembling their related bug reports to visualize the overall picture of a software problem domain. The proposed method is called “mining bug report repositories”. Two techniques of text mining are applied as the main mechanisms in this method. First, classification is applied for identifying severe bug reports, called “bug severity classification”, while “threshold-based similarity analysis” is then applied to assemble bug reports that are related to a bug report at severe level. Our datasets are from three opensource namely SeaMonkey, Firefox, and Core:Layout downloaded from the Bugzilla. Finally, the best models from the proposed method are selected and compared with two baseline methods. For identifying severe bug reports using classification technique, the results show that our method improved accuracy, F1, and AUC scores over the baseline by 11.39, 11.63, and 19% respectively. Meanwhile, for assembling related bug reports using threshold-based similarity technique, the results show that our method improved precision, and likelihood scores over the other baseline by 15.76, and 9.14% respectively. This demonstrate that our proposed method may help increasing chance to fix bugs completely.

Download Full-text

Learning to rank relevant files for bug reports using domain knowledge

Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering - FSE 2014 ◽

10.1145/2635868.2635874 ◽

2014 ◽

Cited By ~ 92

Author(s):

Xin Ye ◽

Razvan Bunescu ◽

Chang Liu

Keyword(s):

Domain Knowledge ◽

Learning To Rank ◽

Bug Reports

Download Full-text

Bug Reports and Deep Learning Models

International Journal of Computer Science and Mobile Computing ◽

10.47760/ijcsmc.2021.v10i12.003 ◽

2021 ◽

Vol 10 (12) ◽

pp. 21-26

Author(s):

Som Gupta ◽

Sanjai Kumar Gupta

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Research Area ◽

Learning Approaches ◽

Bug Localization ◽

Future Directions ◽

Bug Reports ◽

Bug Report ◽

The Future

Deep Learning is one of the emerging and trending research area of machine learning in various domains. The paper describes the deep learning approaches applied to the domain of Bug Reports. The paper classifies the tasks being performed for mining of Bug Reports into Bug Report Classification, Bug Localization, Bug Report Summarization and Duplicate Bug Report Detection. The paper systematically discusses about the deep learning approaches being used for the mentioned tasks, and the future directions in this field of research.

Download Full-text

UNDERSTANDING THE DEVELOPER PARTICIPATION IN BUG FIX PROCESS

INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY ◽

10.24297/ijct.v12i8.3000 ◽

2014 ◽

Vol 12 (8) ◽

pp. 3823-3828

Author(s):

Madhu Kumari ◽

Meera Sharma ◽

Nikita Yadav

Keyword(s):

Linear Relationship ◽

Open Source ◽

Significant Variation ◽

Open Source Project ◽

Bug Reports ◽

Bug Fixing ◽

Bug Report ◽

Severity Class ◽

Software Bug

Prediction of the bug fix time in open source softwares is a challenging job. A software bug consists of many attributes that define the characteristics of the bug. Some of the attributes get filled at the time of reporting and some areÂ at the time of bug fixing. In this paper, 836 bug reports of two products namely Thunderbird and Webtools of Mozilla open source project have been considered. InÂ bug report, we see that there is no linear relationship among the bug attributes namely bug fix time, developers, cc count and severity. This paper has analyzed the interdependence among these attributes through graphical representation.The results conclude that :Case 1. 73% of bugs reported for Webtools are fixed by 17% developers and 61% of bugs are fixed by 14% developers for Thundebird.Case 2. We tried to find a relationship between the time taken by a developer in fixing a bug and the corresponding developer. We also observed that there is a significant variation in bug fixing process, bugs may take 1 day to 4 years in fixing.Case 3. There is no linear relationship between cc count i.e. manpower involved in bug fixing process and bug fix time.Case 4. Maximum number of developers are involved in fixing bugs for major severity class.

Download Full-text

Severity Prediction for Bug Reports Using Multi-Aspect Features: A Deep Learning Approach

Mathematics ◽

10.3390/math9141644 ◽

2021 ◽

Vol 9 (14) ◽

pp. 1644

Author(s):

Anh-Hien Dao ◽

Cheng-Zen Yang

Keyword(s):

Deep Learning ◽

Matthews Correlation Coefficient ◽

State Of The Art ◽

Textual Information ◽

Learning Framework ◽

Bug Reports ◽

Average Accuracy ◽

Severity Prediction ◽

Quality Aspect ◽

Software Bug

The severity of software bug reports plays an important role in maintaining software quality. Many approaches have been proposed to predict the severity of bug reports using textual information. In this research, we propose a deep learning framework called MASP that uses convolutional neural networks (CNN) and the content-aspect, sentiment-aspect, quality-aspect, and reporter-aspect features of bug reports to improve prediction performance. We have performed experiments on datasets collected from Eclipse and Mozilla. The results show that the MASP model outperforms the state-of-the-art CNN model in terms of average Accuracy, Precision, Recall, F1-measure, and the Matthews Correlation Coefficient (MCC) by 1.83%, 0.46%, 3.23%, 1.72%, and 6.61%, respectively.

Download Full-text

A Novel Deep-Learning-Based Bug Severity Classification Technique Using Convolutional Neural Networks and Random Forest with Boosting

Sensors ◽

10.3390/s19132964 ◽

2019 ◽

Vol 19 (13) ◽

pp. 2964 ◽

Cited By ~ 10

Author(s):

Ashima Kukkar ◽

Rajni Mohana ◽

Anand Nayyar ◽

Jeamin Kim ◽

Byeong-Gwon Kang ◽

...

Keyword(s):

Neural Network ◽

Deep Learning ◽

Random Forest ◽

Convolutional Neural Network ◽

High Speed ◽

Tracking System ◽

Bug Reports ◽

Severity Classification ◽

Bug Report ◽

N Gram

The accurate severity classification of a bug report is an important aspect of bug fixing. The bug reports are submitted into the bug tracking system with high speed, and owing to this, bug repository size has been increasing at an enormous rate. This increased bug repository size introduces biases in the bug triage process. Therefore, it is necessary to classify the severity of a bug report to balance the bug triaging process. Previously, many machine learning models were proposed for automation of bug severity classification. The accuracy of these models is not up to the mark because they do not extract the important feature patterns for learning the classifier. This paper proposes a novel deep learning model for multiclass severity classification called Bug Severity classification to address these challenges by using a Convolutional Neural Network and Random forest with Boosting (BCR). This model directly learns the latent and highly representative features. Initially, the natural language techniques preprocess the bug report text, and then n-gram is used to extract the features. Further, the Convolutional Neural Network extracts the important feature patterns of respective severity classes. Lastly, the random forest with boosting classifies the multiple bug severity classes. The average accuracy of the proposed model is 96.34% on multiclass severity of five open source projects. The average F-measures of the proposed BCR and the existing approach were 96.43% and 84.24%, respectively, on binary class severity classification. The results prove that the proposed BCR approach enhances the performance of bug severity classification over the state-of-the-art techniques.

Download Full-text

Automated Classification of Unstructured Bilingual Software Bug Reports: An Industrial Case Study Research

Applied Sciences ◽

10.3390/app12010338 ◽

2021 ◽

Vol 12 (1) ◽

pp. 338

Author(s):

Ömer Köksal ◽

Bedir Tekinerdogan

Keyword(s):

Machine Learning ◽

Industrial Case Study ◽

Software Bugs ◽

Text Input ◽

Bug Reports ◽

Bug Report ◽

Software Bug ◽

Manual Classification

Software bug report classification is a critical process to understand the nature, implications, and causes of software failures. Furthermore, classification enables a fast and appropriate reaction to software bugs. However, for large-scale projects, one must deal with a broad set of bugs from multiple types. In this context, manually classifying bugs becomes cumbersome and time-consuming. Although several studies have addressed automated bug classification using machine learning techniques, they have mainly focused on academic case studies, open-source software, and unilingual text input. This paper presents our automated bug classification approach applied and validated in an industrial case study. In contrast to earlier studies, our study is applied to a commercial software system based on unstructured bilingual bug reports written in English and Turkish. The presented approach adopts and integrates machine learning (ML), text mining, and natural language processing (NLP) techniques to support the classification of software bugs. The approach has been applied within an industrial case study. Compared to manual classification, our results show that bug classification can be automated and even performs better than manual bug classification. Our study shows that the presented approach and the corresponding tools effectively reduce the manual classification time and effort.

Download Full-text

Assessment of Software Bug Complexity and Severity using Evolutionary SOM Scheme

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.f9257.088619 ◽

2019 ◽

Vol 8 (6) ◽

pp. 3152-3158

Keyword(s):

Software Quality ◽

Software Maintenance ◽

Negative Impact ◽

Distance Functions ◽

Tracking Systems ◽

Software Defects ◽

Bug Reports ◽

Bug Report ◽

Severity Prediction ◽

Software Bug

The software defect prediction and assessment plays a significant role in the software development process. Predicting software defects in the earlier stages will increases the software quality, reliability and efficiency, the cost of detecting and eliminating software defects have been the most expensive task during both development and maintenance process, as software demands increase and delivery of the software span decreased, ensuring software quality becomes a challenge. However, due to inadequate testing, no software can pretend to be free from errors. Bug repositories are used for storing and managing bugs in software projects. A bug in the repositories is recorded as a bug report. When a bug is found by a tester its available information is entered in defect tracking systems. During its resolution process a bug enters into various bug states. These defect tracking systems enable user to give the information about the bugs while running the software. However, the severity prediction has recently gained a lot of attention in software maintenance. Bugs with greater severity should be resolved before bugs with lower severity. In this paper an evolutionary interactive scheme to evaluate bug reports and assess the severity is proposed. This paper presents a Software Bug Complexity Cluster (SBCC) using Self Organizing Maps. In this SBCC a feature matrix is built using bug durations and the complexities of software bugs are categorized into distinct clusters including Blocker, Critical, Major, Trivial and Minor by specifying negative impact of the defect using two different techniques, namely k-means and SOM. Bug duration, proximity error and pre-defined distance functions are used to estimate the accuracy of different bug complexities. Our systematic study found that SOM's proximity error and fitness have greater performance and efficiency than K-Means. The collected results showed better performance for the SBCC with respect to fitness and cluster proximity error.

Download Full-text

Resource and dependency based test case generation for RESTful Web services

Empirical Software Engineering ◽

10.1007/s10664-020-09937-1 ◽

2021 ◽

Vol 26 (4) ◽

Author(s):

Man Zhang ◽

Bogdan Marculescu ◽

Andrea Arcuri

Keyword(s):

Web Services ◽

Domain Knowledge ◽

Automated System ◽

Test Case ◽

Test Cases ◽

Evolutionary Search ◽

Sampling Strategies ◽

Mutation Operators ◽

On Line ◽

Restful Web Services

AbstractNowadays, RESTful web services are widely used for building enterprise applications. REST is not a protocol, but rather it defines a set of guidelines on how to design APIs to access and manipulate resources using HTTP over a network. In this paper, we propose an enhanced search-based method for automated system test generation for RESTful web services, by exploiting domain knowledge on the handling of HTTP resources. The proposed techniques use domain knowledge specific to RESTful web services and a set of effective templates to structure test actions (i.e., ordered sequences of HTTP calls) within an individual in the evolutionary search. The action templates are developed based on the semantics of HTTP methods and are used to manipulate the web services’ resources. In addition, we propose five novel sampling strategies with four sampling methods (i.e., resource-based sampling) for the test cases that can use one or more of these templates. The strategies are further supported with a set of new, specialized mutation operators (i.e., resource-based mutation) in the evolutionary search that take into account the use of these resources in the generated test cases. Moreover, we propose a novel dependency handling to detect possible dependencies among the resources in the tested applications. The resource-based sampling and mutations are then enhanced by exploiting the information of these detected dependencies. To evaluate our approach, we implemented it as an extension to the EvoMaster tool, and conducted an empirical study with two selected baselines on 7 open-source and 12 synthetic RESTful web services. Results show that our novel resource-based approach with dependency handling obtains a significant improvement in performance over the baselines, e.g., up to + 130.7% relative improvement (growing from + 27.9% to + 64.3%) on line coverage.

Download Full-text

Mining Stack Overflow: a Recommender Systems-Based Model