A Unified Framework for Bug Report Assignment

Author(s):  
Yuan Zhao ◽  
Tieke He ◽  
Zhenyu Chen

It is typically a manual, time-consuming, and tedious task of assigning bug reports to individual developers. Although some machine learning techniques are adopted to alleviate this dilemma, they are mainly focused on the open source projects, which use traditional repositories such as Bugzilla to manage their bug reports. With the boom of the mobile Internet, some new requirements and methods of software testing are emerging, especially the crowdsourced testing. Unlike the traditional channels, whose bug reports are often heavyweight, which means their bug reports are standardized with detailed attribute localization, bug reports tend to be lightweight in the context of crowdsourced testing. To exploit the differences of the bug reports assignment in the new settings, a unified bug reports assignment framework is proposed in this paper. This framework is capable of handling both the traditional heavyweight bug reports and the lightweight ones by (i) first preprocessing the bug reports and feature selections, (ii) then tuning the parameters that indicate the ratios of choosing different methods to vectorize bug reports, (iii) and finally applying classification algorithms to assign bug reports. Extensive experiments are conducted on three datasets to evaluate the proposed framework. The results indicate the applicability of the proposed framework, and also reveal the differences of bug report assignment between traditional repositories and crowdsourced ones.

2012 ◽  
Vol 4 (2) ◽  
pp. 32-59 ◽  
Author(s):  
K. K. Chaturvedi ◽  
V.B. Singh

Bug severity is the degree of impact that a defect has on the development or operation of a component or system, and can be classified into different levels based on their impact on the system. Identification of severity level can be useful for bug triager in allocating the bug to the concerned bug fixer. Various researchers have attempted text mining techniques in predicting the severity of bugs, detection of duplicate bug reports and assignment of bugs to suitable fixer for its fix. In this paper, an attempt has been made to compare the performance of different machine learning techniques namely Support vector machine (SVM), probability based Naïve Bayes (NB), Decision Tree based J48 (A Java implementation of C4.5), rule based Repeated Incremental Pruning to Produce Error Reduction (RIPPER) and Random Forests (RF) learners in predicting the severity level (1 to 5) of a reported bug by analyzing the summary or short description of the bug reports. The bug report data has been taken from NASA’s PITS (Projects and Issue Tracking System) datasets as closed source and components of Eclipse, Mozilla & GNOME datasets as open source projects. The analysis has been carried out in RapidMiner and STATISTICA data mining tools. The authors measured the performance of different machine learning techniques by considering (i) the value of accuracy and F-Measure for all severity level and (ii) number of best cases at different threshold level of accuracy and F-Measure.


Author(s):  
Sundos Abdulameer Alazawi ◽  
Mohammed Najim Al-Salam

<span>For assessment of system dependability, fault injection techniques are used to expedite the presence of an error or failure in the system, which helps evaluate fault tolerance and system failure prediction. Defects classification and prediction is the principal significant advance in the trustworthiness evaluation of complex software systems such as open-source software since it can quickly be affected by the reliability of those systems, improves performance, and lessening the product cost.   In this context, a new prototype of the fault injection model is presented, FIBR-OSS (Fault Injection for Bug Reports in Open-Source Software). FIBR-OSS can support developers to evaluate the system performance during phase's development for its dependability attributes such as reliability and system dependability means such as fault prediction or forecasting. FIBR-OSS is used for fault speed-up to test the system's failure prediction performance. Some machine learning techniques are implemented on bug reports produced existing by the bug tracking system as datasets for failure prediction techniques, some of those machine learning techniques are used in our approach.</span>


2021 ◽  
Vol 1804 (1) ◽  
pp. 012133
Author(s):  
Mahmood Shakir Hammoodi ◽  
Hasanain Ali Al Essa ◽  
Wial Abbas Hanon

Software engineering is an important area that deals with development and maintenance of software. After developing a software, it is always important to track its performance. One has to always see whether the software functions according to customer requirements. To ensure this, faulty and non- faulty modules must be identified. For this purpose, one can make use of a model for binary class classification of faults. Different technique's outputs differ in one or the other way with respect to the following: fault dataset used, complexity, classification algorithm implemented, etc. Various machine learning techniques can be used for this purpose. But this paper deals with the best classification algorithms available till date and they are decision tree, random forest, naive bayes and logistic regression (tree-based techniques and bayesian based techniques). The motive behind developing such a project is to identify the faulty modules within a software before the actual software testing takes place. As a result, the time consumed by testers or the workload of the testers can be reduced to an extent. This work is very well useful to those working in software industry and also to those people carrying out research in software engineering where the lifecycle of development of a software is discussed.


2021 ◽  
Author(s):  
◽  
Cao Truong Tran

<p>Classification is a major task in machine learning and data mining. Many real-world datasets suffer from the unavoidable issue of missing values. Classification with incomplete data has to be carefully handled because inadequate treatment of missing values will cause large classification errors.    Existing most researchers working on classification with incomplete data focused on improving the effectiveness, but did not adequately address the issue of the efficiency of applying the classifiers to classify unseen instances, which is much more important than the act of creating classifiers. A common approach to classification with incomplete data is to use imputation methods to replace missing values with plausible values before building classifiers and classifying unseen instances. This approach provides complete data which can be then used by any classification algorithm, but sophisticated imputation methods are usually computationally intensive, especially for the application process of classification. Another approach to classification with incomplete data is to build a classifier that can directly work with missing values. This approach does not require time for estimating missing values, but it often generates inaccurate and complex classifiers when faced with numerous missing values. A recent approach to classification with incomplete data which also avoids estimating missing values is to build a set of classifiers which then is used to select applicable classifiers for classifying unseen instances. However, this approach is also often inaccurate and takes a long time to find applicable classifiers when faced with numerous missing values.   The overall goal of the thesis is to simultaneously improve the effectiveness and efficiency of classification with incomplete data by using evolutionary machine learning techniques for feature selection, clustering, ensemble learning, feature construction and constructing classifiers.   The thesis develops approaches for improving imputation for classification with incomplete data by integrating clustering and feature selection with imputation. The approaches improve both the effectiveness and the efficiency of using imputation for classification with incomplete data.   The thesis develops wrapper-based feature selection methods to improve input space for classification algorithms that are able to work directly with incomplete data. The methods not only improve the classification accuracy, but also reduce the complexity of classifiers able to work directly with incomplete data.   The thesis develops a feature construction method to improve input space for classification algorithms with incomplete data by proposing interval genetic programming-genetic programming with a set of interval functions. The method improves the classification accuracy and reduces the complexity of classifiers.   The thesis develops an ensemble approach to classification with incomplete data by integrating imputation, feature selection, and ensemble learning. The results show that the approach is more accurate, and faster than previous common methods for classification with incomplete data.   The thesis develops interval genetic programming to directly evolve classifiers for incomplete data. The results show that classifiers generated by interval genetic programming can be more effective and efficient than classifiers generated the combination of imputation and traditional genetic programming. Interval genetic programming is also more effective than common classification algorithms able to work directly with incomplete data.    In summary, the thesis develops a range of approaches for simultaneously improving the effectiveness and efficiency of classification with incomplete data by using a range of evolutionary machine learning techniques.</p>


Metagenomics ◽  
2017 ◽  
Vol 1 (1) ◽  
Author(s):  
Hayssam Soueidan ◽  
Macha Nikolski

AbstractOwing to the complexity and variability of metagenomic studies, modern machine learning approaches have seen increased usage to answer a variety of question encompassing the full range of metagenomic NGS data analysis.We review here the contribution of machine learning techniques for the field of metagenomics, by presenting known successful approaches in a unified framework. This review focuses on five important metagenomic problems:OTU-clustering, binning, taxonomic proffiing and assignment, comparative metagenomics and gene prediction. For each of these problems, we identify the most prominent methods, summarize the machine learning approaches used and put them into perspective of similar methods.We conclude our review looking further ahead at the challenge posed by the analysis of interactions within microbial communities and different environments, in a field one could call “integrative metagenomics”.


2019 ◽  
Vol 1 (1) ◽  
Author(s):  
Ha Manh Tran ◽  
Son Thanh Le ◽  
Sinh Van Nguyen ◽  
Phong Thanh Ho

Software maintainability is a vital quality aspect as per ISO standards. This has been a concern since decades and even today, it is of top priority. At present, majority of the software applications, particularly open source software are being developed using Object-Oriented methodologies. Researchers in the earlier past have used statistical techniques on metric data extracted from software to evaluate maintainability. Recently, machine learning models and algorithms are also being used in a majority of research works to predict maintainability. In this research, we performed an empirical case study on an open source software jfreechart by applying machine learning algorithms. The objective was to study the relationships between certain metrics and maintainability.


2020 ◽  
Vol 8 (5) ◽  
pp. 4624-4627

In recent years, a lot of data has been generated about students, which can be utilized for deciding the career path of the student. This paper discusses some of the machine learning techniques which can be used to predict the performance of a student and help to decide his/her career path. Some of the key Machine Learning (ML) algorithms applied in our research work are Linear Regression, Logistics Regression, Support Vector machine, Naïve Bayes Classifier and K- means Clustering. The aim of this paper is to predict the student career path using Machine Learning algorithms. We compare the efficiencies of different ML classification algorithms on a real dataset obtained from University students.


Advancement in medical science has always been one of the most vital aspects of the human race. With the progress in technology, the use of modern techniques and equipment is always imposed on treatment purposes. Nowadays, machine learning techniques have widely been used in medical science for assuring accuracy. In this work, we have constructed computational model building techniques for liver disease prediction accurately. We used some efficient classification algorithms: Random Forest, Perceptron, Decision Tree, K-Nearest Neighbors (KNN), and Support Vector Machine (SVM) for predicting liver diseases. Our works provide the implementation of hybrid model construction and comparative analysis for improving prediction performance. At first, classification algorithms are applied to the original liver patient datasets collected from the UCI repository. Then we analyzed features and tweaked to improve the performance of our predictor and made a comparative analysis among the classifiers. We examined that, KNN algorithm outperformed all other techniques with feature selection.


Sign in / Sign up

Export Citation Format

Share Document