Predicting Bug Priority Using Topic Modelling in Imbalanced Learning Environments
Manual classification of bug reports is time-consuming as the reports are received in large quantities. Alternatively, this project proposed automatic bug prediction models to classify the bug reports. The topics or the candidate keywords are mined from the developer description in bug reports using RAKE algorithm and converted into attributes. These attributes together with the target attribute—priority level—construct the training datasets. Naïve Bayes, logistic regression, and decision tree learner algorithms are trained, and the prediction quality was measured using area under recursive operative characteristics curves (AUC) as AUC does not consider the biasness in datasets. The logistics regression model outperforms the other two models providing the accuracy of 0.86 AUC whereas the naïve Bayes and the decision tree learner recorded 0.79 AUC and 0.81 AUC, respectively. The bugs can be classified without developer involvement and logistic regression is also a potential candidate as naïve Bayes for bug classification.