bug reports
Recently Published Documents


TOTAL DOCUMENTS

277
(FIVE YEARS 132)

H-INDEX

27
(FIVE YEARS 7)

2022 ◽  
Vol 31 (1) ◽  
pp. 1-25
Author(s):  
Hui Xu ◽  
Zhuangbin Chen ◽  
Mingshen Sun ◽  
Yangfan Zhou ◽  
Michael R. Lyu

Rust is an emerging programming language that aims at preventing memory-safety bugs without sacrificing much efficiency. The claimed property is very attractive to developers, and many projects start using the language. However, can Rust achieve the memory-safety promise? This article studies the question by surveying 186 real-world bug reports collected from several origins, which contain all existing Rust common vulnerability and exposures (CVEs) of memory-safety issues by 2020-12-31. We manually analyze each bug and extract their culprit patterns. Our analysis result shows that Rust can keep its promise that all memory-safety bugs require unsafe code, and many memory-safety bugs in our dataset are mild soundness issues that only leave a possibility to write memory-safety bugs without unsafe code. Furthermore, we summarize three typical categories of memory-safety bugs, including automatic memory reclaim, unsound function, and unsound generic or trait. While automatic memory claim bugs are related to the side effect of Rust newly-adopted ownership-based resource management scheme, unsound function reveals the essential challenge of Rust development for avoiding unsound code, and unsound generic or trait intensifies the risk of introducing unsoundness. Based on these findings, we propose two promising directions toward improving the security of Rust development, including several best practices of using specific APIs and methods to detect particular bugs involving unsafe code. Our work intends to raise more discussions regarding the memory-safety issues of Rust and facilitate the maturity of the language.


Author(s):  
Som Gupta ◽  
Sanjai Kumar Gupta

Deep Learning is one of the emerging and trending research area of machine learning in various domains. The paper describes the deep learning approaches applied to the domain of Bug Reports. The paper classifies the tasks being performed for mining of Bug Reports into Bug Report Classification, Bug Localization, Bug Report Summarization and Duplicate Bug Report Detection. The paper systematically discusses about the deep learning approaches being used for the mentioned tasks, and the future directions in this field of research.


2021 ◽  
Vol 12 (1) ◽  
pp. 338
Author(s):  
Ömer Köksal ◽  
Bedir Tekinerdogan

Software bug report classification is a critical process to understand the nature, implications, and causes of software failures. Furthermore, classification enables a fast and appropriate reaction to software bugs. However, for large-scale projects, one must deal with a broad set of bugs from multiple types. In this context, manually classifying bugs becomes cumbersome and time-consuming. Although several studies have addressed automated bug classification using machine learning techniques, they have mainly focused on academic case studies, open-source software, and unilingual text input. This paper presents our automated bug classification approach applied and validated in an industrial case study. In contrast to earlier studies, our study is applied to a commercial software system based on unstructured bilingual bug reports written in English and Turkish. The presented approach adopts and integrates machine learning (ML), text mining, and natural language processing (NLP) techniques to support the classification of software bugs. The approach has been applied within an industrial case study. Compared to manual classification, our results show that bug classification can be automated and even performs better than manual bug classification. Our study shows that the presented approach and the corresponding tools effectively reduce the manual classification time and effort.


2021 ◽  
Author(s):  
Aladdin Baarah ◽  
Ahmad Aloqaily ◽  
Hala Zyod ◽  
Nasser Mustafa

2021 ◽  
Author(s):  
Gelareh Meidanipour Lahijany ◽  
Manuel Ohrndorf ◽  
Johannes Zenkert ◽  
Madjid Fathi ◽  
Udo Kelte

Author(s):  
Carolina Sokolowicz ◽  
Marcus Guidoti ◽  
Donat Agosti

Plazi is a non-profit organization focused on the liberation of data from taxonomic publications. As one of Plazi’s goals of promoting the accessibility of taxonomic data, our team has developed different ways of getting the outside community involved. The Plazi community on GitHub encourages the scientific community and other contributors to post GGI-related (Golden Gate Imagine document editor) questions, requirements, ideas, and/or suggestions, including bug reports and feature requests. One can contact us via this GitHub community by creating either an Issue (to report problems on our data or related systems) or a Discussion (to post questions, ideas, or suggestions). We use Github's built-in label system to actively curate the content posted in this repository in order to facilitate further interaction, including filtering and searching before creating new entries. In the plazi/community repository, there is a Q&A (question & answer) section with selected questions and answers that might help solving the encountered problems. Aiming at increasing external participation in the task of liberating taxonomic data, we are developing training courses with independent learning modules that can be combined in different ways to target different audiences (e.g., undergraduates, researchers, developers) in various formats. This material will include text, print-screens, slides, screencasts, and, eventually to a minor extent, online teaching. Each topic within a module will have one or more ‘inline tests', which will be HTML form-based with hard-coded answers to directly assess progress regarding the subject being covered in that particular topic. At the end of each module, we will have a capstone (form-based test asking questions about the topics covered in the respective module) which the user can access whenever needed. As examples of our independent learning modules we can cite Modules I, II and III and their respective topics. Module I (Biodiversity Taxonomy Basis) includes introductory topics (e.g., Topic I — Why do we classify living things; Topic II — Linnaean binomial; Topic III — How is taxonomic information displayed in the literature) aimed at those who don't have a biology/taxonomy background. Module II (The Plazi way) topics (Topic I — Plazi mission; Topic II — Taxomic treatments; Topic III — FAIR taxonomic treatments) are designed in a way that course takers can learn about Plazi processes. Module III (The Golden Gate Imagine) includes topics (Topic I — Introduction to GGI; Topic II — Other User Interface-based alternatives to annotate documents) about the document editor for marking up documents in XML. Other modules include subjects such as individual extractions, material and treatment citations, data quality control, and others. On completion of a module, the user will be awarded a certificate. The combination of these certificates will grant badges that will translate into server permissions that will allow the user to upload new liberated taxonomic treatments and edit treatments already in the system, for instance. Taxonomic treaments are any piece of information about a given taxon concept that involves, includes, or results from an interpretation of the concept of that given taxon. Additionally, Plazi TreatmentBank APIs (Application Programming Interface) are currently being expanded and redesigned and the documentation for these long-waited endpoints will be displayed, for the first time, in this talk.


2021 ◽  
Author(s):  
Thi Mai Anh Bui ◽  
Nhat Hai Nguyen

Precisely locating buggy files for a given bug report is a cumbersome and time-consuming task, particularly in a large-scale project with thousands of source files and bug reports. An efficient bug localization module is desirable to improve the productivity of the software maintenance phase. Many previous approaches rank source files according to their relevance to a given bug report based on simple lexical matching scores. However, the lexical mismatches between natural language expressions used to describe bug reports and technical terms of software source code might reduce the bug localization system’s accuracy. Incorporating domain knowledge through some features such as the semantic similarity, the fixing frequency of a source file, the code change history and similar bug reports is crucial to efficiently locating buggy files. In this paper, we propose a bug localization model, BugLocGA that leverages both lexical and semantic information as well as explores the relation between a bug report and a source file through some domain features. Given a bug report, we calculate the ranking score with every source files through a weighted sum of all features, where the weights are trained through a genetic algorithm with the aim of maximizing the performance of the bug localization model using two evaluation metrics: mean reciprocal rank (MRR) and mean average precision (MAP). The empirical results conducted on some widely-used open source software projects have showed that our model outperformed some state of the art approaches by effectively recommending relevant files where the bug should be fixed.


2021 ◽  
Vol 26 (6) ◽  
Author(s):  
Camila Costa Silva ◽  
Matthias Galster ◽  
Fabian Gilson

AbstractTopic modeling using models such as Latent Dirichlet Allocation (LDA) is a text mining technique to extract human-readable semantic “topics” (i.e., word clusters) from a corpus of textual documents. In software engineering, topic modeling has been used to analyze textual data in empirical studies (e.g., to find out what developers talk about online), but also to build new techniques to support software engineering tasks (e.g., to support source code comprehension). Topic modeling needs to be applied carefully (e.g., depending on the type of textual data analyzed and modeling parameters). Our study aims at describing how topic modeling has been applied in software engineering research with a focus on four aspects: (1) which topic models and modeling techniques have been applied, (2) which textual inputs have been used for topic modeling, (3) how textual data was “prepared” (i.e., pre-processed) for topic modeling, and (4) how generated topics (i.e., word clusters) were named to give them a human-understandable meaning. We analyzed topic modeling as applied in 111 papers from ten highly-ranked software engineering venues (five journals and five conferences) published between 2009 and 2020. We found that (1) LDA and LDA-based techniques are the most frequent topic modeling techniques, (2) developer communication and bug reports have been modelled most, (3) data pre-processing and modeling parameters vary quite a bit and are often vaguely reported, and (4) manual topic naming (such as deducting names based on frequent words in a topic) is common.


2021 ◽  
Author(s):  
Pablo Restrepo Henao ◽  
Jannik Fischbach ◽  
Dominik Spies ◽  
Julian Frattini ◽  
Andreas Vogelsang

Sign in / Sign up

Export Citation Format

Share Document