Quick remedy commits and their impact on mining software repositories

AbstractMost changes during software maintenance and evolution are not atomic changes, but rather the result of several related changes affecting different parts of the code. It may happen that developers omit needed changes, thus leaving a task partially unfinished, introducing technical debt or injecting bugs. We present a study investigating “quick remedy commits” performed by developers to implement changes omitted in previous commits. With quick remedy commits we refer to commits that (i) quickly follow a commit performed by the same developer, and (ii) aim at remedying issues introduced as the result of code changes omitted in the previous commit (e.g., fix references to code components that have been broken as a consequence of a rename refactoring) or simply improve the previously committed change (e.g., improve the name of a newly introduced variable). Through a manual analysis of 500 quick remedy commits, we define a taxonomy categorizing the types of changes that developers tend to omit. The taxonomy can (i) guide the development of tools aimed at detecting omitted changes and (ii) help researchers in identifying corner cases that must be properly handled. For example, one of the categories in our taxonomy groups the reverted commits, meaning changes that are undone in a subsequent commit. We show that not accounting for such commits when mining software repositories can undermine one’s findings. In particular, our results show that considering completely reverted commits when mining software repositories accounts, on average, for 0.07 and 0.27 noisy data points when dealing with two typical MSR data collection tasks (i.e., bug-fixing commits identification and refactoring operations mining, respectively).

Download Full-text

Automatically Labelled Software Topic Model

International Journal of Open Source Software and Processes ◽

10.4018/ijossp.2020010104 ◽

2020 ◽

Vol 11 (1) ◽

pp. 57-78

Author(s):

Youcef Bouziane ◽

Mustapha Kamel Abdi ◽

Salah Sadou

Keyword(s):

Information Retrieval ◽

Software Engineering ◽

Topic Model ◽

Mining Software Repositories ◽

Valuable Data ◽

Empirical Results ◽

Software Repositories ◽

Manual Analysis ◽

Novel Approach ◽

Support Software

Public software repositories (SR) maintain a massive amount of valuable data offering opportunities to support software engineering (SE) tasks. Researchers have applied information retrieval techniques in mining software repositories. Topic models are one of these techniques. However, this technique does not give an interpretation nor labels to the extracted topics and it requires manual analysis to identify them. Some approaches were proposed to automatically label the topics using tags in SR, but they do not consider the existence of spam-tags and they have difficulties to scale to large tag space. This article introduces a novel approach called automatically labelled software topic model (AL-STM) that labels the topics based on observed tags in SR. It mitigates the shortcomings of manual and automatic labelling of topics in SE. AL-STM is implemented using 22K GitHub projects and evaluated in a SE task (tag recommending) against the currently used techniques. The empirical results suggest that AL-STM is more robust in terms of MAP and nDCG, and more scalable to large tag space.

Download Full-text

SamikshaUmbra: Contribution and Performance Assessment of Software Maintenance Professionals by Mining Software Repositories

2013 20th Asia-Pacific Software Engineering Conference (APSEC) ◽

10.1109/apsec.2013.134 ◽

2013 ◽

Author(s):

Ayushi Rastogi ◽

Ashish Sureka

Keyword(s):

Performance Assessment ◽

Software Maintenance ◽

Mining Software Repositories ◽

Software Repositories ◽

And Performance

Download Full-text

Improving enterprise software maintenance efficiency through mining software repositories in an industry context

Companion Proceedings of the 36th International Conference on Software Engineering - ICSE Companion 2014 ◽

10.1145/2591062.2591085 ◽

2014 ◽

Author(s):

Senthil Mani

Keyword(s):

Software Maintenance ◽

Mining Software Repositories ◽

Enterprise Software ◽

Software Repositories

Download Full-text

MSR4SM: Using topic models to effectively mining software repositories for software maintenance tasks

Information and Software Technology ◽

10.1016/j.infsof.2015.05.003 ◽

2015 ◽

Vol 66 ◽

pp. 1-12 ◽

Cited By ~ 22

Author(s):

Xiaobing Sun ◽

Bixin Li ◽

Hareton Leung ◽

Bin Li ◽

Yun Li

Keyword(s):

Software Maintenance ◽

Topic Models ◽

Mining Software Repositories ◽

Software Repositories

Download Full-text

Augmenting Bug Localization with Part-of-Speech and Invocation

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194017500346 ◽

2017 ◽

Vol 27 (06) ◽

pp. 925-949 ◽

Cited By ~ 5

Author(s):

Yu Zhou ◽

Yanxiang Tong ◽

Taolue Chen ◽

Jin Han

Keyword(s):

Software Maintenance ◽

Large Scale ◽

Bug Localization ◽

Bug Reports ◽

Part Of Speech ◽

Adaptive Technique ◽

Bug Report ◽

Software Maintenance And Evolution ◽

Speech Features ◽

Localization Approach

Bug localization represents one of the most expensive, as well as time-consuming, activities during software maintenance and evolution. To alleviate the workload of developers, numerous methods have been proposed to automate this process and narrow down the scope of reviewing buggy files. In this paper, we present a novel buggy source-file localization approach, using the information from both the bug reports and the source files. We leverage the part-of-speech features of bug reports and the invocation relationship among source files. We also integrate an adaptive technique to further optimize the performance of the approach. The adaptive technique discriminates Top 1 and Top N recommendations for a given bug report and consists of two modules. One module is to maximize the accuracy of the first recommended file, and the other one aims at improving the accuracy of the fixed defect file list. We evaluate our approach on six large-scale open source projects, i.e. ASpectJ, Eclipse, SWT, Zxing, Birt and Tomcat. Compared to the previous work, empirical results show that our approach can improve the overall prediction performance in all of these cases. Particularly, in terms of the Top 1 recommendation accuracy, our approach achieves an enhancement from 22.73% to 39.86% for ASpectJ, from 24.36% to 30.76% for Eclipse, from 31.63% to 46.94% for SWT, from 40% to 55% for ZXing, from 7.97% to 21.99% for Birt, and from 33.37% to 38.90% for Tomcat.

Download Full-text

BIAS-VARIANCE CONTROL VIA HARD POINTS SHAVING

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001404003460 ◽

2004 ◽

Vol 18 (05) ◽

pp. 891-903 ◽

Cited By ~ 12

Author(s):

STEFANO MERLER ◽

BRUNO CAPRILE ◽

CESARE FURLANELLO

Keyword(s):

Control Strategy ◽

Noisy Data ◽

Real Data ◽

Training Data ◽

Regularization Technique ◽

Data Points ◽

Classification Tasks ◽

Bias Variance

In this paper, we propose a regularization technique for AdaBoost. The method implements a bias-variance control strategy in order to avoid overfitting in classification tasks on noisy data. The method is based on a notion of easy and hard training patterns as emerging from analysis of the dynamical evolutions of AdaBoost weights. The procedure consists in sorting the training data points by a hardness measure, and in progressively eliminating the hardest, stopping at an automatically selected threshold. Effectiveness of the method is tested and discussed on synthetic as well as real data.

Download Full-text

Using Pig as a data preparation language for large-scale mining software repositories studies: An experience report

Journal of Systems and Software ◽

10.1016/j.jss.2011.07.034 ◽

2012 ◽

Vol 85 (10) ◽

pp. 2195-2204 ◽

Cited By ~ 6

Author(s):

Weiyi Shang ◽

Bram Adams ◽

Ahmed E. Hassan

Keyword(s):

Large Scale ◽

Mining Software Repositories ◽

Experience Report ◽

Data Preparation ◽

Software Repositories

Download Full-text

Mining Software Repositories for Social Norms

2015 IEEE/ACM 37th IEEE International Conference on Software Engineering ◽

10.1109/icse.2015.209 ◽

2015 ◽

Cited By ~ 3

Author(s):

Hoa Khanh Dam ◽

Bastin Tony Roy Savarimuthu ◽

Daniel Avery ◽

Aditya Ghose

Keyword(s):

Social Norms ◽

Mining Software Repositories ◽

Software Repositories

Download Full-text

GrimoireLab: A toolset for software development analytics

PeerJ Computer Science ◽

10.7717/peerj-cs.601 ◽

2021 ◽

Vol 7 ◽

pp. e601

Author(s):

Santiago Dueñas ◽

Valerio Cosentino ◽

Jesus M. Gonzalez-Barahona ◽

Alvaro del Castillo San Felix ◽

Daniel Izquierdo-Cortazar ◽

...

Keyword(s):

Software Development ◽

Preliminary Analysis ◽

Data Retrieval ◽

Mining Software Repositories ◽

Data Sources ◽

Data Sets ◽

Community Based ◽

Academic Environments ◽

Software Repositories ◽

Main Components

Background After many years of research on software repositories, the knowledge for building mature, reusable tools that perform data retrieval, storage and basic analytics is readily available. However, there is still room to improvement in the area of reusable tools implementing this knowledge. Goal To produce a reusable toolset supporting the most common tasks when retrieving, curating and visualizing data from software repositories, allowing for the easy reproduction of data sets ready for more complex analytics, and sparing the researcher or the analyst of most of the tasks that can be automated. Method Use our experience in building tools in this domain to identify a collection of scenarios where a reusable toolset would be convenient, and the main components of such a toolset. Then build those components, and refine them incrementally using the feedback from their use in both commercial, community-based, and academic environments. Results GrimoireLab, an efficient toolset composed of five main components, supporting about 30 different kinds of data sources related to software development. It has been tested in many environments, for performing different kinds of studies, and providing different kinds of services. It features a common API for accessing the retrieved data, facilities for relating items from different data sources, semi-structured storage for easing later analysis and reproduction, and basic facilities for visualization, preliminary analysis and drill-down in the data. It is also modular, making it easy to support new kinds of data sources and analysis. Conclusions We present a mature toolset, widely tested in the field, that can help to improve the situation in the area of reusable tools for mining software repositories. We show some scenarios where it has already been used. We expect it will help to reduce the effort for doing studies or providing services in this area, leading to advances in reproducibility and comparison of results.

Download Full-text

Software Maintenance and Evolution

Encyclopedia of Physical Science and Technology ◽

10.1016/b0-12-227410-5/00857-7 ◽

2003 ◽

pp. 15-24

Author(s):

Elizabeth Burd ◽

Malcolm Munro

Keyword(s):

Software Maintenance ◽

Software Maintenance And Evolution

Download Full-text