mining software repositories
Recently Published Documents


TOTAL DOCUMENTS

108
(FIVE YEARS 22)

H-INDEX

17
(FIVE YEARS 1)

2021 ◽  
Vol 27 (1) ◽  
Author(s):  
Nicolas E. Gold ◽  
Jens Krinke

AbstractResearch in Mining Software Repositories (MSR) is research involving human subjects, as the repositories usually contain data about developers’ and users’ interactions with the repositories and with each other. The ethics issues raised by such research therefore need to be considered before beginning. This paper presents a discussion of ethics issues that can arise in MSR research, using the mining challenges from the years 2006 to 2021 as a case study to identify the kinds of data used. On the basis of contemporary research ethics frameworks we discuss ethics challenges that may be encountered in creating and using repositories and associated datasets. We also report some results from a small community survey of approaches to ethics in MSR research. In addition, we present four case studies illustrating typical ethics issues one encounters in projects and how ethics considerations can shape projects before they commence. Based on our experience, we present some guidelines and practices that can help in considering potential ethics issues and reducing risks.


2021 ◽  
Vol 27 (1) ◽  
Author(s):  
Fengcai Wen ◽  
Csaba Nagy ◽  
Michele Lanza ◽  
Gabriele Bavota

AbstractMost changes during software maintenance and evolution are not atomic changes, but rather the result of several related changes affecting different parts of the code. It may happen that developers omit needed changes, thus leaving a task partially unfinished, introducing technical debt or injecting bugs. We present a study investigating “quick remedy commits” performed by developers to implement changes omitted in previous commits. With quick remedy commits we refer to commits that (i) quickly follow a commit performed by the same developer, and (ii) aim at remedying issues introduced as the result of code changes omitted in the previous commit (e.g., fix references to code components that have been broken as a consequence of a rename refactoring) or simply improve the previously committed change (e.g., improve the name of a newly introduced variable). Through a manual analysis of 500 quick remedy commits, we define a taxonomy categorizing the types of changes that developers tend to omit. The taxonomy can (i) guide the development of tools aimed at detecting omitted changes and (ii) help researchers in identifying corner cases that must be properly handled. For example, one of the categories in our taxonomy groups the reverted commits, meaning changes that are undone in a subsequent commit. We show that not accounting for such commits when mining software repositories can undermine one’s findings. In particular, our results show that considering completely reverted commits when mining software repositories accounts, on average, for 0.07 and 0.27 noisy data points when dealing with two typical MSR data collection tasks (i.e., bug-fixing commits identification and refactoring operations mining, respectively).


2021 ◽  
Vol 46 (3) ◽  
pp. 26-29
Author(s):  
Dennis Pagano ◽  
Walid Maalej

A decade ago, the rise of GitHub and StackOverflow as social version control and knowledge sharing environments was about to start. Social media like Twitter were mocked by some software engineering researchers and practitioners as "tools for kids not professionals". At that time, we published one of the first papers [12] on social media in software engineering at MSR 2011, the Mining Software Repositories Conference.


2021 ◽  
Vol 7 ◽  
pp. e601
Author(s):  
Santiago Dueñas ◽  
Valerio Cosentino ◽  
Jesus M. Gonzalez-Barahona ◽  
Alvaro del Castillo San Felix ◽  
Daniel Izquierdo-Cortazar ◽  
...  

Background After many years of research on software repositories, the knowledge for building mature, reusable tools that perform data retrieval, storage and basic analytics is readily available. However, there is still room to improvement in the area of reusable tools implementing this knowledge. Goal To produce a reusable toolset supporting the most common tasks when retrieving, curating and visualizing data from software repositories, allowing for the easy reproduction of data sets ready for more complex analytics, and sparing the researcher or the analyst of most of the tasks that can be automated. Method Use our experience in building tools in this domain to identify a collection of scenarios where a reusable toolset would be convenient, and the main components of such a toolset. Then build those components, and refine them incrementally using the feedback from their use in both commercial, community-based, and academic environments. Results GrimoireLab, an efficient toolset composed of five main components, supporting about 30 different kinds of data sources related to software development. It has been tested in many environments, for performing different kinds of studies, and providing different kinds of services. It features a common API for accessing the retrieved data, facilities for relating items from different data sources, semi-structured storage for easing later analysis and reproduction, and basic facilities for visualization, preliminary analysis and drill-down in the data. It is also modular, making it easy to support new kinds of data sources and analysis. Conclusions We present a mature toolset, widely tested in the field, that can help to improve the situation in the area of reusable tools for mining software repositories. We show some scenarios where it has already been used. We expect it will help to reduce the effort for doing studies or providing services in this area, leading to advances in reproducibility and comparison of results.


Sign in / Sign up

Export Citation Format

Share Document