From Aristotle to Ringelmann: a large-scale analysis of team productivity and coordination in Open Source Software projects

Ingo Scholtes; Pavlin Mavrodiev; Frank Schweitzer

doi:10.1007/s10664-015-9406-4

Free and Open Source Software organizations: A large-scale analysis of code, comments, and commits frequency

PLoS ONE ◽

10.1371/journal.pone.0257192 ◽

2021 ◽

Vol 16 (9) ◽

pp. e0257192

Author(s):

Tadeusz Chełkowski ◽

Dariusz Jemielniak ◽

Kacper Macikowski

Keyword(s):

Open Source ◽

Open Source Software ◽

Large Scale ◽

Source Code ◽

Business Strategies ◽

Scale Analysis ◽

Large Scale Analysis ◽

Global Corporations ◽

Frequency Source

As Free and Open Source Software (FOSS) increases in importance and use by global corporations, understanding the dynamics of its communities becomes critical. This paper measures up to 21 years of activities in 1314 individual projects and 1.4 billion lines of code managed. After analyzing the FOSS activities on the projects and organizations level, such as commits frequency, source code lines, and code comments, we find that there is less activity now than there was a decade ago. Moreover, our results suggest a greater decrease in the activities in large and well-established FOSS organizations. Our findings indicate that as technologies and business strategies related to FOSS mature, the role of large formal FOSS organizations serving as intermediary between developers diminishes.

Download Full-text

A Study of Quality Indicator Model of Large-Scale Open Source Software Projects for Adoption Decision-making

Procedia Computer Science ◽

10.1016/j.procs.2020.09.020 ◽

2020 ◽

Vol 176 ◽

pp. 3665-3672

Author(s):

Shinji Akatsu ◽

Ayako Masuda ◽

Tsuyoshi Shida ◽

Kazuhiko Tsuda

Keyword(s):

Decision Making ◽

Open Source ◽

Open Source Software ◽

Quality Indicator ◽

Large Scale ◽

Software Projects ◽

Adoption Decision ◽

Indicator Model

Download Full-text

Logging Analysis and Prediction in Open Source Java Project

Research Anthology on Usage and Development of Open Source Software ◽

10.4018/978-1-7998-9158-1.ch038 ◽

2021 ◽

pp. 733-761

Author(s):

Sangeeta Lal ◽

Neetu Sardana ◽

Ashish Sureka

Keyword(s):

Machine Learning ◽

Content Analysis ◽

Software Development ◽

Anomaly Detection ◽

Open Source ◽

Large Scale ◽

Source Code ◽

Scale Analysis ◽

Large Scale Analysis ◽

Research Questions

Log statements present in source code provide important information to the software developers because they are useful in various software development activities such as debugging, anomaly detection, and remote issue resolution. Most of the previous studies on logging analysis and prediction provide insights and results after analyzing only a few code constructs. In this chapter, the authors perform an in-depth, focused, and large-scale analysis of logging code constructs at two levels: the file level and catch-blocks level. They answer several research questions related to statistical and content analysis. Statistical and content analysis reveals the presence of differentiating properties among logged and nonlogged code constructs. Based on these findings, the authors propose a machine-learning-based model for catch-blocks logging prediction. The machine-learning-based model is found to be effective in catch-blocks logging prediction.

Download Full-text

A large-scale empirical exploration on refactoring activities in open source software projects

Science of Computer Programming ◽

10.1016/j.scico.2019.05.002 ◽

2019 ◽

Vol 180 ◽

pp. 1-15 ◽

Cited By ~ 13

Author(s):

Carmine Vassallo ◽

Giovanni Grano ◽

Fabio Palomba ◽

Harald C. Gall ◽

Alberto Bacchelli

Keyword(s):

Open Source ◽

Open Source Software ◽

Large Scale ◽

Software Projects

Download Full-text

Logging Analysis and Prediction in Open Source Java Project

Advances in Systems Analysis, Software Engineering, and High Performance Computing - Optimizing Contemporary Application and Processes in Open Source Software ◽

10.4018/978-1-5225-5314-4.ch003 ◽

2018 ◽

pp. 57-85

Author(s):

Sangeeta Lal ◽

Neetu Sardana ◽

Ashish Sureka

Keyword(s):

Machine Learning ◽

Content Analysis ◽

Software Development ◽

Anomaly Detection ◽

Open Source ◽

Large Scale ◽

Source Code ◽

Scale Analysis ◽

Large Scale Analysis ◽

Research Questions

Log statements present in source code provide important information to the software developers because they are useful in various software development activities such as debugging, anomaly detection, and remote issue resolution. Most of the previous studies on logging analysis and prediction provide insights and results after analyzing only a few code constructs. In this chapter, the authors perform an in-depth, focused, and large-scale analysis of logging code constructs at two levels: the file level and catch-blocks level. They answer several research questions related to statistical and content analysis. Statistical and content analysis reveals the presence of differentiating properties among logged and nonlogged code constructs. Based on these findings, the authors propose a machine-learning-based model for catch-blocks logging prediction. The machine-learning-based model is found to be effective in catch-blocks logging prediction.

Download Full-text

A Study of Measurement for Development Efficiency in Large Scale Open Source Software Projects

IEEJ Transactions on Electronics Information and Systems ◽

10.1541/ieejeiss.138.1011 ◽

2018 ◽

Vol 138 (8) ◽

pp. 1011-1019 ◽

Cited By ~ 1

Author(s):

Ayako Masuda ◽

Chikako Morimoto ◽

Tohru Matsuodani ◽

Kazuhiko Tsuda

Keyword(s):

Open Source ◽

Open Source Software ◽

Large Scale ◽

Software Projects

Download Full-text

A study of quality prediction for large-scale open source software projects

Artificial Intelligence Research ◽

10.5430/air.v10n1p34 ◽

2021 ◽

Vol 10 (1) ◽

pp. 34

Author(s):

Shinji Akatsu ◽

Ayako Masuda ◽

Tsuyoshi Shida ◽

Kazuhiko Tsuda

Keyword(s):

Decision Making ◽

Open Source ◽

Open Source Software ◽

Large Scale ◽

Quality Analysis ◽

Quality Prediction ◽

Software Projects ◽

Resolution Rate ◽

The Status

Open source software (OSS) has seen remarkable progress in recent years. Moreover, OSS usage in corporate information systems has been increasing steadily; consequently, the overall impact of OSS on the society is increasing as well. While product quality of enterprise software is assured by the provider, the deliverables of an OSS are developed by the OSS developer community; therefore, their quality is not guaranteed. Thus, the objective of this study is to build an artificial-intelligence-based quality prediction model that corporate businesses could use for decision-making to determine whether a desired OSS should be adopted. We define the quality of an OSS as “the resolution rate of issues processed by OSS developers as well as the promptness and continuity of doing so.” We selected 44 large-scale OSS projects from GitHub for our quality analysis. First, we investigated the monthly changes in the status of issue creation and resolution for each project. It was found that there are three different patterns in the increase of issue creation, and three patterns in the relationship between the increase in issue creation and that of resolution. It was confirmed that there are multiple cases of each pattern that affect the final resolution rate. Next, we investigated the correlation between the final resolution rate and that for a relevant number of months after issue creation. We deduced that the correlation coefficient even between the resolution rate in the first month and the final rate exceeded 0.5. Based on these analysis results, we conclude that the issue resolution rate in the first month once an issue is created is applicable as knowledge for knowledge-based AI systems that can be used to assist in decision-making regarding OSS adoption in business projects.

Download Full-text

On Predicting Rediscoveries of Software Defects

10.32920/ryerson.14646261 ◽

2021 ◽

Author(s):

Mefta Sadat

Keyword(s):

Customer Satisfaction ◽

Open Source ◽

Predictive Model ◽

Software Quality ◽

Open Source Software ◽

Large Scale ◽

Software Defects ◽

Enterprise Software ◽

Software Projects ◽

Multiple Clients

The same defect may be rediscovered by multiple clients, causing unplanned outages and leading to reduced customer satisfaction. One solution is forcing clients to install a fix for every defect. However, this approach is economically infeasible, because it requires extra resources and increases downtime. Moreover, it may lead to regression of functionality, as new fixes may break the existing functionality. Our goal is to find a way to proactively predict defects that a client may rediscover in the future. We build a predictive model by leveraging recommender algorithms. We evaluate our approach with extracted rediscovery data from four groups of large-scale open source software projects (namely, Eclipse, Gentoo, KDE, and Libre) and one enterprise software. The datasets contain information about ⇡ 1.33 million unique defect reports over a period of 18 years (1999-2017). Our proposed approach may help in understanding the defect rediscovery phenomenon, leading to improvement of software quality and customer satisfaction.

Download Full-text

Demo: Large Scale Analysis on Vulnerability Remediation in Open-source JavaScript Projects

10.1145/3460120.3485357 ◽

2021 ◽

Author(s):

Vinuri Bandara ◽

Thisura Rathnayake ◽

Nipuna Weerasekara ◽

Charitha Elvitigala ◽

Kenneth Thilakarathna ◽

...

Keyword(s):

Open Source ◽

Large Scale ◽

Scale Analysis ◽

Large Scale Analysis

Download Full-text

On Predicting Rediscoveries of Software Defects

10.32920/ryerson.14646261.v1 ◽

2021 ◽

Author(s):

Mefta Sadat

Keyword(s):

Customer Satisfaction ◽

Open Source ◽

Predictive Model ◽

Software Quality ◽

Open Source Software ◽

Large Scale ◽

Software Defects ◽

Enterprise Software ◽

Software Projects ◽

Multiple Clients

The same defect may be rediscovered by multiple clients, causing unplanned outages and leading to reduced customer satisfaction. One solution is forcing clients to install a fix for every defect. However, this approach is economically infeasible, because it requires extra resources and increases downtime. Moreover, it may lead to regression of functionality, as new fixes may break the existing functionality. Our goal is to find a way to proactively predict defects that a client may rediscover in the future. We build a predictive model by leveraging recommender algorithms. We evaluate our approach with extracted rediscovery data from four groups of large-scale open source software projects (namely, Eclipse, Gentoo, KDE, and Libre) and one enterprise software. The datasets contain information about ⇡ 1.33 million unique defect reports over a period of 18 years (1999-2017). Our proposed approach may help in understanding the defect rediscovery phenomenon, leading to improvement of software quality and customer satisfaction.

Download Full-text