scholarly journals Free and Open Source Software organizations: A large-scale analysis of code, comments, and commits frequency

PLoS ONE ◽  
2021 ◽  
Vol 16 (9) ◽  
pp. e0257192
Author(s):  
Tadeusz Chełkowski ◽  
Dariusz Jemielniak ◽  
Kacper Macikowski

As Free and Open Source Software (FOSS) increases in importance and use by global corporations, understanding the dynamics of its communities becomes critical. This paper measures up to 21 years of activities in 1314 individual projects and 1.4 billion lines of code managed. After analyzing the FOSS activities on the projects and organizations level, such as commits frequency, source code lines, and code comments, we find that there is less activity now than there was a decade ago. Moreover, our results suggest a greater decrease in the activities in large and well-established FOSS organizations. Our findings indicate that as technologies and business strategies related to FOSS mature, the role of large formal FOSS organizations serving as intermediary between developers diminishes.

Author(s):  
Sangeeta Lal ◽  
Neetu Sardana ◽  
Ashish Sureka

Log statements present in source code provide important information to the software developers because they are useful in various software development activities such as debugging, anomaly detection, and remote issue resolution. Most of the previous studies on logging analysis and prediction provide insights and results after analyzing only a few code constructs. In this chapter, the authors perform an in-depth, focused, and large-scale analysis of logging code constructs at two levels: the file level and catch-blocks level. They answer several research questions related to statistical and content analysis. Statistical and content analysis reveals the presence of differentiating properties among logged and nonlogged code constructs. Based on these findings, the authors propose a machine-learning-based model for catch-blocks logging prediction. The machine-learning-based model is found to be effective in catch-blocks logging prediction.


Author(s):  
Sangeeta Lal ◽  
Neetu Sardana ◽  
Ashish Sureka

Log statements present in source code provide important information to the software developers because they are useful in various software development activities such as debugging, anomaly detection, and remote issue resolution. Most of the previous studies on logging analysis and prediction provide insights and results after analyzing only a few code constructs. In this chapter, the authors perform an in-depth, focused, and large-scale analysis of logging code constructs at two levels: the file level and catch-blocks level. They answer several research questions related to statistical and content analysis. Statistical and content analysis reveals the presence of differentiating properties among logged and nonlogged code constructs. Based on these findings, the authors propose a machine-learning-based model for catch-blocks logging prediction. The machine-learning-based model is found to be effective in catch-blocks logging prediction.


2021 ◽  
Vol 61 ◽  
pp. 102533
Author(s):  
Martin Hirche ◽  
Luke Greenacre ◽  
Magda Nenycz-Thiel ◽  
Simone Loose ◽  
Larry Lockshin

2016 ◽  
Author(s):  
Amin Mahpour

AbstractPyMAP is a native python module for analysis of 450k methylation platform and is freely available for public use. The package can be easily deployed to cloud platforms that support python scripting language for large-scale methylation studies. By implementing fast parsing functionality, this module can be used to analyze large scale methylation datasets. Additionally, command-line executables shipped with the module can be used to perform common analysis tasks on personal computers.Availability and implementation: PyMAP is implemented in Python and the source code is available under GPL v2 license from http://aminmahpour.github.io/PyMAP/.


2018 ◽  
Author(s):  
Pamela H Russell ◽  
Rachel L Johnson ◽  
Shreyas Ananthan ◽  
Benjamin Harnke ◽  
Nichole E Carlson

AbstractIn recent years, the explosion of genomic data and bioinformatic tools has been accompanied by a growing conversation around reproducibility of results and usability of software. However, the actual state of the body of bioinformatics software remains largely unknown. The purpose of this paper is to investigate the state of source code in the bioinformatics community, specifically looking at relationships between code properties, development activity, developer communities, and software impact. To investigate these issues, we curated a list of 1,720 bioinformatics repositories on GitHub through their mention in peer-reviewed bioinformatics articles. Additionally, we included 23 high-profile repositories identified by their popularity in an online bioinformatics forum. We analyzed repository metadata, source code, development activity, and team dynamics using data made available publicly through the GitHub API, as well as article metadata. We found key relationships within our dataset, including: certain scientific topics are associated with more active code development and higher community interest in the repository; most of the code in the main dataset is written in dynamically typed languages, while most of the code in the high-profile set is statically typed; developer team size is associated with community engagement and high-profile repositories have larger teams; the proportion of female contributors decreases for high-profile repositories and with seniority level in author lists; and, multiple measures of project impact are associated with the simple variable of whether the code was modified at all after paper publication. In addition to providing the first large-scale analysis of bioinformatics code to our knowledge, our work will enable future analysis through publicly available data, code, and methods. Code to generate the dataset and reproduce the analysis is provided under the MIT license at https://github.com/pamelarussell/githubbioinformatics. Data are available at https://doi.org/10.17605/OSF.IO/UWHX8.Author summaryWe present, to our knowledge, the first large-scale analysis of bioinformatics source code. The purpose of our work is to contribute data to the growing conversation in the bioinformatics community around reproducibility, code quality, and software usability. We analyze a large collection of bioinformatics software projects, identifying relationships between code properties, development activity, developer communities, and software impact. Throughout the work, we compare the large set of projects to a small set of highly popular bioinformatics tools, highlighting features associated with high-profile projects. We make our data and code publicly available to enable others to build upon our analysis or generate new datasets. The significance of our work is to (1) contribute a large base of knowledge to the bioinformatics community about the state of their software, (2) contribute tools and resources enabling the community to conduct their own analyses, and (3) demonstrate that it is possible to systematically analyze large volumes of bioinformatics code. This work and the provided resources will enable a more effective, data-driven conversation around software practices in the bioinformatics community.


2015 ◽  
Vol 6 (1) ◽  
pp. 49-73 ◽  
Author(s):  
Sangeeta Lal ◽  
Neetu Sardana ◽  
Ashish Sureka

Log statements present in source code provide important information to the software developers because they are useful in various software development activities. Most of the previous studies on logging analysis and prediction provide insights and results after analyzing only a few code constructs. In this paper, the authors perform an in-depth and large-scale analysis of logging code constructs at two levels. They answer nine research questions related to statistical and content analysis. Statistical analysis at file level reveals that fewer files consist of log statements but logged files have a greater complexity than that of non-logged files. Results show that a positive correlation exists between size and logging count of the logged files. Statistical analysis on catch-blocks show that try-blocks associated with logged catch-blocks have greater complexity than non-logged catch-blocks and the logging ratio of an exception type is project specific. Content-based analysis of catch-blocks reveals the presence of different topics in try-blocks associated with logged and non-logged catch-blocks.


2021 ◽  
Author(s):  
Vinuri Bandara ◽  
Thisura Rathnayake ◽  
Nipuna Weerasekara ◽  
Charitha Elvitigala ◽  
Kenneth Thilakarathna ◽  
...  

Author(s):  
C. A. Ardagna

Nowadays, a global information infrastructure connects remote parties through the use of large scale networks, and many companies focus on developing e-services based on remote resources and on interactions between remote parties. In such a context, e-government (e-gov) systems became of paramount importance for the public administration, and many ongoing development projects are targeted on their implementation, security, and release (Bettini, Jajodia, Sean Wang, & Wijesekera, 2002). For open-source software to play an important role in this scenario, three main technological requirements must be fulfilled: (1) the identification and optimization of de facto standards for building e-gov open-source software components, (2) the adoption of open-source techniques to secure e-gov services and (3) the standard integration of these components into an open-source middleware layer, capable of conveying a completely open-source e-gov solution. This article highlights that e-gov systems should be constructed on an open-source middleware layer, providing full public responsibility in its development. The role of open-source middleware for secure e-gov services deployment is discussed, focusing on implementing a security environment without custom programming. An alternative solution is given and consists of the adoption of a stand-alone architecture that fulfils all security requirements.


Sign in / Sign up

Export Citation Format

Share Document