Free and Open Source Software organizations: A large-scale analysis of code, comments, and commits frequency

Log statements present in source code provide important information to the software developers because they are useful in various software development activities such as debugging, anomaly detection, and remote issue resolution. Most of the previous studies on logging analysis and prediction provide insights and results after analyzing only a few code constructs. In this chapter, the authors perform an in-depth, focused, and large-scale analysis of logging code constructs at two levels: the file level and catch-blocks level. They answer several research questions related to statistical and content analysis. Statistical and content analysis reveals the presence of differentiating properties among logged and nonlogged code constructs. Based on these findings, the authors propose a machine-learning-based model for catch-blocks logging prediction. The machine-learning-based model is found to be effective in catch-blocks logging prediction.

Download Full-text

From Aristotle to Ringelmann: a large-scale analysis of team productivity and coordination in Open Source Software projects

Empirical Software Engineering ◽

10.1007/s10664-015-9406-4 ◽

2015 ◽

Vol 21 (2) ◽

pp. 642-683 ◽

Cited By ~ 19

Author(s):

Ingo Scholtes ◽

Pavlin Mavrodiev ◽

Frank Schweitzer

Keyword(s):

Open Source ◽

Open Source Software ◽

Large Scale ◽

Scale Analysis ◽

Software Projects ◽

Large Scale Analysis ◽

Team Productivity

Download Full-text

Logging Analysis and Prediction in Open Source Java Project

Advances in Systems Analysis, Software Engineering, and High Performance Computing - Optimizing Contemporary Application and Processes in Open Source Software ◽

10.4018/978-1-5225-5314-4.ch003 ◽

2018 ◽

pp. 57-85

Author(s):

Sangeeta Lal ◽

Neetu Sardana ◽

Ashish Sureka

Keyword(s):

Machine Learning ◽

Content Analysis ◽

Software Development ◽

Anomaly Detection ◽

Open Source ◽

Large Scale ◽

Source Code ◽

Scale Analysis ◽

Large Scale Analysis ◽

Research Questions

Log statements present in source code provide important information to the software developers because they are useful in various software development activities such as debugging, anomaly detection, and remote issue resolution. Most of the previous studies on logging analysis and prediction provide insights and results after analyzing only a few code constructs. In this chapter, the authors perform an in-depth, focused, and large-scale analysis of logging code constructs at two levels: the file level and catch-blocks level. They answer several research questions related to statistical and content analysis. Statistical and content analysis reveals the presence of differentiating properties among logged and nonlogged code constructs. Based on these findings, the authors propose a machine-learning-based model for catch-blocks logging prediction. The machine-learning-based model is found to be effective in catch-blocks logging prediction.

Download Full-text

SGL: A Domain-Specific Language for Large-Scale Analysis of Open-Source Code

2018 IEEE Cybersecurity Development (SecDev) ◽

10.1109/secdev.2018.00016 ◽

2018 ◽

Author(s):

Darius Foo ◽

Ming Yi Ang ◽

Jason Yeo ◽

Asankhaya Sharma

Keyword(s):

Open Source ◽

Large Scale ◽

Source Code ◽

Domain Specific Language ◽

Scale Analysis ◽

Specific Language ◽

Open Source Code ◽

Domain Specific ◽

Large Scale Analysis

Download Full-text

SKU performance and distribution: A large-scale analysis of the role of product characteristics with store scanner data

Journal of Retailing and Consumer Services ◽

10.1016/j.jretconser.2021.102533 ◽

2021 ◽

Vol 61 ◽

pp. 102533

Author(s):

Martin Hirche ◽

Luke Greenacre ◽

Magda Nenycz-Thiel ◽

Simone Loose ◽

Larry Lockshin

Keyword(s):

Large Scale ◽

Scale Analysis ◽

Scanner Data ◽

Product Characteristics ◽

Large Scale Analysis

Download Full-text

pyMAP: a Python package for small and large scale analysis of Illumina 450k methylation platform

10.1101/078048 ◽

2016 ◽

Cited By ~ 1

Author(s):

Amin Mahpour

Keyword(s):

Large Scale ◽

Source Code ◽

Command Line ◽

Scale Analysis ◽

Illumina 450K ◽

Large Scale Analysis ◽

Python Scripting ◽

Scripting Language ◽

Python Package ◽

450K Methylation

AbstractPyMAP is a native python module for analysis of 450k methylation platform and is freely available for public use. The package can be easily deployed to cloud platforms that support python scripting language for large-scale methylation studies. By implementing fast parsing functionality, this module can be used to analyze large scale methylation datasets. Additionally, command-line executables shipped with the module can be used to perform common analysis tasks on personal computers.Availability and implementation: PyMAP is implemented in Python and the source code is available under GPL v2 license from http://aminmahpour.github.io/PyMAP/.

Download Full-text

A large-scale analysis of bioinformatics code on GitHub

10.1101/321919 ◽

2018 ◽

Author(s):

Pamela H Russell ◽

Rachel L Johnson ◽

Shreyas Ananthan ◽

Benjamin Harnke ◽

Nichole E Carlson

Keyword(s):

Large Scale ◽

Source Code ◽

The State ◽

Scale Analysis ◽

Development Activity ◽

High Profile ◽

Link Type ◽

Large Scale Analysis ◽

Bioinformatics Software ◽

Bioinformatics Community

AbstractIn recent years, the explosion of genomic data and bioinformatic tools has been accompanied by a growing conversation around reproducibility of results and usability of software. However, the actual state of the body of bioinformatics software remains largely unknown. The purpose of this paper is to investigate the state of source code in the bioinformatics community, specifically looking at relationships between code properties, development activity, developer communities, and software impact. To investigate these issues, we curated a list of 1,720 bioinformatics repositories on GitHub through their mention in peer-reviewed bioinformatics articles. Additionally, we included 23 high-profile repositories identified by their popularity in an online bioinformatics forum. We analyzed repository metadata, source code, development activity, and team dynamics using data made available publicly through the GitHub API, as well as article metadata. We found key relationships within our dataset, including: certain scientific topics are associated with more active code development and higher community interest in the repository; most of the code in the main dataset is written in dynamically typed languages, while most of the code in the high-profile set is statically typed; developer team size is associated with community engagement and high-profile repositories have larger teams; the proportion of female contributors decreases for high-profile repositories and with seniority level in author lists; and, multiple measures of project impact are associated with the simple variable of whether the code was modified at all after paper publication. In addition to providing the first large-scale analysis of bioinformatics code to our knowledge, our work will enable future analysis through publicly available data, code, and methods. Code to generate the dataset and reproduce the analysis is provided under the MIT license at https://github.com/pamelarussell/githubbioinformatics. Data are available at https://doi.org/10.17605/OSF.IO/UWHX8.Author summaryWe present, to our knowledge, the first large-scale analysis of bioinformatics source code. The purpose of our work is to contribute data to the growing conversation in the bioinformatics community around reproducibility, code quality, and software usability. We analyze a large collection of bioinformatics software projects, identifying relationships between code properties, development activity, developer communities, and software impact. Throughout the work, we compare the large set of projects to a small set of highly popular bioinformatics tools, highlighting features associated with high-profile projects. We make our data and code publicly available to enable others to build upon our analysis or generate new datasets. The significance of our work is to (1) contribute a large base of knowledge to the bioinformatics community about the state of their software, (2) contribute tools and resources enabling the community to conduct their own analyses, and (3) demonstrate that it is possible to systematically analyze large volumes of bioinformatics code. This work and the provided resources will enable a more effective, data-driven conversation around software practices in the bioinformatics community.

Download Full-text

Two Level Empirical Study of Logging Statements in Open Source Java Projects

International Journal of Open Source Software and Processes ◽

10.4018/ijossp.2015010104 ◽

2015 ◽

Vol 6 (1) ◽

pp. 49-73 ◽

Cited By ~ 3

Author(s):

Sangeeta Lal ◽

Neetu Sardana ◽

Ashish Sureka

Keyword(s):

Content Analysis ◽

Statistical Analysis ◽

Empirical Study ◽

Large Scale ◽

Source Code ◽

Scale Analysis ◽

Software Developers ◽

Specific Content ◽

Large Scale Analysis ◽

Research Questions

Log statements present in source code provide important information to the software developers because they are useful in various software development activities. Most of the previous studies on logging analysis and prediction provide insights and results after analyzing only a few code constructs. In this paper, the authors perform an in-depth and large-scale analysis of logging code constructs at two levels. They answer nine research questions related to statistical and content analysis. Statistical analysis at file level reveals that fewer files consist of log statements but logged files have a greater complexity than that of non-logged files. Results show that a positive correlation exists between size and logging count of the logged files. Statistical analysis on catch-blocks show that try-blocks associated with logged catch-blocks have greater complexity than non-logged catch-blocks and the logging ratio of an exception type is project specific. Content-based analysis of catch-blocks reveals the presence of different topics in try-blocks associated with logged and non-logged catch-blocks.

Download Full-text

Demo: Large Scale Analysis on Vulnerability Remediation in Open-source JavaScript Projects

10.1145/3460120.3485357 ◽

2021 ◽

Author(s):

Vinuri Bandara ◽

Thisura Rathnayake ◽

Nipuna Weerasekara ◽

Charitha Elvitigala ◽

Kenneth Thilakarathna ◽

...

Keyword(s):

Open Source ◽

Large Scale ◽

Scale Analysis ◽

Large Scale Analysis

Download Full-text

Open-Source Solution to Secure E-Government Services

Encyclopedia of Digital Government ◽

10.4018/978-1-59140-789-8.ch197 ◽

2011 ◽

pp. 1300-1305

Author(s):

C. A. Ardagna

Keyword(s):

Open Source ◽

Open Source Software ◽

Large Scale ◽

Security Requirements ◽

The Public ◽

Security Environment ◽

Public Responsibility ◽

Large Scale Networks ◽

Standard Integration

Nowadays, a global information infrastructure connects remote parties through the use of large scale networks, and many companies focus on developing e-services based on remote resources and on interactions between remote parties. In such a context, e-government (e-gov) systems became of paramount importance for the public administration, and many ongoing development projects are targeted on their implementation, security, and release (Bettini, Jajodia, Sean Wang, & Wijesekera, 2002). For open-source software to play an important role in this scenario, three main technological requirements must be fulfilled: (1) the identification and optimization of de facto standards for building e-gov open-source software components, (2) the adoption of open-source techniques to secure e-gov services and (3) the standard integration of these components into an open-source middleware layer, capable of conveying a completely open-source e-gov solution. This article highlights that e-gov systems should be constructed on an open-source middleware layer, providing full public responsibility in its development. The role of open-source middleware for secure e-gov services deployment is discussed, focusing on implementing a security environment without custom programming. An alternative solution is given and consists of the adoption of a stand-alone architecture that fulfils all security requirements.

Download Full-text