Contextual Similarity Among Identifier Names: An Empirical Study

2021 ◽

pp. 733-761

Author(s):

Sangeeta Lal ◽

Neetu Sardana ◽

Ashish Sureka

Keyword(s):

Machine Learning ◽

Content Analysis ◽

Software Development ◽

Anomaly Detection ◽

Open Source ◽

Large Scale ◽

Source Code ◽

Scale Analysis ◽

Large Scale Analysis ◽

Research Questions

Log statements present in source code provide important information to the software developers because they are useful in various software development activities such as debugging, anomaly detection, and remote issue resolution. Most of the previous studies on logging analysis and prediction provide insights and results after analyzing only a few code constructs. In this chapter, the authors perform an in-depth, focused, and large-scale analysis of logging code constructs at two levels: the file level and catch-blocks level. They answer several research questions related to statistical and content analysis. Statistical and content analysis reveals the presence of differentiating properties among logged and nonlogged code constructs. Based on these findings, the authors propose a machine-learning-based model for catch-blocks logging prediction. The machine-learning-based model is found to be effective in catch-blocks logging prediction.

Download Full-text

Logging Analysis and Prediction in Open Source Java Project

Advances in Systems Analysis, Software Engineering, and High Performance Computing - Optimizing Contemporary Application and Processes in Open Source Software ◽

10.4018/978-1-5225-5314-4.ch003 ◽

2018 ◽

pp. 57-85

Author(s):

Sangeeta Lal ◽

Neetu Sardana ◽

Ashish Sureka

Keyword(s):

Machine Learning ◽

Content Analysis ◽

Software Development ◽

Anomaly Detection ◽

Open Source ◽

Large Scale ◽

Source Code ◽

Scale Analysis ◽

Large Scale Analysis ◽

Research Questions

Log statements present in source code provide important information to the software developers because they are useful in various software development activities such as debugging, anomaly detection, and remote issue resolution. Most of the previous studies on logging analysis and prediction provide insights and results after analyzing only a few code constructs. In this chapter, the authors perform an in-depth, focused, and large-scale analysis of logging code constructs at two levels: the file level and catch-blocks level. They answer several research questions related to statistical and content analysis. Statistical and content analysis reveals the presence of differentiating properties among logged and nonlogged code constructs. Based on these findings, the authors propose a machine-learning-based model for catch-blocks logging prediction. The machine-learning-based model is found to be effective in catch-blocks logging prediction.

Download Full-text

Associating Natural Language Comment and Source Code Entities

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6382 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8592-8599

Author(s):

Sheena Panthaplackel ◽

Milos Gligoric ◽

Raymond J. Mooney ◽

Junyi Jessy Li

Keyword(s):

Software Development ◽

Natural Language ◽

Open Source ◽

Source Code ◽

Initial Step ◽

Binary Classifier ◽

Sequence Labeling ◽

Evaluation Dataset ◽

Revision Histories

Comments are an integral part of software development; they are natural language descriptions associated with source code elements. Understanding explicit associations can be useful in improving code comprehensibility and maintaining the consistency between code and comments. As an initial step towards this larger goal, we address the task of associating entities in Javadoc comments with elements in Java source code. We propose an approach for automatically extracting supervised data using revision histories of open source projects and present a manually annotated evaluation dataset for this task. We develop a binary classifier and a sequence labeling model by crafting a rich feature set which encompasses various aspects of code, comments, and the relationships between them. Experiments show that our systems outperform several baselines learning from the proposed supervision.

Download Full-text

Embedding Metadata and Other Semantics in Word Processing Documents

International Journal of Digital Curation ◽

10.2218/ijdc.v4i2.96 ◽

2009 ◽

Vol 4 (2) ◽

pp. 93-106 ◽

Cited By ~ 1

Author(s):

Peter Sefton ◽

Ian Barnes ◽

Ron Ward ◽

Jim Downing

Keyword(s):

Semantic Web ◽

Software Development ◽

Open Source ◽

Academic Writing ◽

Word Processing ◽

Source Code ◽

Data Curation ◽

User Testing ◽

Microsoft Word ◽

Computing Platforms

This paper describes a technique for embedding document metadata, and potentially other semantic references inline in word processing documents, which the authors have implemented with the help of a software development team. Several assumptions underly the approach; It must be available across computing platforms and work with both Microsoft Word (because of its user base) and OpenOffice.org (because of its free availability). Further the application needs to be acceptable to and usable by users, so the initial implementation covers only small number of features, which will only be extended after user-testing. Within these constraints the system provides a mechanism for encoding not only simple metadata, but for inferring hierarchical relationships between metadata elements from a ‘flat’ word processing file.The paper includes links to open source code implementing the techniques as part of a broader suite of tools for academic writing. This addresses tools and software, semantic web and data curation, integrating curation into research workflows and will provide a platform for integrating work on ontologies, vocabularies and folksonomies into word processing tools.

Download Full-text

Large-scale refactoring challenges and coordination in open source software development

International Journal of Information Systems and Management ◽

10.1504/ijisam.2020.10032700 ◽

2020 ◽

Vol 2 (2) ◽

pp. 150

Author(s):

James Howison ◽

Eunyoung Moon

Keyword(s):

Software Development ◽

Open Source ◽

Open Source Software ◽

Large Scale ◽

Open Source Software Development

Download Full-text

Key Concepts and Definitions of Open Source Communities

Encyclopedia of Networked and Virtual Organizations ◽

10.4018/978-1-59904-885-7.ch099 ◽

2010 ◽

pp. 753-760

Author(s):

Ruben van Wendel de Joode ◽

Sebastian Spaeth

Keyword(s):

Software Development ◽

Open Source ◽

Open Source Software ◽

Online Communities ◽

Source Code ◽

Professional Organizations ◽

Large Numbers ◽

Key Concepts ◽

Open Source Communities ◽

Do So

Most open source software is developed in online communities. These communities are typically referred to as “open source software communities” or “OSS communities.” In OSS communities, the source code, which is the human-readable part of software, is treated as something that is open and that should be downloadable and modifiable to anyone who wishes to do so. The availability of the source code has enabled a practice of decentralized software development in which large numbers of people contribute time and effort. Communities like Linux and Apache, for instance, have been able to connect thousands of individual programmers and professional organizations (although most project communities remain relatively small). These people and organizations are not confined to certain geographical places; on the contrary, they come from literally all continents and they interact and collaborate virtually.

Download Full-text

Integrating Projects from Multiple Open Source Code Forges

Database Technologies ◽

10.4018/978-1-60566-058-5.ch141 ◽

2009 ◽

pp. 2301-2312

Author(s):

Megan Squire

Keyword(s):

Software Development ◽

Open Source ◽

Relevant Literature ◽

Source Code ◽

Scoring Systems ◽

Open Source Code ◽

Multiple Code ◽

Future Work

Much of the data about free, libre, and open source (FLOSS) software development comes from studies of code forges or code repositories used for managing projects. This paper presents a method for integrating data about open source projects by way of matching projects (entities) across multiple code forges. After a review of the relevant literature, a few of the methods are chosen and applied to the FLOSS domain, including a comparison of some simple scoring systems for pairwise project matches. Finally, the paper describes limitations of this approach and recommendations for future work.

Download Full-text

Understanding the Causes of Architecture Changes Using OSS Mailing Lists

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194015400367 ◽

2015 ◽

Vol 25 (09n10) ◽

pp. 1633-1651 ◽

Cited By ~ 2

Author(s):

Wei Ding ◽

Peng Liang ◽

Antony Tang ◽

Hans van Vliet

Keyword(s):

Grounded Theory ◽

Empirical Study ◽

Open Source ◽

Open Source Software ◽

Source Code ◽

Internal Quality ◽

Functional Requirement ◽

Quality Requirement ◽

External Quality ◽

Mailing Lists

The causes of architecture changes can tell about why architecture changes, and this knowledge can be captured to prevent architecture knowledge vaporization and architecture degeneration. But the causes are not always known, especially in open source software (OSS) development. This makes it very hard to understand the underlying reasons for the architecture changes and design appropriate modifications. Architecture information is communicated in development mailing lists of OSS projects. To explore the possibility of identifying and understanding the causes of architecture changes, we conducted an empirical study to analyze architecture information (i.e. architectural threads) communicated in the development mailing lists of two popular OSS projects: Hibernate and ArgoUML, verified architecture changes with source code, and identified the causes of architecture changes from the communicated architecture information. The main findings of this study are: (1) architecture information communicated in OSS mailing lists does lead to architecture changes in code; (2) the major cause for architecture changes in both Hibernate and ArgoUML is preventative changes, and the causes of architecture changes are further classified to functional requirement, external quality requirement, and internal quality requirement using the coding techniques of grounded theory; (3) more than 45% of architecture changes in both projects happened before the first stable version was released.

Download Full-text

Empirical study of the effects of open source adoption on software development economics

Journal of Systems and Software ◽

10.1016/j.jss.2007.01.011 ◽

2007 ◽

Vol 80 (9) ◽

pp. 1517-1529 ◽

Cited By ~ 58

Author(s):

Samuel A. Ajila ◽

Di Wu

Keyword(s):

Empirical Study ◽

Software Development ◽

Open Source ◽

Development Economics

Download Full-text

Use of Free and Open-Source Software (FOSS) in the U.S. Department of Defense

Terry's Archive Online ◽

10.48034/20030102 ◽

2003 ◽

Vol 2003 (01) ◽

pp. 0102

Author(s):

Terry Bollinger

Keyword(s):

Software Development ◽

Open Source ◽

Open Source Software ◽

Department Of Defense ◽

Low Cost ◽

Source Code ◽

Leading Edge ◽

Cyber Attacks ◽

Software Analysis ◽

The U.S

This report documents the results of a study by The MITRE Corporation on the use of free and open-source software (FOSS) in the U.S. Department of Defense (DoD). FOSS gives users the right to run, copy, distribute, study, change, and improve it as they see fit, without asking permission or making fiscal payments to any external group or person. The study showed that FOSS provides substantial benefits to DoD security, infrastructure support, software development, and research. Given the openness of its source code, the finding that FOSS profoundly benefits security was both counterintuitive and instructive. Banning FOSS in DoD would remove access to exceptionally well-verified infrastructure components such as OpenBSD and robust network and software analysis tools needed to detect and respond to cyber-attacks. Finally, losing the hands-on source code accessibility of FOSS source code would reduce DoD’s ability to respond rapidly to cyberattacks. In short, banning FOSS would have immediate, broad, and strongly negative impacts on the DoD’s ability to defend the U.S. against cyberattacks. For infrastructure support, the deep historical ties between FOSS and the emergence of the Internet mean that removing FOSS applications would strongly negatively impact the DoD’s ability to support web and Internet-based applications. Software development would be hit especially hard due to many leading-edge and broadly used tools being FOSS. Finally, the loss of access to low-cost data processing tools and the inability to share results in the more potent form of executable FOSS software would seriously and negatively impact nearly all forms of scientific and data-driven research.

Download Full-text