scholarly journals Contextual Similarity Among Identifier Names: An Empirical Study

2020 ◽  
Author(s):  
Remo De Oliveira Gresta ◽  
Elder Cirilo

Identifiers are one of the most important sources of domain information in software development. Therefore, it is recognized that the proper use of names directly impacts the code's comprehensibility, maintainability, and quality. Our goal in this work is to expand the current knowledge about names by considering not only their quality but also their contextual similarity. To achieve that, we extracted names of four large scale open-source projects written in Java. Then, we computed the semantic similarity between classes and their attributes/variables using Fasttext, an word embedding algorithm. As a result, we could observe that source code, in general, preserve an acceptable level of contextual similarity, developers avoid to use names out of the default dictionary (e.g., domain), and files with more changes and maintained by distinct contributors tend to have better a contextual similarity.

Author(s):  
Sangeeta Lal ◽  
Neetu Sardana ◽  
Ashish Sureka

Log statements present in source code provide important information to the software developers because they are useful in various software development activities such as debugging, anomaly detection, and remote issue resolution. Most of the previous studies on logging analysis and prediction provide insights and results after analyzing only a few code constructs. In this chapter, the authors perform an in-depth, focused, and large-scale analysis of logging code constructs at two levels: the file level and catch-blocks level. They answer several research questions related to statistical and content analysis. Statistical and content analysis reveals the presence of differentiating properties among logged and nonlogged code constructs. Based on these findings, the authors propose a machine-learning-based model for catch-blocks logging prediction. The machine-learning-based model is found to be effective in catch-blocks logging prediction.


Author(s):  
Sangeeta Lal ◽  
Neetu Sardana ◽  
Ashish Sureka

Log statements present in source code provide important information to the software developers because they are useful in various software development activities such as debugging, anomaly detection, and remote issue resolution. Most of the previous studies on logging analysis and prediction provide insights and results after analyzing only a few code constructs. In this chapter, the authors perform an in-depth, focused, and large-scale analysis of logging code constructs at two levels: the file level and catch-blocks level. They answer several research questions related to statistical and content analysis. Statistical and content analysis reveals the presence of differentiating properties among logged and nonlogged code constructs. Based on these findings, the authors propose a machine-learning-based model for catch-blocks logging prediction. The machine-learning-based model is found to be effective in catch-blocks logging prediction.


2020 ◽  
Vol 34 (05) ◽  
pp. 8592-8599
Author(s):  
Sheena Panthaplackel ◽  
Milos Gligoric ◽  
Raymond J. Mooney ◽  
Junyi Jessy Li

Comments are an integral part of software development; they are natural language descriptions associated with source code elements. Understanding explicit associations can be useful in improving code comprehensibility and maintaining the consistency between code and comments. As an initial step towards this larger goal, we address the task of associating entities in Javadoc comments with elements in Java source code. We propose an approach for automatically extracting supervised data using revision histories of open source projects and present a manually annotated evaluation dataset for this task. We develop a binary classifier and a sequence labeling model by crafting a rich feature set which encompasses various aspects of code, comments, and the relationships between them. Experiments show that our systems outperform several baselines learning from the proposed supervision.


2009 ◽  
Vol 4 (2) ◽  
pp. 93-106 ◽  
Author(s):  
Peter Sefton ◽  
Ian Barnes ◽  
Ron Ward ◽  
Jim Downing

This paper describes a technique for embedding document metadata, and potentially other semantic references inline in word processing documents, which the authors have implemented with the help of a software development team. Several assumptions underly the approach; It must be available across computing platforms and work with both Microsoft Word (because of its user base) and OpenOffice.org (because of its free availability). Further the application needs to be acceptable to and usable by users, so the initial implementation covers only small number of features, which will only be extended after user-testing. Within these constraints the system provides a mechanism for encoding not only simple metadata, but for inferring hierarchical relationships between metadata elements from a ‘flat’ word processing file.The paper includes links to open source code implementing the techniques as part of a broader suite of tools for academic writing. This addresses tools and software, semantic web and data curation, integrating curation into research workflows and will provide a platform for integrating work on ontologies, vocabularies and folksonomies into word processing tools.


Author(s):  
Ruben van Wendel de Joode ◽  
Sebastian Spaeth

Most open source software is developed in online communities. These communities are typically referred to as “open source software communities” or “OSS communities.” In OSS communities, the source code, which is the human-readable part of software, is treated as something that is open and that should be downloadable and modifiable to anyone who wishes to do so. The availability of the source code has enabled a practice of decentralized software development in which large numbers of people contribute time and effort. Communities like Linux and Apache, for instance, have been able to connect thousands of individual programmers and professional organizations (although most project communities remain relatively small). These people and organizations are not confined to certain geographical places; on the contrary, they come from literally all continents and they interact and collaborate virtually.


2009 ◽  
pp. 2301-2312
Author(s):  
Megan Squire

Much of the data about free, libre, and open source (FLOSS) software development comes from studies of code forges or code repositories used for managing projects. This paper presents a method for integrating data about open source projects by way of matching projects (entities) across multiple code forges. After a review of the relevant literature, a few of the methods are chosen and applied to the FLOSS domain, including a comparison of some simple scoring systems for pairwise project matches. Finally, the paper describes limitations of this approach and recommendations for future work.


2015 ◽  
Vol 25 (09n10) ◽  
pp. 1633-1651 ◽  
Author(s):  
Wei Ding ◽  
Peng Liang ◽  
Antony Tang ◽  
Hans van Vliet

The causes of architecture changes can tell about why architecture changes, and this knowledge can be captured to prevent architecture knowledge vaporization and architecture degeneration. But the causes are not always known, especially in open source software (OSS) development. This makes it very hard to understand the underlying reasons for the architecture changes and design appropriate modifications. Architecture information is communicated in development mailing lists of OSS projects. To explore the possibility of identifying and understanding the causes of architecture changes, we conducted an empirical study to analyze architecture information (i.e. architectural threads) communicated in the development mailing lists of two popular OSS projects: Hibernate and ArgoUML, verified architecture changes with source code, and identified the causes of architecture changes from the communicated architecture information. The main findings of this study are: (1) architecture information communicated in OSS mailing lists does lead to architecture changes in code; (2) the major cause for architecture changes in both Hibernate and ArgoUML is preventative changes, and the causes of architecture changes are further classified to functional requirement, external quality requirement, and internal quality requirement using the coding techniques of grounded theory; (3) more than 45% of architecture changes in both projects happened before the first stable version was released.


2003 ◽  
Vol 2003 (01) ◽  
pp. 0102
Author(s):  
Terry Bollinger

This report documents the results of a study by The MITRE Corporation on the use of free and open-source software (FOSS) in the U.S. Department of Defense (DoD). FOSS gives users the right to run, copy, distribute, study, change, and improve it as they see fit, without asking permission or making fiscal payments to any external group or person. The study showed that FOSS provides substantial benefits to DoD security, infrastructure support, software development, and research. Given the openness of its source code, the finding that FOSS profoundly benefits security was both counterintuitive and instructive. Banning FOSS in DoD would remove access to exceptionally well-verified infrastructure components such as OpenBSD and robust network and software analysis tools needed to detect and respond to cyber-attacks. Finally, losing the hands-on source code accessibility of FOSS source code would reduce DoD’s ability to respond rapidly to cyberattacks. In short, banning FOSS would have immediate, broad, and strongly negative impacts on the DoD’s ability to defend the U.S. against cyberattacks. For infrastructure support, the deep historical ties between FOSS and the emergence of the Internet mean that removing FOSS applications would strongly negatively impact the DoD’s ability to support web and Internet-based applications. Software development would be hit especially hard due to many leading-edge and broadly used tools being FOSS. Finally, the loss of access to low-cost data processing tools and the inability to share results in the more potent form of executable FOSS software would seriously and negatively impact nearly all forms of scientific and data-driven research.


Sign in / Sign up

Export Citation Format

Share Document