Archetypal source code searches: a survey of software developers and maintainers

The business models of software/platform as a service have contributed to developers dependence on the Internet. Developers can rapidly point each other and consumers to the newest software changes with the power of the hyper link. But, developers are not limited to referencing software changes to one another through the web. Other shared hypermedia might include links to: Stack Overflow, Twitter, and issue trackers. This work explores the software traceability of Uniform Resource Locators (URLs) which software developers leave in commit messages and software repositories. URLs are easily extracted from commit messages and source code. Therefore, it would be useful to researchers if URLs provide additional insight on project development. To assess traceability, manual topic labelling is evaluated against automated topic labelling on URL data sets. This work also shows differences between URL data collected from commit messages versus URL data collected from source code. As well, this work explores outlying software projects with many URLs in case these projects do not provide meaningful software relationship information. Results from manual topic labelling show promise under evaluation while automated topic labelling did not yield precise topics. Further investigation of manual and automated topic analysis would be useful.

Download Full-text

Motivation of Open Source Developers

Multi-Disciplinary Advancement in Open Source Software and Processes ◽

10.4018/978-1-60960-513-1.ch014 ◽

2011 ◽

pp. 233-249

Author(s):

Mark R. Allyn ◽

Ram B. Misra

Keyword(s):

Open Source ◽

Open Source Software ◽

Organizational Design ◽

Source Code ◽

Software Developers ◽

Open Source Code ◽

And Performance ◽

The Many ◽

And Control ◽

Work Done

The motivational drivers of open source software developers have been researched by various investigators since about 2000. This work shows that developers are motivated by different extrinsic and intrinsic drivers, among them community aspirations, reciprocity and fairness, creative impulses, and monetary and career ambitions. There has been some work done in studying whether the profile of developer motivations is constant across open source projects or is sensitive to project organizational design. Among the many factors that could influence the mix of motives of OS developers is the license under which the work is performed. Licenses range in openness between those such as the GNU GPL that severely restrict the freedom of developers to mingle their OS code with proprietary code to those such as BSD licenses which allow programmers much greater latitude in integrating open source code with proprietary code. In addition to formal rules, meritocracies emerge to reward effort and performance, and also to direct, coordinate, and control other participants. The authors discuss these variables and how they may be related to motivations.

Download Full-text

Evaluation indicators for open-source software: a review

Cybersecurity ◽

10.1186/s42400-021-00084-8 ◽

2021 ◽

Vol 4 (1) ◽

Author(s):

Yuhang Zhao ◽

Ruigang Liang ◽

Xiang Chen ◽

Jing Zou

Keyword(s):

Supply Chain ◽

Comparative Analysis ◽

Open Source ◽

Open Source Software ◽

Source Code ◽

Comprehensive Understanding ◽

Security Risks ◽

Software Developers ◽

Security Vulnerabilities ◽

The Past

AbstractIn recent years, the widespread applications of open-source software (OSS) have brought great convenience for software developers. However, it is always facing unavoidable security risks, such as open-source code defects and security vulnerabilities. To find out the OSS risks in time, we carry out an empirical study to identify the indicators for evaluating the OSS. To achieve a comprehensive understanding of the OSS assessment, we collect 56 papers from prestigious academic venues (such as IEEE Xplore, ACM Digital Library, DBLP, and Google Scholar) in the past 21 years. During the process of the investigation, we first identify the main concerns for selecting OSS and distill five types of commonly used indicators to assess OSS. We then conduct a comparative analysis to discuss how these indicators are used in each surveyed study and their differences. Moreover, we further undertake a correlation analysis between these indicators and uncover 13 confirmed conclusions and four cases with controversy occurring in these studies. Finally, we discuss several possible applications of these conclusions, which are insightful for the research on OSS and software supply chain.

Download Full-text

Two Level Empirical Study of Logging Statements in Open Source Java Projects

International Journal of Open Source Software and Processes ◽

10.4018/ijossp.2015010104 ◽

2015 ◽

Vol 6 (1) ◽

pp. 49-73 ◽

Cited By ~ 3

Author(s):

Sangeeta Lal ◽

Neetu Sardana ◽

Ashish Sureka

Keyword(s):

Content Analysis ◽

Statistical Analysis ◽

Empirical Study ◽

Large Scale ◽

Source Code ◽

Scale Analysis ◽

Software Developers ◽

Specific Content ◽

Large Scale Analysis ◽

Research Questions

Log statements present in source code provide important information to the software developers because they are useful in various software development activities. Most of the previous studies on logging analysis and prediction provide insights and results after analyzing only a few code constructs. In this paper, the authors perform an in-depth and large-scale analysis of logging code constructs at two levels. They answer nine research questions related to statistical and content analysis. Statistical analysis at file level reveals that fewer files consist of log statements but logged files have a greater complexity than that of non-logged files. Results show that a positive correlation exists between size and logging count of the logged files. Statistical analysis on catch-blocks show that try-blocks associated with logged catch-blocks have greater complexity than non-logged catch-blocks and the logging ratio of an exception type is project specific. Content-based analysis of catch-blocks reveals the presence of different topics in try-blocks associated with logged and non-logged catch-blocks.

Download Full-text

To Fork or Not to Fork

International Journal of Open Source Software and Processes ◽

10.4018/jossp.2011070101 ◽

2011 ◽

Vol 3 (3) ◽

pp. 1-9 ◽

Cited By ~ 8

Author(s):

Linus Nyman ◽

Tommi Mikkonen

Keyword(s):

Open Source ◽

Open Source Software ◽

Software Package ◽

Source Code ◽

Original Version ◽

Development Work ◽

Software Developers

A project fork occurs when software developers take a copy of source code from one software package and use it to begin an independent development work that is maintained separately. Although forking in open source software does not require the permission of the original authors, the new version competes for the attention of the same developers that have worked on the original version. The motivations developers have for performing forks are many, but in general they have received little attention. The authors present the results of a study of forks performed in SourceForge (http://sourceforge.net/) and list the developers’ motivations for their actions.

Download Full-text

JOVA: Java Object Viewing Aid

Part A: Combustion and Alternative Energy Technology; Computers in Engineering; Drilling Technology; Environmental Engineering Technology; Composite Materials Design and Analysis; Manufacturing and Services ◽

10.1115/etce2001-17074 ◽

2001 ◽

Author(s):

Doğan Kartaltepe ◽

Venkat Subramaniam

Keyword(s):

Large Scale ◽

Source Code ◽

Unified Modeling Language ◽

Modeling Language ◽

Recent Past ◽

Software Developers ◽

Large Scale Systems ◽

Unified Modeling ◽

Java Programming ◽

Software Libraries

Abstract Both the object paradigm and the Java programming language have gained immense popularity over the years. In the recent past the Unified Modeling Language has come to be the standard for the representation of classes and their relationships. Exceptional mechanisms are available to develop html-based documentation from Java source code. However, these documentations are largely text based. As more developers get involved in developing large-scale systems using Java, it would be of significant benefit to represent classes and relationships between classes using UML. Of course, asking all vendors of Java software libraries to provide UML diagrams for their products/libraries is impractical. Java Object Viewing Aid uses the feature of reflection to extract the details of classes and relationships from the existing binaries. It then presents this information in a form that is intuitive to software developers. In this paper we present the details and features of JOVA.

Download Full-text

Using Machine Learning Image Recognition for Code Reviews

10.5121/csit.2020.101514 ◽

2020 ◽

Author(s):

Michael Dorin ◽

Trang Le ◽

Rajkumar Kolakaluri ◽

Sergio Montenegro

Keyword(s):

Machine Learning ◽

Image Recognition ◽

Source Code ◽

Cost Effective ◽

Software Developers ◽

Development Cycle ◽

Software Release ◽

Software Engineers

It is commonly understood that code reviews are a cost-effective way of finding faults early in the development cycle. However, many modern software developers are too busy to do them. Skipping code reviews means a loss of opportunity to detect expensive faults prior to software release. Software engineers can be pushed in many directions and reviewing code is very often considered an undesirable task, especially when time is wasted reviewing programs that are not ready. In this study, we wish to ascertain the potential for using machine learning and image recognition to detect immature software source code prior to a review. We show that it is possible to use machine learning to detect software problems visually and allow code reviews to focus on application details. The results are promising and are an indication that further research could be valuable.

Download Full-text

What Software Architecture Styles are Popular?

Proceedings of the Institute for System Programming of RAS ◽

10.15514/ispras-2021-33(3)-1 ◽

2021 ◽

Vol 33 (3) ◽

pp. 7-26

Author(s):

Alexey Alexandrovich Mitsyuk ◽

Nikolay Arsenovich Jamgaryan

Keyword(s):

Software Engineering ◽

Software Architecture ◽

Open Source ◽

Software Industry ◽

Source Code ◽

Automatic Recognition ◽

Software Developers ◽

Architecture Style ◽

Repository Mining ◽

Industrial Software

One can meet the software architecture style's notion in the software engineering literature. This notion is considered important in books on software architecture and university sources. However, many software developers are not so optimistic about it. It is not clear, whether this notion is just an academic concept, or is actually used in the software industry. In this paper, we measured industrial software developers' attitudes towards the concept of software architecture style. We also investigated the popularity of eleven concrete architecture styles. We applied two methods. A developers’ survey was applied to estimate developers' overall attitude and define what the community thinks about the automatic recognition of software architecture styles. Automatic crawlers were applied to mine the open-source code from the GitHub platform. These crawlers identified style smells in repositories using the features we proposed for the architecture styles. We found that the notion of software architecture style is not just a concept of academics in universities. Many software developers apply this concept in their work. We formulated features for the eleven concrete software architecture styles and developed crawlers based on these features. The results of repository mining using the features showed which styles are popular among developers of open-source projects from commercial companies and non-commercial communities. Automatic mining results were additionally validated by the Github developers survey.

Download Full-text

On the Investigation of Domain-Sensitive Bad Smells in Information Systems

10.5753/sbsi.2017.6067 ◽

2017 ◽

Cited By ~ 1

Author(s):

Markos Viggiato ◽

Johnatan Oliveira ◽

Eduardo Figueiredo

Keyword(s):

Information Systems ◽

Source Code ◽

The Other ◽

Specific Information ◽

Code Smells ◽

Software Developers ◽

Accounting Systems ◽

Detection Rates ◽

Information System Design ◽

Domain Independent

Bad smells are symptoms that something may be wrong in the information system design or source code. Although bad smells have been widely studied, we still lack an in-deep analysis about how they appear more or less frequently in specific information systems domains. The frequency of bad smells in a domain of information systems can be useful, for instance, to allow software developers to focus on the more relevant bad smells of a certain domain. Moreover, developers of new bad smell detection tools could take information about domains into consideration to improve the tool detection rates. In this paper, we investigate code smells more likely to appear in four specific information systems domains: accounting, e-commerce, health, and restaurant. Our analysis relies on 52 information systems mined from GitHub. We identified bad smells with two detection tools, PMD and JDeodorant. Our findings suggest that Comments is a domain-independent bad smell since they uniformly appear in all investigated domains. On the other hand, Large Class and Long Method can be considered domain-sensitive bad smells since they appear more frequently in accounting systems. Although less frequent in general, Long Parameter List and Switch Statements also appear more in health and e-commerce systems, respectively, than in other domains.

Download Full-text

µ -ANT: semantic microaggregation-based anonymization tool

Bioinformatics ◽

10.1093/bioinformatics/btz792 ◽

2019 ◽

Author(s):

David Sánchez ◽

Sergio Martínez ◽

Josep Domingo-Ferrer ◽

Jordi Soria-Comas ◽

Montserrat Batet

Keyword(s):

Medical Research ◽

State Of The Art ◽

Source Code ◽

Use Case ◽

Software Developers ◽

Case Examples ◽

Data Anonymization ◽

Healthcare Data ◽

Secondary Use ◽

Anonymized Data

Abstract Motivation Detailed patient data are crucial for medical research. Yet, these healthcare data can only be released for secondary use if they have undergone anonymization. Results We present and describe µ-ANT, a practical and easily configurable anonymization tool for (healthcare) data. It implements several state-of-the-art methods to offer robust privacy guarantees and preserve the utility of the anonymized data as much as possible. µ-ANT also supports the heterogenous attribute types commonly found in electronic healthcare records and targets both practitioners and software developers interested in data anonymization. Availability and implementation (source code, documentation, executable, sample datasets and use case examples) https://github.com/CrisesUrv/microaggregation-based_anonymization_tool.

Download Full-text