scholarly journals Automatic Calculation of Process Metrics and their Bug Prediction Capabilities

2017 ◽  
Vol 23 (2) ◽  
pp. 537-559
Author(s):  
Péter Gyimesi

Identifying fault-prone code parts is useful for the developers to help reduce the time required for locating bugs. It is usually done by characterizing the already known bugs with certain kinds of metrics and building a predictive model from the data. For the characterization of bugs, software product and process metrics are the most popular ones. The calculation of product metrics is supported by many free and commercial software products. However, tools that are capable of computing process metrics are quite rare. In this study, we present a method of computing software process metrics in a graph database. We describe the schema of the database created and we present a way to readily get the process metrics from it. With this technique, process metrics can be calculated at the file, class and method levels. We used GitHub as the source of the change history and we selected 5 open-source Java projects for processing. To retrieve positional information about the classes and methods, we used SourceMeter, a static source code analyzer tool. We used Neo4j as the graph database engine, and its query language - cypher - to get the process metrics. We published the tools we created as open-source projects on GitHub. To demonstrate the utility of our tools, we selected 25 release versions of the 5 Java projects and calculated the process metrics for all of the source code elements (files, classes and methods) in these versions. Using our previous published bug database, we built bug databases for the selected projects that contain the computed process metrics and the corresponding bug numbers for files and classes. (We published these databases as an online appendix.) Then we applied 13 machine learning algorithms on the database we created to find out if it is feasible for bug prediction purposes. We achieved F-measure values on average of around 0.7 at the class level, and slightly better values of between 0.7 and 0.75 at the file level. The best performing algorithm was the RandomForest method for both cases.

2013 ◽  
Vol 07 (03) ◽  
pp. 215-236 ◽  
Author(s):  
ANDREW CRAPO ◽  
ABHA MOITRA

The Semantic Application Design Language (SADL) combines advances in standardized declarative modeling languages based on formal logic with advances in domain-specific language (DSL) development environments to create a controlled-English language that translates directly into the Web Ontology Language (OWL), the SPARQL graph query language, and a compatible if/then rule language. Models in the SADL language can be authored, tested, and maintained in an Eclipse-based integrated development environment (IDE). This environment offers semantic highlighting, statement completion, expression templates, hyperlinking of concepts to their definition, model validation, automatic error correction, and other advanced authoring features to enhance the ease and productivity of the modeling environment. In addition, the SADL language offers the ability to build in validation tests and test suites that can be used for regression testing. Through common Eclipse functionality, the models can be easily placed under source code control, versioned, and managed throughout the life of the model. Differences between versions can be compared side-by-side. Finally, the SADL-IDE offers an explanation capability that is useful in understanding what was inferred by the reasoner/rule engine and why those conclusions were reached. Perhaps more importantly, explanation is available of why an expected inference failed to occur. The objective of the language and the IDE is to enable domain experts to play a more active and productive role in capturing their knowledge and making it available as computable artifacts useful for automation where appropriate and for decision support systems in applications that benefit from a collaborative human-computer approach. SADL is built entirely on open source code and most of SADL is itself released to open source. This paper explores the concepts behind the language and provides details and examples of the authoring and model lifecycle support facilities.


2009 ◽  
pp. 3008-3036 ◽  
Author(s):  
Stefan Koch ◽  
Christian Neumann

There has been considerable discussion on the possible impacts of open source software development practices, especially in regard to the quality of the resulting software product. Recent studies have shown that analyzing data from source code repositories is an efficient way to gather information about project characteristics and programmers, showing that OSS projects are very heterogeneous in their team structures and software processes. However, one problem is that the resulting process metrics measuring attributes of the development process and of the development environment do not give any hints about the quality, complexity, or structure of the resulting software. Therefore, we expanded the analysis by calculating several product metrics, most of them specifically tailored to object-oriented software. We then analyzed the relationship between these product metrics and process metrics derived from a CVS repository. The aim was to establish whether different variants of open source development processes have a significant impact on the resulting software products. In particular we analyzed the impact on quality and design associated with the numbers of contributors and the amount of their work, using the GINI coefficient as a measure of inequality within the developer group.


Author(s):  
A.A. Shinkarev ◽  

At the moment there are many open source software products and packages, and their number is increasing every day. So it can be concluded that publishing source code is becoming more and more popular in the world of software development. When publishing the source code of a software solution or software package for use in the developer community, special attention should be given to the license type – this affects which scenarios will be available for use of the published package or software solution. It is also necessary to draw up full and detailed documentation and decide on the ways to promote the published package among developers. The purpose of the study was to justify the feasibility and necessity of publishing software products, packages and libraries for their use by other developers to build their own systems and services. The author meant to describe the major open source licenses, identify their features and differences, and those situations for which this or that type of license is suitable, as well as to demonstrate the need of writing documentation and describe ways to promote and popularize published software products, packages, and libraries in the developer community. Materials and methods. The paper considers official license documents describing conditions of use, reproduction, and distribution. The author analyzes the main ways and means to promote open source software products. Results. The article substantiates the relevance of publishing and using the source code of a software product, package or library. The author describes the main provisions of the most common licenses and gives advice on choosing the type of license when publishing source code for free use. The necessity of writing documentation for the published software product is substantiated. The article also describes some of the ways to promote published packages, such as the choice of name, speaking at conferences, and publishing articles with case studies.


Author(s):  
Shuhan Yan ◽  
Tianjiao Du ◽  
Beijun Shen ◽  
Yuting Chen ◽  
Zhilei Ren

Users frequently raise feedback when using software products. Feedback from users regarding their experiences and expectations and software defects they found adds values to software maintenance and evolution — software managers collect user feedback and then dispatch feedback issues that developers (and/or maintainers) need to track and process. Feedback tracking is often supported by open source platforms and collaborative software systems. Meanwhile, there still exists a gap between feedback issues and source code: since user feedback is usually informal and arbitrary, engineers have to spend much effort on comprehending issues and identifying which source code files need to be improved or fixed. This paper introduces a deep learning approach, Feedback2Code , which facilitates identification of user-feedback-related source code files. The core idea is to (1) explore latent semantics of user feedback and source code using several deep learning techniques such as Multi-Layer Perceptron (MLP), Convolutional Neutral Network (CNN) and skip-gram and (2) establish a multi-correlation model to explore linkages between feedback issues and source code files. Given a feedback issue, the linkages then allow engineers to identify source code files that are highly relevant to the issue. We have implemented Feedback2Code and evaluated it against ChangeAdvisor (a state-of-the-art approach) on 24 open source projects. The evaluation results clearly show the strength of Feedback2Code : for 103793 feedback issues, Feedback2Code successfully established 101190 feedback-code linkages and achieved a precision that is [Formula: see text] higher than that of ChangeAdvisor . Feedback2Code also achieved an MRR and an MAP that are [Formula: see text] and [Formula: see text] higher than those of ChangeAdvisor , respectively. Furthermore, we also found that a Feedback2Code -trained model can be easily transferred, allowing feedback-code linkages to be established in new projects with a little history data.


2020 ◽  
Vol 10 (4) ◽  
pp. 1270
Author(s):  
Razvan Raducu ◽  
Gonzalo Esteban ◽  
Francisco J. Rodríguez Lera ◽  
Camino Fernández

Different Machine Learning techniques to detect software vulnerabilities have emerged in scientific and industrial scenarios. Different actors in these scenarios aim to develop algorithms for predicting security threats without requiring human intervention. However, these algorithms require data-driven engines based on the processing of huge amounts of data, known as datasets. This paper introduces the SonarCloud Vulnerable Code Prospector for C (SVCP4C). This tool aims to collect vulnerable source code from open source repositories linked to SonarCloud, an online tool that performs static analysis and tags the potentially vulnerable code. The tool provides a set of tagged files suitable for extracting features and creating training datasets for Machine Learning algorithms. This study presents a descriptive analysis of these files and overviews current status of C vulnerabilities, specifically buffer overflow, in the reviewed public repositories.


Author(s):  
Stefan Koch ◽  
Christian Neumann

There has been considerable discussion on the possible impacts of open source software development practices, especially in regard to the quality of the resulting software product. Recent studies have shown that analyzing data from source code repositories is an efficient way to gather information about project characteristics and programmers, showing that OSS projects are very heterogeneous in their team structures and software processes. However, one problem is that the resulting process metrics measuring attributes of the development process and of the development environment do not give any hints about the quality, complexity, or structure of the resulting software. Therefore, we expanded the analysis by calculating several product metrics, most of them specifically tailored to object-oriented software. We then analyzed the relationship between these product metrics and process metrics derived from a CVS repository. The aim was to establish whether different variants of open source development processes have a significant impact on the resulting software products. In particular we analyzed the impact on quality and design associated with the numbers of contributors and the amount of their work, using the GINI coefficient as a measure of inequality within the developer group.


Author(s):  
Himanshi Vashisht ◽  
Sanjay Bharadwaj ◽  
Sushma Sharma

Code refactoring is a “Process of restructuring an existing source code.”. It also helps in improving the internal structure of the code without really affecting its external behaviour”. It changes a source code in such a way that it does not alter the external behaviour yet still it improves its internal structure. It is a way to clean up code that minimizes the chances of introducing bugs. Refactoring is a change made to the internal structure of a software component to make it easier to understand and cheaper to modify, without changing the observable behaviour of that software component. Bad smells indicate that there is something wrong in the code that have to refactor. There are different tools that are available to identify and emove these bad smells. A software has two types of quality attributes- Internal and external. In this paper we will study the effect of clone refactoring on software quality attributes.


2017 ◽  
Author(s):  
George H. Shaw ◽  
◽  
Howard D. Mooers ◽  
Josef Smrz ◽  
Zdenek Papez ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document