Weighted software metrics aggregation and its application to defect prediction

AbstractIt is a well-known practice in software engineering to aggregate software metrics to assess software artifacts for various purposes, such as their maintainability or their proneness to contain bugs. For different purposes, different metrics might be relevant. However, weighting these software metrics according to their contribution to the respective purpose is a challenging task. Manual approaches based on experts do not scale with the number of metrics. Also, experts get confused if the metrics are not independent, which is rarely the case. Automated approaches based on supervised learning require reliable and generalizable training data, a ground truth, which is rarely available. We propose an automated approach to weighted metrics aggregation that is based on unsupervised learning. It sets metrics scores and their weights based on probability theory and aggregates them. To evaluate the effectiveness, we conducted two empirical studies on defect prediction, one on ca. 200 000 code changes, and another ca. 5 000 software classes. The results show that our approach can be used as an agnostic unsupervised predictor in the absence of a ground truth.

Download Full-text

Glean

Proceedings of the VLDB Endowment ◽

10.14778/3447689.3447703 ◽

2021 ◽

Vol 14 (6) ◽

pp. 997-1005

Author(s):

Sandeep Tata ◽

Navneet Potti ◽

James B. Wendt ◽

Lauro Beltrão Costa ◽

Marc Najork ◽

...

Keyword(s):

Machine Learning ◽

Data Management ◽

Real World ◽

Empirical Studies ◽

Ground Truth ◽

Training Data ◽

Ground Truth Data ◽

Document Type ◽

Machine Learning Model ◽

Structured Information

Extracting structured information from templatic documents is an important problem with the potential to automate many real-world business workflows such as payment, procurement, and payroll. The core challenge is that such documents can be laid out in virtually infinitely different ways. A good solution to this problem is one that generalizes well not only to known templates such as invoices from a known vendor, but also to unseen ones. We developed a system called Glean to tackle this problem. Given a target schema for a document type and some labeled documents of that type, Glean uses machine learning to automatically extract structured information from other documents of that type. In this paper, we describe the overall architecture of Glean, and discuss three key data management challenges : 1) managing the quality of ground truth data, 2) generating training data for the machine learning model using labeled documents, and 3) building tools that help a developer rapidly build and improve a model for a given document type. Through empirical studies on a real-world dataset, we show that these data management techniques allow us to train a model that is over 5 F1 points better than the exact same model architecture without the techniques we describe. We argue that for such information-extraction problems, designing abstractions that carefully manage the training data is at least as important as choosing a good model architecture.

Download Full-text

113 times Tomcat: A dataset

10.7287/peerj.preprints.26491v1 ◽

2018 ◽

Cited By ~ 1

Author(s):

Giuseppe Destefanis ◽

Mahir Arzoky ◽

Steve Counsell ◽

Stephen Swift ◽

Marco Ortu ◽

...

Keyword(s):

Software Engineering ◽

Software Quality ◽

Software Metrics ◽

Empirical Studies ◽

Preliminary Results

Measuring software to get information about its properties and quality is one of the main issues in modern software engineering. The aim of this paper is to present a dataset of metrics associated to 113 versions of Tomcat. We describe the dataset along with the adopted criteria and the opportunities of research, providing preliminary results. This dataset can enhance the reliability of empirical studies, enabling their reproducibility, reducing their cost, and it can foster further research on software quality and software metrics.

Download Full-text

113 times Tomcat: A dataset

10.7287/peerj.preprints.26491 ◽

2018 ◽

Cited By ~ 1

Author(s):

Giuseppe Destefanis ◽

Mahir Arzoky ◽

Steve Counsell ◽

Stephen Swift ◽

Marco Ortu ◽

...

Keyword(s):

Software Engineering ◽

Software Quality ◽

Software Metrics ◽

Empirical Studies ◽

Preliminary Results

Download Full-text

An Empirical Study on Software Defect Prediction Using CodeBERT Model

Applied Sciences ◽

10.3390/app11114793 ◽

2021 ◽

Vol 11 (11) ◽

pp. 4793

Author(s):

Cong Pan ◽

Minyan Lu ◽

Biao Xu

Keyword(s):

Deep Learning ◽

Software Engineering ◽

Empirical Study ◽

Empirical Studies ◽

Language Model ◽

Prediction Performance ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect ◽

Cross Project

Deep learning-based software defect prediction has been popular these days. Recently, the publishing of the CodeBERT model has made it possible to perform many software engineering tasks. We propose various CodeBERT models targeting software defect prediction, including CodeBERT-NT, CodeBERT-PS, CodeBERT-PK, and CodeBERT-PT. We perform empirical studies using such models in cross-version and cross-project software defect prediction to investigate if using a neural language model like CodeBERT could improve prediction performance. We also investigate the effects of different prediction patterns in software defect prediction using CodeBERT models. The empirical results are further discussed.

Download Full-text

Exploring the Efficacy of Transfer Learning in Mining Image-Based Software Artifacts

10.21203/rs.3.rs-16922/v1 ◽

2020 ◽

Author(s):

Natalie Best ◽

Jordan Ott ◽

Erik Linstead

Keyword(s):

Software Engineering ◽

Transfer Learning ◽

Training Data ◽

Network Architectures ◽

Large Parameter ◽

Software Artifacts ◽

Engineering Data ◽

Software Engineers ◽

Uml Diagrams ◽

Large Parameter Space

Abstract Background Transfer learning allows us to train deep architectures requiring a large number of learned parameters, even if the amount of available data is limited, by leveraging existing models previously trained for another task. In previous attempts to classify image-based software artifacts in the absence of big data, it was noted that standard off-the-shelf deep architectures such as VGG-16 could not be utilized due to their large parameter space and therefore had to be replaced by customized architectures with fewer layers. This proves to be challenging to empirical software engineers who would like to make use of existing architectures without the need for customization. Findings Here we explore the applicability of transfer learning utilizing models pre-trained on non-software engineering data applied to the problem of classifying software UML diagrams. Our experimental results show training reacts positively to transfer learning as related to sample size, even though the pre-trained model was not exposed to training instances from the software domain. We contrast the transferred network with other networks to show its advantage on different sized training sets, which indicates that transfer learning is equally effective to custom deep architectures when large amounts of training data is not available. Conclusion Our findings suggest that transfer learning, even when based on models that do not contain software engineering artifacts, can provide a pathway for using off-the-shelf deep architectures without customization. This provides an alternative to practitioners who want to apply deep learning to image-based classification but to not have the expertise or comfort to define their own network architectures.

Download Full-text

Empirical Study of Software Defect Prediction: A Systematic Mapping

Symmetry ◽

10.3390/sym11020212 ◽

2019 ◽

Vol 11 (2) ◽

pp. 212 ◽

Cited By ~ 14

Author(s):

Le Son ◽

Nakul Pritam ◽

Manju Khari ◽

Raghvendra Kumar ◽

Pham Phuong ◽

...

Keyword(s):

Software Quality ◽

Empirical Studies ◽

Training Data ◽

Defect Prediction ◽

Software Defect Prediction ◽

Systematic Mapping ◽

Software Defect ◽

Multi Stage ◽

Stage Process ◽

Or Organization

Software defect prediction has been one of the key areas of exploration in the domain of software quality. In this paper, we perform a systematic mapping to analyze all the software defect prediction literature available from 1995 to 2018 using a multi-stage process. A total of 156 studies are selected in the first step, and the final mapping is conducted based on these studies. The ability of a model to learn from data that does not come from the same project or organization will help organizations that do not have sufficient training data or are going to start work on new projects. The findings of this research are useful not only to the software engineering domain, but also to the empirical studies, which mainly focus on symmetry as they provide steps-by-steps solutions for questions raised in the article.

Download Full-text

An Unsupervised Learning Technique to Optimize Radio Maps for Indoor Localization

Sensors ◽

10.3390/s19040752 ◽

2019 ◽

Vol 19 (4) ◽

pp. 752 ◽

Cited By ~ 5

Author(s):

Jens Trogh ◽

Wout Joseph ◽

Luc Martens ◽

David Plets

Keyword(s):

Unsupervised Learning ◽

Indoor Localization ◽

Inertial Sensor ◽

Ground Truth ◽

Training Data ◽

Model Based ◽

Site Survey ◽

Strength Based ◽

Fingerprint Database ◽

Radio Map

A major burden of signal strength-based fingerprinting for indoor positioning is the generation and maintenance of a radio map, also known as a fingerprint database. Model-based radio maps are generated much faster than measurement-based radio maps but are generally not accurate enough. This work proposes a method to automatically construct and optimize a model-based radio map. The method is based on unsupervised learning, where random walks, for which the ground truth locations are unknown, serve as input for the optimization, along with a floor plan and a location tracking algorithm. No measurement campaign or site survey, which are labor-intensive and time-consuming, or inertial sensor measurements, which are often not available and consume additional power, are needed for this approach. Experiments in a large office building, covering over 1100 m2, resulted in median accuracies of up to 2.07 m, or a relative improvement of 28.6% with only 15 min of unlabeled training data.

Download Full-text

Neural Network Model to Calculate the Creep Data Using Size

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse.v7i8.60 ◽

2017 ◽

Vol 7 (8) ◽

pp. 257

Author(s):

Seetharam .K ◽

Sharana Basava Gowda ◽

. Varadaraj

Keyword(s):

Neural Network ◽

Risk Factors ◽

Software Engineering ◽

Network Model ◽

Neural Network Model ◽

Software Metrics ◽

Creep Data ◽

Engineering Software ◽

Function Points ◽

Different Levels

In Software engineering software metrics play wide and deeper scope. Many projects fail because of risks in software engineering development[1]t. Among various risk factors creeping is also one factor. The paper discusses approximate volume of creeping requirements that occur after the completion of the nominal requirements phase. This is using software size measured in function points at four different levels. The major risk factors are depending both directly and indirectly associated with software size of development. Hence It is possible to predict risk due to creeping cause using size.

Download Full-text

Copula-based software metrics aggregation

Software Quality Journal ◽

10.1007/s11219-021-09568-9 ◽

2021 ◽

Author(s):

Maria Ulan ◽

Welf Löwe ◽

Morgan Ericsson ◽

Anna Wingkvist

Keyword(s):

Open Source Software ◽

Software Metrics ◽

Information Quality ◽

Empirical Studies ◽

Probabilistic Approach ◽

Software Systems ◽

Quality Model ◽

Quality Models ◽

Joint Distributions ◽

Abstract Notion

AbstractA quality model is a conceptual decomposition of an abstract notion of quality into relevant, possibly conflicting characteristics and further into measurable metrics. For quality assessment and decision making, metrics values are aggregated to characteristics and ultimately to quality scores. Aggregation has often been problematic as quality models do not provide the semantics of aggregation. This makes it hard to formally reason about metrics, characteristics, and quality. We argue that aggregation needs to be interpretable and mathematically well defined in order to assess, to compare, and to improve quality. To address this challenge, we propose a probabilistic approach to aggregation and define quality scores based on joint distributions of absolute metrics values. To evaluate the proposed approach and its implementation under realistic conditions, we conduct empirical studies on bug prediction of ca. 5000 software classes, maintainability of ca. 15000 open-source software systems, and on the information quality of ca. 100000 real-world technical documents. We found that our approach is feasible, accurate, and scalable in performance.

Download Full-text

MULFE: Multi-Label Learning via Label-Specific Feature Space Ensemble

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3451392 ◽

2021 ◽

Vol 16 (1) ◽

pp. 1-24

Author(s):

Yaojin Lin ◽

Qinghua Hu ◽

Jinghua Liu ◽

Xingquan Zhu ◽

Xindong Wu

Keyword(s):

Empirical Studies ◽

Feature Space ◽

Training Data ◽

Data Sets ◽

Learning Framework ◽

Feature Spaces ◽

Public Data ◽

Margin Distribution ◽

Label Correlations ◽

Label Correlation

In multi-label learning, label correlations commonly exist in the data. Such correlation not only provides useful information, but also imposes significant challenges for multi-label learning. Recently, label-specific feature embedding has been proposed to explore label-specific features from the training data, and uses feature highly customized to the multi-label set for learning. While such feature embedding methods have demonstrated good performance, the creation of the feature embedding space is only based on a single label, without considering label correlations in the data. In this article, we propose to combine multiple label-specific feature spaces, using label correlation, for multi-label learning. The proposed algorithm, mu lti- l abel-specific f eature space e nsemble (MULFE), takes consideration label-specific features, label correlation, and weighted ensemble principle to form a learning framework. By conducting clustering analysis on each label’s negative and positive instances, MULFE first creates features customized to each label. After that, MULFE utilizes the label correlation to optimize the margin distribution of the base classifiers which are induced by the related label-specific feature spaces. By combining multiple label-specific features, label correlation based weighting, and ensemble learning, MULFE achieves maximum margin multi-label classification goal through the underlying optimization framework. Empirical studies on 10 public data sets manifest the effectiveness of MULFE.

Download Full-text