Estimation of Target Defect Prediction Coverage in Heterogeneous Cross Software Projects

Author(s):  
Rohit Vashisht ◽  
Syed Afzal Murtaza Rizvi

Heterogeneous cross-project defect prediction (HCPDP) is an evolving area under quality assurance domain which aims to predict defects in a target project that has restricted historical defect data as well as completely non-uniform software metrics from other projects using a model built on another source project. The article discusses a particular source project group's problem of defect prediction coverage (DPC) and also proposes a novel two phase model for addressing this issue in HCPDP. The study has evaluated DPC on 13 benchmarked datasets in three open source software projects. One hundred percent of DPC is achieved with higher defect prediction accuracy for two project group pairs. The issue of partial DPC is found in third prediction pairs and a new strategy is proposed in the research study to overcome this issue. Furthermore, this paper compares HCPDP modeling with reference to with-in project defect prediction (WPDP), both empirically and theoretically, and it is found that the performance of WPDP is highly comparable to HCPDP and gradient boosting method performs best among all three classifiers.

2018 ◽  
Vol 2018 ◽  
pp. 1-13 ◽  
Author(s):  
Haijin Ji ◽  
Song Huang

Different data preprocessing methods and classifiers have been established and evaluated earlier for the software defect prediction (SDP) across projects. These novel approaches have provided relatively acceptable prediction results for different software projects. However, to the best of our knowledge, few researchers have combined data preprocessing and building robust classifier simultaneously to improve prediction performances in SDP. Therefore, this paper presents a new whole framework for predicting fault-prone software modules. The proposed framework consists of instance filtering, feature selection, instance reduction, and establishing a new classifier. Additionally, we find that the 21 main software metrics commonly do follow nonnormal distribution after performing a Kolmogorov-Smirnov test. Therefore, the newly proposed classifier is built on the maximum correntropy criterion (MCC). The MCC is well-known for its effectiveness in handling non-Gaussian noise. To evaluate the new framework, the experimental study is designed with due care using nine open-source software projects with their 32 releases, obtained from the PROMISE data repository. The prediction accuracy is evaluated using F-measure. The state-of-the-art methods for Cross-Project Defect Prediction are also included for comparison. All of the evidences derived from the experimentation verify the effectiveness and robustness of our new framework.


2020 ◽  
Vol 10 (13) ◽  
pp. 4624
Author(s):  
Mitja Gradišnik ◽  
Tina Beranič ◽  
Sašo Karakatič

Software maintenance is one of the key stages in the software lifecycle and it includes a variety of activities that consume the significant portion of the costs of a software project. Previous research suggest that future software maintainability can be predicted, based on various source code aspects, but most of the research focuses on the prediction based on the present state of the code and ignores its history. While taking the history into account in software maintainability prediction seems intuitive, the research empirically testing this has not been done, and is the main goal of this paper. This paper empirically evaluates the contribution of historical measurements of the Chidamber & Kemerer (C&K) software metrics to software maintainability prediction models. The main contribution of the paper is the building of the prediction models with classification and regression trees and random forest learners in iterations by adding historical measurement data extracted from previous releases gradually. The maintainability prediction models were built based on software metric measurements obtained from real-world open-source software projects. The analysis of the results show that an additional amount of historical metric measurements contributes to the maintainability prediction. Additionally, the study evaluates the contribution of individual C&K software metrics on the performance of maintainability prediction models.


Author(s):  
Vandana Singh ◽  
Brice Bongiovanni

This article presents the results of a research study about the experiences of women in Open Source Software communities. The lack of women in computing professions serves as a cause of social inequity and in this research we develop a nuanced understanding of the experiences of women participating in open-source software. In-depth qualitative interviews were conducted with eleven women representing multiple countries and a variety of open-source software projects. The theory of individual differences in gender and information technology (IT) laid the foundation for data analysis and interpretation. The results demonstrate varied experiences of women, the need for women-to-women mentoring, and the need for presence and enforcement of Codes of Conduct in the online communities. Women shared their experiences of working in a variety of roles and the importance of all the roles in product development and maintenance. The persistence of women in OSS communities despite the toxic masculine culture, and their interest in improving the environment for other women and marginalized newcomers, was evident from the interviews.


2019 ◽  
Vol 107 ◽  
pp. 125-136 ◽  
Author(s):  
Chao Liu ◽  
Dan Yang ◽  
Xin Xia ◽  
Meng Yan ◽  
Xiaohong Zhang

2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Yiwen Zhong ◽  
Kun Song ◽  
ShengKai Lv ◽  
Peng He

Cross-project defect prediction (CPDP) is a mainstream method estimating the most defect-prone components of software with limited historical data. Several studies investigate how software metrics are used and how modeling techniques influence prediction performance. However, the software’s metrics diversity impact on the predictor remains unclear. Thus, this paper aims to assess the impact of various metric sets on CPDP and investigate the feasibility of CPDP with hybrid metrics. Based on four software metrics types, we investigate the impact of various metric sets on CPDP in terms of F-measure and statistical methods. Then, we validate the dominant performance of CPDP with hybrid metrics. Finally, we further verify the CPDP-OSS feasibility built with three types of metrics (orient-object, semantic, and structural metrics) and challenge them against two current models. The experimental results suggest that the impact of different metric sets on the performance of CPDP is significantly distinct, with semantic and structural metrics performing better. Additionally, trials indicate that it is helpful for CPDP to increase the software’s metrics diversity appropriately, as the CPDP-OSS improvement is up to 53.8%. Finally, compared with two baseline methods, TCA+ and TDSelector, the optimized CPDP model is viable in practice, and the improvement rate is up to 50.6% and 25.7%, respectively.


2020 ◽  
Vol 21 (2) ◽  
pp. 206-214
Author(s):  
V. S. Tynchenko ◽  
◽  
I. A. Golovenok ◽  
V. E. Petrenko ◽  
A. V. Milov ◽  
...  

2020 ◽  
Author(s):  
Sonali Srivastava ◽  
Shikha Rani ◽  
Shailly Singh ◽  
Saurabh Singh ◽  
Rohit Vashisht

Author(s):  
Maria Ulan ◽  
Welf Löwe ◽  
Morgan Ericsson ◽  
Anna Wingkvist

AbstractA quality model is a conceptual decomposition of an abstract notion of quality into relevant, possibly conflicting characteristics and further into measurable metrics. For quality assessment and decision making, metrics values are aggregated to characteristics and ultimately to quality scores. Aggregation has often been problematic as quality models do not provide the semantics of aggregation. This makes it hard to formally reason about metrics, characteristics, and quality. We argue that aggregation needs to be interpretable and mathematically well defined in order to assess, to compare, and to improve quality. To address this challenge, we propose a probabilistic approach to aggregation and define quality scores based on joint distributions of absolute metrics values. To evaluate the proposed approach and its implementation under realistic conditions, we conduct empirical studies on bug prediction of ca. 5000 software classes, maintainability of ca. 15000 open-source software systems, and on the information quality of ca. 100000 real-world technical documents. We found that our approach is feasible, accurate, and scalable in performance.


Sign in / Sign up

Export Citation Format

Share Document