Comments on "Researcher bias: The use of machine learning in software defect prediction"

10.7287/peerj.preprints.1260 ◽

2016 ◽

Author(s):

Chakkrit Tantithamthavorn ◽

Shane McIntosh ◽

Ahmed E Hassan ◽

Kenichi Matsumoto

Keyword(s):

Prediction Model ◽

Research Group ◽

Strong Association ◽

Model Performance ◽

Strong Relationship ◽

Defect Prediction ◽

Explanatory Variables ◽

Software Defect ◽

The Impact ◽

The Relationship

Shepperd et al. find that the reported performance of a defect prediction model shares a strong relationship with the group of researchers who construct the models. In this paper, we perform an alternative investigation of Shepperd et al.’s data. We observe that (a) research group shares a strong association with other explanatory variables (i.e., the dataset and metric families that are used to build a model); (b) the strong association among these explanatory variables makes it difficult to discern the impact of the research group on model performance; and (c) after mitigating the impact of this strong association, we find that the research group has a smaller impact than the metric family. These observations lead us to conclude that the relationship between the researcher group and the performance of a defect prediction model are more likely due to the tendency of researchers to reuse experimental components (e.g., datasets and metrics). We recommend that researchers experiment with a broader selection of datasets and metrics to combat any potential bias in their results.

Download Full-text

Comments on "Researcher bias: The use of machine learning in software defect prediction"

10.7287/peerj.preprints.1260v1 ◽

2015 ◽

Cited By ~ 1

Author(s):

Chakkrit Tantithamthavorn ◽

Shane McIntosh ◽

Ahmed E Hassan ◽

Kenichi Matsumoto

Keyword(s):

Prediction Model ◽

Strong Association ◽

Model Performance ◽

Strong Relationship ◽

Defect Prediction ◽

Explanatory Variables ◽

Software Defect ◽

The Impact ◽

The Relationship ◽

Selection Of

Shepperd et al. (2014) find that the reported performance of a defect prediction model shares a strong relationship with the group of researchers who construct the models. In this paper, we perform an alternative investigation of Shepperd et al. (2014)’s data. We observe that (a) researcher group shares a strong association with the dataset and metric families that are used to build a model; (b) the strong association among the explanatory variables introduces a large amount of interference when interpreting the impact of the researcher group on model performance; and (c) after mitigating the interference, we find that the researcher group has a smaller impact than the metric family. These observations lead us to conclude that the relationship between the researcher group and the performance of a defect prediction model may have more to do with the tendency of researchers to reuse experimental components (e.g., datasets and metrics). We recommend that researchers experiment with a broader selection of datasets and metrics to combat potential bias in their results.

Download Full-text

Research of Software Defect Prediction Model Based on ACO-SVM

Chinese Journal of Computers ◽

10.3724/sp.j.1016.2011.01148 ◽

2011 ◽

Vol 34 (6) ◽

pp. 1148-1154 ◽

Cited By ~ 13

Author(s):

Hui-Yan JIANG ◽

Mao ZONG ◽

Xiang-Ying LIU

Keyword(s):

Prediction Model ◽

Defect Prediction ◽

Software Defect Prediction ◽

Model Based ◽

Software Defect

Download Full-text

The impact of the distance metric and measure on SMOTE-based techniques in software defect prediction

Information and Software Technology ◽

10.1016/j.infsof.2021.106742 ◽

2021 ◽

pp. 106742

Author(s):

Shuo Feng ◽

Jacky Keung ◽

Peichang Zhang ◽

Yan Xiao ◽

Miao Zhang

Keyword(s):

Defect Prediction ◽

Software Defect Prediction ◽

Distance Metric ◽

Software Defect ◽

The Impact

Download Full-text

Establishing a software defect prediction model via effective dimension reduction

Information Sciences ◽

10.1016/j.ins.2018.10.056 ◽

2019 ◽

Vol 477 ◽

pp. 399-409 ◽

Cited By ~ 7

Author(s):

Hua Wei ◽

Changzhen Hu ◽

Shiyou Chen ◽

Yuan Xue ◽

Quanxin Zhang

Keyword(s):

Prediction Model ◽

Dimension Reduction ◽

Defect Prediction ◽

Software Defect Prediction ◽

Effective Dimension ◽

Software Defect ◽

Effective Dimension Reduction

Download Full-text

Revisiting the Impact of Dependency Network Metrics on Software Defect Prediction

IEEE Transactions on Software Engineering ◽

10.1109/tse.2021.3131950 ◽

2021 ◽

pp. 1-1

Author(s):

Lina Gong ◽

Gopi Krishnan Krishnan Rajbahadur ◽

Ahmed E. Hassan ◽

S. Jiang

Keyword(s):

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect ◽

Dependency Network ◽

Network Metrics ◽

The Impact

Download Full-text

The Suitability of the Satellite Metrological Inputs Source on the Hydrological Model in a Small Urban Catchment

10.20944/preprints201608.0134.v1 ◽

2016 ◽

Author(s):

Song Song ◽

Youpeng Xu ◽

Jiali Wang ◽

Jinkang Du ◽

Jianxin Zhang ◽

...

Keyword(s):

Spatial Resolution ◽

Potential Evapotranspiration ◽

Model Performance ◽

Yangtze River Delta ◽

Precipitation Data ◽

Model Quality ◽

Small Catchment ◽

Explanatory Variables ◽

Distributed Models ◽

The Impact

Distributed/semi-distributed models are considered to be sensitive to the spatial resolution of the data input. In this paper, we take a small catchment in high urbanized Yangtze River Delta, Qinhuai catchment as study area, to analyze the impact of spatial resolution of precipitation and the potential evapotranspiration (PET) on the long-term runoff and flood runoff process. The data source includes the TRMM precipitation data, FEWS download PET data, and the interpolated metrological station data. GIS/RS technique was used to collect and pre-process the geographical, precipitation and PET series, which were then served as the input of CREST (Coupled Routing and Excess Storage) model to simulate the runoff process. The results clearly showed that, the CREST model is applicable to the Qinhuai catchment; the spatial resolution of precipitation had strong influence on the modelled runoff results and the metrological precipitation data cannot be substituted by the TRMM data in small catchment; the CREST model was not sensitive to the spatial resolution of the PET data, while the estimation fourmula of the PET data was correlated with the model quality. This paper focused on the small urbanized catchment, suggesting the influential explanatory variables for the model performance, and providing reliable reference for the study in similar area.

Download Full-text

Fundamental factors influencing returns of shares listed on the Johannesburg Stock Exchange in South Africa

Journal of Economic and Financial Sciences ◽

10.4102/jef.v9i2.50 ◽

2017 ◽

Vol 9 (2) ◽

pp. 426-435

Author(s):

Marise Vermeulen

Keyword(s):

Stock Exchange ◽

Dividend Payout ◽

Return On Equity ◽

Explanatory Variables ◽

Johannesburg Stock Exchange ◽

Earnings Per Share ◽

Asset Growth ◽

Book To Market Ratio ◽

The Impact ◽

The Relationship

This study investigated the relationship between share returns and nine variables that had been proven to influence returns in previous research, using a multiple regression analysis. These variables are size, leverage, book-to-market ratio, earnings yield, dividend payout, earnings growth, return on equity, earnings per share and asset growth. The impact of some of the variables on share returns proved to be insignificant, and some collinearity was identified between some of the variables. However, three significant variables were identified and the final regression model included the book-to-market ratio, dividend payout and leverage as the explanatory variables.

Download Full-text

Research of Software Defect Prediction Model Based on Gray Theory

2009 International Conference on Management and Service Science ◽

10.1109/icmss.2009.5301677 ◽

2009 ◽

Cited By ~ 1

Author(s):

Zhuo-yuan Xiang ◽

Zhitao Tang

Keyword(s):

Prediction Model ◽

Defect Prediction ◽

Software Defect Prediction ◽

Model Based ◽

Software Defect ◽

Gray Theory

Download Full-text

A Novel Effort Measure Method for Effort-Aware Just-in-Time Software Defect Prediction

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194021500364 ◽

2021 ◽

Vol 31 (08) ◽

pp. 1145-1169

Author(s):

Liqiong Chen ◽

Shilong Song ◽

Can Wang

Keyword(s):

Prediction Model ◽

Defect Prediction ◽

Software Systems ◽

Support Vector ◽

Just In Time ◽

Software Defect Prediction ◽

Fine Grained ◽

Software Defect ◽

Code Changes ◽

The Cost

Just-in-time software defect prediction (JIT-SDP) is a fine-grained software defect prediction technology, which aims to identify the defective code changes in software systems. Effort-aware software defect prediction is a software defect prediction technology that takes into consideration the cost of code inspection, which can find more defective code changes in limited test resources. The traditional effort-aware defect prediction model mainly measures the effort based on the number of lines of code (LOC) and rarely considers additional factors. This paper proposes a novel effort measure method called Multi-Metric Joint Calculation (MMJC). When measuring the effort, MMJC takes into account not only LOC, but also the distribution of modified code across different files (Entropy), the number of developers that changed the files (NDEV) and the developer experience (EXP). In the simulation experiment, MMJC is combined with Linear Regression, Decision Tree, Random Forest, LightGBM, Support Vector Machine and Neural Network, respectively, to build the software defect prediction model. Several comparative experiments are conducted between the models based on MMJC and baseline models. The results show that indicators ACC and [Formula: see text] of the models based on MMJC are improved by 35.3% and 15.9% on average in the three verification scenarios, respectively, compared with the baseline models.

Download Full-text