scholarly journals Comments on "Researcher bias: The use of machine learning in software defect prediction"

Author(s):  
Chakkrit Tantithamthavorn ◽  
Shane McIntosh ◽  
Ahmed E Hassan ◽  
Kenichi Matsumoto

Shepperd et al. find that the reported performance of a defect prediction model shares a strong relationship with the group of researchers who construct the models. In this paper, we perform an alternative investigation of Shepperd et al.’s data. We observe that (a) research group shares a strong association with other explanatory variables (i.e., the dataset and metric families that are used to build a model); (b) the strong association among these explanatory variables makes it difficult to discern the impact of the research group on model performance; and (c) after mitigating the impact of this strong association, we find that the research group has a smaller impact than the metric family. These observations lead us to conclude that the relationship between the researcher group and the performance of a defect prediction model are more likely due to the tendency of researchers to reuse experimental components (e.g., datasets and metrics). We recommend that researchers experiment with a broader selection of datasets and metrics to combat any potential bias in their results.

2016 ◽  
Author(s):  
Chakkrit Tantithamthavorn ◽  
Shane McIntosh ◽  
Ahmed E Hassan ◽  
Kenichi Matsumoto

Shepperd et al. find that the reported performance of a defect prediction model shares a strong relationship with the group of researchers who construct the models. In this paper, we perform an alternative investigation of Shepperd et al.’s data. We observe that (a) research group shares a strong association with other explanatory variables (i.e., the dataset and metric families that are used to build a model); (b) the strong association among these explanatory variables makes it difficult to discern the impact of the research group on model performance; and (c) after mitigating the impact of this strong association, we find that the research group has a smaller impact than the metric family. These observations lead us to conclude that the relationship between the researcher group and the performance of a defect prediction model are more likely due to the tendency of researchers to reuse experimental components (e.g., datasets and metrics). We recommend that researchers experiment with a broader selection of datasets and metrics to combat any potential bias in their results.


Author(s):  
Chakkrit Tantithamthavorn ◽  
Shane McIntosh ◽  
Ahmed E Hassan ◽  
Kenichi Matsumoto

Shepperd et al. (2014) find that the reported performance of a defect prediction model shares a strong relationship with the group of researchers who construct the models. In this paper, we perform an alternative investigation of Shepperd et al. (2014)’s data. We observe that (a) researcher group shares a strong association with the dataset and metric families that are used to build a model; (b) the strong association among the explanatory variables introduces a large amount of interference when interpreting the impact of the researcher group on model performance; and (c) after mitigating the interference, we find that the researcher group has a smaller impact than the metric family. These observations lead us to conclude that the relationship between the researcher group and the performance of a defect prediction model may have more to do with the tendency of researchers to reuse experimental components (e.g., datasets and metrics). We recommend that researchers experiment with a broader selection of datasets and metrics to combat potential bias in their results.


2011 ◽  
Vol 34 (6) ◽  
pp. 1148-1154 ◽  
Author(s):  
Hui-Yan JIANG ◽  
Mao ZONG ◽  
Xiang-Ying LIU

2019 ◽  
Vol 477 ◽  
pp. 399-409 ◽  
Author(s):  
Hua Wei ◽  
Changzhen Hu ◽  
Shiyou Chen ◽  
Yuan Xue ◽  
Quanxin Zhang

Author(s):  
Lina Gong ◽  
Gopi Krishnan Krishnan Rajbahadur ◽  
Ahmed E. Hassan ◽  
S. Jiang

Author(s):  
Song Song ◽  
Youpeng Xu ◽  
Jiali Wang ◽  
Jinkang Du ◽  
Jianxin Zhang ◽  
...  

Distributed/semi-distributed models are considered to be sensitive to the spatial resolution of the data input. In this paper, we take a small catchment in high urbanized Yangtze River Delta, Qinhuai catchment as study area, to analyze the impact of spatial resolution of precipitation and the potential evapotranspiration (PET) on the long-term runoff and flood runoff process. The data source includes the TRMM precipitation data, FEWS download PET data, and the interpolated metrological station data. GIS/RS technique was used to collect and pre-process the geographical, precipitation and PET series, which were then served as the input of CREST (Coupled Routing and Excess Storage) model to simulate the runoff process. The results clearly showed that, the CREST model is applicable to the Qinhuai catchment; the spatial resolution of precipitation had strong influence on the modelled runoff results and the metrological precipitation data cannot be substituted by the TRMM data in small catchment; the CREST model was not sensitive to the spatial resolution of the PET data, while the estimation fourmula of the PET data was correlated with the model quality. This paper focused on the small urbanized catchment, suggesting the influential explanatory variables for the model performance, and providing reliable reference for the study in similar area.


2017 ◽  
Vol 9 (2) ◽  
pp. 426-435
Author(s):  
Marise Vermeulen

This study investigated the relationship between share returns and nine variables that had been proven to influence returns in previous research, using a multiple regression analysis. These variables are size, leverage, book-to-market ratio, earnings yield, dividend payout, earnings growth, return on equity, earnings per share and asset growth. The impact of some of the variables on share returns proved to be insignificant, and some collinearity was identified between some of the variables. However, three significant variables were identified and the final regression model included the book-to-market ratio, dividend payout and leverage as the explanatory variables.


Author(s):  
Liqiong Chen ◽  
Shilong Song ◽  
Can Wang

Just-in-time software defect prediction (JIT-SDP) is a fine-grained software defect prediction technology, which aims to identify the defective code changes in software systems. Effort-aware software defect prediction is a software defect prediction technology that takes into consideration the cost of code inspection, which can find more defective code changes in limited test resources. The traditional effort-aware defect prediction model mainly measures the effort based on the number of lines of code (LOC) and rarely considers additional factors. This paper proposes a novel effort measure method called Multi-Metric Joint Calculation (MMJC). When measuring the effort, MMJC takes into account not only LOC, but also the distribution of modified code across different files (Entropy), the number of developers that changed the files (NDEV) and the developer experience (EXP). In the simulation experiment, MMJC is combined with Linear Regression, Decision Tree, Random Forest, LightGBM, Support Vector Machine and Neural Network, respectively, to build the software defect prediction model. Several comparative experiments are conducted between the models based on MMJC and baseline models. The results show that indicators ACC and [Formula: see text] of the models based on MMJC are improved by 35.3% and 15.9% on average in the three verification scenarios, respectively, compared with the baseline models.


Sign in / Sign up

Export Citation Format

Share Document