scholarly journals Comments on "Researcher bias: The use of machine learning in software defect prediction"

Author(s):  
Chakkrit Tantithamthavorn ◽  
Shane McIntosh ◽  
Ahmed E Hassan ◽  
Kenichi Matsumoto

Shepperd et al. (2014) find that the reported performance of a defect prediction model shares a strong relationship with the group of researchers who construct the models. In this paper, we perform an alternative investigation of Shepperd et al. (2014)’s data. We observe that (a) researcher group shares a strong association with the dataset and metric families that are used to build a model; (b) the strong association among the explanatory variables introduces a large amount of interference when interpreting the impact of the researcher group on model performance; and (c) after mitigating the interference, we find that the researcher group has a smaller impact than the metric family. These observations lead us to conclude that the relationship between the researcher group and the performance of a defect prediction model may have more to do with the tendency of researchers to reuse experimental components (e.g., datasets and metrics). We recommend that researchers experiment with a broader selection of datasets and metrics to combat potential bias in their results.

2016 ◽  
Author(s):  
Chakkrit Tantithamthavorn ◽  
Shane McIntosh ◽  
Ahmed E Hassan ◽  
Kenichi Matsumoto

Shepperd et al. find that the reported performance of a defect prediction model shares a strong relationship with the group of researchers who construct the models. In this paper, we perform an alternative investigation of Shepperd et al.’s data. We observe that (a) research group shares a strong association with other explanatory variables (i.e., the dataset and metric families that are used to build a model); (b) the strong association among these explanatory variables makes it difficult to discern the impact of the research group on model performance; and (c) after mitigating the impact of this strong association, we find that the research group has a smaller impact than the metric family. These observations lead us to conclude that the relationship between the researcher group and the performance of a defect prediction model are more likely due to the tendency of researchers to reuse experimental components (e.g., datasets and metrics). We recommend that researchers experiment with a broader selection of datasets and metrics to combat any potential bias in their results.


2016 ◽  
Author(s):  
Chakkrit Tantithamthavorn ◽  
Shane McIntosh ◽  
Ahmed E Hassan ◽  
Kenichi Matsumoto

Shepperd et al. find that the reported performance of a defect prediction model shares a strong relationship with the group of researchers who construct the models. In this paper, we perform an alternative investigation of Shepperd et al.’s data. We observe that (a) research group shares a strong association with other explanatory variables (i.e., the dataset and metric families that are used to build a model); (b) the strong association among these explanatory variables makes it difficult to discern the impact of the research group on model performance; and (c) after mitigating the impact of this strong association, we find that the research group has a smaller impact than the metric family. These observations lead us to conclude that the relationship between the researcher group and the performance of a defect prediction model are more likely due to the tendency of researchers to reuse experimental components (e.g., datasets and metrics). We recommend that researchers experiment with a broader selection of datasets and metrics to combat any potential bias in their results.


2011 ◽  
Vol 34 (6) ◽  
pp. 1148-1154 ◽  
Author(s):  
Hui-Yan JIANG ◽  
Mao ZONG ◽  
Xiang-Ying LIU

Toxins ◽  
2021 ◽  
Vol 13 (2) ◽  
pp. 158
Author(s):  
Colin Eady

For 30 years, forage ryegrass breeding has known that the germplasm may contain a maternally inherited symbiotic Epichloë endophyte. These endophytes produce a suite of secondary alkaloid compounds, dependent upon strain. Many produce ergot and other alkaloids, which are associated with both insect deterrence and livestock health issues. The levels of alkaloids and other endophyte characteristics are influenced by strain, host germplasm, and environmental conditions. Some strains in the right host germplasm can confer an advantage over biotic and abiotic stressors, thus acting as a maternally inherited desirable ‘trait’. Through seed production, these mutualistic endophytes do not transmit into 100% of the crop seed and are less vigorous than the grass seed itself. This causes stability and longevity issues for seed production and storage should the ‘trait’ be desired in the germplasm. This makes understanding the precise nature of the relationship vitally important to the plant breeder. These Epichloë endophytes cannot be ‘bred’ in the conventional sense, as they are asexual. Instead, the breeder may modulate endophyte characteristics through selection of host germplasm, a sort of breeding by proxy. This article explores, from a forage seed company perspective, the issues that endophyte characteristics and breeding them by proxy have on ryegrass breeding, and outlines the methods used to assess the ‘trait’, and the application of these through the breeding, production, and deployment processes. Finally, this article investigates opportunities for enhancing the utilisation of alkaloid-producing endophytes within pastures, with a focus on balancing alkaloid levels to further enhance pest deterrence and improving livestock outcomes.


Author(s):  
Song Song ◽  
Youpeng Xu ◽  
Jiali Wang ◽  
Jinkang Du ◽  
Jianxin Zhang ◽  
...  

Distributed/semi-distributed models are considered to be sensitive to the spatial resolution of the data input. In this paper, we take a small catchment in high urbanized Yangtze River Delta, Qinhuai catchment as study area, to analyze the impact of spatial resolution of precipitation and the potential evapotranspiration (PET) on the long-term runoff and flood runoff process. The data source includes the TRMM precipitation data, FEWS download PET data, and the interpolated metrological station data. GIS/RS technique was used to collect and pre-process the geographical, precipitation and PET series, which were then served as the input of CREST (Coupled Routing and Excess Storage) model to simulate the runoff process. The results clearly showed that, the CREST model is applicable to the Qinhuai catchment; the spatial resolution of precipitation had strong influence on the modelled runoff results and the metrological precipitation data cannot be substituted by the TRMM data in small catchment; the CREST model was not sensitive to the spatial resolution of the PET data, while the estimation fourmula of the PET data was correlated with the model quality. This paper focused on the small urbanized catchment, suggesting the influential explanatory variables for the model performance, and providing reliable reference for the study in similar area.


2017 ◽  
Vol 9 (2) ◽  
pp. 426-435
Author(s):  
Marise Vermeulen

This study investigated the relationship between share returns and nine variables that had been proven to influence returns in previous research, using a multiple regression analysis. These variables are size, leverage, book-to-market ratio, earnings yield, dividend payout, earnings growth, return on equity, earnings per share and asset growth. The impact of some of the variables on share returns proved to be insignificant, and some collinearity was identified between some of the variables. However, three significant variables were identified and the final regression model included the book-to-market ratio, dividend payout and leverage as the explanatory variables.


Sign in / Sign up

Export Citation Format

Share Document