scholarly journals Avoiding model selection bias in small-sample genomic datasets

2006 ◽  
Vol 22 (10) ◽  
pp. 1245-1250 ◽  
Author(s):  
D. Berrar ◽  
I. Bradbury ◽  
W. Dubitzky
2010 ◽  
Vol 20 (4) ◽  
pp. 768-786
Author(s):  
Kristien Wouters ◽  
José Cortiñas Abrahantes ◽  
Geert Molenberghs ◽  
Helena Geys ◽  
Abdellah Ahnaou ◽  
...  

2009 ◽  
Vol 62 (1) ◽  
pp. 117-125 ◽  
Author(s):  
Paul M. Lukacs ◽  
Kenneth P. Burnham ◽  
David R. Anderson

Economies ◽  
2020 ◽  
Vol 8 (2) ◽  
pp. 49 ◽  
Author(s):  
Waqar Badshah ◽  
Mehmet Bulut

Only unstructured single-path model selection techniques, i.e., Information Criteria, are used by Bounds test of cointegration for model selection. The aim of this paper was twofold; one was to evaluate the performance of these five routinely used information criteria {Akaike Information Criterion (AIC), Akaike Information Criterion Corrected (AICC), Schwarz/Bayesian Information Criterion (SIC/BIC), Schwarz/Bayesian Information Criterion Corrected (SICC/BICC), and Hannan and Quinn Information Criterion (HQC)} and three structured approaches (Forward Selection, Backward Elimination, and Stepwise) by assessing their size and power properties at different sample sizes based on Monte Carlo simulations, and second was the assessment of the same based on real economic data. The second aim was achieved by the evaluation of the long-run relationship between three pairs of macroeconomic variables, i.e., Energy Consumption and GDP, Oil Price and GDP, and Broad Money and GDP for BRICS (Brazil, Russia, India, China and South Africa) countries using Bounds cointegration test. It was found that information criteria and structured procedures have the same powers for a sample size of 50 or greater. However, BICC and Stepwise are better at small sample sizes. In the light of simulation and real data results, a modified Bounds test with Stepwise model selection procedure may be used as it is strongly theoretically supported and avoids noise in the model selection process.


2014 ◽  
Vol 30 (3) ◽  
pp. 325-332 ◽  
Author(s):  
Hema Mistry

Objectives: In economic evaluations of healthcare technologies, situations arise where data are not randomized and numbers are small. For this reason, obtaining reliable cost estimates of such interventions may be difficult. This study explores two approaches in obtaining cost estimates for pregnant women screened for a fetal cardiac anomaly.Methods: Two methods to reduce selection bias in health care: regression analyses and propensity scoring methods were applied to the total mean costs of pregnancy for women who received specialist cardiac advice by means of two referral modes: telemedicine and direct referral.Results: The observed total mean costs of pregnancy were higher for the telemedicine group than the direct referral group (4,918 versus 4,311 GBP). The regression model found that referral mode was not a significant predictor of costs and the cost difference between the two groups was reduced from 607 to 94 GBP. After applying the various propensity score methods, the groups were balanced in terms of sizes and compositions; and again the cost differences between the two groups were smaller ranging from -62 (matching “by hand”) to 333 GBP (kernel matching).Conclusions: Regression analyses and propensity scoring methods applied to the dataset may have increased the homogeneity and reduced the variance in the adjusted costs; that is, these methods have allowed the observed selection bias to be reduced. I believe that propensity scoring methods worked better for this dataset, because after matching the two groups were similar in terms of background characteristics and the adjusted cost differences were smaller.


2019 ◽  
Vol 34 (2) ◽  
pp. 41-61 ◽  
Author(s):  
Bidisha Chakrabarty ◽  
Scott Duellman ◽  
Michael A. Hyman

SYNOPSIS Research on the association between abnormal audit fees (measuring audit effort) and financial misconduct has produced mixed results. The use of actual misstatements in this research creates small-sample inferences, introduces systematic selection bias, and reduces the scope of sample coverage. In this study we use a metric based on Benford's Law to analyze the impact of abnormal audit fees on the likelihood of misconduct. This measure is parsimonious, avoids selection bias, and can be computed for a large sample of public firms. Consistent with theory, we find that greater audit effort reduces the likelihood of misconduct and auditor resignations are more likely for clients with higher misconduct likelihood. Our findings are not driven by audit firm size, client size, the governance structure of the client, or economic bonding explanations. The effect is not subsumed when controlling for alternative misconduct measurement metrics and is robust across multiple tests to address endogeneity. JEL Classifications: G32; M41.


2021 ◽  
Vol 30 (10) ◽  
pp. 2221-2238
Author(s):  
Sarah B Peskoe ◽  
David Arterburn ◽  
Karen J Coleman ◽  
Lisa J Herrinton ◽  
Michael J Daniels ◽  
...  

While electronic health records data provide unique opportunities for research, numerous methodological issues must be considered. Among these, selection bias due to incomplete/missing data has received far less attention than other issues. Unfortunately, standard missing data approaches (e.g. inverse-probability weighting and multiple imputation) generally fail to acknowledge the complex interplay of heterogeneous decisions made by patients, providers, and health systems that govern whether specific data elements in the electronic health records are observed. This, in turn, renders the missing-at-random assumption difficult to believe in standard approaches. In the clinical literature, the collection of decisions that gives rise to the observed data is referred to as the data provenance. Building on a recently-proposed framework for modularizing the data provenance, we develop a general and scalable framework for estimation and inference with respect to regression models based on inverse-probability weighting that allows for a hierarchy of missingness mechanisms to better align with the complex nature of electronic health records data. We show that the proposed estimator is consistent and asymptotically Normal, derive the form of the asymptotic variance, and propose two consistent estimators. Simulations show that naïve application of standard methods may yield biased point estimates, that the proposed estimators have good small-sample properties, and that researchers may have to contend with a bias-variance trade-off as they consider how to handle missing data. The proposed methods are motivated by an on-going, electronic health records-based study of bariatric surgery.


Sign in / Sign up

Export Citation Format

Share Document