scholarly journals Collaborative targeted maximum likelihood estimation for variable importance measure: Illustration for functional outcome prediction in mild traumatic brain injuries

2016 ◽  
Vol 27 (1) ◽  
pp. 286-297 ◽  
Author(s):  
Romain Pirracchio ◽  
John K Yue ◽  
Geoffrey T Manley ◽  
Mark J van der Laan ◽  
Alan E Hubbard ◽  
...  

Standard statistical practice used for determining the relative importance of competing causes of disease typically relies on ad hoc methods, often byproducts of machine learning procedures (stepwise regression, random forest, etc.). Causal inference framework and data-adaptive methods may help to tailor parameters to match the clinical question and free one from arbitrary modeling assumptions. Our focus is on implementations of such semiparametric methods for a variable importance measure (VIM). We propose a fully automated procedure for VIM based on collaborative targeted maximum likelihood estimation (cTMLE), a method that optimizes the estimate of an association in the presence of potentially numerous competing causes. We applied the approach to data collected from traumatic brain injury patients, specifically a prospective, observational study including three US Level-1 trauma centers. The primary outcome was a disability score (Glasgow Outcome Scale - Extended (GOSE)) collected three months post-injury. We identified clinically important predictors among a set of risk factors using a variable importance analysis based on targeted maximum likelihood estimators (TMLE) and on cTMLE. Via a parametric bootstrap, we demonstrate that the latter procedure has the potential for robust automated estimation of variable importance measures based upon machine-learning algorithms. The cTMLE estimator was associated with substantially less positivity bias as compared to TMLE and larger coverage of the 95% CI. This study confirms the power of an automated cTMLE procedure that can target model selection via machine learning to estimate VIMs in complicated, high-dimensional data.

2021 ◽  
Vol 50 (Supplement_1) ◽  
Author(s):  
Ghazaleh Dashti ◽  
Katherine J. Lee ◽  
Julie A. Simpson ◽  
Ian R. White ◽  
John B. Carlin ◽  
...  

Abstract Background Causal inference from cohort studies is central to epidemiological research. Targeted Maximum Likelihood Estimation (TMLE) is an appealing doubly robust method for causal effect estimation, but it is unclear how missing data should be handled when it is used in conjunction with machine learning approaches for the exposure and outcome models. This is problematic because missing data are ubiquitous and can result in biased estimates and loss of precision if handled inappropriately. Methods Based on a motivating example from the Victorian Adolescent Health Cohort Study, we conducted a simulation study to evaluate the performance of available approaches for handling missing data when using TMLE with machine learning. These included complete-case analysis; an extended TMLE approach incorporating an outcome missingness probability model; the missing indicator approach for missing covariate data (MCMI); and multiple imputation (MI) using standard parametric approaches or machine learning algorithms. We considered 11 missingness mechanisms typical in cohort studies, and a simple and a complex setting, in which exposure and outcome generation models included two-way and higher-order interactions. Results MI using regression with no interactions and MI with random forest yielded estimates with the highest bias. MI with regression including two-way interactions was the best performing method overall. Of the non-MI approaches, MCMI performed the worst Conclusions When using TMLE with machine learning to estimate the average causal effect, avoiding standard MI with no interactions and MCMI is recommended. Key messages We provide novel guidance for handling missing data for causal effect estimation using TMLE.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Amir Almasi-Hashiani ◽  
Saharnaz Nedjat ◽  
Reza Ghiasvand ◽  
Saeid Safiri ◽  
Maryam Nazemipour ◽  
...  

Abstract Objectives The relationship between reproductive factors and breast cancer (BC) risk has been investigated in previous studies. Considering the discrepancies in the results, the aim of this study was to estimate the causal effect of reproductive factors on BC risk in a case-control study using the double robust approach of targeted maximum likelihood estimation. Methods This is a causal reanalysis of a case-control study done between 2005 and 2008 in Shiraz, Iran, in which 787 confirmed BC cases and 928 controls were enrolled. Targeted maximum likelihood estimation along with super Learner were used to analyze the data, and risk ratio (RR), risk difference (RD), andpopulation attributable fraction (PAF) were reported. Results Our findings did not support parity and age at the first pregnancy as risk factors for BC. The risk of BC was higher among postmenopausal women (RR = 3.3, 95% confidence interval (CI) = (2.3, 4.6)), women with the age at first marriage ≥20 years (RR = 1.6, 95% CI = (1.3, 2.1)), and the history of oral contraceptive (OC) use (RR = 1.6, 95% CI = (1.3, 2.1)) or breastfeeding duration ≤60 months (RR = 1.8, 95% CI = (1.3, 2.5)). The PAF for menopause status, breastfeeding duration, and OC use were 40.3% (95% CI = 39.5, 40.6), 27.3% (95% CI = 23.1, 30.8) and 24.4% (95% CI = 10.5, 35.5), respectively. Conclusions Postmenopausal women, and women with a higher age at first marriage, shorter duration of breastfeeding, and history of OC use are at the higher risk of BC.


2019 ◽  
Vol 189 (2) ◽  
pp. 133-145 ◽  
Author(s):  
Samantha F Ehrlich ◽  
Romain S Neugebauer ◽  
Juanran Feng ◽  
Monique M Hedderson ◽  
Assiamira Ferrara

Abstract This cohort study sought to estimate the differences in risk of delivering infants who were small or large for gestational age (SGA or LGA, respectively) according to exercise during the first trimester of pregnancy (vs. no exercise) among 2,286 women receiving care at Kaiser Permanente Northern California in 2013–2017. Exercise was assessed by questionnaire. SGA and LGA were determined by the sex- and gestational-age-specific birthweight distributions of the 2017 US Natality file. Risk differences were estimated by targeted maximum likelihood estimation, with and without data-adaptive prediction (machine learning). Analyses were also stratified by prepregnancy weight status. Overall, exercise at the cohort-specific 75th percentile was associated with an increased risk of SGA of 4.5 (95% CI: 2.1, 6.8) per 100 births, and decreased risk of LGA of 2.8 (95% CI: 0.5, 5.1) per 100 births; similar findings were observed among the underweight and normal-weight women, but no associations were found among those with overweight or obesity. Meeting Physical Activity Guidelines was associated with increased risk of SGA and decreased risk of LGA but only among underweight and normal-weight women. Any vigorous exercise reduced the risk of LGA in underweight and normal-weight women only and was not associated with SGA risk.


Sign in / Sign up

Export Citation Format

Share Document