scholarly journals The causal effect and impact of reproductive factors on breast cancer using super learner and targeted maximum likelihood estimation: a case-control study in Fars Province, Iran

2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Amir Almasi-Hashiani ◽  
Saharnaz Nedjat ◽  
Reza Ghiasvand ◽  
Saeid Safiri ◽  
Maryam Nazemipour ◽  
...  

Abstract Objectives The relationship between reproductive factors and breast cancer (BC) risk has been investigated in previous studies. Considering the discrepancies in the results, the aim of this study was to estimate the causal effect of reproductive factors on BC risk in a case-control study using the double robust approach of targeted maximum likelihood estimation. Methods This is a causal reanalysis of a case-control study done between 2005 and 2008 in Shiraz, Iran, in which 787 confirmed BC cases and 928 controls were enrolled. Targeted maximum likelihood estimation along with super Learner were used to analyze the data, and risk ratio (RR), risk difference (RD), andpopulation attributable fraction (PAF) were reported. Results Our findings did not support parity and age at the first pregnancy as risk factors for BC. The risk of BC was higher among postmenopausal women (RR = 3.3, 95% confidence interval (CI) = (2.3, 4.6)), women with the age at first marriage ≥20 years (RR = 1.6, 95% CI = (1.3, 2.1)), and the history of oral contraceptive (OC) use (RR = 1.6, 95% CI = (1.3, 2.1)) or breastfeeding duration ≤60 months (RR = 1.8, 95% CI = (1.3, 2.5)). The PAF for menopause status, breastfeeding duration, and OC use were 40.3% (95% CI = 39.5, 40.6), 27.3% (95% CI = 23.1, 30.8) and 24.4% (95% CI = 10.5, 35.5), respectively. Conclusions Postmenopausal women, and women with a higher age at first marriage, shorter duration of breastfeeding, and history of OC use are at the higher risk of BC.

2021 ◽  
Vol 50 (Supplement_1) ◽  
Author(s):  
Ghazaleh Dashti ◽  
Katherine J. Lee ◽  
Julie A. Simpson ◽  
Ian R. White ◽  
John B. Carlin ◽  
...  

Abstract Background Causal inference from cohort studies is central to epidemiological research. Targeted Maximum Likelihood Estimation (TMLE) is an appealing doubly robust method for causal effect estimation, but it is unclear how missing data should be handled when it is used in conjunction with machine learning approaches for the exposure and outcome models. This is problematic because missing data are ubiquitous and can result in biased estimates and loss of precision if handled inappropriately. Methods Based on a motivating example from the Victorian Adolescent Health Cohort Study, we conducted a simulation study to evaluate the performance of available approaches for handling missing data when using TMLE with machine learning. These included complete-case analysis; an extended TMLE approach incorporating an outcome missingness probability model; the missing indicator approach for missing covariate data (MCMI); and multiple imputation (MI) using standard parametric approaches or machine learning algorithms. We considered 11 missingness mechanisms typical in cohort studies, and a simple and a complex setting, in which exposure and outcome generation models included two-way and higher-order interactions. Results MI using regression with no interactions and MI with random forest yielded estimates with the highest bias. MI with regression including two-way interactions was the best performing method overall. Of the non-MI approaches, MCMI performed the worst Conclusions When using TMLE with machine learning to estimate the average causal effect, avoiding standard MI with no interactions and MCMI is recommended. Key messages We provide novel guidance for handling missing data for causal effect estimation using TMLE.


Author(s):  
Iván Díaz ◽  
Michael Rosenblum

AbstractTargeted maximum likelihood estimation (TMLE) is a general method for estimating parameters in semiparametric and nonparametric models. The key step in any TMLE implementation is constructing a sequence of least-favorable parametric models for the parameter of interest. This has been done for a variety of parameters arising in causal inference problems, by augmenting standard regression models with a “clever-covariate.” That approach requires deriving such a covariate for each new type of problem; for some problems such a covariate does not exist. To address these issues, we give a general TMLE implementation based on exponential families. This approach does not require deriving a clever-covariate, and it can be used to implement TMLE for estimating any smooth parameter in the nonparametric model. A computational advantage is that each iteration of TMLE involves estimation of a parameter in an exponential family, which is a convex optimization problem for which software implementing reliable and computationally efficient methods exists. We illustrate the method in three estimation problems, involving the mean of an outcome missing at random, the parameter of a median regression model, and the causal effect of a continuous exposure, respectively. We conduct a simulation study comparing different choices for the parametric submodel. We find that the choice of submodel can have an important impact on the behavior of the estimator in finite samples.


2019 ◽  
Vol 189 (2) ◽  
pp. 133-145 ◽  
Author(s):  
Samantha F Ehrlich ◽  
Romain S Neugebauer ◽  
Juanran Feng ◽  
Monique M Hedderson ◽  
Assiamira Ferrara

Abstract This cohort study sought to estimate the differences in risk of delivering infants who were small or large for gestational age (SGA or LGA, respectively) according to exercise during the first trimester of pregnancy (vs. no exercise) among 2,286 women receiving care at Kaiser Permanente Northern California in 2013–2017. Exercise was assessed by questionnaire. SGA and LGA were determined by the sex- and gestational-age-specific birthweight distributions of the 2017 US Natality file. Risk differences were estimated by targeted maximum likelihood estimation, with and without data-adaptive prediction (machine learning). Analyses were also stratified by prepregnancy weight status. Overall, exercise at the cohort-specific 75th percentile was associated with an increased risk of SGA of 4.5 (95% CI: 2.1, 6.8) per 100 births, and decreased risk of LGA of 2.8 (95% CI: 0.5, 5.1) per 100 births; similar findings were observed among the underweight and normal-weight women, but no associations were found among those with overweight or obesity. Meeting Physical Activity Guidelines was associated with increased risk of SGA and decreased risk of LGA but only among underweight and normal-weight women. Any vigorous exercise reduced the risk of LGA in underweight and normal-weight women only and was not associated with SGA risk.


2018 ◽  
Vol 28 (6) ◽  
pp. 1761-1780 ◽  
Author(s):  
Laura B Balzer ◽  
Wenjing Zheng ◽  
Mark J van der Laan ◽  
Maya L Petersen

We often seek to estimate the impact of an exposure naturally occurring or randomly assigned at the cluster-level. For example, the literature on neighborhood determinants of health continues to grow. Likewise, community randomized trials are applied to learn about real-world implementation, sustainability, and population effects of interventions with proven individual-level efficacy. In these settings, individual-level outcomes are correlated due to shared cluster-level factors, including the exposure, as well as social or biological interactions between individuals. To flexibly and efficiently estimate the effect of a cluster-level exposure, we present two targeted maximum likelihood estimators (TMLEs). The first TMLE is developed under a non-parametric causal model, which allows for arbitrary interactions between individuals within a cluster. These interactions include direct transmission of the outcome (i.e. contagion) and influence of one individual’s covariates on another’s outcome (i.e. covariate interference). The second TMLE is developed under a causal sub-model assuming the cluster-level and individual-specific covariates are sufficient to control for confounding. Simulations compare the alternative estimators and illustrate the potential gains from pairing individual-level risk factors and outcomes during estimation, while avoiding unwarranted assumptions. Our results suggest that estimation under the sub-model can result in bias and misleading inference in an observational setting. Incorporating working assumptions during estimation is more robust than assuming they hold in the underlying causal model. We illustrate our approach with an application to HIV prevention and treatment.


Sign in / Sign up

Export Citation Format

Share Document