Evaluating the Effects of School Club Activities on Collaborative Competency Using Random Forests for Causal Inference

2021 ◽  
Vol 22 (4) ◽  
pp. 55-78
Author(s):  
Youmi Suk ◽  
Jinsil Lee
2020 ◽  
Author(s):  
Youmi Suk ◽  
Hyunseung Kang

Recently, there has been growing interest in using machine learning (ML) methods for causal inference due to their automatic and flexible abilities to model the propensity score and the outcome model. However, almost all the ML methods for causal inference have been studied under the assumption of no unmeasured confounding and there is little work on handling omitted/unmeasured variable bias. This paper focuses on an ML method based on random forests known as Causal Forests and presents five simple modifications for tuning Causal Forests so that they are robust to cluster-level unmeasured confounding. Our simulation study finds that adjusting the algorithm with the propensity score from fixed effects logistic regression and using demeaned variables make the estimates more robust to cluster-level unmeasured confounding. In particular, using demeaned variables is useful when we are not sure of the functional form of the propensity scores. We conclude by demonstrating our proposals in a real data study concerning the effect of taking an eighth-grade algebra course on math achievement scores from the Early Childhood Longitudinal Study.


2021 ◽  
Vol 50 (Supplement_1) ◽  
Author(s):  
Stephanie Long ◽  
Genevieve Lefebvre ◽  
Tibor Schuster

Abstract Background Advances in causal inference have helped explain the longstanding birthweight and obesity paradoxes: selection bias due to conditioning on a collider variable i.e. collider-stratification bias (CSB). The lessons learned have critical implications for the interpretation of machine learning (ML), including decision trees and random forests (RFs), that implicitly condition on input variables. RFs are a popular approach for identifying important “predictors” from large data through variable importance, defined by the average decrease in prediction accuracy. While CSB has become a recognized concern when estimating exposure-outcome effects, knowledge of its impact on ML’s variable importance measures (VIMs) is limited. Applying the causal inference framework, we investigated the accuracy of RFs’ VIMs in data-mechanisms prone to CSB. Methods A Monte Carlo simulation study was conducted, with binary outcome and collider variables generated from logistic models. Two exposure variables stochastically determined the outcome and a collider variable, independent of the outcome. VIMs from RFs were compared to the known causal relevance of the input variables on the outcome. Results While variable importance of true exposure variables was not systematically affected by CSB, validity of VIMs can be affected, leading to erroneous selection of collider variables, causally independent of the outcome, as outcome predictors. Conclusions In presence of CSB, VIMs are not valid measures of the causal relevance of variables and may mislead selection of truly important factors that affect the outcome. Key messages ML must consider causal data-generating mechanisms otherwise it may lead to erroneous assessment of variable importance regarding outcome prediction.


2019 ◽  
Author(s):  
Youmi Suk ◽  
Hyunseung Kang ◽  
Jee-Seon Kim

There is a growing interest in using machine learning (ML) methods for causal inference due to their (nearly) automatic and flexible ability to model key quantities such as the propensity score or the outcome model. Unfortunately, most ML methods for causal inference have been studied under single-level settings where all individuals are independent of each other and there is little work in using these methods with clustered or nested data, a common setting in education studies. This paper investigates using one particular ML method based on random forests known as Causal Forests to estimate treatment effects in multilevel observational data. We conduct simulation studies under different types of multilevel data, including two-level, three-level, and cross-classified data. Our simulation study shows that when the ML method is supplemented with estimated propensity scores from multilevel models that account for clustered/hierarchical structure, the modified ML method outperforms pre-existing methods in a wide variety of settings. We conclude by estimating the effect of private math lessons in the Trends in International Mathematics and Science Study data, a large-scale educational assessment where students are nested within schools.


2019 ◽  
Vol 42 ◽  
Author(s):  
Roberto A. Gulli

Abstract The long-enduring coding metaphor is deemed problematic because it imbues correlational evidence with causal power. In neuroscience, most research is correlational or conditionally correlational; this research, in aggregate, informs causal inference. Rather than prescribing semantics used in correlational studies, it would be useful for neuroscientists to focus on a constructive syntax to guide principled causal inference.


2013 ◽  
Author(s):  
John F. Magnotti ◽  
Wei Ji Ma ◽  
Michael S. Beauchamp

2018 ◽  
Vol 10 (1) ◽  
pp. 219-234
Author(s):  
John H. Hitchcock ◽  
◽  
Anthony J. Onwuegbuzie ◽  
Shannon David ◽  
Anne-Maree Ruddy ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document