Tuning Random Forests for Causal Inference Under Cluster-Level Unmeasured Confounding
Recently, there has been growing interest in using machine learning (ML) methods for causal inference due to their automatic and flexible abilities to model the propensity score and the outcome model. However, almost all the ML methods for causal inference have been studied under the assumption of no unmeasured confounding and there is little work on handling omitted/unmeasured variable bias. This paper focuses on an ML method based on random forests known as Causal Forests and presents five simple modifications for tuning Causal Forests so that they are robust to cluster-level unmeasured confounding. Our simulation study finds that adjusting the algorithm with the propensity score from fixed effects logistic regression and using demeaned variables make the estimates more robust to cluster-level unmeasured confounding. In particular, using demeaned variables is useful when we are not sure of the functional form of the propensity scores. We conclude by demonstrating our proposals in a real data study concerning the effect of taking an eighth-grade algebra course on math achievement scores from the Early Childhood Longitudinal Study.