scholarly journals Random Forests Approach for Causal Inference with Clustered Observational Data

Author(s):  
Youmi Suk ◽  
Hyunseung Kang ◽  
Jee-Seon Kim
2019 ◽  
Author(s):  
Youmi Suk ◽  
Hyunseung Kang ◽  
Jee-Seon Kim

There is a growing interest in using machine learning (ML) methods for causal inference due to their (nearly) automatic and flexible ability to model key quantities such as the propensity score or the outcome model. Unfortunately, most ML methods for causal inference have been studied under single-level settings where all individuals are independent of each other and there is little work in using these methods with clustered or nested data, a common setting in education studies. This paper investigates using one particular ML method based on random forests known as Causal Forests to estimate treatment effects in multilevel observational data. We conduct simulation studies under different types of multilevel data, including two-level, three-level, and cross-classified data. Our simulation study shows that when the ML method is supplemented with estimated propensity scores from multilevel models that account for clustered/hierarchical structure, the modified ML method outperforms pre-existing methods in a wide variety of settings. We conclude by estimating the effect of private math lessons in the Trends in International Mathematics and Science Study data, a large-scale educational assessment where students are nested within schools.


JAMIA Open ◽  
2020 ◽  
Author(s):  
Michal Ozery-Flato ◽  
Yaara Goldschmidt ◽  
Oded Shaham ◽  
Sivan Ravid ◽  
Chen Yanover

Abstract Objective Observational medical databases, such as electronic health records and insurance claims, track the healthcare trajectory of millions of individuals. These databases provide real-world longitudinal information on large cohorts of patients and their medication prescription history. We present an easy-to-customize framework that systematically analyzes such databases to identify new indications for on-market prescription drugs. Materials and Methods Our framework provides an interface for defining study design parameters and extracting patient cohorts, disease-related outcomes, and potential confounders in observational databases. It then applies causal inference methodology to emulate hundreds of randomized controlled trials (RCTs) for prescribed drugs, while adjusting for confounding and selection biases. After correcting for multiple testing, it outputs the estimated effects and their statistical significance in each database. Results We demonstrate the utility of the framework in a case study of Parkinson’s disease (PD) and evaluate the effect of 259 drugs on various PD progression measures in two observational medical databases, covering more than 150 million patients. The results of these emulated trials reveal remarkable agreement between the two databases for the most promising candidates. Discussion Estimating drug effects from observational data is challenging due to data biases and noise. To tackle this challenge, we integrate causal inference methodology with domain knowledge and compare the estimated effects in two separate databases. Conclusion Our framework enables systematic search for drug repurposing candidates by emulating RCTs using observational data. The high level of agreement between separate databases strongly supports the identified effects.


2021 ◽  
Vol 15 (5) ◽  
pp. 1-46
Author(s):  
Liuyi Yao ◽  
Zhixuan Chu ◽  
Sheng Li ◽  
Yaliang Li ◽  
Jing Gao ◽  
...  

Causal inference is a critical research topic across many domains, such as statistics, computer science, education, public policy, and economics, for decades. Nowadays, estimating causal effect from observational data has become an appealing research direction owing to the large amount of available data and low budget requirement, compared with randomized controlled trials. Embraced with the rapidly developed machine learning area, various causal effect estimation methods for observational data have sprung up. In this survey, we provide a comprehensive review of causal inference methods under the potential outcome framework, one of the well-known causal inference frameworks. The methods are divided into two categories depending on whether they require all three assumptions of the potential outcome framework or not. For each category, both the traditional statistical methods and the recent machine learning enhanced methods are discussed and compared. The plausible applications of these methods are also presented, including the applications in advertising, recommendation, medicine, and so on. Moreover, the commonly used benchmark datasets as well as the open-source codes are also summarized, which facilitate researchers and practitioners to explore, evaluate and apply the causal inference methods.


2019 ◽  
Vol 188 (9) ◽  
pp. 1682-1685 ◽  
Author(s):  
Hailey R Banack

Abstract Authors aiming to estimate causal effects from observational data frequently discuss 3 fundamental identifiability assumptions for causal inference: exchangeability, consistency, and positivity. However, too often, studies fail to acknowledge the importance of measurement bias in causal inference. In the presence of measurement bias, the aforementioned identifiability conditions are not sufficient to estimate a causal effect. The most fundamental requirement for estimating a causal effect is knowing who is truly exposed and unexposed. In this issue of the Journal, Caniglia et al. (Am J Epidemiol. 2019;000(00):000–000) present a thorough discussion of methodological challenges when estimating causal effects in the context of research on distance to obstetrical care. Their article highlights empirical strategies for examining nonexchangeability due to unmeasured confounding and selection bias and potential violations of the consistency assumption. In addition to the important considerations outlined by Caniglia et al., authors interested in estimating causal effects from observational data should also consider implementing quantitative strategies to examine the impact of misclassification. The objective of this commentary is to emphasize that you can’t drive a car with only three wheels, and you also cannot estimate a causal effect in the presence of exposure misclassification bias.


2016 ◽  
Vol 21 (2) ◽  
pp. 192-218 ◽  
Author(s):  
Sandra C. Deshors ◽  
Stefan Th. Gries

In this paper, we explore verb complementation patterns with to and ing in native English (British and American English) as compared to three Asian Englishes (Hong Kong, Indian, and Singaporean English). Based on data from the International Corpus of English annotated for variables describing the matrix verb and the complement, we run two random forests analyses to determine where the Asian Englishes have developed complementation preferences different from the two native speaker varieties. We find not only a variety of differences between the Asian and the native Englishes, but also that the Asian Englishes are more similar (i.e. ‘better predicted by’) the American English data. Further, as the first study of its kind to extend the MuPDAR approach from the now frequent regression analyses to random forests analysis, this study adds a potentially useful analytical tool to the often messy and skewed observational data corpus linguists need to deal with.


2016 ◽  
Vol 44 (5) ◽  
pp. 409-415 ◽  
Author(s):  
Stefan Listl ◽  
Hendrik Jürges ◽  
Richard G. Watt

2012 ◽  
Vol 26 (4) ◽  
pp. 372-390 ◽  
Author(s):  
James J. Lee

Personality psychology aims to explain the causes and the consequences of variation in behavioural traits. Because of the observational nature of the pertinent data, this endeavour has provoked many controversies. In recent years, the computer scientist Judea Pearl has used a graphical approach to extend the innovations in causal inference developed by Ronald Fisher and Sewall Wright. Besides shedding much light on the philosophical notion of causality itself, this graphical framework now contains many powerful concepts of relevance to the controversies just mentioned. In this article, some of these concepts are applied to areas of personality research where questions of causation arise, including the analysis of observational data and the genetic sources of individual differences. Copyright © 2012 John Wiley & Sons, Ltd.


Sign in / Sign up

Export Citation Format

Share Document