High-dimensional generalized propensity score with application to omics data

Author(s):  
Qian Gao ◽  
Yu Zhang ◽  
Jie Liang ◽  
Hongwei Sun ◽  
Tong Wang

Abstract Propensity score (PS) methods are popular when estimating causal effects in non-randomized studies. Drawing causal conclusion relies on the unconfoundedness assumption. This assumption is untestable and is considered more plausible if a large number of pre-treatment covariates are included in the analysis. However, previous studies have shown that including unnecessary covariates into PS models can lead to bias and efficiency loss. With the ever-increasing amounts of available data, such as the omics data, there is often little prior knowledge of the exact set of important covariates. Therefore, variable selection for causal inference in high-dimensional settings has received considerable attention in recent years. However, recent studies have focused mainly on binary treatments. In this study, we considered continuous treatments and proposed the generalized outcome-adaptive LASSO (GOAL) to select covariates that can provide an unbiased and statistically efficient estimation. Simulation studies showed that when the outcome model was linear, the GOAL selected almost all true confounders and predictors of outcome and excluded other covariates. The accuracy and precision of the estimates were close to ideal. Furthermore, the GOAL is robust to model misspecification. We applied the GOAL to seven DNA methylation datasets from the Gene Expression Omnibus database, which covered four brain regions, to estimate the causal effects of epigenetic aging acceleration on the incidence of Alzheimer’s disease.

2019 ◽  
Vol 36 (6) ◽  
pp. 1785-1794
Author(s):  
Jun Li ◽  
Qing Lu ◽  
Yalu Wen

Abstract Motivation The use of human genome discoveries and other established factors to build an accurate risk prediction model is an essential step toward precision medicine. While multi-layer high-dimensional omics data provide unprecedented data resources for prediction studies, their corresponding analytical methods are much less developed. Results We present a multi-kernel penalized linear mixed model with adaptive lasso (MKpLMM), a predictive modeling framework that extends the standard linear mixed models widely used in genomic risk prediction, for multi-omics data analysis. MKpLMM can capture not only the predictive effects from each layer of omics data but also their interactions via using multiple kernel functions. It adopts a data-driven approach to select predictive regions as well as predictive layers of omics data, and achieves robust selection performance. Through extensive simulation studies, the analyses of PET-imaging outcomes from the Alzheimer’s Disease Neuroimaging Initiative study, and the analyses of 64 drug responses, we demonstrate that MKpLMM consistently outperforms competing methods in phenotype prediction. Availability and implementation The R-package is available at https://github.com/YaluWen/OmicPred. Supplementary information Supplementary data are available at Bioinformatics online.


2016 ◽  
Vol 5 (1) ◽  
pp. 1-18 ◽  
Author(s):  
Laura Balzer ◽  
Jennifer Ahern ◽  
Sandro Galea ◽  
Mark van der Laan

AbstractMany of the secondary outcomes in observational studies and randomized trials are rare. Methods for estimating causal effects and associations with rare outcomes, however, are limited, and this represents a missed opportunity for investigation. In this article, we construct a new targeted minimum loss-based estimator (TMLE) for the effect or association of an exposure on a rare outcome. We focus on the causal risk difference and statistical models incorporating bounds on the conditional mean of the outcome, given the exposure and measured confounders. By construction, the proposed estimator constrains the predicted outcomes to respect this model knowledge. Theoretically, this bounding provides stability and power to estimate the exposure effect. In finite sample simulations, the proposed estimator performed as well, if not better, than alternative estimators, including a propensity score matching estimator, inverse probability of treatment weighted (IPTW) estimator, augmented-IPTW and the standard TMLE algorithm. The new estimator yielded consistent estimates if either the conditional mean outcome or the propensity score was consistently estimated. As a substitution estimator, TMLE guaranteed the point estimates were within the parameter range. We applied the estimator to investigate the association between permissive neighborhood drunkenness norms and alcohol use disorder. Our results highlight the potential for double robust, semiparametric efficient estimation with rare events and high dimensional covariates.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Ryosuke Matsukane ◽  
Hiroyuki Watanabe ◽  
Kojiro Hata ◽  
Kimitaka Suetsugu ◽  
Toshikazu Tsuji ◽  
...  

AbstractThe liver is an essential organ for regulating innate and acquired immunity. We hypothesized that the pre-treatment hepatic function affects the clinical outcome of immune checkpoint inhibitors (ICIs) in non-small cell lung cancer (NSCLC). We analyzed 140 patients with NSCLC who received ICIs. We investigated the association between pre-treatment liver function, assessed using the albumin–bilirubin (ALBI) grade, and clinical outcomes in univariate, multivariate, and propensity score matching analyses. Patients were divided into four grades according to pre-treatment liver function. Eighty-eight patients had good hepatic reserve (ALBI grade 1 or 2a), whereas 52 patients had poor hepatic reserve (ALBI grade 2b or 3). In the univariate Kaplan–Meier analysis, the ALBI grade 1, 2a group had a significantly prolonged progression-free survival (PFS, 5.3 versus 2.5 months, p = 0.0019) and overall survival (OS, 19.6 vs. 6.2 months, p = 0.0002). These results were consistent, regardless of whether the analysis was performed in patients with a performance status of 0 or 1 at pre-treatment (N = 124) or in those selected using propensity score matching (N = 76). In the multivariate analysis, pre-treatment ALBI grade was an independent prognostic factor for both PFS (hazard ratio [HR] 0.57, 95% confidence interval [95% CI] 0.38–0.86, p = 0.007) and OS (HR 0.45, 95% CI 0.29–0.72, p = 0.001). Our results suggest that pre-treatment hepatic function assessed by ALBI grade could be an essential biomarker for predicting the efficacy of treatment with ICIs in NSCLC.


2021 ◽  
Vol 83 ◽  
pp. 56-62
Author(s):  
Beth Ann Griffin ◽  
Marika Suttorp Booth ◽  
Monica Busse ◽  
Edward J. Wild ◽  
Claude Setodji ◽  
...  

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Van Hoan Do ◽  
Stefan Canzar

AbstractEmerging single-cell technologies profile multiple types of molecules within individual cells. A fundamental step in the analysis of the produced high-dimensional data is their visualization using dimensionality reduction techniques such as t-SNE and UMAP. We introduce j-SNE and j-UMAP as their natural generalizations to the joint visualization of multimodal omics data. Our approach automatically learns the relative contribution of each modality to a concise representation of cellular identity that promotes discriminative features but suppresses noise. On eight datasets, j-SNE and j-UMAP produce unified embeddings that better agree with known cell types and that harmonize RNA and protein velocity landscapes.


2012 ◽  
Vol 2012 ◽  
pp. 1-8 ◽  
Author(s):  
Catalina Alvarado-Rojas ◽  
Michel Le Van Quyen

Little is known about the long-term dynamics of widely interacting cortical and subcortical networks during the wake-sleep cycle. Using large-scale intracranial recordings of epileptic patients during seizure-free periods, we investigated local- and long-range synchronization between multiple brain regions over several days. For such high-dimensional data, summary information is required for understanding and modelling the underlying dynamics. Here, we suggest that a compact yet useful representation is given by a state space based on the first principal components. Using this representation, we report, with a remarkable similarity across the patients with different locations of electrode placement, that the seemingly complex patterns of brain synchrony during the wake-sleep cycle can be represented by a small number of characteristic dynamic modes. In this space, transitions between behavioral states occur through specific trajectories from one mode to another. These findings suggest that, at a coarse level of temporal resolution, the different brain states are correlated with several dominant synchrony patterns which are successively activated across wake-sleep states.


2021 ◽  
pp. 096228022199750
Author(s):  
Zhaoxin Ye ◽  
Yeying Zhu ◽  
Donna L Coffman

Causal mediation effect estimates can be obtained from marginal structural models using inverse probability weighting with appropriate weights. In order to compute weights, treatment and mediator propensity score models need to be fitted first. If the covariates are high-dimensional, parsimonious propensity score models can be developed by regularization methods including LASSO and its variants. Furthermore, in a mediation setup, more efficient direct or indirect effect estimators can be obtained by using outcome-adaptive LASSO to select variables for propensity score models by incorporating the outcome information. A simulation study is conducted to assess how different regularization methods can affect the performance of estimated natural direct and indirect effect odds ratios. Our simulation results show that regularizing propensity score models by outcome-adaptive LASSO can improve the efficiency of the natural effect estimators and by optimizing balance in the covariates, bias can be reduced in most cases. The regularization methods are then applied to MIMIC-III database, an ICU database developed by MIT.


Sign in / Sign up

Export Citation Format

Share Document