scholarly journals Data-driven Covariate Selection for Confounding Adjustment by Focusing on the Stability of the Effect Estimator

2021 ◽  
Author(s):  
Wen Wei Loh ◽  
Dongning Ren

Valid inference of cause-and-effect relations in observational studies necessitates adjusting for common causes of the focal predictor (i.e., treatment) and the outcome. When such common causes, henceforth termed confounders, remain unadjusted for, they generate spurious correlations that lead to biased causal effect estimates. But routine adjustment for all available covariates, when only a subset are truly confounders, is known to yield potentially inefficient and unstable estimators. In this article, we introduce a data-driven confounder selection strategy that focuses on stable estimation of the treatment effect. The approach exploits the causal knowledge that after adjusting for confounders to eliminate all confounding biases, adding any remaining non-confounding covariates associated with only treatment or outcome, but not both, should not systematically change the effect estimator. The strategy proceeds in two steps. First, we prioritize covariates for adjustment by probing how strongly each covariate is associated with treatment and outcome. Next, we gauge the stability of the effect estimator by evaluating its trajectory adjusting for different covariate subsets. The smallest subset that yields a stable effect estimate is then selected. Thus, the strategy offers direct insight into the (in)sensitivity of the effect estimator to the chosen covariates for adjustment. The ability to correctly select confounders and yield valid causal inference following data-driven covariate selection is evaluated empirically using extensive simulation studies. Furthermore, we compare the proposed method empirically with routine variable selection methods. Finally, we demonstrate the procedure using two publicly available real-world datasets.

2021 ◽  
Author(s):  
Wen Wei Loh ◽  
Dongning Ren

Valid inference of cause-and-effect relations in observational studies necessitates adjusting for common causes of the focal predictor (i.e., treatment) and the outcome. When such common causes, henceforth termed confounders, remain unadjusted for, they generate spurious correlations that lead to biased causal effect estimates. But routine adjustment for all available covariates, when only a subset are truly confounders, is known to yield potentially inefficient and unstable estimators. In this article, we introduce a data-driven confounder selection strategy that focuses on stable estimation of the treatment effect. The approach exploits the causal knowledge that after adjusting for confounders to eliminate all confounding biases, adding any remaining non-confounding covariates associated with only treatment or outcome, but not both, should not systematically change the effect estimator. The strategy proceeds in two steps. First, we prioritize covariates for adjustment by probing how strongly each covariate is associated with treatment and outcome. Next, we gauge the stability of the effect estimator by evaluating its trajectory adjusting for different covariate subsets. The smallest subset that yields a stable effect estimate is then selected. Thus, the strategy offers direct insight into the (in)sensitivity of the effect estimator to the chosen covariates for adjustment. The ability to correctly select confounders and yield valid causal inference following data-driven covariate selection is evaluated empirically using extensive simulation studies. Furthermore, we compare the proposed method empirically with routine variable selection methods. Finally, we demonstrate the procedure using two publicly available real-world datasets.


2021 ◽  
Vol 11 (4) ◽  
pp. 1829
Author(s):  
Davide Grande ◽  
Catherine A. Harris ◽  
Giles Thomas ◽  
Enrico Anderlini

Recurrent Neural Networks (RNNs) are increasingly being used for model identification, forecasting and control. When identifying physical models with unknown mathematical knowledge of the system, Nonlinear AutoRegressive models with eXogenous inputs (NARX) or Nonlinear AutoRegressive Moving-Average models with eXogenous inputs (NARMAX) methods are typically used. In the context of data-driven control, machine learning algorithms are proven to have comparable performances to advanced control techniques, but lack the properties of the traditional stability theory. This paper illustrates a method to prove a posteriori the stability of a generic neural network, showing its application to the state-of-the-art RNN architecture. The presented method relies on identifying the poles associated with the network designed starting from the input/output data. Providing a framework to guarantee the stability of any neural network architecture combined with the generalisability properties and applicability to different fields can significantly broaden their use in dynamic systems modelling and control.


Author(s):  
Fernando Pires Hartwig ◽  
Kate Tilling ◽  
George Davey Smith ◽  
Deborah A Lawlor ◽  
Maria Carolina Borges

Abstract Background Two-sample Mendelian randomization (MR) allows the use of freely accessible summary association results from genome-wide association studies (GWAS) to estimate causal effects of modifiable exposures on outcomes. Some GWAS adjust for heritable covariables in an attempt to estimate direct effects of genetic variants on the trait of interest. One, both or neither of the exposure GWAS and outcome GWAS may have been adjusted for covariables. Methods We performed a simulation study comprising different scenarios that could motivate covariable adjustment in a GWAS and analysed real data to assess the influence of using covariable-adjusted summary association results in two-sample MR. Results In the absence of residual confounding between exposure and covariable, between exposure and outcome, and between covariable and outcome, using covariable-adjusted summary associations for two-sample MR eliminated bias due to horizontal pleiotropy. However, covariable adjustment led to bias in the presence of residual confounding (especially between the covariable and the outcome), even in the absence of horizontal pleiotropy (when the genetic variants would be valid instruments without covariable adjustment). In an analysis using real data from the Genetic Investigation of ANthropometric Traits (GIANT) consortium and UK Biobank, the causal effect estimate of waist circumference on blood pressure changed direction upon adjustment of waist circumference for body mass index. Conclusions Our findings indicate that using covariable-adjusted summary associations in MR should generally be avoided. When that is not possible, careful consideration of the causal relationships underlying the data (including potentially unmeasured confounders) is required to direct sensitivity analyses and interpret results with appropriate caution.


Author(s):  
Xiaochun Li ◽  
Changyu Shen

Propensity score–based methods or multiple regressions of the outcome are often used for confounding adjustment in analysis of observational studies. In either approach, a model is needed: A model describing the relationship between the treatment assignment and covariates in the propensity score–based method or a model for the outcome and covariates in the multiple regressions. The 2 models are usually unknown to the investigators and must be estimated. The correct model specification, therefore, is essential for the validity of the final causal estimate. We describe in this article a doubly robust estimator which combines both models propitiously to offer analysts 2 chances for obtaining a valid causal estimate and demonstrate its use through a data set from the Lindner Center Study.


2020 ◽  
Vol 10 (3) ◽  
pp. 1062 ◽  
Author(s):  
Tarek Berghout ◽  
Leïla-Hayet Mouss ◽  
Ouahab Kadri ◽  
Lotfi Saïdi ◽  
Mohamed Benbouzid

The efficient data investigation for fast and accurate remaining useful life prediction of aircraft engines can be considered as a very important task for maintenance operations. In this context, the key issue is how an appropriate investigation can be conducted for the extraction of important information from data-driven sequences in high dimensional space in order to guarantee a reliable conclusion. In this paper, a new data-driven learning scheme based on an online sequential extreme learning machine algorithm is proposed for remaining useful life prediction. Firstly, a new feature mapping technique based on stacked autoencoders is proposed to enhance features representations through an accurate reconstruction. In addition, to attempt into addressing dynamic programming based on environmental feedback, a new dynamic forgetting function based on the temporal difference of recursive learning is introduced to enhance dynamic tracking ability of newly coming data. Moreover, a new updated selection strategy was developed in order to discard the unwanted data sequences and to ensure the convergence of the training model parameters to their appropriate values. The proposed approach is validated on the C-MAPSS dataset where experimental results confirm that it yields satisfactory accuracy and efficiency of the prediction model compared to other existing methods.


Author(s):  
Adam Petrie ◽  
Xiaopeng Zhao

The stability of a dynamical system can be indicated by eigenvalues of its underlying mathematical model. However, eigenvalue analysis of a complicated system (e.g. the heart) may be extremely difficult because full models may be intractable or unavailable. We develop data-driven statistical techniques, which are independent of any underlying dynamical model, that use principal components and maximum-likelihood methods to estimate the dominant eigenvalues and their standard errors from the time series of one or a few measurable quantities, e.g. transmembrane voltages in cardiac experiments. The techniques are applied to predicting cardiac alternans that is characterized by an eigenvalue approaching −1. Cardiac alternans signals a vulnerability to ventricular fibrillation, the leading cause of death in the USA.


2017 ◽  
Vol 2645 (1) ◽  
pp. 157-167 ◽  
Author(s):  
Jishun Ou ◽  
Jingxin Xia ◽  
Yao-Jan Wu ◽  
Wenming Rao

Urban traffic flow forecasting is essential to proactive traffic control and management. Most existing forecasting methods depend on proper and reliable input features, for example, weather conditions and spatiotemporal lagged variables of traffic flow. However, the feature selection process is often done manually without comprehensive evaluation and leads to inaccurate results. For that challenge, this paper presents an approach combining the bias-corrected random forests algorithm with a data-driven feature selection strategy for short-term urban traffic flow forecasting. First, several input features were extracted from traffic flow time series data. Then the importance of these features was quantified with the permutation importance measure. Next, a data-driven feature selection strategy was introduced to identify the most important features. Finally, the forecasting model was built on the bias-corrected random forests algorithm and the selected features. The proposed approach was validated with data collected from three types of urban roads (expressway, major arterial, and minor arterial) in Kunshan City, China. The proposed approach was also compared with 10 existing approaches to verify its effectiveness. The results of the validation and comparison show that even without further model tuning, the proposed approach achieves the lowest average mean absolute error and root mean square error on six stations while it achieves the second-best average performance in mean absolute percentage error. Meanwhile, the training efficiency is improved compared with the original random forests method owing to the use of the feature selection strategy.


2021 ◽  
pp. 00818-2020
Author(s):  
Sarah L. Finnegan ◽  
Kyle T.S. Pattinson ◽  
Josefin Sundh ◽  
Magnus Sköld ◽  
Christer Janson ◽  
...  

IntroductionChronic breathlessness occurs across many different conditions, often independently of disease severity. Yet, despite being strongly linked to adverse outcomes, the consideration of chronic breathlessness as a stand-alone therapeutic target remains limited. Here we use data-driven techniques to identify and confirm the stability of underlying features (factors) driving breathlessness across different cardiorespiratory diseases.MethodsStudy of questionnaire data on 182 participants with main diagnoses of asthma (21.4%), COPD (24.7%), heart failure (19.2%), idiopathic pulmonary fibrosis (18.7%), other interstitial lung disease (5.5%), and “other diagnoses” (8.8%) were entered into an exploratory factor analysis (EFA). Participants were stratified based on their EFA factor scores. We then examined model stability using six-month follow-up data and established the most compact set of measures describing the breathlessness experience.ResultsIn this dataset, we have identified four stable factors that underlie the experience of breathlessness. These factors were assigned the following descriptive labels: 1) body burden, 2) affect/mood, 3) breathing burden and 4) anger/frustration. Stratifying patients by their scores across the four factors revealed two groups corresponding to high and low burden. These two groups were not related to the primary disease diagnosis and remained stable after six months.DiscussionIn this work we identified and confirmed the stability of underlying features of breathlessness. Previous work in this domain has been largely limited to single-diagnosis patient groups without subsequent re-testing of model stability. This work provides further evidence supporting disease independent approaches to assess breathlessness.


Micromachines ◽  
2021 ◽  
Vol 12 (11) ◽  
pp. 1390
Author(s):  
Khalid A. Alattas ◽  
Ardashir Mohammadzadeh ◽  
Saleh Mobayen ◽  
Ayman A. Aly ◽  
Bassem F. Felemban ◽  
...  

In this study, a novel data-driven control scheme is presented for MEMS gyroscopes (MEMS-Gs). The uncertainties are tackled by suggested type-3 fuzzy system with non-singleton fuzzification (NT3FS). Besides the dynamics uncertainties, the suggested NT3FS can also handle the input measurement errors. The rules of NT3FS are online tuned to better compensate the disturbances. By the input-output data set a data-driven scheme is designed, and a new LMI set is presented to ensure the stability. By several simulations and comparisons the superiority of the introduced control scheme is demonstrated.


2020 ◽  
Author(s):  
Christine A. Botosan ◽  
Adrienna Huffman ◽  
Mary Harris Stanford

This paper offers an in-depth data driven overview of the history and status as of 2017 of segment reporting by public entities trading in U.S. capital markets. Our analysis focuses on the perceived issues identified in the Financial Accounting Standards Board (FASB) 2016 Invitation to Comment on FASB's Agenda - the extent of disaggregation into reportable segments, the stability of segmentation over time, the line-items disclosed, and the reconciliation of segment to consolidated totals. We document the trends in and status of segment reporting as of 2017 as another round of efforts to improve segment reporting proceeds. The paper concludes with a discussion of several unanswered questions suggested by the data. Keywords: Segment disclosures, SFAS 131, SFAS 14, ASC 280.


Sign in / Sign up

Export Citation Format

Share Document