General Transportability – Synthesizing Observations and Experiments from Heterogeneous Domains

The process of transporting and synthesizing experimental findings from heterogeneous data collections to construct causal explanations is arguably one of the most central and challenging problems in modern data science. This problem has been studied in the causal inference literature under the rubric of causal effect identifiability and transportability (Bareinboim and Pearl 2016). In this paper, we investigate a general version of this challenge where the goal is to learn conditional causal effects from an arbitrary combination of datasets collected under different conditions, observational or experimental, and from heterogeneous populations. Specifically, we introduce a unified graphical criterion that characterizes the conditions under which conditional causal effects can be uniquely determined from the disparate data collections. We further develop an efficient, sound, and complete algorithm that outputs an expression for the conditional effect whenever it exists, which synthesizes the available causal knowledge and empirical evidence; if the algorithm is unable to find a formula, then such synthesis is provably impossible, unless further parametric assumptions are made. Finally, we prove that do-calculus (Pearl 1995) is complete for this task, i.e., the inexistence of a do-calculus derivation implies the impossibility of constructing the targeted causal explanation.

Download Full-text

Understanding the nature of association between anxiety phenotypes and anorexia nervosa: a triangulation approach

BMC Psychiatry ◽

10.1186/s12888-020-02883-8 ◽

2020 ◽

Vol 20 (1) ◽

Author(s):

E. Caitlin Lloyd ◽

Hannah M. Sallis ◽

Bas Verplanken ◽

Anne M. Haase ◽

Marcus R. Munafò

Keyword(s):

Anorexia Nervosa ◽

Longitudinal Study ◽

Anxiety Disorder ◽

Anxiety Disorders ◽

Causal Effect ◽

Causal Effects ◽

Observational Research ◽

Genome Wide Association Studies ◽

Genetic Liability ◽

Depressed Affect

Abstract Background Evidence from observational studies suggests an association between anxiety disorders and anorexia nervosa (AN), but causal inference is complicated by the potential for confounding in these studies. We triangulate evidence across a longitudinal study and a Mendelian randomization (MR) study, to evaluate whether there is support for anxiety disorder phenotypes exerting a causal effect on AN risk. Methods Study One assessed longitudinal associations of childhood worry and anxiety disorders with lifetime AN in the Avon Longitudinal Study of Parents and Children cohort. Study Two used two-sample MR to evaluate: causal effects of worry, and genetic liability to anxiety disorders, on AN risk; causal effects of genetic liability to AN on anxiety outcomes; and the causal influence of worry on anxiety disorder development. The independence of effects of worry, relative to depressed affect, on AN and anxiety disorder outcomes, was explored using multivariable MR. Analyses were completed using summary statistics from recent genome-wide association studies. Results Study One did not support an association between worry and subsequent AN, but there was strong evidence for anxiety disorders predicting increased risk of AN. Study Two outcomes supported worry causally increasing AN risk, but did not support a causal effect of anxiety disorders on AN development, or of AN on anxiety disorders/worry. Findings also indicated that worry causally influences anxiety disorder development. Multivariable analysis estimates suggested the influence of worry on both AN and anxiety disorders was independent of depressed affect. Conclusions Overall our results provide mixed evidence regarding the causal role of anxiety exposures in AN aetiology. The inconsistency between outcomes of Studies One and Two may be explained by limitations surrounding worry assessment in Study One, confounding of the anxiety disorder and AN association in observational research, and low power in MR analyses probing causal effects of genetic liability to anxiety disorders. The evidence for worry acting as a causal risk factor for anxiety disorders and AN supports targeting worry for prevention of both outcomes. Further research should clarify how a tendency to worry translates into AN risk, and whether anxiety disorder pathology exerts any causal effect on AN.

Download Full-text

MuSA: a graphical user interface for multi-OMICs data integration in radiogenomic studies

Scientific Reports ◽

10.1038/s41598-021-81200-z ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Mario Zanfardino ◽

Rossana Castaldo ◽

Katia Pane ◽

Ornella Affinito ◽

Marco Aiello ◽

...

Keyword(s):

User Interface ◽

Data Integration ◽

Graphical User Interface ◽

Data Science ◽

Heterogeneous Data ◽

Biological Information ◽

Omics Data ◽

Correlation Clustering ◽

Downstream Analysis ◽

Omics Data Integration

AbstractAnalysis of large-scale omics data along with biomedical images has gaining a huge interest in predicting phenotypic conditions towards personalized medicine. Multiple layers of investigations such as genomics, transcriptomics and proteomics, have led to high dimensionality and heterogeneity of data. Multi-omics data integration can provide meaningful contribution to early diagnosis and an accurate estimate of prognosis and treatment in cancer. Some multi-layer data structures have been developed to integrate multi-omics biological information, but none of these has been developed and evaluated to include radiomic data. We proposed to use MultiAssayExperiment (MAE) as an integrated data structure to combine multi-omics data facilitating the exploration of heterogeneous data. We improved the usability of the MAE, developing a Multi-omics Statistical Approaches (MuSA) tool that uses a Shiny graphical user interface, able to simplify the management and the analysis of radiogenomic datasets. The capabilities of MuSA were shown using public breast cancer datasets from TCGA-TCIA databases. MuSA architecture is modular and can be divided in Pre-processing and Downstream analysis. The pre-processing section allows data filtering and normalization. The downstream analysis section contains modules for data science such as correlation, clustering (i.e., heatmap) and feature selection methods. The results are dynamically shown in MuSA. MuSA tool provides an easy-to-use way to create, manage and analyze radiogenomic data. The application is specifically designed to guide no-programmer researchers through different computational steps. Integration analysis is implemented in a modular structure, making MuSA an easily expansible open-source software.

Download Full-text

The Pice Effects of Competition from Parallel Imports and Therapeutic Alternatives: Using Dynamic Models to Estimate the Causal Effect on the Extensive and Intensive Margins

Review of Industrial Organization ◽

10.1007/s11151-021-09834-x ◽

2021 ◽

Author(s):

David Granlund

Keyword(s):

Dynamic Models ◽

Causal Effect ◽

The Other ◽

Causal Effects ◽

Parallel Imports ◽

Extensive And Intensive Margins ◽

Price Effects ◽

Short And Long Term ◽

Therapeutic Alternatives

AbstractThis paper studies responses to competition with the use of dynamic models that distinguish between short- and long-term price effects. The dynamic models also allow lagged numbers of competitors to become valid and strong instruments for the current numbers, which enables studying the causal effects using flexible specifications. A first parallel trader is found to decrease prices of exchangeable products by 7% in the long term. On the other hand, prices do not respond to the first competitor that sells therapeutic alternatives; but competition from four or more competitors that sell on-patent therapeutic alternatives decreases prices by about 10% in the long term.

Download Full-text

Causal Returns to Education

Jahrbücher für Nationalökonomie und Statistik ◽

10.1515/jbnst-2006-0103 ◽

2006 ◽

Vol 226 (1) ◽

Author(s):

Anton L. Flossmann ◽

Winfried Pohlmeier

Keyword(s):

Educational Attainment ◽

Educational Policy ◽

Empirical Evidence ◽

Returns To Education ◽

Causal Effect ◽

Causal Effects ◽

Continuous Treatment ◽

Treatment Variable ◽

Secondary Schooling ◽

Upper Secondary

SummaryThis paper surveys the empirical evidence on causal effects of education on earnings for Germany and compares alternative studies in the light of their underlying identifying assumptions. We work out the different assumptions taken by various studies, which lead to rather different interpretations of the estimated causal effect. In particular, we are interested in the question to what extend causal return estimates are informative regarding educational policy advice. Despite the substantial methodological differences, we have to conclude that the empirical findings for Germany are quite robust and do not deviate substantially from each other. This also holds for the few studies which rely on ignorability conditions, regardless of whether they use educational attainment as a continuous treatment variable or as a discrete treatment indicator. Own estimates based on the matching approach indicate that the selection into upper secondary schooling is suboptimal

Download Full-text

Causal inference via string diagram surgery

Mathematical Structures in Computer Science ◽

10.1017/s096012952100027x ◽

2021 ◽

pp. 1-22

Author(s):

Bart Jacobs ◽

Aleks Kissinger ◽

Fabio Zanasi

Keyword(s):

Causal Inference ◽

Probabilistic Reasoning ◽

Causal Effect ◽

Sufficient Conditions ◽

Causal Effects ◽

Stochastic Matrices ◽

Counterfactual Reasoning ◽

Special Cases ◽

String Diagram ◽

Set Up

Abstract Extracting causal relationships from observed correlations is a growing area in probabilistic reasoning, originating with the seminal work of Pearl and others from the early 1990s. This paper develops a new, categorically oriented view based on a clear distinction between syntax (string diagrams) and semantics (stochastic matrices), connected via interpretations as structure-preserving functors. A key notion in the identification of causal effects is that of an intervention, whereby a variable is forcefully set to a particular value independent of any prior propensities. We represent the effect of such an intervention as an endo-functor which performs ‘string diagram surgery’ within the syntactic category of string diagrams. This diagram surgery in turn yields a new, interventional distribution via the interpretation functor. While in general there is no way to compute interventional distributions purely from observed data, we show that this is possible in certain special cases using a calculational tool called comb disintegration. We demonstrate the use of this technique on two well-known toy examples: one where we predict the causal effect of smoking on cancer in the presence of a confounding common cause and where we show that this technique provides simple sufficient conditions for computing interventions which apply to a wide variety of situations considered in the causal inference literature; the other one is an illustration of counterfactual reasoning where the same interventional techniques are used, but now in a ‘twinned’ set-up, with two version of the world – one factual and one counterfactual – joined together via exogenous variables that capture the uncertainties at hand.

Download Full-text

Novel bounds for causal effects based on sensitivity parameters on the risk difference scale

Journal of Causal Inference ◽

10.1515/jci-2021-0024 ◽

2021 ◽

Vol 9 (1) ◽

pp. 190-210

Author(s):

Arvid Sjölander ◽

Ola Hössjer

Keyword(s):

Risk Ratio ◽

Simulation Study ◽

Observational Studies ◽

Causal Effect ◽

Real Data ◽

Risk Difference ◽

Causal Effects ◽

Ratio Scale ◽

Unmeasured Confounding ◽

Range Of Values

Abstract Unmeasured confounding is an important threat to the validity of observational studies. A common way to deal with unmeasured confounding is to compute bounds for the causal effect of interest, that is, a range of values that is guaranteed to include the true effect, given the observed data. Recently, bounds have been proposed that are based on sensitivity parameters, which quantify the degree of unmeasured confounding on the risk ratio scale. These bounds can be used to compute an E-value, that is, the degree of confounding required to explain away an observed association, on the risk ratio scale. We complement and extend this previous work by deriving analogous bounds, based on sensitivity parameters on the risk difference scale. We show that our bounds can also be used to compute an E-value, on the risk difference scale. We compare our novel bounds with previous bounds through a real data example and a simulation study.

Download Full-text

Causal Effect Between Total Cholesterol and HDL Cholesterol as Risk Factors for Chronic Kidney Disease: A Mendelian Randomization Study

10.21203/rs.3.rs-52857/v1 ◽

2020 ◽

Author(s):

Liu Miao ◽

Yan Min ◽

Chuan-Meng Zhu ◽

Jian-Hong Chen ◽

Bin Qi ◽

...

Keyword(s):

Total Cholesterol ◽

Serum Lipid ◽

Mendelian Randomization ◽

Causal Effect ◽

Hdl Cholesterol ◽

Causal Effects ◽

Lipid Levels ◽

Serum Lipid Levels ◽

Genome Wide ◽

Mr Study

Abstract Background/Aims: While observational studies show an association between serum lipid levels and cardiovascular disease (CVD), intervention studies that examine the preventive eﬀects of serum lipid levels on the development of CKD are lacking. Methods: To estimate the role of serum lipid levels in the etiology of CKD, we conducted a two-sample Mendelian randomization (MR) study on serum lipid levels. Single nucleotide polymorphisms (SNPs), which were signiﬁcantly associated genome-wide with plasma serum lipid levels from the GLGC and CKDGen consortium genome-wide association study (GWAS), including total cholesterol (TC, n = 187365), triglyceride (TG, n = 177861), HDL cholesterol (HDL-C, n = 187167), LDL cholesterol (LDL-C, n = 173082), apolipoprotein A1 (ApoA1, n = 20687), apolipoprotein B (ApoB, n = 20690) and CKD (n = 117165), were used as instrumental variables. None of the lipid-related SNPs was associated with CKD (all P > 0.05). Results: MR analysis genetically predicted the causal effect between TC/HDL-C and CKD. The odds ratio (OR) and 95% conﬁdence interval (CI) of TC within CKD was 0.756 (0.579 to 0.933) (P = 0.002), and HDL-C was 0.85 (0.687 to 1.012) (P = 0.049). No causal eﬀects between TG, LDL-C- ApoA1, ApoB and CKD were observed. Sensitivity analyses conﬁrmed that TC and HDL-C were signiﬁcantly associated with CKD. Conclusions: The ﬁndings from this MR study indicate causal effects between TC, HDL-C and CKD. Decreased TC and elevated HDL-C may reduce the incidence of CKD but need to be further conﬁrmed by using a genetic and environmental approach.

Download Full-text

Education, intelligence and Alzheimer’s disease: Evidence from a multivariable two-sample Mendelian randomization study

10.1101/401042 ◽

2018 ◽

Cited By ~ 13

Author(s):

Emma L Anderson ◽

Laura D Howe ◽

Kaitlin H Wade ◽

Yoav Ben-Shlomo ◽

W. David Hill ◽

...

Keyword(s):

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Educational Attainment ◽

Mendelian Randomization ◽

Causal Effect ◽

Univariate Analysis ◽

Causal Effects ◽

Training Interventions ◽

Bidirectional Relationship ◽

Years Of Schooling

AbstractObjectivesTo examine whether educational attainment and intelligence have causal effects on risk of Alzheimer’s disease (AD), independently of each other.DesignTwo-sample univariable and multivariable Mendelian Randomization (MR) to estimate the causal effects of education on intelligence and vice versa, and the total and independent causal effects of both education and intelligence on risk of AD.Participants17,008 AD cases and 37,154 controls from the International Genomics of Alzheimer’s Project (IGAP) consortiumMain outcome measureOdds ratio of AD per standardised deviation increase in years of schooling and intelligenceResultsThere was strong evidence of a causal, bidirectional relationship between intelligence and educational attainment, with the magnitude of effect being similar in both directions. Similar overall effects were observed for both educational attainment and intelligence on AD risk in the univariable MR analysis; with each SD increase in years of schooling and intelligence, odds of AD were, on average, 37% (95% CI: 23% to 49%) and 35% (95% CI: 25% to 43%) lower, respectively. There was little evidence from the multivariable MR analysis that educational attainment affected AD risk once intelligence was taken into account, but intelligence affected AD risk independently of educational attainment to a similar magnitude observed in the univariate analysis.ConclusionsThere is robust evidence for an independent, causal effect of intelligence in lowering AD risk, potentially supporting a role for cognitive training interventions to improve aspects of intelligence. However, given the observed causal effect of educational attainment on intelligence, there may also be support for policies aimed at increasing length of schooling to lower incidence of AD.

Download Full-text

Perspectives on Marine Data Science as a Blueprint for Emerging Data Science Disciplines

Frontiers in Marine Science ◽

10.3389/fmars.2021.678404 ◽

2021 ◽

Vol 8 ◽

Author(s):

Maria-Theresia Verwega ◽

Carola Trahms ◽

Avan N. Antia ◽

Thorsten Dickhaus ◽

Enno Prigge ◽

...

Keyword(s):

Data Science ◽

Academic Career ◽

Heterogeneous Data ◽

Early Career ◽

Earth System ◽

Doctoral Research ◽

Crucial Component ◽

Interface Science ◽

Marine Data ◽

New Research

Earth System Sciences have been generating increasingly larger amounts of heterogeneous data in recent years. We identify the need to combine Earth System Sciences with Data Sciences, and give our perspective on how this could be accomplished within the sub-field of Marine Sciences. Marine data hold abundant information and insights that Data Science techniques can reveal. There is high demand and potential to combine skills and knowledge from Marine and Data Sciences to best take advantage of the vast amount of marine data. This can be accomplished by establishing Marine Data Science as a new research discipline. Marine Data Science is an interface science that applies Data Science tools to extract information, knowledge, and insights from the exponentially increasing body of marine data. Marine Data Scientists need to be trained Data Scientists with a broad basic understanding of Marine Sciences and expertise in knowledge transfer. Marine Data Science doctoral researchers need targeted training for these specific skills, a crucial component of which is co-supervision from both parental sciences. They also might face challenges of scientific recognition and lack of an established academic career path. In this paper, we, Marine and Data Scientists at different stages of their academic career, present perspectives to define Marine Data Science as a distinct discipline. We draw on experiences of a Doctoral Research School, MarDATA, dedicated to training a cohort of early career Marine Data Scientists. We characterize the methods of Marine Data Science as a toolbox including skills from their two parental sciences. All of these aim to analyze and interpret marine data, which build the foundation of Marine Data Science.

Download Full-text

Targeted Minimum Loss-Based Estimation of Causal Effects in Right-Censored Survival Data with Time-Dependent Covariates: Warfarin, Stroke, and Death in Atrial Fibrillation

Journal of Causal Inference ◽

10.1515/jci-2013-0001 ◽

2013 ◽

Vol 1 (2) ◽

pp. 235-254 ◽

Cited By ~ 4

Author(s):

Jordan C. Brooks ◽

Mark J. van der Laan ◽

Daniel E. Singer ◽

Alan S. Go

Keyword(s):

Survival Data ◽

Causal Effect ◽

Estimating Equation ◽

Dependent Censoring ◽

Time Dependent ◽

Causal Effects ◽

Consistent Estimation ◽

Censored Survival Data ◽

Inverse Probability ◽

Time Dependent Covariates

AbstractCausal effects in right-censored survival data can be formally defined as the difference in the marginal cumulative event probabilities under particular interventions. Conventional estimators, such as the Kaplan-Meier (KM), fail to consistently estimate these marginal parameters under dependent treatment assignment or dependent censoring. Several modern estimators have been developed that reduce bias under both dependent treatment assignment and dependent censoring by incorporating information from baseline and time-dependent covariates. In the present article we describe a recently developed targeted minimum loss-based estimation (TMLE) algorithm for general longitudinal data structures and present in detail its application in right-censored survival data with time-dependent covariates. The treatment-specific marginal cumulative event probability is defined via a series of iterated conditional expectations in a time-dependent counting process framework. The TMLE involves an initial estimator of each conditional expectation and sequentially updates these such that the resulting estimator solves the efficient influence curve estimating equation in the nonparametric statistical model. We describe the assumptions required for consistent estimation of statistical parameters and additional assumptions required for consistent estimation of the causal effect parameter. Using simulated right-censored survival data, the mean squared error, bias, and 95% confidence interval coverage probability of the TMLE is compared with those of the conventional KM and the inverse probability of censoring weight estimating equation, conventional maximum likelihood substitution estimator, and the double robustaugmented inverse probability of censoring weighted estimating equation. We conclude the article with estimation of the causal effect of warfarin medical therapy on the probability of “stroke or death” within a 1-year time frame using data from the ATRIA-1 observational cohort of persons with atrial fibrillation. Our results suggest that a fixed policy of warfarin treatment for all patients would result in 2% fewer deaths or strokes within 1-year as compared with a policy of withholding warfarin from all patients.

Download Full-text