EPISPOT: an epigenome-driven approach for detecting and interpreting hotspots in molecular QTL studies

AbstractWe present EPISPOT, a fully joint framework which exploits large panels of epigenetic marks as variant-level information to enhance molecular quantitative trait locus (QTL) mapping. Thanks to a purpose-built Bayesian inferential algorithm, our approach effectively couples simultaneous QTL analysis of thousands of genetic variants and molecular traits genome-wide, and hypothesis-free selection of biologically interpretable marks which directly contribute to the QTL effects. This unified learning approach boosts statistical power and sheds light on the regulatory basis of the uncovered associations. EPISPOT is also tailored to the modelling of trans-acting genetic variants, including QTL hotspots, whose detection and functional interpretation are challenging with standard approaches. We illustrate the advantages of EPISPOT in simulations emulating real-data conditions and in an epigenome-driven monocyte expression QTL study which confirms known hotspots and reveals new ones, as well as plausible mechanisms of action. In particular, based on monocyte DNase-I sensitivity site annotations selected by the method from > 150 epigenetic annotations, we clarify the mediation effects and cell-type specificity of well-known master hotspots in the vicinity of the lyzosyme gene. EPISPOT is radically new in that it makes it possible to forgo the daunting and underpowered task of one-mark-at-a-time enrichment analyses for the prioritisation of QTL hits. Our method can be used to enhance the discovery and functional understanding of signals in QTL problems with all types of outcomes, be they transcriptomic, proteomic, lipidomic, metabolic or clinical.

Download Full-text

Model Checking via Testing for Direct Effects in Mendelian Randomization and Transcriptome-wide Association Studies

10.1101/2021.07.09.451811 ◽

2021 ◽

Author(s):

Yangqing Deng ◽

Wei Pan

Keyword(s):

Model Checking ◽

Genetic Variants ◽

Statistical Power ◽

Mendelian Randomization ◽

Association Studies ◽

Real Data ◽

Direct Effects ◽

Statistical Efficiency ◽

Challenging Situation ◽

High Statistical Power

It is of great interest and potential to discover causal relationships between pairs of exposures and outcomes using genetic variants as instrumental variables (IVs) to deal with hidden confounding in observational studies. Two most popular approaches are Mendelian randomization (MR), which usually use independent genetic variants/SNPs across the genome, and transcriptome-wide association studies (TWAS) using cis-SNPs local to a gene, as IVs. In spite of their many promising applications, both approaches face a major challenge: the validity of their causal conclusions depends on three critical assumptions on valid IVs, which however may not hold in practice. The most likely as well as challenging situation is due to the wide-spread horizontal pleiotropy, leading to two of three IV assumptions being violated and thus to biased statistical inference. Although some methods have been proposed as being robust to various degrees to the violation of some modeling assumptions, they often give different and even conflicting results due to their own modeling assumptions and possibly lower statistical efficiency, imposing difficulties to the practitioner in choosing and interpreting varying results across different methods. Hence, it would help to directly test whether any assumption is violated or not. In particular, there is a lack of such tests for TWAS. We propose a new and general GOF test, called TEDE (TEsting Direct Effects), applicable to both correlated and independent SNPs/IVs (as commonly used in TWAS and MR respectively). Through simulation studies and real data examples, we demonstrate high statistical power and advantages of our new method, while confirming the frequent violation of modeling (including IV) assumptions in practice and thus the importance of model checking by applying such a test in MR/TWAS analysis.

Download Full-text

Mendelian Randomization With Refined Instrumental Variables From Genetic Score Improves Accuracy and Reduces Bias

Frontiers in Genetics ◽

10.3389/fgene.2021.618829 ◽

2021 ◽

Vol 12 ◽

Author(s):

Lijuan Lin ◽

Ruyang Zhang ◽

Hui Huang ◽

Ying Zhu ◽

Yi Li ◽

...

Keyword(s):

Genetic Variants ◽

Statistical Power ◽

Complex Disease ◽

Type I Error ◽

Mendelian Randomization ◽

Causal Effect ◽

Type I ◽

Individual Data ◽

Genetic Score ◽

Mediation Effects

Mendelian randomization (MR) can estimate the causal effect for a risk factor on a complex disease using genetic variants as instrument variables (IVs). A variety of generalized MR methods have been proposed to integrate results arising from multiple IVs in order to increase power. One of the methods constructs the genetic score (GS) by a linear combination of the multiple IVs using the multiple regression model, which was applied in medical researches broadly. However, GS-based MR requires individual-level data, which greatly limit its application in clinical research. We propose an alternative method called Mendelian Randomization with Refined Instrumental Variable from Genetic Score (MR-RIVER) to construct a genetic IV by integrating multiple genetic variants based on summarized results, rather than individual data. Compared with inverse-variance weighted (IVW) and generalized summary-data-based Mendelian randomization (GSMR), MR-RIVER maintained the type I error, while possessing more statistical power than the competing methods. MR-RIVER also presented smaller biases and mean squared errors, compared to the IVW and GSMR. We further applied the proposed method to estimate the effects of blood metabolites on educational attainment, by integrating results from several publicly available resources. MR-RIVER provided robust results under different LD prune criteria and identified three metabolites associated with years of schooling and additional 15 metabolites with indirect mediation effects through butyrylcarnitine. MR-RIVER, which extends score-based MR to summarized results in lieu of individual data and incorporates multiple correlated IVs, provided a more accurate and powerful means for the discovery of novel risk factors.

Download Full-text

Model checking via testing for direct effects in Mendelian Randomization and transcriptome-wide association studies

PLoS Computational Biology ◽

10.1371/journal.pcbi.1009266 ◽

2021 ◽

Vol 17 (8) ◽

pp. e1009266

Author(s):

Yangqing Deng ◽

Wei Pan

Keyword(s):

Model Checking ◽

Genetic Variants ◽

Statistical Power ◽

Goodness Of Fit ◽

Mendelian Randomization ◽

Association Studies ◽

Real Data ◽

Direct Effects ◽

Statistical Efficiency ◽

Challenging Situation

It is of great interest and potential to discover causal relationships between pairs of exposures and outcomes using genetic variants as instrumental variables (IVs) to deal with hidden confounding in observational studies. Two most popular approaches are Mendelian randomization (MR), which usually use independent genetic variants/SNPs across the genome, and transcriptome-wide association studies (TWAS) (or their generalizations) using cis-SNPs local to a gene (or some genome-wide and likely dependent SNPs), as IVs. In spite of their many promising applications, both approaches face a major challenge: the validity of their causal conclusions depends on three critical assumptions on valid IVs, and more generally on other modeling assumptions, which however may not hold in practice. The most likely as well as challenging situation is due to the wide-spread horizontal pleiotropy, leading to two of the three IV assumptions being violated and thus to biased statistical inference. More generally, we’d like to conduct a goodness-of-fit (GOF) test to check the model being used. Although some methods have been proposed as being robust to various degrees to the violation of some modeling assumptions, they often give different and even conflicting results due to their own modeling assumptions and possibly lower statistical efficiency, imposing difficulties to the practitioner in choosing and interpreting varying results across different methods. Hence, it would help to directly test whether any assumption is violated or not. In particular, there is a lack of such tests for TWAS. We propose a new and general GOF test, called TEDE (TEsting Direct Effects), applicable to both correlated and independent SNPs/IVs (as commonly used in TWAS and MR respectively). Through simulation studies and real data examples, we demonstrate high statistical power and advantages of our new method, while confirming the frequent violation of modeling (including valid IV) assumptions in practice and thus the importance of model checking by applying such a test in MR/TWAS analysis.

Download Full-text

Variation in the SERPINA6/SERPINA1 locus alters morning plasma cortisol, hepatic corticosteroid binding globulin expression, gene expression in peripheral tissues, and risk of cardiovascular disease

Journal of Human Genetics ◽

10.1038/s10038-020-00895-6 ◽

2021 ◽

Author(s):

Andrew A. Crawford ◽

◽

Sean Bankier ◽

Elisabeth Altmaier ◽

Catriona L. K. Barnes ◽

...

Keyword(s):

Gene Expression ◽

Cardiovascular Disease ◽

Genetic Variants ◽

Plasma Cortisol ◽

Statistical Power ◽

Mendelian Randomisation ◽

Eqtl Analysis ◽

Causative Role ◽

Peripheral Tissues ◽

Corticosteroid Binding Globulin

AbstractThe stress hormone cortisol modulates fuel metabolism, cardiovascular homoeostasis, mood, inflammation and cognition. The CORtisol NETwork (CORNET) consortium previously identified a single locus associated with morning plasma cortisol. Identifying additional genetic variants that explain more of the variance in cortisol could provide new insights into cortisol biology and provide statistical power to test the causative role of cortisol in common diseases. The CORNET consortium extended its genome-wide association meta-analysis for morning plasma cortisol from 12,597 to 25,314 subjects and from ~2.2 M to ~7 M SNPs, in 17 population-based cohorts of European ancestries. We confirmed the genetic association with SERPINA6/SERPINA1. This locus contains genes encoding corticosteroid binding globulin (CBG) and α1-antitrypsin. Expression quantitative trait loci (eQTL) analyses undertaken in the STARNET cohort of 600 individuals showed that specific genetic variants within the SERPINA6/SERPINA1 locus influence expression of SERPINA6 rather than SERPINA1 in the liver. Moreover, trans-eQTL analysis demonstrated effects on adipose tissue gene expression, suggesting that variations in CBG levels have an effect on delivery of cortisol to peripheral tissues. Two-sample Mendelian randomisation analyses provided evidence that each genetically-determined standard deviation (SD) increase in morning plasma cortisol was associated with increased odds of chronic ischaemic heart disease (0.32, 95% CI 0.06–0.59) and myocardial infarction (0.21, 95% CI 0.00–0.43) in UK Biobank and similarly in CARDIoGRAMplusC4D. These findings reveal a causative pathway for CBG in determining cortisol action in peripheral tissues and thereby contributing to the aetiology of cardiovascular disease.

Download Full-text

Bias in two-sample Mendelian randomization when using heritable covariable-adjusted summary associations

International Journal of Epidemiology ◽

10.1093/ije/dyaa266 ◽

2021 ◽

Author(s):

Fernando Pires Hartwig ◽

Kate Tilling ◽

George Davey Smith ◽

Deborah A Lawlor ◽

Maria Carolina Borges

Keyword(s):

Waist Circumference ◽

Genetic Variants ◽

Mendelian Randomization ◽

Causal Effect ◽

Association Studies ◽

Real Data ◽

Sensitivity Analyses ◽

Effect Estimate ◽

Genome Wide Association Studies ◽

Residual Confounding

Abstract Background Two-sample Mendelian randomization (MR) allows the use of freely accessible summary association results from genome-wide association studies (GWAS) to estimate causal effects of modifiable exposures on outcomes. Some GWAS adjust for heritable covariables in an attempt to estimate direct effects of genetic variants on the trait of interest. One, both or neither of the exposure GWAS and outcome GWAS may have been adjusted for covariables. Methods We performed a simulation study comprising different scenarios that could motivate covariable adjustment in a GWAS and analysed real data to assess the influence of using covariable-adjusted summary association results in two-sample MR. Results In the absence of residual confounding between exposure and covariable, between exposure and outcome, and between covariable and outcome, using covariable-adjusted summary associations for two-sample MR eliminated bias due to horizontal pleiotropy. However, covariable adjustment led to bias in the presence of residual confounding (especially between the covariable and the outcome), even in the absence of horizontal pleiotropy (when the genetic variants would be valid instruments without covariable adjustment). In an analysis using real data from the Genetic Investigation of ANthropometric Traits (GIANT) consortium and UK Biobank, the causal effect estimate of waist circumference on blood pressure changed direction upon adjustment of waist circumference for body mass index. Conclusions Our findings indicate that using covariable-adjusted summary associations in MR should generally be avoided. When that is not possible, careful consideration of the causal relationships underlying the data (including potentially unmeasured confounders) is required to direct sensitivity analyses and interpret results with appropriate caution.

Download Full-text

Improved inference and prediction of bacterial genotype-phenotype associations using pangenome-spanning regressions

10.1101/852426 ◽

2019 ◽

Cited By ~ 3

Author(s):

John A. Lees ◽

T. Tien Mai ◽

Marco Galardini ◽

Nicole E. Wheeler ◽

Jukka Corander

Keyword(s):

Genetic Variants ◽

Statistical Power ◽

Genome Wide Association Study ◽

Bacterial Species ◽

False Positive Rate ◽

Model Fitting ◽

Joint Modeling ◽

Inference Procedure ◽

Modeling Framework ◽

Narrow Sense Heritability

ABSTRACTDiscovery of influential genetic variants and prediction of phenotypes such as antibiotic resistance are becoming routine tasks in bacterial genomics. Genome-wide association study (GWAS) methods can be applied to study bacterial populations, with a particular emphasis on alignment-free approaches, which are necessitated by the more plastic nature of bacterial genomes. Here we advance bacterial GWAS by introducing a computationally scalable joint modeling framework, where genetic variants covering the entire pangenome are compactly represented by unitigs, and the model fitting is achieved using elastic net penalization. In contrast to current leading GWAS approaches, which test each genotype-phenotype association separately for each variant, our joint modelling approach is shown to lead to increased statistical power while maintaining control of the false positive rate. Our inference procedure also delivers an estimate of the narrow-sense heritability, which is gaining considerable interest in studies of bacteria. Using an extensive set of state-of-the-art bacterial population genomic datasets we demonstrate that our approach performs accurate phenotype prediction, comparable to popular machine learning methods, while retaining both interpretability and computational efficiency. We expect that these advances will pave the way for the next generation of high-powered association and prediction studies for an increasing number of bacterial species.

Download Full-text

Automatically Building Linking Relations between Lane-Level Map and Commercial Navigation Map Using Topological Networks Matching

Journal of Navigation ◽

10.1017/s0373463320000259 ◽

2020 ◽

Vol 73 (5) ◽

pp. 1159-1178

Author(s):

Lu Tao ◽

Pan Zhang ◽

Lixin Yan ◽

Dunyao Zhu

Keyword(s):

Real Data ◽

Autonomous Driving ◽

Data Sets ◽

Driving System ◽

Forward Path ◽

Level Information ◽

Data Source ◽

System Mapping ◽

Autonomous Driving System ◽

Mapping Information

The lane-level map, which contains the lane-level information severely lacking in widely used commercial navigation maps, has become an essential data source for autonomous driving systems. The linking relations between lane-level map and commercial navigation map can facilitate an autonomous driving system mapping information between different applications using different maps. In this paper, an approach is proposed to build the linking relations automatically. The different topology networks are first reconstructed into similar structures. Then, to build the linking relations automatically, the adaptive multi-filter algorithm and forward path exploring algorithm are proposed to detect corresponding junctions and paths, respectively. The approach is validated by two real data sets of more than 150 km of roads, mainly highway. The linking relations for nearly 94% of the total road length have been built successfully.

Download Full-text

Optimal selection of genetic variants for adjustment of population stratification in European association studies

Briefings in Bioinformatics ◽

10.1093/bib/bbz023 ◽

2019 ◽

Vol 21 (3) ◽

pp. 753-761 ◽

Cited By ~ 2

Author(s):

Regina Brinster ◽

Dominique Scherer ◽

Justo Lorenzo Bermejo

Keyword(s):

Genetic Variants ◽

Population Stratification ◽

Statistical Power ◽

Type I Error ◽

Association Studies ◽

Reference Sample ◽

Error Rates ◽

The Cancer Genome Atlas ◽

Type I ◽

Genotype Data

Abstract Population stratification is usually corrected relying on principal component analysis (PCA) of genome-wide genotype data, even in populations considered genetically homogeneous, such as Europeans. The need to genotype only a small number of genetic variants that show large differences in allele frequency among subpopulations—so-called ancestry-informative markers (AIMs)—instead of the whole genome for stratification adjustment could represent an advantage for replication studies and candidate gene/pathway studies. Here we compare the correction performance of classical and robust principal components (PCs) with the use of AIMs selected according to four different methods: the informativeness for assignment measure ($IN$-AIMs), the combination of PCA and F-statistics, PCA-correlated measurement and the PCA weighted loadings for each genetic variant. We used real genotype data from the Population Reference Sample and The Cancer Genome Atlas to simulate European genetic association studies and to quantify type I error rate and statistical power in different case–control settings. In studies with the same numbers of cases and controls per country and control-to-case ratios reflecting actual rates of disease prevalence, no adjustment for population stratification was required. The unnecessary inclusion of the country of origin, PCs or AIMs as covariates in the regression models translated into increasing type I error rates. In studies with cases and controls from separate countries, no investigated method was able to adequately correct for population stratification. The first classical and the first two robust PCs achieved the lowest (although inflated) type I error, followed at some distance by the first eight $IN$-AIMs.

Download Full-text

The asymptotic distribution of modularity in weighted signed networks

Biometrika ◽

10.1093/biomet/asaa059 ◽

2020 ◽

Author(s):

Rong Ma ◽

Ian Barnett

Keyword(s):

Community Structure ◽

Asymptotic Distribution ◽

Statistical Power ◽

Type I Error ◽

Real Data ◽

Asymptotic Distributions ◽

Edge Weight ◽

Type I ◽

Largest Eigenvalue ◽

The Largest Eigenvalue

Summary Modularity is a popular metric for quantifying the degree of community structure within a network. The distribution of the largest eigenvalue of a network’s edge weight or adjacency matrix is well studied and is frequently used as a substitute for modularity when performing statistical inference. However, we show that the largest eigenvalue and modularity are asymptotically uncorrelated, which suggests the need for inference directly on modularity itself when the network is large. To this end, we derive the asymptotic distribution of modularity in the case where the network’s edge weight matrix belongs to the Gaussian orthogonal ensemble, and study the statistical power of the corresponding test for community structure under some alternative models. We empirically explore universality extensions of the limiting distribution and demonstrate the accuracy of these asymptotic distributions through Type I error simulations. We also compare the empirical powers of the modularity-based tests and some existing methods. Our method is then used to test for the presence of community structure in two real data applications.

Download Full-text

Factorial Mendelian randomization: using genetic variants to assess interactions

International Journal of Epidemiology ◽

10.1093/ije/dyz161 ◽

2019 ◽

Vol 49 (4) ◽

pp. 1147-1158 ◽

Cited By ~ 6

Author(s):

Jessica M B Rees ◽

Christopher N Foley ◽

Stephen Burgess

Keyword(s):

Risk Factors ◽

Instrumental Variables ◽

Genetic Variants ◽

Mendelian Randomization ◽

Real Data ◽

Uk Biobank ◽

Pharmacological Interventions ◽

Natural Break ◽

Using Data ◽

Randomization Analysis

Abstract Background Factorial Mendelian randomization is the use of genetic variants to answer questions about interactions. Although the approach has been used in applied investigations, little methodological advice is available on how to design or perform a factorial Mendelian randomization analysis. Previous analyses have employed a 2 × 2 approach, using dichotomized genetic scores to divide the population into four subgroups as in a factorial randomized trial. Methods We describe two distinct contexts for factorial Mendelian randomization: investigating interactions between risk factors, and investigating interactions between pharmacological interventions on risk factors. We propose two-stage least squares methods using all available genetic variants and their interactions as instrumental variables, and using continuous genetic scores as instrumental variables rather than dichotomized scores. We illustrate our methods using data from UK Biobank to investigate the interaction between body mass index and alcohol consumption on systolic blood pressure. Results Simulated and real data show that efficiency is maximized using the full set of interactions between genetic variants as instruments. In the applied example, between 4- and 10-fold improvement in efficiency is demonstrated over the 2 × 2 approach. Analyses using continuous genetic scores are more efficient than those using dichotomized scores. Efficiency is improved by finding genetic variants that divide the population at a natural break in the distribution of the risk factor, or else divide the population into more equal-sized groups. Conclusions Previous factorial Mendelian randomization analyses may have been underpowered. Efficiency can be improved by using all genetic variants and their interactions as instrumental variables, rather than the 2 × 2 approach.

Download Full-text