A powerful and efficient two-stage method for detecting gene-to-gene interactions in GWAS

Jakub Pecanka; Marianne A. Jonker; Zoltan Bochdanovits; Aad W. Van Der Vaart;

doi:10.1093/biostatistics/kxw060

A powerful and efficient two-stage method for detecting gene-to-gene interactions in GWAS

Biostatistics ◽

10.1093/biostatistics/kxw060 ◽

2017 ◽

Vol 18 (3) ◽

pp. 477-494 ◽

Cited By ~ 5

Author(s):

Jakub Pecanka ◽

Marianne A. Jonker ◽

Zoltan Bochdanovits ◽

Aad W. Van Der Vaart ◽

Keyword(s):

Complex Traits ◽

Multiple Testing ◽

Statistical Power ◽

Genome Wide Association Study ◽

Score Test ◽

Interaction Model ◽

Type I ◽

Two Stage ◽

Genome Wide ◽

Strong Control

Summary For over a decade functional gene-to-gene interaction (epistasis) has been suspected to be a determinant in the “missing heritability” of complex traits. However, searching for epistasis on the genome-wide scale has been challenging due to the prohibitively large number of tests which result in a serious loss of statistical power as well as computational challenges. In this article, we propose a two-stage method applicable to existing case-control data sets, which aims to lessen both of these problems by pre-assessing whether a candidate pair of genetic loci is involved in epistasis before it is actually tested for interaction with respect to a complex phenotype. The pre-assessment is based on a two-locus genotype independence test performed in the sample of cases. Only the pairs of loci that exhibit non-equilibrium frequencies are analyzed via a logistic regression score test, thereby reducing the multiple testing burden. Since only the computationally simple independence tests are performed for all pairs of loci while the more demanding score tests are restricted to the most promising pairs, genome-wide association study (GWAS) for epistasis becomes feasible. By design our method provides strong control of the type I error. Its favourable power properties especially under the practically relevant misspecification of the interaction model are illustrated. Ready-to-use software is available. Using the method we analyzed Parkinson’s disease in four cohorts and identified possible interactions within several SNP pairs in multiple cohorts.

Download Full-text

Genome-wide analyses in 1,987,836 participants identify 39 genetic loci associated with sleep apnoea

10.1101/2020.09.29.20199893 ◽

2020 ◽

Author(s):

Adrian I Campos ◽

Nathan Ingold ◽

Yunru Huang ◽

Pik Fang Kho ◽

Xikun Han ◽

...

Keyword(s):

Complex Traits ◽

Statistical Power ◽

Genome Wide Association Study ◽

Meta Analysis ◽

Genetic Correlations ◽

Sleep Apnoea ◽

Risk Scores ◽

Genetic Associations ◽

Genome Wide ◽

A Genome

Rationale: Sleep apnoea is a complex disorder characterised by periods of halted breathing during sleep. Despite its association with serious health conditions such as cardiovascular disease, the aetiology of sleep apnoea remains understudied, and previous genetic studies have failed to identify replicable genetic risk factors. Objective: To advance our understanding of factors that increase susceptibility to sleep apnoea by identifying novel genetic associations. Methods: We conducted a genome-wide association study (GWAS) meta-analysis of sleep apnoea across five cohorts, and a previously published GWAS of apnoea-hypopnea index (N Total =510,484). Further, we used multi-trait analysis of GWAS (MTAG) to boost statistical power, leveraging the high genetic correlations between apnoea, snoring and body mass index. Replication was performed in an independent sample from 23andMe, Inc (N Total =1,477,352; N cases =175,522). Results: Our results revealed 39 independent genomic loci robustly associated with sleep apnoea risk, and significant genetic correlations with multisite chronic pain, sleep disorders, diabetes, high blood pressure, osteoarthritis, asthma and BMI-related traits. We also derived polygenic risk scores for sleep apnoea in a leave-one-out independent cohort and predicted probable sleep apnoea in participants (OR=1.15 to 1.22; variance explained = 0.4 to 0.9%). Conclusions: We report novel genetic markers robustly associated with sleep apnoea risk and substantial molecular overlap with other complex traits, thus advancing our understanding of the underlying biological mechanisms of susceptibility to sleep apnoea.

Download Full-text

Genome-wide Marginal Epistatic Association Mapping in Case-Control Studies

10.1101/374983 ◽

2018 ◽

Cited By ~ 1

Author(s):

Lorin Crawford ◽

Xiang Zhou

Keyword(s):

Complex Traits ◽

Statistical Power ◽

Association Studies ◽

Computational Cost ◽

Case Control ◽

Type I ◽

Genome Wide Association Studies ◽

Case Control Studies ◽

Control Data ◽

Genome Wide

AbstractEpistasis, commonly defined as the interaction between genetic loci, is an important contributor to the genetic architecture underlying many complex traits and common diseases. Most existing epistatic mapping methods in genome-wide association studies explicitly search over all pairwise or higher-order interactions. However, due to the potentially large search space and the resulting multiple testing burden, these conventional approaches often suffer from heavy computational cost and low statistical power. A recently proposed attractive alternative for mapping epistasis focuses instead on detecting marginal epistasis, which is defined as the combined pairwise interaction effects between a given variant and all other variants. By searching for marginal epistatic effects, one can identify genetic variants that are involved in epistasis without the need to identify the exact partners with which the variants interact — thus, potentially alleviating much of the statistical and computational burden associated with conventional epistatic mapping procedures. However, previous marginal epistatic mapping methods are based on quantitative trait models. As we will show here, these lack statistical power in case-control studies. Here, we develop a liability threshold mixed model that extends marginal epistatic mapping to case-control studies. Our method properly accounts for case-control ascertainment and the binary nature of case-control data. We refer to this method as the liability threshold marginal epistasis test (LT-MAPIT). With simulations, we illustrate the benefits of LT-MAPIT in terms of providing effective type I error control, and being more powerful than both existing marginal epistatic mapping methods and conventional explicit search-based approaches in case-control data. We finally apply LT-MAPIT to identify both marginal and pairwise epistasis in seven complex diseases from the Wellcome Trust Case Control Consortium (WTCCC) 1 study.

Download Full-text

Integration of multidimensional splicing data and GWAS summary statistics for risk gene discovery

10.1101/2021.09.13.460009 ◽

2021 ◽

Author(s):

Ying Ji ◽

Qiang Wei ◽

Rui Chen ◽

Quan Wang ◽

Ran Tao ◽

...

Keyword(s):

Correlation Analysis ◽

Complex Traits ◽

Multiple Testing ◽

Genome Wide Association ◽

Type I ◽

Summary Statistics ◽

Expression Data ◽

Genome Wide ◽

Causal Genes ◽

Trait Associations

AbstractA common strategy for the functional interpretation of genome-wide association study (GWAS) findings has been the integrative analysis of GWAS and expression data. Using this strategy, many association methods (e.g., PrediXcan and FUSION) have been successful in identifying trait-associated genes via mediating effects on RNA expression. However, these approaches often ignore the effects of splicing, which carries as much disease risk as expression. Compared to expression data, one challenge to detect associations using splicing data is the large multiple testing burden due to multidimensional splicing events within genes. Here, we introduce a multidimensional splicing gene (MSG) approach, which consists of two stages: 1) we use sparse canonical correlation analysis (sCCA) to construct latent canonical vectors (CVs) by identifying sparse linear combinations of genetic variants and splicing events that are maximally correlated with each other; and 2) we test for the association between the genetically regulated splicing CVs and the trait of interest using GWAS summary statistics. Simulations show that MSG has proper type I error control and substantial power gains over existing multidimensional expression analysis methods (i.e., S-MultiXcan, UTMOST, and sCCA+ACAT) under diverse scenarios. When applied to the Genotype-Tissue Expression Project data and GWAS summary statistics of 14 complex human traits, MSG identified on average 83%, 115%, and 223% more significant genes than sCCA+ACAT, S-MultiXcan, and UTMOST, respectively. We highlight MSG’s applications to Alzheimer’s disease, low-density lipoprotein cholesterol, and schizophrenia, and found that the majority of MSG-identified genes would have been missed from expression-based analyses. Our results demonstrate that aggregating splicing data through MSG can improve power in identifying gene-trait associations and help better understand the genetic risk of complex traits.Author summaryWhile genome-wide association studies (GWAS) have successfully mapped thousands of loci associated with complex traits, it remains difficult to identify which genes they regulate and in which biological contexts. This interpretation challenge has motivated the development of computational methods to prioritize causal genes at GWAS loci. Most available methods have focused on linking risk variants with differential gene expression. However, genetic control of splicing and expression are comparable in their complex trait risk, and few studies have focused on identifying causal genes using splicing information. To study splicing mediated effects, one important statistical challenge is the large multiple testing burden generated from multidimensional splicing events. In this study, we develop a new approach, MSG, to test the mediating role of splicing variation on complex traits. We integrate multidimensional splicing data using sparse canonocial correlation analysis and then combine evidence for splicing-trait associations across features using a joint test. We show this approach has higher power to identify causal genes using splicing data than current state-of-art methods designed to model multidimensional expression data. We illustrate the benefits of our approach through extensive simulations and applications to real data sets of 14 complex traits.

Download Full-text

GxEsum: a novel approach to estimate the phenotypic variance explained by genome-wide GxE interaction based on GWAS summary statistics for biobank-scale data

Genome Biology ◽

10.1186/s13059-021-02403-1 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Jisu Shin ◽

Sang Hong Lee

Keyword(s):

Complex Traits ◽

Error Rates ◽

Type I ◽

Phenotypic Variance ◽

Environment Interaction ◽

Summary Statistics ◽

Gxe Interaction ◽

Genome Wide ◽

Scale Data ◽

Variance Explained

AbstractGenetic variation in response to the environment, that is, genotype-by-environment interaction (GxE), is fundamental in the biology of complex traits and diseases. However, existing methods are computationally demanding and infeasible to handle biobank-scale data. Here, we introduce GxEsum, a method for estimating the phenotypic variance explained by genome-wide GxE based on GWAS summary statistics. Through comprehensive simulations and analysis of UK Biobank with 288,837 individuals, we show that GxEsum can handle a large-scale biobank dataset with controlled type I error rates and unbiased GxE estimates, and its computational efficiency can be hundreds of times higher than existing GxE methods.

Download Full-text

Exploiting the GTEx resources to decipher the mechanisms at GWAS loci

Genome Biology ◽

10.1186/s13059-020-02252-4 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Alvaro N. Barbeira ◽

◽

Rodrigo Bonazzola ◽

Eric R. Gamazon ◽

Yanyu Liang ◽

...

Keyword(s):

Complex Traits ◽

Target Genes ◽

Genome Wide Association Study ◽

Data Driven ◽

Functional Interpretation ◽

Transcriptome Regulation ◽

Genome Wide ◽

Causal Genes ◽

Dose Dependent ◽

Single Approach

AbstractThe resources generated by the GTEx consortium offer unprecedented opportunities to advance our understanding of the biology of human diseases. Here, we present an in-depth examination of the phenotypic consequences of transcriptome regulation and a blueprint for the functional interpretation of genome-wide association study-discovered loci. Across a broad set of complex traits and diseases, we demonstrate widespread dose-dependent effects of RNA expression and splicing. We develop a data-driven framework to benchmark methods that prioritize causal genes and find no single approach outperforms the combination of multiple approaches. Using colocalization and association approaches that take into account the observed allelic heterogeneity of gene expression, we propose potential target genes for 47% (2519 out of 5385) of the GWAS loci examined.

Download Full-text

Analysis of genetic dominance in the UK Biobank

10.1101/2021.08.15.456387 ◽

2021 ◽

Author(s):

Duncan S Palmer ◽

Wei Zhou ◽

Liam Abbott ◽

Nik Baya ◽

Claire Churchhouse ◽

...

Keyword(s):

Complex Traits ◽

Multiple Testing ◽

Model Organisms ◽

Systematic Evaluation ◽

Hair Color ◽

Phenotypic Variance ◽

Additive Effects ◽

Uk Biobank ◽

Genome Wide ◽

The Uk

In classical statistical genetic theory, a dominance effect is defined as the deviation from a purely additive genetic effect for a biallelic variant. Dominance effects are well documented in model organisms. However, evidence in humans is limited to a handful of traits, particularly those with strong single locus effects such as hair color. We carried out the largest systematic evaluation of dominance effects on phenotypic variance in the UK Biobank. We curated and tested over 1,000 phenotypes for dominance effects through GWAS scans, identifying 175 loci at genome-wide significance correcting for multiple testing (P < 4.7 × 10-11). Power to detect non-additive loci is much lower than power to detect additive effects for complex traits: based on the relative effect sizes at genome-wide significant additive loci, we estimate a factor of 20-30 increase in sample size will be necessary to capture clear evidence of dominance similar to those currently observed for additive effects. However, these localised dominance hits do not extend to a significant aggregate contribution to phenotypic variance genome-wide. By deriving a version of LD-score regression to detect dominance effects tagged by common variation genome-wide (minor allele frequency > 0.05), we found no strong evidence of a contribution to phenotypic variance when accounting for multiple testing. Across the 267 continuous and 793 binary traits the median contribution was 5.73 × 10-4, with unbiased point estimates ranging from -0.261 to 0.131. Finally, we introduce dominance fine-mapping to explore whether the more rapid decay of dominance LD can be leveraged to find causal variants. These results provide the most comprehensive assessment of dominance trait variation in humans to date.

Download Full-text

Genome-wide association study identifies 48 common genetic variants associated with handedness

10.1101/831321 ◽

2019 ◽

Author(s):

Gabriel Cuellar Partida ◽

Joyce Y Tung ◽

Nicholas Eriksson ◽

Eva Albrecht ◽

Fazil Aliev ◽

...

Keyword(s):

Association Study ◽

Genetic Variants ◽

Complex Traits ◽

Genome Wide Association Study ◽

Genetic Correlations ◽

Genome Wide Association ◽

Left Handedness ◽

Left Handed ◽

Genome Wide ◽

Common Genetic Variants

AbstractHandedness, a consistent asymmetry in skill or use of the hands, has been studied extensively because of its relationship with language and the over-representation of left-handers in some neurodevelopmental disorders. Using data from the UK Biobank, 23andMe and 32 studies from the International Handedness Consortium, we conducted the world’s largest genome-wide association study of handedness (1,534,836 right-handed, 194,198 (11.0%) left-handed and 37,637 (2.1%) ambidextrous individuals). We found 41 genetic loci associated with left-handedness and seven associated with ambidexterity at genome-wide levels of significance (P < 5×10−8). Tissue enrichment analysis implicated the central nervous system and brain tissues including the hippocampus and cerebrum in the etiology of left-handedness. Pathways including regulation of microtubules, neurogenesis, axonogenesis and hippocampus morphology were also highlighted. We found suggestive positive genetic correlations between being left-handed and some neuropsychiatric traits including schizophrenia and bipolar disorder. SNP heritability analyses indicated that additive genetic effects of genotyped variants explained 5.9% (95% CI = 5.8% – 6.0%) of the underlying liability of being left-handed, while the narrow sense heritability was estimated at 12% (95% CI = 7.2% – 17.7%). Further, we show that genetic correlation between left-handedness and ambidexterity is low (rg = 0.26; 95% CI = 0.08 – 0.43) implying that these traits are largely influenced by different genetic mechanisms. In conclusion, our findings suggest that handedness, like many other complex traits is highly polygenic, and that the genetic variants that predispose to left-handedness may underlie part of the association with some psychiatric disorders that has been observed in multiple observational studies.

Download Full-text

Genome Wide Association Study in the New Haven Lexinome Project Identifies GARRE1 as a Novel Gene for Reading Performance

10.1101/2021.01.05.423827 ◽

2021 ◽

Author(s):

Andrew K. Adams ◽

Emily L. Guertin ◽

Dongnhu T. Truong ◽

Elizabeth G. Atkinson ◽

Mellissa M.C. DeMille ◽

...

Keyword(s):

Association Study ◽

Genome Wide Association Study ◽

Reading Performance ◽

Genome Wide Association ◽

Type I ◽

Clinical Effects ◽

New Haven ◽

Chromosome 19 ◽

Genome Wide ◽

Minor Alleles

AbstractDespite high prevalence and high heritability, few candidate genes have been identified for reading disability. To identify novel genetic variants we performed a genome-wide association study (GWAS) using high-depth whole genome sequencing and predicated on reading performance in 407 subjects enrolled in a longitudinal study of response-to-intervention, called the New Haven Lexinome Project. The primary GWAS identified a single peak of 31 SNPs on chromosome 19 that achieved the threshold for genome-wide significance (rs2599553 P=3.13×10−8) located over an expression quantitative trait locus (eQTL) for GARRE1 (Granule Associated Rac And RHOG Effector 1). Little is known about the function of GARRE1, except that it is highly and developmentally expressed in human cerebellum relative to cortex. Local ancestry regression showed the strongest association for the lead variant in African or Admixed American populations, who have been under-represented in previous genetic studies of reading. We replicated our chromosome 19 results in the Genes, Reading, and Dyslexia (GRaD) cohort and found a moderating effect of age with implications for the consideration of developmental effects in the design of future analyses. Growth curve modeling demonstrated that minor alleles of the lead SNP are related to reading longitudinally from Grade 1 to Grade 5, and that children with at least 1 minor allele of rs2599553 persistently underperformed relative to their peers by 0.33 to 0.5 standard deviations on standardized assessments of non-word decoding and reading fluency.Significance StatementTo the best of our knowledge, this work represents the only GWAS predicated on longitudinal reading performance data. Starting with initial discovery, we replicate our association in a second cohort, address common causes of type I error, localize the signal to a single gene, implicate a region of the brain most likely to be affected by variation in our candidate, show a gene-by-age effect with implications for study design in this field, and demonstrate that minor alleles of our lead SNP are associated with significant and persistent clinical effects on reading development in children.

Download Full-text

The harmonic mean p-value for combining dependent tests

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1814092116 ◽

2019 ◽

Vol 116 (4) ◽

pp. 1195-1200 ◽

Cited By ~ 43

Author(s):

Daniel J. Wilson

Keyword(s):

Multiple Testing ◽

Statistical Power ◽

Scientific Discovery ◽

Association Studies ◽

Harmonic Mean ◽

P Value ◽

Genome Wide Association Studies ◽

Familywise Error Rate ◽

Significance Threshold ◽

Genome Wide

Analysis of “big data” frequently involves statistical comparison of millions of competing hypotheses to discover hidden processes underlying observed patterns of data, for example, in the search for genetic determinants of disease in genome-wide association studies (GWAS). Controlling the familywise error rate (FWER) is considered the strongest protection against false positives but makes it difficult to reach the multiple testing-corrected significance threshold. Here, I introduce the harmonic mean p-value (HMP), which controls the FWER while greatly improving statistical power by combining dependent tests using generalized central limit theorem. I show that the HMP effortlessly combines information to detect statistically significant signals among groups of individually nonsignificant hypotheses in examples of a human GWAS for neuroticism and a joint human–pathogen GWAS for hepatitis C viral load. The HMP simultaneously tests all ways to group hypotheses, allowing the smallest groups of hypotheses that retain significance to be sought. The power of the HMP to detect significant hypothesis groups is greater than the power of the Benjamini–Hochberg procedure to detect significant hypotheses, although the latter only controls the weaker false discovery rate (FDR). The HMP has broad implications for the analysis of large datasets, because it enhances the potential for scientific discovery.

Download Full-text

Bayesian Two-Stage Adaptive Design in Bioequivalence

The International Journal of Biostatistics ◽

10.1515/ijb-2018-0105 ◽

2019 ◽

Vol 16 (1) ◽

Cited By ~ 2

Author(s):

Shengjie Liu ◽

Jun Gao ◽

Yuling Zheng ◽

Lei Huang ◽

Fangrong Yan

Keyword(s):

Statistical Power ◽

Adaptive Design ◽

Type I Error ◽

Probability Model ◽

Type I ◽

Two Stage ◽

Stage Design ◽

Estimation Strategy ◽

Drug Products ◽

Two Stage Design

AbstractBioequivalence (BE) studies are an integral component of new drug development process, and play an important role in approval and marketing of generic drug products. However, existing design and evaluation methods are basically under the framework of frequentist theory, while few implements Bayesian ideas. Based on the bioequivalence predictive probability model and sample re-estimation strategy, we propose a new Bayesian two-stage adaptive design and explore its application in bioequivalence testing. The new design differs from existing two-stage design (such as Potvin’s method B, C) in the following aspects. First, it not only incorporates historical information and expert information, but further combines experimental data flexibly to aid decision-making. Secondly, its sample re-estimation strategy is based on the ratio of the information in interim analysis to total information, which is simpler in calculation than the Potvin’s method. Simulation results manifested that the two-stage design can be combined with various stop boundary functions, and the results are different. Moreover, the proposed method saves sample size compared to the Potvin’s method under the conditions that type I error rate is below 0.05 and statistical power reaches 80 %.

Download Full-text