scholarly journals Joint genetic analysis using variant sets reveals polygenic gene-context interactions

2016 ◽  
Author(s):  
Francesco Paolo Casale ◽  
Danilo Horta ◽  
Barbara Rakitsch ◽  
Oliver Stegle

AbstractJoint genetic models for multiple traits have helped to enhance association analyses. Most existing multi-trait models have been designed to increase power for detecting associations, whereas the analysis of interactions has received considerably less attention. Here, we propose iSet, a method based on linear mixed models to test for interactions between sets of variants and environmental states or other contexts. Our model generalizes previous interaction tests and in particular provides a test for local differences in the genetic architecture between contexts. We first use simulations to validate iSet before applying the model to the analysis of genotype-environment interactions in an eQTL study. Our model retrieves a larger number of interactions than alternative methods and reveals that up to 20% of cases show context-specific configurations of causal variants. Finally, we apply iSet to test for sub-group specific genetic effects in human lipid levels in a large human cohort, where we identify a gene-sex interaction for C-reactive protein that is missed by alternative methods.Author summaryGenetic effects on phenotypes can depend on external contexts, including environment. Statistical tests for identifying such interactions are important to understand how individual genetic variants may act in different contexts. Interaction effects can either be studied using measurements of a given phenotype in different contexts, under the same genetic backgrounds, or by stratifying a population into subgroups. Here, we derive a method based on linear mixed models that can be applied to both of these designs. iSet enables testing for interactions between context and sets of variants, and accounts for polygenic effects. We validate our model using simulations, before applying it to the genetic analysis of gene expression studies and genome-wide association studies of human blood lipid levels. We find that modeling interactions with variant sets offers increased power, thereby uncovering interactions that cannot be detected by alternative methods.

2017 ◽  
Author(s):  
Carl Kadie ◽  
David Heckerman

AbstractWe have developed Ludicrous Speed Linear Mixed Models, a version of FaST-LMM optimized for the cloud. The approach can perform a genome-wide association analysis on a dataset of one million SNPs across one million individuals at a cost of about 868 CPU days with an elapsed time on the order of two weeks. A Python implementation is available at https://fastlmm.github.io/.SignificanceIdentifying SNP-phenotype correlations using GWAS is difficult because effect sizes are so small for common, complex diseases. To address this issue, institutions are creating extremely large cohorts with sample sizes on the order of one million. Unfortunately, such cohorts are likely to contain confounding factors such as population structure and family/cryptic relatedness. The linear mixed model (LMM) can often correct for such confounding factors, but is too slow to use even with algebraic speedups known as FaST-LMM. We present a cloud implementation of FaST-LMM, called Ludicrous Speed LMM, that can process one million samples and one million test SNPs in a reasonable amount of time and at a reasonable cost.


2012 ◽  
Vol 9 (6) ◽  
pp. 525-526 ◽  
Author(s):  
Jennifer Listgarten ◽  
Christoph Lippert ◽  
Carl M Kadie ◽  
Robert I Davidson ◽  
Eleazar Eskin ◽  
...  

2021 ◽  
Vol 12 ◽  
Author(s):  
Zheng Ning ◽  
Yakov A. Tsepilov ◽  
Sodbo Zh. Sharapov ◽  
Zhipeng Wang ◽  
Alexander K. Grishenko ◽  
...  

The ever-growing genome-wide association studies (GWAS) have revealed widespread pleiotropy. To exploit this, various methods that jointly consider associations of a genetic variant with multiple traits have been developed. Most efforts have been made concerning improving GWAS discovery power. However, how to replicate these discovered pleiotropic loci has yet to be discussed thoroughly. Unlike a single-trait scenario, multi-trait replication is not trivial considering the underlying genotype-multi-phenotype map of the associations. Here, we evaluate four methods for replicating multi-trait associations, corresponding to four levels of replication strength. Weak replication cannot justify pleiotropic genetic effects, whereas strong replication using our developed correlation methods can inform consistent pleiotropic genetic effects across the discovery and replication samples. We provide a protocol for replicating multi-trait genetic associations in practice. The described methods are implemented in the free and open-source R package MultiABEL.


2020 ◽  
Author(s):  
Eiji Yamamoto ◽  
Hiroshi Matsunaga

ABSTRACTGenotype-by-environment interactions (GEIs) are important for not only a precise understanding of genotype–phenotype relationships but also improving the environmental adaptability of crops. Although many formulae have been proposed to model GEI effects, a comprehensive comparison of their efficacy in genome-wide association studies (GWASs) has not been performed. Therefore, the advantages and disadvantages of the formulae are not well recognized. In this study, linear mixed models (LMMs) consisting of various combinations of foreground fixed genetic and background random genetic effect terms were constructed. Next, the power to detect quantitative trait loci (QTLs) with GEI effects was compared across the LMMs by using simulation. The fixed genetic effect terms of the genotype main effects and GEI (GGE) model were preferred over those based on the additive main effects and multiplicative interaction (AMMI) model because the latter showed p-value inflation, whereas the former yielded theoretically expected p-value distribution. With regard to the background random genetic effects, inclusion of genotype-by-trial interaction effects has been recommended to achieve high power and robustness when phenotype data are obtained from multiple environments and multiple trials. The recommended form of LMM was applied to GWASs performed using real agronomic trait data of tomato F1 varieties (Solanum lycopersicum L.) that were obtained from two different cropping seasons. The GWASs detected QTLs with not only persistent effects across the cropping seasons, but also cropping season-specific effects. Thus, the application of GWAS strategy to phenotypic data from multiple environments and trials allowed the detection of more QTLs with GEI effects.


2011 ◽  
Vol 8 (10) ◽  
pp. 833-835 ◽  
Author(s):  
Christoph Lippert ◽  
Jennifer Listgarten ◽  
Ying Liu ◽  
Carl M Kadie ◽  
Robert I Davidson ◽  
...  

2014 ◽  
Vol 4 (1) ◽  
Author(s):  
Christian Widmer ◽  
Christoph Lippert ◽  
Omer Weissbrod ◽  
Nicolo Fusi ◽  
Carl Kadie ◽  
...  

2019 ◽  
Vol 17 (06) ◽  
pp. 1940012
Author(s):  
Yuan Liu ◽  
Yongchao Ma ◽  
Evan Salsman ◽  
Frank A. Manthey ◽  
Elias M. Elias ◽  
...  

Mapping short reads to a reference genome is an essential step in many next-generation sequencing (NGS) analyses. In plants with large genomes, a large fraction of the reads can align to multiple locations of the genome with equally good alignment scores. How to map these ambiguous reads to the genome is a challenging problem with big impacts on the downstream analysis. Traditionally, the default method is to assign an ambiguous read randomly to one of the many potential locations. In this study, we explore two alternative methods that are based on the hypothesis that the possibility of an ambiguous read being generated by a location is proportional to the total number of reads produced by that location: (1) the enrichment method that assigns an ambiguous read to the location that has produced the most reads among all the potential locations, (2) the probability method that assigns an ambiguous read to a location based on a probability proportional to the number of reads the location produces. We systematically compared the performance of the proposed methods with that of the default random method. Our results showed that the enrichment method produced better results than the default random method and the probability method in the discovery of single nucleotide polymorphisms (SNPs). Not only did it produce more SNP markers, but it also produced SNP markers with better quality, which was demonstrated using multiple mainstay genomic analyses, including genome-wide association studies (GWAS), minor allele distribution, population structure, and genomic prediction.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Camilo Broc ◽  
Therese Truong ◽  
Benoit Liquet

Abstract Background The increasing number of genome-wide association studies (GWAS) has revealed several loci that are associated to multiple distinct phenotypes, suggesting the existence of pleiotropic effects. Highlighting these cross-phenotype genetic associations could help to identify and understand common biological mechanisms underlying some diseases. Common approaches test the association between genetic variants and multiple traits at the SNP level. In this paper, we propose a novel gene- and a pathway-level approach in the case where several independent GWAS on independent traits are available. The method is based on a generalization of the sparse group Partial Least Squares (sgPLS) to take into account groups of variables, and a Lasso penalization that links all independent data sets. This method, called joint-sgPLS, is able to convincingly detect signal at the variable level and at the group level. Results Our method has the advantage to propose a global readable model while coping with the architecture of data. It can outperform traditional methods and provides a wider insight in terms of a priori information. We compared the performance of the proposed method to other benchmark methods on simulated data and gave an example of application on real data with the aim to highlight common susceptibility variants to breast and thyroid cancers. Conclusion The joint-sgPLS shows interesting properties for detecting a signal. As an extension of the PLS, the method is suited for data with a large number of variables. The choice of Lasso penalization copes with architectures of groups of variables and observations sets. Furthermore, although the method has been applied to a genetic study, its formulation is adapted to any data with high number of variables and an exposed a priori architecture in other application fields.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Nadav Brandes ◽  
Nathan Linial ◽  
Michal Linial

AbstractThe characterization of germline genetic variation affecting cancer risk, known as cancer predisposition, is fundamental to preventive and personalized medicine. Studies of genetic cancer predisposition typically identify significant genomic regions based on family-based cohorts or genome-wide association studies (GWAS). However, the results of such studies rarely provide biological insight or functional interpretation. In this study, we conducted a comprehensive analysis of cancer predisposition in the UK Biobank cohort using a new gene-based method for detecting protein-coding genes that are functionally interpretable. Specifically, we conducted proteome-wide association studies (PWAS) to identify genetic associations mediated by alterations to protein function. With PWAS, we identified 110 significant gene-cancer associations in 70 unique genomic regions across nine cancer types and pan-cancer. In 48 of the 110 PWAS associations (44%), estimated gene damage is associated with reduced rather than elevated cancer risk, suggesting a protective effect. Together with standard GWAS, we implicated 145 unique genomic loci with cancer risk. While most of these genomic regions are supported by external evidence, our results also highlight many novel loci. Based on the capacity of PWAS to detect non-additive genetic effects, we found that 46% of the PWAS-significant cancer regions exhibited exclusive recessive inheritance. These results highlight the importance of recessive genetic effects, without relying on familial studies. Finally, we show that many of the detected genes exert substantial cancer risk in the studied cohort determined by a quantitative functional description, suggesting their relevance for diagnosis and genetic consulting.


Sign in / Sign up

Export Citation Format

Share Document