scholarly journals TVAR: Assessing Tissue-specific Functional Effects of Non-coding Variants with Deep Learning

Author(s):  
Hai Yang ◽  
Rui Chen ◽  
Quan Wang ◽  
Qiang Wei ◽  
Ying Ji ◽  
...  

Abstract Analysis of whole genome-sequencing (WGS) for genetics of disease is still a challenge due to lack of accurate functional annotation of noncoding variants, especially the rare ones. As eQTLs have been extensively implicated in genetics of human diseases, we hypothesize that noncoding rare variants discovered in WGS play a regulatory role in predisposing disease risk. With thousands of tissue- and cell type-specific epigenomic features, we propose TVAR, a multi-label learning based deep neural network that predicts the functionality of noncoding variants in the genome based on eQTLs across 49 human tissues in GTEx. TVAR learns the relationships between high-dimensional epigenomics and eQTLs across tissues, taking the correlation among tissues into account to learn shared and tissue-specific eQTL effects. As a result, TVAR outputs tissue-specific annotations, with an average of 0.77 across these tissues. We evaluate TVAR’s performance on four complex diseases (coronary artery disease, breast cancer, Type 2 diabetes, and Schizophrenia), using TVAR’s tissue-specific annotations, and observe its superior performance in predicting functional variants for both common and rare variants, compared to five existing state-of-the-art tools. We further evaluate TVAR’s G-score, a scoring scheme across all tissues, on ClinVar, fine-mapped GWAS loci, Massive Parallel Reporter Assay (MPRA) validated variants, and observe consistently better performance of TVAR compared to other competing tools.

2019 ◽  
Vol 3 (Supplement_1) ◽  
pp. S221-S221
Author(s):  
Luke C Pilling ◽  
Luigi Ferrucci ◽  
David Melzer

Abstract Thousands of loci across the genome have been identified for specific diseases in genome-wide association studies (GWAS), yet very few are associated with lifespan itself. We hypothesized that specific biological pathways transcend individual diseases and affect health and lifespan more broadly. Using the published results for the most recent GWAS for 10 key age-related diseases (including coronary artery disease, type-2 diabetes, and several cancers) we identified 22 loci with a strong genetic association with at least three of the diseases. These multi-trait aging loci include known genes affecting multiple diverse health end points, such as CDKN2A/B (9p21.3) and APOE. There are also novel multi-trait genes including SH2B3 and CASC8, likely involved in hallmark pathways of aging biology, including telomere shortening and inflammation. Several of these loci involve trade-offs between chronic disease risk and cancer.


2011 ◽  
Vol 57 (2) ◽  
pp. 241-254 ◽  
Author(s):  
Emma Ahlqvist ◽  
Tarunveer Singh Ahluwalia ◽  
Leif Groop

BACKGROUND Type 2 diabetes (T2D) is a complex disorder that is affected by multiple genetic and environmental factors. Extensive efforts have been made to identify the disease-affecting genes to better understand the disease pathogenesis, find new targets for clinical therapy, and allow prediction of disease. CONTENT Our knowledge about the genes involved in disease pathogenesis has increased substantially in recent years, thanks to genomewide association studies and international collaborations joining efforts to collect the huge numbers of individuals needed to study complex diseases on a population level. We have summarized what we have learned so far about the genes that affect T2D risk and their functions. Although more than 40 loci associated with T2D or glycemic traits have been reported and reproduced, only a minor part of the genetic component of the disease has been explained, and the causative variants and affected genes are unknown for many of the loci. SUMMARY Great advances have recently occurred in our understanding of the genetics of T2D, but much remains to be learned about the disease etiology. The genetics of T2D has so far been driven by technology, and we now hope that next-generation sequencing will provide important information on rare variants with stronger effects. Even when variants are known, however, great effort will be required to discover how they affect disease risk.


2015 ◽  
Vol 242 (1) ◽  
pp. 334-339 ◽  
Author(s):  
Sabrina Prudente ◽  
Diego Bailetti ◽  
Christine Mendonca ◽  
Gaia Chiara Mannino ◽  
Andrea Fontana ◽  
...  

2021 ◽  
Author(s):  
Aleksejs Sazonovs ◽  
Christine R Stevens ◽  
Guhan R Venkataraman ◽  
Kai Yuan ◽  
Brandon Avila ◽  
...  

Genome-wide association studies (GWAS) have identified hundreds of loci associated with Crohns disease (CD); however, as with all complex diseases, deriving pathogenic mechanisms from these non-coding GWAS discoveries has been challenging. To complement GWAS and better define actionable biological targets, we analysed sequence data from more than 30,000 CD cases and 80,000 population controls. We observe rare coding variants in established CD susceptibility genes as well as ten genes where coding variation directly implicates the gene in disease risk for the first time.


2020 ◽  
Author(s):  
Ricky Lali ◽  
Michael Chong ◽  
Arghavan Omidi ◽  
Pedrum Mohammadi-Shemirani ◽  
Ann Le ◽  
...  

ABSTRACTRare variants are collectively numerous and may underlie a considerable proportion of complex disease risk. However, identifying genuine rare variant associations is challenging due to small effect sizes, presence of technical artefacts, and heterogeneity in population structure. We hypothesized that rare variant burden over a large number of genes can be combined into predictive rare variant genetic risk score (RVGRS). We propose a novel method (RV-EXCALIBER) that leverages summary-level data from a large public exome sequencing database (gnomAD) as controls and robustly calibrates rare variant burden to account for the aforementioned biases. A RVGRS was found to strongly associate with coronary artery disease (CAD) in European and South Asian populations. Calibrated RVGRS capture the aggregate effect of rare variants through a polygenic model of inheritance, identifies 1.5% of the population with substantial risk of early CAD, and confers risk even when adjusting for known Mendelian CAD genes, clinical risk factors, and common variant gene scores.


2016 ◽  
Author(s):  
Xin Li ◽  
Yungil Kim ◽  
Emily K. Tsang ◽  
Joe R. Davis ◽  
Farhan N. Damani ◽  
...  

AbstractRare genetic variants are abundant in humans yet their functional effects are often unknown and challenging to predict. The Genotype-Tissue Expression (GTEx) project provides a unique opportunity to identify the functional impact of rare variants through combined analyses of whole genomes and multi-tissue RNA-sequencing data. Here, we identify gene expression outliers, or individuals with extreme expression levels, across 44 human tissues, and characterize the contribution of rare variation to these large changes in expression. We find 58% of underexpression and 28% of overexpression outliers have underlying rare variants compared with 9% of non-outliers. Large expression effects are enriched for proximal loss-of-function, splicing, and structural variants, particularly variants near the TSS and at evolutionarily conserved sites. Known disease genes have expression outliers, underscoring that rare variants can contribute to genetic disease risk. To prioritize functional rare regulatory variants, we develop RIVER, a Bayesian approach that integrates RNA and whole genome sequencing data from the same individual. RIVER predicts functional variants significantly better than models using genomic annotations alone, and is an extensible tool for personal genome interpretation. Overall, we demonstrate that rare variants contribute to large gene expression changes across tissues with potential health consequences, and provide an integrative method for interpreting rare variants in individual genomes.


2009 ◽  
Vol 10 (1) ◽  
pp. 29-34 ◽  
Author(s):  
Samantha Manfredi ◽  
Debora Calvi ◽  
Martina del Fiandra ◽  
Nicoletta Botto ◽  
Andrea Biagini ◽  
...  

Author(s):  
Pyry Helkkula ◽  
Tuomo Kiiskinen ◽  
Aki S. Havulinna ◽  
Juha Karjalainen ◽  
Seppo Koskinen ◽  
...  

AbstractProtein-truncating variants (PTVs) affecting dyslipidemia risk may point to therapeutic targets for cardiometabolic disease. Our objective was to identify PTVs that associated with both lipid levels and cardiometabolic disease risk and assess their possible associations with risks of other diseases. To achieve this aim, we leveraged the enrichment of PTVs in the Finnish population and tested the association of low-frequency PTVs in 1,209 genes with serum lipid levels in the Finrisk Study (n = 23,435). We then tested which of the lipid-associated PTVs also associated with risks of cardiometabolic diseases or 2,264 disease endpoints curated in the FinnGen Study (n = 176,899). Three PTVs were associated with both lipid levels and the risk of cardiometabolic disease: triglyceride-lowering variants in ANGPTL8 (−24.0[-30.4 to −16.9] mg/dL per rs760351239-T allele, P = 3.4× 10−9) and ANGPTL4 (−14.4[-18.6 to −9.8] mg/dL per rs746226153-G allele, P = 4.3 × 10−9) and the HDL cholesterol-elevating variant in LIPG (10.2[7.5 to 13.0] mg/dL per rs200435657-A allele, P = 5.0 × 10−13). The risk of type 2 diabetes was lower in carriers of ANGPTL8 (odds ratio [OR] = 0.67[0.47-0.92], P = 0.01), ANGPTL4 (OR = 0.70[0.60-0.82], P = 1.4× 10−5) and LIPG (OR = 0.67[0.48-0.91], P = 0.01) PTVs than in noncarriers. Moreover, the odds of coronary artery disease were 44% lower in carriers of a PTV in ANGPTL8 (OR = 0.56[0.38-0.83], P = 0.004). Finally, the phenome-wide scan of the ANGPTL8 PTV showed a markedly higher associated risk of esophagitis (585 cases, OR = 174.3[17.7-1715.1], P = 9.7 × 10−6) and sensorineural hearing loss (12,250 cases, OR = 2.45[1.63-3.68], P = 1.8 × 10−5). The ANGPTL8 PTV carriers were less likely to use statin therapy (53,518 cases, OR = 0.53[0.41-0.71], P = 1.2 × 10−5). Our findings provide genetic evidence of potential long-term efficacy and safety of therapeutic targeting of dyslipidemias.


Author(s):  
David Curtis

AbstractWeighted burden analysis has been used in exome-sequenced case-control studies to identify genes in which there is an excess of rare and/or functional variants associated with phenotype. Implementation in a ridge regression framework allows simultaneous analysis of all variants along with relevant covariates such as population principal components. In order to apply the approach to a quantitative phenotype, a weighted burden score is derived for each subject and included in a linear regression analysis. The weighting scheme is adjusted in order to apply differential weights to rare and very rare variants and a score is derived based on both the frequency and predicted effect of each variant. When applied to an ethnically heterogeneous dataset consisting of 49,790 exome-sequenced UK Biobank subjects and using BMI as the phenotype the method produces a very inflated test statistic. However this is almost completely corrected by including 20 population principal components as covariates. When this is done the top 30 genes include a few which are quite plausibly associated with the phenotype, including LYPLAL1 and NSDHL. This approach offers a way to carry out gene-based analyses of rare variants identified by exome sequencing in heterogeneous datasets without requiring that data from ethnic minority subjects be discarded. This research has been conducted using the UK Biobank Resource.


Sign in / Sign up

Export Citation Format

Share Document