scholarly journals Functional Interpretation of Genetic Variants Using Deep Learning Predicts Impact on Epigenome

2018 ◽  
Author(s):  
Gabriel E. Hoffman ◽  
Eric E. Schadt ◽  
Panos Roussos

ABSTRACTIdentifying causal variants underling disease risk and adoption of personalized medicine are currently limited by the challenge of interpreting the functional consequences of genetic variants. Predicting the functional effects of disease-associated protein-coding variants is increasingly routine. Yet the vast majority of risk variants are non-coding, and predicting the functional consequence and prioritizing variants for functional validation remains a major challenge. Here we develop a deep learning model to accurately predict locus-specific signals from four epigenetic assays using only DNA sequence as input. Given the predicted epigenetic signal from DNA sequence for the reference and alternative alleles at a given locus, we generate a score of the predicted epigenetic consequences for 438 million variants. These impact scores are assay-specific, are predictive of allele-specific transcription factor binding and are enriched for variants associated with gene expression and disease risk. Nucleotide-level functional consequence scores for non-coding variants can refine the mechanism of known causal variants, identify novel risk variants and prioritize downstream experiments.

2019 ◽  
Vol 47 (20) ◽  
pp. 10597-10611 ◽  
Author(s):  
Gabriel E Hoffman ◽  
Jaroslav Bendl ◽  
Kiran Girdhar ◽  
Eric E Schadt ◽  
Panos Roussos

Abstract Identifying functional variants underlying disease risk and adoption of personalized medicine are currently limited by the challenge of interpreting the functional consequences of genetic variants. Predicting the functional effects of disease-associated protein-coding variants is increasingly routine. Yet, the vast majority of risk variants are non-coding, and predicting the functional consequence and prioritizing variants for functional validation remains a major challenge. Here, we develop a deep learning model to accurately predict locus-specific signals from four epigenetic assays using only DNA sequence as input. Given the predicted epigenetic signal from DNA sequence for the reference and alternative alleles at a given locus, we generate a score of the predicted epigenetic consequences for 438 million variants observed in previous sequencing projects. These impact scores are assay-specific, are predictive of allele-specific transcription factor binding and are enriched for variants associated with gene expression and disease risk. Nucleotide-level functional consequence scores for non-coding variants can refine the mechanism of known functional variants, identify novel risk variants and prioritize downstream experiments.


Science ◽  
2020 ◽  
Vol 369 (6503) ◽  
pp. 561-565 ◽  
Author(s):  
Siwei Zhang ◽  
Hanwen Zhang ◽  
Yifan Zhou ◽  
Min Qiao ◽  
Siming Zhao ◽  
...  

Most neuropsychiatric disease risk variants are in noncoding sequences and lack functional interpretation. Because regulatory sequences often reside in open chromatin, we reasoned that neuropsychiatric disease risk variants may affect chromatin accessibility during neurodevelopment. Using human induced pluripotent stem cell (iPSC)–derived neurons that model developing brains, we identified thousands of genetic variants exhibiting allele-specific open chromatin (ASoC). These neuronal ASoCs were partially driven by altered transcription factor binding, overrepresented in brain gene enhancers and expression quantitative trait loci, and frequently associated with distal genes through chromatin contacts. ASoCs were enriched for genetic variants associated with brain disorders, enabling identification of functional schizophrenia risk variants and their cis-target genes. This study highlights ASoC as a functional mechanism of noncoding neuropsychiatric risk variants, providing a powerful framework for identifying disease causal variants and genes.


PLoS ONE ◽  
2020 ◽  
Vol 15 (11) ◽  
pp. e0241215
Author(s):  
Seung Mi Lee ◽  
Yoomi Park ◽  
Young Ju Kim ◽  
Han-Sung Hwang ◽  
Heewon Seo ◽  
...  

Introduction Ritodrine is one of the most commonly used tocolytics in preterm labor, acting as a ß2-adrenergic agonist that reduces intracellular calcium levels and prevents myometrial activation. Ritodrine infusion can result in serious maternal complications, and pulmonary edema is a particular concern among these. The cause of pulmonary edema following ritodrine treatment is multifactorial; however, the contributing genetic factors remain poorly understood. This study investigates the genetic variants associated with ritodrine-induced pulmonary edema. Methods In this case-control study, 16 patients who developed pulmonary edema during ritodrine infusion [case], and 16 pregnant women who were treated with ritodrine and did not develop pulmonary edema [control] were included. The control pregnant women were selected after matching for plurality and gestational age at the time of tocolytic use. Maternal blood was collected during admission for tocolytic treatment, and whole exome sequencing was performed with the stored blood samples. Results Gene-wise variant burden (GVB) analysis resulted in a total of 71 candidate genes by comparing the cumulative effects of multiple coding variants for 19729 protein-coding genes between the patients with pulmonary edema and the matched controls. Subsequent data analysis selected only the statistically significant and deleterious variants compatible with ritodrine-induced pulmonary edema. Two final candidate variants in CPT2 and ADRA1A were confirmed by Sanger sequencing. Conclusions We identified new potential variants in genes that play a role in cyclic adenosine monophosphate (cAMP)/protein kinase A (PKA) regulation, which supports their putative involvement in the predisposition to ritodrine-induced pulmonary edema in pregnant women.


2017 ◽  
Vol 242 (13) ◽  
pp. 1325-1334 ◽  
Author(s):  
Yizhou Zhu ◽  
Cagdas Tazearslan ◽  
Yousin Suh

Genome-wide association studies have shown that the far majority of disease-associated variants reside in the non-coding regions of the genome, suggesting that gene regulatory changes contribute to disease risk. To identify truly causal non-coding variants and their affected target genes remains challenging but is a critical step to translate the genetic associations to molecular mechanisms and ultimately clinical applications. Here we review genomic/epigenomic resources and in silico tools that can be used to identify causal non-coding variants and experimental strategies to validate their functionalities. Impact statement Most signals from genome-wide association studies (GWASs) map to the non-coding genome, and functional interpretation of these associations remained challenging. We reviewed recent progress in methodologies of studying the non-coding genome and argued that no single approach allows one to effectively identify the causal regulatory variants from GWAS results. By illustrating the advantages and limitations of each method, our review potentially provided a guideline for taking a combinatorial approach to accurately predict, prioritize, and eventually experimentally validate the causal variants.


2021 ◽  
Author(s):  
Thanh Thanh Le Nguyen ◽  
Huanyao Gao ◽  
Duan Liu ◽  
Zhenqing Ye ◽  
Jeong-Heon Lee ◽  
...  

AbstractUnderstanding the function of non-coding genetic variants represents a formidable challenge for biomedicine. We previously identified genetic variants that influence gene expression only after exposure to a hormone or drug. Using glucocorticoid signaling as a model system, we have now demonstrated, in a genome-wide manner, that exposure to glucocorticoids triggered disease risk variants with previously unclear function to influence the expression of genes involved in autoimmunity, metabolic and mood disorders, osteoporosis and cancer. Integrating a series of genomic and epigenomic assays, we identified the cis-regulatory elements and 3-dimensional interactions underlying the ligand-dependent associations between those genetic variants and distant risk genes. These observations increase our understanding of mechanisms of non-coding genetic variant-chemical environment interactions and advance the fine-mapping of disease risk and pharmacogenomic loci.One Sentence SummaryGenomic and epigenomic fine-mapping of ligand-dependent genetic variants unmasks novel disease risk genes


2018 ◽  
Vol 78 (4) ◽  
pp. 446-453 ◽  
Author(s):  
Yukinori Okada ◽  
Stephen Eyre ◽  
Akari Suzuki ◽  
Yuta Kochi ◽  
Kazuhiko Yamamoto

Study of the genetics of rheumatoid arthritis (RA) began about four decades ago with the discovery of HLA-DRB1. Since the beginning of this century, a number of non-HLA risk loci have been identified through genome-wide association studies (GWAS). We now know that over 100 loci are associated with RA risk. Because genetic information implies a clear causal relationship to the disease, research into the pathogenesis of RA should be promoted. However, only 20% of GWAS loci contain coding variants, with the remaining variants occurring in non-coding regions, and therefore, the majority of causal genes and causal variants remain to be identified. The use of epigenetic studies, high-resolution mapping of open chromatin, chromosomal conformation technologies and other approaches could identify many of the missing links between genetic risk variants and causal genetic components, thus expanding our understanding of RA genetics.


2021 ◽  
Vol 11 (3) ◽  
pp. 332
Author(s):  
Rachel Raybould ◽  
Rebecca Sims

Sporadic Alzheimer’s disease (AD) is a complex genetic disease, and the leading cause of dementia worldwide. Over the past 3 decades, extensive pioneering research has discovered more than 70 common and rare genetic risk variants. These discoveries have contributed massively to our understanding of the pathogenesis of AD but approximately half of the heritability for AD remains unaccounted for. There are regions of the genome that are not assayed by mainstream genotype and sequencing technology. These regions, known as the Dark Genome, often harbour large structural DNA variants that are likely relevant to disease risk. Here, we describe the dark genome and review current technological and bioinformatics advances that will enable researchers to shed light on these hidden regions of the genome. We highlight the potential importance of the hidden genome in complex disease and how these strategies will assist in identifying the missing heritability of AD. Identification of novel protein-coding structural variation that increases risk of AD will open new avenues for translational research and new drug targets that have the potential for clinical benefit to delay or even prevent clinical symptoms of disease.


2020 ◽  
Author(s):  
Nima C. Emami ◽  
Taylor B. Cavazos ◽  
Sara R. Rashkin ◽  
Clinton L. Cario ◽  
Rebecca E. Graff ◽  
...  

ABSTRACTThe potential association between rare germline genetic variants and prostate cancer (PrCa) susceptibility has been understudied due to challenges with assessing rare variation. Furthermore, although common risk variants for PrCa have shown limited individual effect sizes, their cumulative effect may be of similar magnitude as high penetrance mutations. To identify rare variants associated with PrCa susceptibility, and better characterize the mechanisms and cumulative disease risk associated with common risk variants, we analyzed large population-based cohorts, custom genotyping microarrays, and imputation reference panels in an integrative study of PrCa genetic etiology. In particular, 11,649 men (6,196 PrCa cases, 5,453 controls) of European ancestry from the Kaiser Permanente Research Program on Genes, Environment and Health, ProHealth Study, and California Men’s Health Study were genotyped and meta-analyzed with 196,269 European-ancestry male subjects (7,917 PrCa cases, 188,352 controls) from the UK Biobank. Six novel loci were genome-wide significant in our meta-analysis, including two rare variants (minor allele frequency < 0.01, at 3p21.31 and 8p12). Gene-based rare variant tests implicated a previously discovered PrCa gene (HOXB13) as well as a novel candidate (ILDR1) highly expressed in prostate tissue. Haplotypic patterns of long-range linkage disequilibrium were observed for rare genetic variants at HOXB13 and other loci, reflecting their evolutionary history. Furthermore, a polygenic risk score (PRS) of 187 known, largely common PrCa variants was strongly associated with risk in non-Hispanic whites (90th vs. 10th decile OR = 7.66, P = 1.80*10-239). Many of the 187 variants exhibited functional signatures of gene expression regulation or transcription factor binding, including a six-fold difference in log-probability of Androgen Receptor binding at the variant rs2680708 (17q22). Our finding of two novel rare variants associated with PrCa should motivate further consideration of the role of low frequency polymorphisms in PrCa, while the considerable effect of PrCa PRS profiles should prompt discussion of their role in clinical practice.


2021 ◽  
Author(s):  
Lambert Moyon ◽  
Camille Berthelot ◽  
Alexandra Louis ◽  
Nga Thi Thuy Nguyen ◽  
Hugues Roest Crollius

Whole genome sequencing is increasingly used to diagnose medical conditions of genetic origin. While both coding and non-coding DNA variants contribute to a wide range of diseases, most patients who receive a WGS-based diagnosis today harbour a protein-coding mutation. Functional interpretation and prioritization of non-coding variants represents a persistent challenge, and disease-causing non-coding variants remain largely unidentified. Depending on the disease, WGS fails to identify a candidate variant in 20-80% of patients, severely limiting the usefulness of sequencing for personalised medicine. Here we present FINSURF, a machine-learning approach to predict the functional impact of non-coding variants in regulatory regions. FINSURF outperforms state-of-the-art methods, owing to control optimisation during training. In addition to ranking candidate variants, FINSURF also delivers diagnostic information on functional consequences of mutations. We applied FINSURF to a diverse set of 30 diseases with described causative non-coding mutations, and correctly identified the disease-causative non-coding variant within the ten top hits in 22 cases. FINSURF is implemented as an online server to as well as custom browser tracks, and provides a quick and efficient solution to prioritize candidate non-coding variants in realistic clinical settings.


2017 ◽  
Author(s):  
Margret R. Hoehe ◽  
Ralf Herwig ◽  
Qing Mao ◽  
Brock A. Peters ◽  
Radoje Drmanac ◽  
...  

AbstractTo fully understand human genetic variation, one must assess the specific distribution of variants between the two chromosomal homologues of genes, and any functional units of interest, as the phase of variants can significantly impact gene function and phenotype. To this end, we have systematically analyzed 18,121 autosomal protein-coding genes in 1,092 statistically phased genomes from the 1000 Genomes Project, and an unprecedented number of 184 experimentally phased genomes from the Personal Genome Project. Here we show that mutations predicted to functionally alter the protein, and coding variants as a whole, are not randomly distributed between the two homologues of a gene, but do occur significantly more frequently in cis-than trans-configurations, with cis/trans ratios of ∼60:40. Significant cis-abundance was observed in virtually all individual genomes in all populations. Nearly all variable genes exhibited either cis, or trans configurations of protein-altering mutations in significant excess, allowing distinction of cis- and trans-abundant genes. These common patterns of phase were largely constituted by a shared, global set of phase-sensitive genes. We show significant enrichment of this global set with gene sets indicating its involvement in adaptation and evolution. Moreover, cis- and trans-abundant genes were found functionally distinguishable, and exhibited strikingly different distributional patterns of protein-altering mutations. This work establishes common patterns of phase as key characteristics of diploid human exomes and provides evidence for their potential functional significance. Thus, it highlights the importance of phase for the interpretation of protein-coding genetic variation, challenging the current conceptual and functional interpretation of autosomal genes.


Sign in / Sign up

Export Citation Format

Share Document