scholarly journals Functional interpretation of genetic variants using deep learning predicts impact on chromatin accessibility and histone modification

2019 ◽  
Vol 47 (20) ◽  
pp. 10597-10611 ◽  
Author(s):  
Gabriel E Hoffman ◽  
Jaroslav Bendl ◽  
Kiran Girdhar ◽  
Eric E Schadt ◽  
Panos Roussos

Abstract Identifying functional variants underlying disease risk and adoption of personalized medicine are currently limited by the challenge of interpreting the functional consequences of genetic variants. Predicting the functional effects of disease-associated protein-coding variants is increasingly routine. Yet, the vast majority of risk variants are non-coding, and predicting the functional consequence and prioritizing variants for functional validation remains a major challenge. Here, we develop a deep learning model to accurately predict locus-specific signals from four epigenetic assays using only DNA sequence as input. Given the predicted epigenetic signal from DNA sequence for the reference and alternative alleles at a given locus, we generate a score of the predicted epigenetic consequences for 438 million variants observed in previous sequencing projects. These impact scores are assay-specific, are predictive of allele-specific transcription factor binding and are enriched for variants associated with gene expression and disease risk. Nucleotide-level functional consequence scores for non-coding variants can refine the mechanism of known functional variants, identify novel risk variants and prioritize downstream experiments.

2018 ◽  
Author(s):  
Gabriel E. Hoffman ◽  
Eric E. Schadt ◽  
Panos Roussos

ABSTRACTIdentifying causal variants underling disease risk and adoption of personalized medicine are currently limited by the challenge of interpreting the functional consequences of genetic variants. Predicting the functional effects of disease-associated protein-coding variants is increasingly routine. Yet the vast majority of risk variants are non-coding, and predicting the functional consequence and prioritizing variants for functional validation remains a major challenge. Here we develop a deep learning model to accurately predict locus-specific signals from four epigenetic assays using only DNA sequence as input. Given the predicted epigenetic signal from DNA sequence for the reference and alternative alleles at a given locus, we generate a score of the predicted epigenetic consequences for 438 million variants. These impact scores are assay-specific, are predictive of allele-specific transcription factor binding and are enriched for variants associated with gene expression and disease risk. Nucleotide-level functional consequence scores for non-coding variants can refine the mechanism of known causal variants, identify novel risk variants and prioritize downstream experiments.


Science ◽  
2020 ◽  
Vol 369 (6503) ◽  
pp. 561-565 ◽  
Author(s):  
Siwei Zhang ◽  
Hanwen Zhang ◽  
Yifan Zhou ◽  
Min Qiao ◽  
Siming Zhao ◽  
...  

Most neuropsychiatric disease risk variants are in noncoding sequences and lack functional interpretation. Because regulatory sequences often reside in open chromatin, we reasoned that neuropsychiatric disease risk variants may affect chromatin accessibility during neurodevelopment. Using human induced pluripotent stem cell (iPSC)–derived neurons that model developing brains, we identified thousands of genetic variants exhibiting allele-specific open chromatin (ASoC). These neuronal ASoCs were partially driven by altered transcription factor binding, overrepresented in brain gene enhancers and expression quantitative trait loci, and frequently associated with distal genes through chromatin contacts. ASoCs were enriched for genetic variants associated with brain disorders, enabling identification of functional schizophrenia risk variants and their cis-target genes. This study highlights ASoC as a functional mechanism of noncoding neuropsychiatric risk variants, providing a powerful framework for identifying disease causal variants and genes.


2017 ◽  
Vol 242 (13) ◽  
pp. 1325-1334 ◽  
Author(s):  
Yizhou Zhu ◽  
Cagdas Tazearslan ◽  
Yousin Suh

Genome-wide association studies have shown that the far majority of disease-associated variants reside in the non-coding regions of the genome, suggesting that gene regulatory changes contribute to disease risk. To identify truly causal non-coding variants and their affected target genes remains challenging but is a critical step to translate the genetic associations to molecular mechanisms and ultimately clinical applications. Here we review genomic/epigenomic resources and in silico tools that can be used to identify causal non-coding variants and experimental strategies to validate their functionalities. Impact statement Most signals from genome-wide association studies (GWASs) map to the non-coding genome, and functional interpretation of these associations remained challenging. We reviewed recent progress in methodologies of studying the non-coding genome and argued that no single approach allows one to effectively identify the causal regulatory variants from GWAS results. By illustrating the advantages and limitations of each method, our review potentially provided a guideline for taking a combinatorial approach to accurately predict, prioritize, and eventually experimentally validate the causal variants.


2020 ◽  
Author(s):  
Hai Yang ◽  
Rui Chen ◽  
Quan Wang ◽  
Qiang Wei ◽  
Ying Ji ◽  
...  

Abstract Analysis of whole genome-sequencing (WGS) for genetics of disease is still a challenge due to lack of accurate functional annotation of noncoding variants, especially the rare ones. As eQTLs have been extensively implicated in genetics of human diseases, we hypothesize that noncoding rare variants discovered in WGS play a regulatory role in predisposing disease risk. With thousands of tissue- and cell type-specific epigenomic features, we propose TVAR, a multi-label learning based deep neural network that predicts the functionality of noncoding variants in the genome based on eQTLs across 49 human tissues in GTEx. TVAR learns the relationships between high-dimensional epigenomics and eQTLs across tissues, taking the correlation among tissues into account to learn shared and tissue-specific eQTL effects. As a result, TVAR outputs tissue-specific annotations, with an average of 0.77 across these tissues. We evaluate TVAR’s performance on four complex diseases (coronary artery disease, breast cancer, Type 2 diabetes, and Schizophrenia), using TVAR’s tissue-specific annotations, and observe its superior performance in predicting functional variants for both common and rare variants, compared to five existing state-of-the-art tools. We further evaluate TVAR’s G-score, a scoring scheme across all tissues, on ClinVar, fine-mapped GWAS loci, Massive Parallel Reporter Assay (MPRA) validated variants, and observe consistently better performance of TVAR compared to other competing tools.


2021 ◽  
Author(s):  
Thanh Thanh Le Nguyen ◽  
Huanyao Gao ◽  
Duan Liu ◽  
Zhenqing Ye ◽  
Jeong-Heon Lee ◽  
...  

AbstractUnderstanding the function of non-coding genetic variants represents a formidable challenge for biomedicine. We previously identified genetic variants that influence gene expression only after exposure to a hormone or drug. Using glucocorticoid signaling as a model system, we have now demonstrated, in a genome-wide manner, that exposure to glucocorticoids triggered disease risk variants with previously unclear function to influence the expression of genes involved in autoimmunity, metabolic and mood disorders, osteoporosis and cancer. Integrating a series of genomic and epigenomic assays, we identified the cis-regulatory elements and 3-dimensional interactions underlying the ligand-dependent associations between those genetic variants and distant risk genes. These observations increase our understanding of mechanisms of non-coding genetic variant-chemical environment interactions and advance the fine-mapping of disease risk and pharmacogenomic loci.One Sentence SummaryGenomic and epigenomic fine-mapping of ligand-dependent genetic variants unmasks novel disease risk genes


2021 ◽  
Author(s):  
Tian Zhou ◽  
Xinyi Zhu ◽  
Zhizhong Ye ◽  
Yongfei Wang ◽  
Chao Yao ◽  
...  

Dysregulated transcription factors represent a major class of drug targets that mediate the abnormal expression of many critical genes involved in SLE and other autoimmune diseases. Although strong evidence suggests that natural human genetic variation affects basal and inducible gene expression, it is still a considerable challenge to establish a biological link between GWAS-identified non-coding genetic risk variants and their regulated gene targets. Here, we combine genetic data, epigenomic data, and CRISPR activation (CRISPRa) assays to screen for functional variants regulating IRF8 expression. Using CRISPR-mediated deletion and 3D chromatin structure analysis, we demonstrate that the locus containing rs2280381 is a cell-type-specific distal enhancer for IRF8 that spatially interacts with the IRF8 promoter. Further, rs2280381 mediates IRF8 expression through enhancer RNA AC092723.1, which recruits TET1 to the IRF8 promoter to modulate IRF8 expression by affecting methylation levels. The alleles of rs2280381 modulate PU.1 binding and chromatin state to differentially regulate AC092723.1 and IRF8 expression. Our work illustrates a strategy to define the functional genetic variants modulating transcription factor gene expression levels and identifies the biologic mechanism by which autoimmune diseases risk genetic variants contribute to the pathogenesis of disease.


2020 ◽  
Author(s):  
Nima C. Emami ◽  
Taylor B. Cavazos ◽  
Sara R. Rashkin ◽  
Clinton L. Cario ◽  
Rebecca E. Graff ◽  
...  

ABSTRACTThe potential association between rare germline genetic variants and prostate cancer (PrCa) susceptibility has been understudied due to challenges with assessing rare variation. Furthermore, although common risk variants for PrCa have shown limited individual effect sizes, their cumulative effect may be of similar magnitude as high penetrance mutations. To identify rare variants associated with PrCa susceptibility, and better characterize the mechanisms and cumulative disease risk associated with common risk variants, we analyzed large population-based cohorts, custom genotyping microarrays, and imputation reference panels in an integrative study of PrCa genetic etiology. In particular, 11,649 men (6,196 PrCa cases, 5,453 controls) of European ancestry from the Kaiser Permanente Research Program on Genes, Environment and Health, ProHealth Study, and California Men’s Health Study were genotyped and meta-analyzed with 196,269 European-ancestry male subjects (7,917 PrCa cases, 188,352 controls) from the UK Biobank. Six novel loci were genome-wide significant in our meta-analysis, including two rare variants (minor allele frequency < 0.01, at 3p21.31 and 8p12). Gene-based rare variant tests implicated a previously discovered PrCa gene (HOXB13) as well as a novel candidate (ILDR1) highly expressed in prostate tissue. Haplotypic patterns of long-range linkage disequilibrium were observed for rare genetic variants at HOXB13 and other loci, reflecting their evolutionary history. Furthermore, a polygenic risk score (PRS) of 187 known, largely common PrCa variants was strongly associated with risk in non-Hispanic whites (90th vs. 10th decile OR = 7.66, P = 1.80*10-239). Many of the 187 variants exhibited functional signatures of gene expression regulation or transcription factor binding, including a six-fold difference in log-probability of Androgen Receptor binding at the variant rs2680708 (17q22). Our finding of two novel rare variants associated with PrCa should motivate further consideration of the role of low frequency polymorphisms in PrCa, while the considerable effect of PrCa PRS profiles should prompt discussion of their role in clinical practice.


Nature ◽  
2017 ◽  
Vol 550 (7675) ◽  
pp. 239-243 ◽  
Author(s):  
Xin Li ◽  
◽  
Yungil Kim ◽  
Emily K. Tsang ◽  
Joe R. Davis ◽  
...  

Abstract Rare genetic variants are abundant in humans and are expected to contribute to individual disease risk1,2,3,4. While genetic association studies have successfully identified common genetic variants associated with susceptibility, these studies are not practical for identifying rare variants1,5. Efforts to distinguish pathogenic variants from benign rare variants have leveraged the genetic code to identify deleterious protein-coding alleles1,6,7, but no analogous code exists for non-coding variants. Therefore, ascertaining which rare variants have phenotypic effects remains a major challenge. Rare non-coding variants have been associated with extreme gene expression in studies using single tissues8,9,10,11, but their effects across tissues are unknown. Here we identify gene expression outliers, or individuals showing extreme expression levels for a particular gene, across 44 human tissues by using combined analyses of whole genomes and multi-tissue RNA-sequencing data from the Genotype-Tissue Expression (GTEx) project v6p release12. We find that 58% of underexpression and 28% of overexpression outliers have nearby conserved rare variants compared to 8% of non-outliers. Additionally, we developed RIVER (RNA-informed variant effect on regulation), a Bayesian statistical model that incorporates expression data to predict a regulatory effect for rare variants with higher accuracy than models using genomic annotations alone. Overall, we demonstrate that rare variants contribute to large gene expression changes across tissues and provide an integrative method for interpretation of rare variants in individual genomes.


2021 ◽  
pp. jrheum.201684
Author(s):  
Aichang Ji ◽  
Amara Shaukat ◽  
Riku Takei ◽  
Matthew Bixley ◽  
Murray Cadzow ◽  
...  

Objective The Māori and Pacific (Polynesian) population of Aotearoa New Zealand (NZ) has a high prevalence of gout. Our aim was to identify potentially functional missense genetic variants in candidate inflammatory genes amplified in frequency that may underlie the increased prevalence of gout in Polynesian populations. Methods A list of 712 inflammatory disease-related genes was generated. An in silico targeted exome set was extracted from whole genome sequencing data in people with gout of various ancestral groups (Polynesian, European, East Asian; n = 55, 780, 135, respectively) to identify Polynesian-amplified common missense variants (AF > 0.05). Candidate functional variants were tested for association with gout by multivariable-adjusted regression analysis in 2,528 individuals of Polynesian ancestry. Results We identified 26 variants common in the Polynesian population and uncommon in the European and East Asian populations. Three of the 26 population-specific variants were nominally associated with the risk of gout (rs1635712, KIAA0319, ORmeta = 1.28, Pmeta = 0.028; rs16869924, CLNK, ORmeta = 1.37, Pmeta = 0.0017; rs2070025, FGA, ORmeta = 1.34, Pmeta = 0.017). The CLNK variant, within the established SLC2A9 gout locus, was genetically-independent of the association signal at SLC2A9. Conclusion We provide nominal evidence for the existence of population-amplified genetic variants conferring risk of gout in Polynesian populations. Polymorphisms in CLNK have previously been associated with gout in other populations, supporting our evidence for association of this gene with gout.


Science ◽  
2019 ◽  
Vol 366 (6469) ◽  
pp. 1134-1139 ◽  
Author(s):  
Alexi Nott ◽  
Inge R. Holtman ◽  
Nicole G. Coufal ◽  
Johannes C. M. Schlachetzki ◽  
Miao Yu ◽  
...  

Noncoding genetic variation is a major driver of phenotypic diversity, but functional interpretation is challenging. To better understand common genetic variation associated with brain diseases, we defined noncoding regulatory regions for major cell types of the human brain. Whereas psychiatric disorders were primarily associated with variants in transcriptional enhancers and promoters in neurons, sporadic Alzheimer’s disease (AD) variants were largely confined to microglia enhancers. Interactome maps connecting disease-risk variants in cell-type–specific enhancers to promoters revealed an extended microglia gene network in AD. Deletion of a microglia-specific enhancer harboring AD-risk variants ablated BIN1 expression in microglia, but not in neurons or astrocytes. These findings revise and expand the list of genes likely to be influenced by noncoding variants in AD and suggest the probable cell types in which they function.


Sign in / Sign up

Export Citation Format

Share Document