scholarly journals A Systemic Analysis of Transcriptomic and Epigenomic Data To Reveal Regulation Patterns for Complex Disease

2017 ◽  
Vol 7 (7) ◽  
pp. 2271-2279 ◽  
Author(s):  
Chao Xu ◽  
Ji-Gang Zhang ◽  
Dongdong Lin ◽  
Lan Zhang ◽  
Hui Shen ◽  
...  

Abstract Integrating diverse genomics data can provide a global view of the complex biological processes related to the human complex diseases. Although substantial efforts have been made to integrate different omics data, there are at least three challenges for multi-omics integration methods: (i) How to simultaneously consider the effects of various genomic factors, since these factors jointly influence the phenotypes; (ii) How to effectively incorporate the information from publicly accessible databases and omics datasets to fully capture the interactions among (epi)genomic factors from diverse omics data; and (iii) Until present, the combination of more than two omics datasets has been poorly explored. Current integration approaches are not sufficient to address all of these challenges together. We proposed a novel integrative analysis framework by incorporating sparse model, multivariate analysis, Gaussian graphical model, and network analysis to address these three challenges simultaneously. Based on this strategy, we performed a systemic analysis for glioblastoma multiforme (GBM) integrating genome-wide gene expression, DNA methylation, and miRNA expression data. We identified three regulatory modules of genomic factors associated with GBM survival time and revealed a global regulatory pattern for GBM by combining the three modules, with respect to the common regulatory factors. Our method can not only identify disease-associated dysregulated genomic factors from different omics, but more importantly, it can incorporate the information from publicly accessible databases and omics datasets to infer a comprehensive interaction map of all these dysregulated genomic factors. Our work represents an innovative approach to enhance our understanding of molecular genomic mechanisms underlying human complex diseases.

2019 ◽  
Author(s):  
Kushal K. Dey ◽  
Bryce Van de Geijn ◽  
Samuel Sungil Kim ◽  
Farhad Hormozdiari ◽  
David R. Kelley ◽  
...  

AbstractDeep learning models have shown great promise in predicting genome-wide regulatory effects from DNA sequence, but their informativeness for human complex diseases and traits is not fully understood. Here, we evaluate the disease informativeness of allelic-effect annotations (absolute value of the predicted difference between reference and variant alleles) constructed using two previously trained deep learning models, DeepSEA and Basenji. We apply stratified LD score regression (S-LDSC) to 41 independent diseases and complex traits (average N=320K) to evaluate each annotation’s informativeness for disease heritability conditional on a broad set of coding, conserved, regulatory and LD-related annotations from the baseline-LD model and other sources; as a secondary metric, we also evaluate the accuracy of models that incorporate deep learning annotations in predicting disease-associated or fine-mapped SNPs. We aggregated annotations across all tissues (resp. blood cell types or brain tissues) in meta-analyses across all 41 traits (resp. 11 blood-related traits or 8 brain-related traits). These allelic-effect annotations were highly enriched for disease heritability, but produced only limited conditionally significant results – only Basenji-H3K4me3 in meta-analyses across all 41 traits and brain-specific Basenji-H3K4me3 in meta-analyses across 8 brain-related traits. We conclude that deep learning models are yet to achieve their full potential to provide considerable amount of unique information for complex disease, and that the informativeness of deep learning models for disease beyond established functional annotations cannot be inferred from metrics based on their accuracy in predicting regulatory annotations.


2019 ◽  
Author(s):  
Fengzhe Xu ◽  
Yuanqing Fu ◽  
Ting-yu Sun ◽  
Zengliang Jiang ◽  
Zelei Miao ◽  
...  

AbstractThere is increasing interest about the interplay between host genetics and gut microbiome on human complex diseases, with prior evidence mainly derived from animal models. In addition, the shared and distinct microbiome features among human complex diseases remain largely unclear. We performed a microbiome genome-wide association study to identify host genetic variants associated with gut microbiome in a Chinese population with 1475 participants. We then conducted bi-directional Mendelian randomization analyses to examine the potential causal associations between gut microbiome and human complex diseases. We found that Saccharibacteria (also known as TM7 phylum) could potentially improve renal function by affecting renal function biomarkers (i.e., creatinine and estimated glomerular filtration rate). In contrast, atrial fibrillation, chronic kidney disease and prostate cancer, as predicted by the host genetics, had potential causal effect on gut microbiome. Further disease-microbiome feature analysis suggested that gut microbiome features revealed novel relationship among human complex diseases. These results suggest that different human complex diseases share common and distinct gut microbiome features, which may help re-shape our understanding about the disease etiology in humans.


2016 ◽  
Author(s):  
Qiongshi Lu ◽  
Ryan L. Powles ◽  
Sarah Abdallah ◽  
Derek Ou ◽  
Qian Wang ◽  
...  

AbstractContinuing efforts from large international consortia have made genome-wide epigenomic and transcriptomic annotation data publicly available for a variety of cell and tissue types. However, synthesis of these datasets into effective summary metrics to characterize the functional non-coding genome remains a challenge. Here, we present GenoSkyline-Plus, an extension of our previous work through integration of an expanded set of epigenomic and transcriptomic annotations to produce high-resolution, single tissue annotations. After validating our annotations with a catalog of tissue-specific non-coding elements previously identified in the literature, we apply our method using data from 127 different cell and tissue types to present an atlas of heritability enrichment across 45 different GWAS traits. We show that broader organ system categories (e.g. immune system) increase statistical power in identifying biologically relevant tissue types for complex diseases while annotations of individual cell types (e.g. monocytes or B-cells) provide deeper insights into disease etiology. Additionally, we use our GenoSkyline-Plus annotations in an in-depth case study of late-onset Alzheimer’s disease (LOAD). Our analyses suggest a strong connection between LOAD heritability and genetic variants contained in regions of the genome functional in monocytes. Furthermore, we show that LOAD shares a similar localization of SNPs to monocyte-functional regions with Parkinson’s disease. Overall, we demonstrate that integrated genome annotations at the single tissue level provide a valuable tool for understanding the etiology of complex human diseases. Our GenoSkyline-Plus annotations are freely available at http://genocanyon.med.yale.edu/GenoSkyline.Author SummaryAfter years of community efforts, many experimental and computational approaches have been developed and applied for functional annotation of the human genome, yet proper annotation still remains challenging, especially in non-coding regions. As complex disease research rapidly advances, increasing evidence suggests that non-coding regulatory DNA elements may be the primary regions harboring risk variants in human complex diseases. In this paper, we introduce GenoSkyline-Plus, a principled annotation framework to identify tissue and cell type-specific functional regions in the human genome through integration of diverse high-throughput epigenomic and transcriptomic data. Through validation of known non-coding tissue-specific regulatory regions, enrichment analyses on 45 complex traits, and an in-depth case study of neurodegenerative diseases, we demonstrate the ability of GenoSkyline-Plus to accurately identify tissue-specific functionality in the human genome and provide unbiased, genome-wide insights into the genetic basis of human complex diseases.


2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Kushal K. Dey ◽  
Bryce van de Geijn ◽  
Samuel Sungil Kim ◽  
Farhad Hormozdiari ◽  
David R. Kelley ◽  
...  

Abstract Deep learning models have shown great promise in predicting regulatory effects from DNA sequence, but their informativeness for human complex diseases is not fully understood. Here, we evaluate genome-wide SNP annotations from two previous deep learning models, DeepSEA and Basenji, by applying stratified LD score regression to 41 diseases and traits (average N = 320K), conditioning on a broad set of coding, conserved and regulatory annotations. We aggregated annotations across all (respectively blood or brain) tissues/cell-types in meta-analyses across all (respectively 11 blood or 8 brain) traits. The annotations were highly enriched for disease heritability, but produced only limited conditionally significant results: non-tissue-specific and brain-specific Basenji-H3K4me3 for all traits and brain traits respectively. We conclude that deep learning models have yet to achieve their full potential to provide considerable unique information for complex disease, and that their conditional informativeness for disease cannot be inferred from their accuracy in predicting regulatory annotations.


2008 ◽  
Vol 54 (7) ◽  
pp. 1116-1124 ◽  
Author(s):  
Struan F A Grant ◽  
Hakon Hakonarson

Abstract Background: There is a revolution occurring in single nucleotide polymorphism (SNP) genotyping technology, with high-throughput methods now allowing large numbers of SNPs (105–106) to be genotyped in large cohort studies. This has enabled large-scale genome-wide association (GWA) studies in complex diseases, such as diabetes, asthma, and inflammatory bowel disease, to be undertaken for the first time. Content: The GWA approach serves the critical need for a comprehensive and unbiased strategy to identify causal genes related to complex disease, and is rapidly replacing the more traditional candidate gene studies and microsatellite-based linkage mapping approaches that have dominated gene discovery attempts for common diseases. As a consequence of employing array-based technologies, over the last 3 years dramatic discoveries of key variants involved in multiple complex diseases and related traits have been reported in the top scientific literature and, most importantly, have been largely replicated by independent investigator groups. As a consequence, several novel genes have been identified, most notably in the metabolic, cardiovascular, autoimmune, and oncology disease areas, that are clearly rooted in the biology of these disorders. These discoveries have opened up new avenues for investigators to address novel molecular pathways that were not previously linked to or thought of in relation with these diseases. Summary: This review provides a synopsis of recent advances and what we may expect to still emerge from this field.


2020 ◽  
Author(s):  
Kushal K. Dey ◽  
Samuel S. Kim ◽  
Steven Gazal ◽  
Joseph Nasser ◽  
Jesse M. Engreitz ◽  
...  

AbstractDeep learning models have achieved great success in predicting genome-wide regulatory effects from DNA sequence, but recent work has reported that SNP annotations derived from these predictions contribute limited unique information for human complex disease. Here, we explore three integrative approaches to improve the disease informativeness of allelic-effect annotations (predicted difference between reference and variant alleles) constructed using two previously trained deep learning models, DeepSEA and Basenji. First, we employ gradient boosting to learn optimal combinations of deep learning annotations, using (off-chromosome) fine-mapped SNPs and matched control SNPs for training. Second, we improve the specificity of these annotations by restricting them to SNPs implicated by (proximal and distal) SNP-to-gene (S2G) linking strategies, e.g. prioritizing SNPs involved in gene regulation. Third, we predict gene expression (and derive allelic-effect annotations) from deep learning annotations at SNPs implicated by S2G linking strategies — generalizing the previously proposed ExPecto approach, which incorporates deep learning annotations based on distance to TSS. We evaluated these approaches using stratified LD score regression, using functional data in blood and focusing on 11 autoimmune diseases and blood-related traits (average N=306K). We determined that the three approaches produced SNP annotations that were uniquely informative for these diseases/traits, despite the fact that linear combinations of the underlying DeepSEA and Basenji blood annotations were not uniquely informative for these diseases/traits. Our results highlight the benefits of integrating SNP annotations produced by deep learning models with other types of data, including data linking SNPs to genes.


BioTech ◽  
2021 ◽  
Vol 10 (1) ◽  
pp. 3
Author(s):  
Yinhao Du ◽  
Kun Fan ◽  
Xi Lu ◽  
Cen Wu

Gene-environment (G×E) interaction is critical for understanding the genetic basis of complex disease beyond genetic and environment main effects. In addition to existing tools for interaction studies, penalized variable selection emerges as a promising alternative for dissecting G×E interactions. Despite the success, variable selection is limited in terms of accounting for multidimensional measurements. Published variable selection methods cannot accommodate structured sparsity in the framework of integrating multiomics data for disease outcomes. In this paper, we have developed a novel variable selection method in order to integrate multi-omics measurements in G×E interaction studies. Extensive studies have already revealed that analyzing omics data across multi-platforms is not only sensible biologically, but also resulting in improved identification and prediction performance. Our integrative model can efficiently pinpoint important regulators of gene expressions through sparse dimensionality reduction, and link the disease outcomes to multiple effects in the integrative G×E studies through accommodating a sparse bi-level structure. The simulation studies show the integrative model leads to better identification of G×E interactions and regulators than alternative methods. In two G×E lung cancer studies with high dimensional multi-omics data, the integrative model leads to an improved prediction and findings with important biological implications.


Sign in / Sign up

Export Citation Format

Share Document