scholarly journals A deep learning framework for predicting human essential genes from population and functional genomic data

2021 ◽  
Author(s):  
Troy M LaPolice ◽  
Yi-Fei Huang

Being able to predict essential genes intolerant to loss-of-function (LOF) mutations can dramatically improve our ability to identify genes associated with genetic disorders. Numerous computational methods have recently been developed to predict human essential genes from population genomic data; however, the existing methods have limited power in pinpointing short essential genes due to the sparsity of polymorphisms in the human genome. Here we present an evolution-based deep learning model, DeepLOF, which integrates population and functional genomic data to improve gene essentiality prediction. Compared to previous methods, DeepLOF shows unmatched performance in predicting ClinGen haploinsufficient genes, mouse essential genes, and essential genes in human cell lines. Furthermore, DeepLOF discovers 109 potentially essential genes that are too short to be identified by previous methods. Altogether, DeepLOF is a powerful computational method to aid in the discovery of essential genes.

2008 ◽  
Vol 40 (7) ◽  
pp. 854-861 ◽  
Author(s):  
Jun Zhu ◽  
Bin Zhang ◽  
Erin N Smith ◽  
Becky Drees ◽  
Rachel B Brem ◽  
...  

2018 ◽  
Author(s):  
Yeping Lina Qiu ◽  
Hong Zheng ◽  
Olivier Gevaert

AbstractMotivationThe presence of missing values is a frequent problem encountered in genomic data analysis. Lost data can be an obstacle to downstream analyses that require complete data matrices. State-of-the-art imputation techniques including Singular Value Decomposition (SVD) and K-Nearest Neighbors (KNN) based methods usually achieve good performances, but are computationally expensive especially for large datasets such as those involved in pan-cancer analysis.ResultsThis study describes a new method: a denoising autoencoder with partial loss (DAPL) as a deep learning based alternative for data imputation. Results on pan-cancer gene expression data and DNA methylation data from over 11,000 samples demonstrate significant improvement over standard denoising autoencoder for both data missing-at-random cases with a range of missing percentages, and missing-not-at-random cases based on expression level and GC-content. We discuss the advantages of DAPL over traditional imputation methods and show that it achieves comparable or better performance with less computational burden.Availabilityhttps://github.com/gevaertlab/[email protected]


2011 ◽  
Vol 7 (6) ◽  
pp. e1002073 ◽  
Author(s):  
Nathan L. Nehrt ◽  
Wyatt T. Clark ◽  
Predrag Radivojac ◽  
Matthew W. Hahn

Author(s):  
Zheng Zhao ◽  
Ke’nan Zhang ◽  
Qiangwei Wang ◽  
Guanzhang Li ◽  
Fan Zeng ◽  
...  

AbstractGliomas are the most common and malignant intracranial tumours in adults. Recent studies have shown that functional genomics greatly aids in the understanding of the pathophysiology and therapy of glioma. However, comprehensive genomic data and analysis platforms are relatively limited. In this study, we developed the Chinese Glioma Genome Atlas (CGGA, http://www.cgga.org.cn), a user-friendly data portal for storage and interactive exploration of multi-dimensional functional genomic data that includes nearly 2,000 primary and recurrent glioma samples from Chinese cohorts. CGGA currently provides access to whole-exome sequencing (286 samples), messenger RNA sequencing (1,018 samples) and microarray (301 samples), DNA methylation microarray (159 samples), and microRNA microarray (198 samples) data, as well as detailed clinical data (e.g., WHO grade, histological type, critical molecular genetic information, age, sex, chemoradiotherapy status and survival data). In addition, we developed an analysis tool to allow users to browse mutational, mRNA/microRNA expression, and DNA methylation profiles and perform survival and correlation analyses of specific glioma subtypes. CGGA greatly reduces the barriers between complex functional genomic data and glioma researchers who seek rapid, intuitive, and high-quality access to data resources and enables researchers to use these immeasurable data sources for biological research and clinical application. Importantly, the free provision of data will allow researchers to quickly generate and provide data to the research community.


2017 ◽  
Author(s):  
Casey W. Dunn ◽  
Felipe Zapata ◽  
Catriona Munro ◽  
Stefan Siebert ◽  
Andreas Hejnol

AbstractThere is considerable interest in comparing functional genomic data across species. One goal of such work is to provide an integrated understanding of genome and phenotype evolution. Most comparative functional genomic studies have relied on multiple pairwise comparisons between species, an approach that does not incorporate information about the evolutionary relationships among species. The statistical problems that arise from not considering these relationships can lead pairwise approaches to the wrong conclusions, and are a missed opportunity to learn about biology that can only be understood in an explicit phylogenetic context. Here we examine two recently published studies that compare gene expression across species with pairwise methods, and find reason to question the original conclusions of both. One study interpreted pairwise comparisons of gene expression as support for the ortholog conjecture, the hypothesis that orthologs tend to be more similar than paralogs. The other study interpreted pairwise comparisons of embryonic gene expression across distantly related animals as evidence for a distinct evolutionary process that gave rise to phyla. In each study, distinct patterns of pairwise similarity among species were originally interpreted as evidence of particular evolutionary processes, but instead we find they reflect species relationships. These reanalyses concretely demonstrate the inadequacy of pairwise comparisons for analyzing functional genomic data across species. It will be critical to adopt phylogenetic comparative methods in future functional genomic work. Fortunately, phylogenetic comparative biology is also a rapidly advancing field with many methods that can be directly applied to functional genomic data.SignificanceComparisons of genome function between species are providing important insight into the evolutionary origins of diversity. Here we demonstrate that comparative functional genomics studies can come to the wrong conclusions if they do not take the relationships of species into account and instead rely on pairwise comparisons between species, as is common practice. We re-examined two previously published studies and found problems with pairwise comparisons that draw both their original conclusions into question. One study had found support for the ortholog conjecture and the other had concluded that the evolution of gene expression was different between animal phyla than within them. Our results demonstrate that to answer evolutionary questions about genome function, it is critical to consider evolutionary relationships.


2018 ◽  
Author(s):  
Yang Yang ◽  
Quanquan Gu ◽  
Yang Zhang ◽  
Takayo Sasaki ◽  
Julianna Crivello ◽  
...  

SummaryA large amount of multi-species functional genomic data from high-throughput assays are becoming available to help understand the molecular mechanisms for phenotypic diversity across species. However, continuous-trait probabilistic models, which are key to such comparative analysis, remain underexplored. Here we develop a new model, called phylogenetic hidden Markov Gaussian processes (Phylo-HMGP), to simultaneously infer heterogeneous evolutionary states of functional genomic features in a genome-wide manner. Both simulation studies and real data application demonstrate the effectiveness of Phylo-HMGP. Importantly, we applied Phylo-HMGP to analyze a new cross-species DNA replication timing (RT) dataset from the same cell type in five primate species (human, chimpanzee, orangutan, gibbon, and green monkey). We demonstrate that our Phylo-HMGP model enables discovery of genomic regions with distinct evolutionary patterns of RT. Our method provides a generic framework for comparative analysis of multi-species continuous functional genomic signals to help reveal regions with conserved or lineage-specific regulatory roles.


Cell Systems ◽  
2018 ◽  
Vol 7 (2) ◽  
pp. 208-218.e11 ◽  
Author(s):  
Yang Yang ◽  
Quanquan Gu ◽  
Yang Zhang ◽  
Takayo Sasaki ◽  
Julianna Crivello ◽  
...  

2018 ◽  
Vol 115 (3) ◽  
pp. E409-E417 ◽  
Author(s):  
Casey W. Dunn ◽  
Felipe Zapata ◽  
Catriona Munro ◽  
Stefan Siebert ◽  
Andreas Hejnol

There is considerable interest in comparing functional genomic data across species. One goal of such work is to provide an integrated understanding of genome and phenotype evolution. Most comparative functional genomic studies have relied on multiple pairwise comparisons between species, an approach that does not incorporate information about the evolutionary relationships among species. The statistical problems that arise from not considering these relationships can lead pairwise approaches to the wrong conclusions and are a missed opportunity to learn about biology that can only be understood in an explicit phylogenetic context. Here, we examine two recently published studies that compare gene expression across species with pairwise methods, and find reason to question the original conclusions of both. One study interpreted pairwise comparisons of gene expression as support for the ortholog conjecture, the hypothesis that orthologs tend to have more similar attributes (expression in this case) than paralogs. The other study interpreted pairwise comparisons of embryonic gene expression across distantly related animals as evidence for a distinct evolutionary process that gave rise to phyla. In each study, distinct patterns of pairwise similarity among species were originally interpreted as evidence of particular evolutionary processes, but instead, we find that they reflect species relationships. These reanalyses concretely show the inadequacy of pairwise comparisons for analyzing functional genomic data across species. It will be critical to adopt phylogenetic comparative methods in future functional genomic work. Fortunately, phylogenetic comparative biology is also a rapidly advancing field with many methods that can be directly applied to functional genomic data.


2016 ◽  
Author(s):  
Yi-Fei Huang ◽  
Brad Gulko ◽  
Adam Siepel

AbstractAcross many species, a large fraction of genetic variants that influence phenotypes of interest is located outside of protein-coding genes, yet existing methods for identifying such variants have poor predictive power. Here, we introduce a new computational method, called LINSIGHT, that substantially improves the prediction of noncoding nucleotide sites at which mutations are likely to have deleterious fitness consequences, and which therefore are likely to be phenotypically important. LINSIGHT combines a simple neural network for functional genomic data with a probabilistic model of molecular evolution. The method is fast and highly scalable, enabling it to exploit the “Big Data” available in modern genomics. We show that LINSIGHT outperforms the best available methods in identifying human noncoding variants associated with inherited diseases. In addition, we apply LINSIGHT to an atlas of human enhancers and show that the fitness consequences at enhancers depend on cell-type, tissue specificity, and constraints at associated promoters.


Sign in / Sign up

Export Citation Format

Share Document