scholarly journals DeepPheno: Predicting single gene loss-of-function phenotypes using an ontology-aware hierarchical classifier

2019 ◽  
Author(s):  
Maxat Kulmanov ◽  
Robert Hoehndorf

AbstractMotivationPredicting the phenotypes resulting from molecular perturbations is one of the key challenges in genetics. Both forward and reverse genetic screen are employed to identify the molecular mechanisms underlying phenotypes and disease, and these resulted in a large number of genotype–phenotype association being available for humans and model organisms. Combined with recent advances in machine learning, it may now be possible to predict human phenotypes resulting from particular molecular aberrations.ResultsWe developed DeepPheno, a neural network based hierarchical multi-class multi-label classification method for predicting the phenotypes resulting from complete loss-of-function in single genes. DeepPheno uses the functional annotations with gene products to predict the phenotypes resulting from a loss-of-function; additionally, we employ a two-step procedure in which we predict these functions first and then predict phenotypes. Prediction of phenotypes is ontology-based and we propose a novel ontology-based classifier suitable for very large hierarchical classification tasks. These methods allow us to predict phenotypes associated with any known protein-coding gene. We evaluate our approach using evaluation metrics established by the CAFA challenge and compare with top performing CAFA2 methods as well as several state of the art phenotype prediction approaches, demonstrating the improvement of DeepPheno over state of the art methods. Furthermore, we show that predictions generated by DeepPheno are applicable to predicting gene–disease associations based on comparing phenotypes, and that a large number of new predictions made by DeepPheno interact with a gene that is already associated with the predicted phenotype.Availabilityhttps://github.com/bio-ontology-research-group/[email protected]

2020 ◽  
Vol 16 (11) ◽  
pp. e1008453
Author(s):  
Maxat Kulmanov ◽  
Robert Hoehndorf

Predicting the phenotypes resulting from molecular perturbations is one of the key challenges in genetics. Both forward and reverse genetic screen are employed to identify the molecular mechanisms underlying phenotypes and disease, and these resulted in a large number of genotype–phenotype association being available for humans and model organisms. Combined with recent advances in machine learning, it may now be possible to predict human phenotypes resulting from particular molecular aberrations. We developed DeepPheno, a neural network based hierarchical multi-class multi-label classification method for predicting the phenotypes resulting from loss-of-function in single genes. DeepPheno uses the functional annotations with gene products to predict the phenotypes resulting from a loss-of-function; additionally, we employ a two-step procedure in which we predict these functions first and then predict phenotypes. Prediction of phenotypes is ontology-based and we propose a novel ontology-based classifier suitable for very large hierarchical classification tasks. These methods allow us to predict phenotypes associated with any known protein-coding gene. We evaluate our approach using evaluation metrics established by the CAFA challenge and compare with top performing CAFA2 methods as well as several state of the art phenotype prediction approaches, demonstrating the improvement of DeepPheno over established methods. Furthermore, we show that predictions generated by DeepPheno are applicable to predicting gene–disease associations based on comparing phenotypes, and that a large number of new predictions made by DeepPheno have recently been added as phenotype databases.


Author(s):  
Esra Sefik ◽  
Ryan H. Purcell ◽  
Elaine F. Walker ◽  
Gary J. Bassell ◽  
Jennifer G. Mulle ◽  
...  

AbstractThe 3q29 deletion (3q29Del) confers >40-fold increased risk for schizophrenia. However, no single gene in this interval is definitively associated with disease, prompting the hypothesis that neuropsychiatric sequelae emerge upon loss of multiple functionally-connected genes. 3q29 genes are unevenly annotated and the impact of 3q29Del on the human neural transcriptome is unknown. To systematically formulate unbiased hypotheses about molecular mechanisms linking 3q29Del to neuropsychiatric illness, we conducted a systems-level network analysis of the non-pathological adult human cortical transcriptome and generated evidence-based predictions that relate 3q29 genes to novel functions and disease associations. The 21 protein-coding genes located in the interval segregated into seven clusters of highly co-expressed genes, demonstrating both convergent and distributed effects of 3q29Del across the interrogated transcriptomic landscape. Pathway analysis of these clusters indicated involvement in nervous-system functions, including synaptic signaling and organization, as well as core cellular functions, including transcriptional regulation, post-translational modifications, chromatin remodeling and mitochondrial metabolism. Top network-neighbors of 3q29 genes showed significant overlap with known schizophrenia, autism and intellectual disability-risk genes, suggesting that 3q29Del biology is relevant to idiopathic disease. Leveraging “guilt by association”, we propose nine 3q29 genes, including one hub gene, as prioritized drivers of neuropsychiatric risk. These results provide testable hypotheses for experimental analysis on causal drivers and mechanisms of the largest known genetic risk factor for schizophrenia and highlight the study of normal function in non-pathological post-mortem tissue to further our understanding of psychiatric genetics, especially for rare syndromes like 3q29Del, where access to neural tissue from carriers is unavailable or limited.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Esra Sefik ◽  
Ryan H. Purcell ◽  
Katrina Aberizk ◽  
Hallie Averbach ◽  
Emily Black ◽  
...  

AbstractThe 3q29 deletion (3q29Del) confers high risk for schizophrenia and other neurodevelopmental and psychiatric disorders. However, no single gene in this interval is definitively associated with disease, prompting the hypothesis that neuropsychiatric sequelae emerge upon loss of multiple functionally-connected genes. 3q29 genes are unevenly annotated and the impact of 3q29Del on the human neural transcriptome is unknown. To systematically formulate unbiased hypotheses about molecular mechanisms linking 3q29Del to neuropsychiatric illness, we conducted a systems-level network analysis of the non-pathological adult human cortical transcriptome and generated evidence-based predictions that relate 3q29 genes to novel functions and disease associations. The 21 protein-coding genes located in the interval segregated into seven clusters of highly co-expressed genes, demonstrating both convergent and distributed effects of 3q29Del across the interrogated transcriptomic landscape. Pathway analysis of these clusters indicated involvement in nervous-system functions, including synaptic signaling and organization, as well as core cellular functions, including transcriptional regulation, posttranslational modifications, chromatin remodeling, and mitochondrial metabolism. Top network-neighbors of 3q29 genes showed significant overlap with known schizophrenia, autism, and intellectual disability-risk genes, suggesting that 3q29Del biology is relevant to idiopathic disease. Leveraging “guilt by association”, we propose nine 3q29 genes, including one hub gene, as prioritized drivers of neuropsychiatric risk. These results provide testable hypotheses for experimental analysis on causal drivers and mechanisms of the largest known genetic risk factor for schizophrenia and highlight the study of normal function in non-pathological postmortem tissue to further our understanding of psychiatric genetics, especially for rare syndromes like 3q29Del, where access to neural tissue from carriers is unavailable or limited.


2006 ◽  
Vol 8 (17) ◽  
pp. 1-19 ◽  
Author(s):  
Thorsten Enklaar ◽  
Bernhard U. Zabel ◽  
Dirk Prawitt

Beckwith–Wiedemann syndrome (BWS) is a congenital overgrowth condition with an increased risk of developing embryonic tumours, such as Wilms' tumour. The cardinal features are abdominal wall defects, macroglossia and gigantism. BWS is generally sporadic; only 10–15% of cases are familial. A variety of molecular aberrations have been associated with BWS. The only mutations within a gene are loss-of-function mutations in the CDKN1C gene, which codes for an imprinted cell-cycle regulator. CDKN1C mutations appear to be particularly associated with umbilical abnormalities, but not with increased predisposition to Wilms' tumour. In the remaining BWS subgroups, a disturbance of the tight epigenetic regulation of gene expression (patUPD 11p, microdeletions or epimutations) seems to be the cause of the syndrome. Here we describe the clinical presentation of BWS and its dissociation from phenotypically overlapping overgrowth syndromes. We then review the current concepts of causative molecular genetic and epigenetic mechanisms, and discuss future directions of research.


2021 ◽  
Author(s):  
Sarah M Alghamdi ◽  
Paul N Schofield ◽  
Robert Hoehndorf

Computing phenotypic similarity has been shown to be useful in identification of new disease genes and for rare disease diagnostic support. Genotype--phenotype data from orthologous genes in model organisms can compensate for lack of human data to greatly increase genome coverage. Work over the past decade has demonstrated the power of cross-species phenotype comparisons, and several cross-species phenotype ontologies have been developed for this purpose. The relative contribution of different model organisms to identifying disease-associated genes using computational approaches is not yet fully explored. We use methods based on phenotype ontologies to semantically relate phenotypes resulting from loss-of-function mutations in different model organisms to disease-associated phenotypes in humans. Semantic machine learning methods are used to measure how much different model organisms contribute to the identification of known human gene--disease associations. We find that only mouse phenotypes can accurately predict human gene--disease associations. Our work has implications for the future development of integrated phenotype ontologies, as well as for the use of model organism phenotypes in human genetic variant interpretation.


2021 ◽  
Vol 12 ◽  
Author(s):  
Jiahui Zhang ◽  
Taijie Jin ◽  
Ivona Aksentijevich ◽  
Qing Zhou

RIPK1 (receptor-interacting serine/threonine-protein kinase 1) is a key molecule for mediating apoptosis, necroptosis, and inflammatory pathways downstream of death receptors (DRs) and pattern recognition receptors (PRRs). RIPK1 functions are regulated by multiple post-translational modifications (PTMs), including ubiquitination, phosphorylation, and the caspase-8-mediated cleavage. Dysregulation of these modifications leads to an immune deficiency or a hyperinflammatory disease in humans. Over the last decades, numerous studies on the RIPK1 function in model organisms have provided insights into the molecular mechanisms of RIPK1 role in the maintenance of immune homeostasis. However, the physiological role of RIPK1 in the regulation of cell survival and cell death signaling in humans remained elusive. Recently, RIPK1 loss-of-function (LoF) mutations and cleavage-deficient mutations have been identified in humans. This review discusses the molecular pathogenesis of RIPK1-deficiency and cleavage-resistant RIPK1 induced autoinflammatory (CRIA) disorders and summarizes the clinical manifestations of respective diseases to help with the identification of new patients.


2017 ◽  
Author(s):  
Nadezhda A. Potapova ◽  
Maria A. Andrianova ◽  
Georgii A. Bazykin ◽  
Alexey S. Kondrashov

AbstractA gene which carries a bona fide loss-of-function mutation effectively becomes a functionless pseudogene, free from selective constraint. However, there is a number of molecular mechanisms that may lead to at least a partial preservation of the function of genes carrying even drastic alleles. We performed a direct measurement of the strength of negative selection acting on nonsense alleles of protein-coding genes in the Zambian population of Drosophila melanogaster. Within those exons that carry nonsense mutations, negative selection, assayed by the ratio of missense over synonymous nucleotide diversity levels, appears to be absent, consistent with total loss of function. In other exons of nonsense alleles, negative selection was deeply relaxed but likely not completely absent, and the per site number of missense alleles declined significantly with the distance from the premature stop codon. This pattern may be due to alternative splicing which preserves function of some isoforms of nonsense alleles of genes.


2016 ◽  
Author(s):  
Ricardo Mallarino ◽  
Tess A. Linden ◽  
Catherine R. Linnen ◽  
Hopi E. Hoekstra

AbstractA central goal of evolutionary biology is to understand the molecular mechanisms underlying phenotypic adaptation. While the contribution of protein-coding and cis-regulatory mutations to adaptive traits have been well documented, additional sources of variation—such as the production of alternative RNA transcripts from a single gene, or isoforms—have been understudied. Here, we focus on the pigmentation gene Agouti, known to express multiple alternative transcripts, to investigate the role of isoform usage in the evolution of cryptic color phenotypes in deer mice (genus Peromyscus). We first characterize the Agouti isoforms expressed in the Peromyscus skin and find two novel isoforms not previously identified in Mus. Next, we show that a locally adapted light-colored population of P. maniculatus living on the Nebraska Sand Hills shows an up-regulation of a single Agouti isoform, termed 1C, compared to their ancestral dark-colored conspecifics. Using in vitro assays, we show that this preference for isoform 1C may be driven by isoform-specific differences in translation. In addition, using an admixed population of wild-caught mice, we find that variation in overall Agouti expression maps to a region near exon 1C, which also has patterns of nucleotide variation consistent with strong positive selection. Finally, we show that the independent evolution of cryptic light pigmentation in a different species, P. polionotus, has been driven by a preference for the same Agouti isoform. Together, these findings present an example of the role of alternative transcript processing in adaptation and demonstrate molecular convergence at the level of isoform regulation.


2021 ◽  
Author(s):  
Shrey Gandhi ◽  
Anika Witten ◽  
Federica deMajo ◽  
Martijn Gilbers ◽  
Jos Maessen ◽  
...  

AbstractCardiovascular disease (CVD) remains the leading cause of death worldwide. A deeper characterization of the regional transcription patterns within different heart chambers may aid to improve our understanding of the molecular mechanisms involved in the function of the heart as well as our ability to develop novel therapeutic strategies. Here, we determined differentially expressed protein coding, long non-coding (lncRNA) and circular RNA (CircRNA) genes within various heart chambers across seven vertebrate species. We identified chamber specific genes, lncRNAs and pathways that are evolutionarily conserved in vertebrates. Further, we identified lncRNA homologs based on sequence, secondary structure, synteny and expressional conservation. Interestingly, most lncRNAs were found to be syntenically conserved. Various factors affect the co-expression patterns of transcripts including (i) genomic overlap, (ii) strandedness and (iii) transcript biotype. We also provide a catalogue of CircRNAs which are abundantly expressed across vertebrate hearts. Finally, we established a repository called EvoACTG (http://evoactg.uni-muenster.de/), which provides information about the conserved expression patterns for both PC genes and non-coding RNAs (ncRNAs) in the various heart chambers, and may serve as a community resource for investigators interested in the (patho)-physiology of CVD. We believe that this study will inform researchers working in the field of cardiovascular biology to explore the conserved yet intertwined nature of both coding and non-coding cardiac transcriptome across various popular model organisms in CVD research.


2019 ◽  
Author(s):  
Nicola Whiffin ◽  
Irina M. Armean ◽  
Aaron Kleinman ◽  
Jamie L. Marshall ◽  
Eric V. Minikel ◽  
...  

AbstractHuman genetic variants causing loss-of-function (LoF) of protein-coding genes provide natural in vivo models of gene inactivation, which are powerful indicators of gene function and the potential toxicity of therapeutic inhibitors targeting these genes1,2. Gain-of-kinase-function variants in LRRK2 are known to significantly increase the risk of Parkinson’s disease3,4, suggesting that inhibition of LRRK2 kinase activity is a promising therapeutic strategy. Whilst preclinical studies in model organisms have raised some on-target toxicity concerns5–8, the biological consequences of LRRK2 inhibition have not been well characterized in humans. Here we systematically analyse LoF variants in LRRK2 observed across 141,456 individuals sequenced in the Genome Aggregation Database (gnomAD)9 and over 4 million participants in the 23andMe genotyped dataset, to assess their impact at a molecular and phenotypic level. After thorough variant curation, we identify 1,358 individuals with high-confidence predicted LoF variants in LRRK2, several with experimental validation. We show that heterozygous LoF of LRRK2 reduces LRRK2 protein level by ~50% but is not associated with reduced life expectancy, or with any specific phenotype or disease state. These data suggest that therapeutics that downregulate LRRK2 levels or kinase activity by up to 50% are unlikely to have major on-target safety liabilities. Our results demonstrate the value of large scale genomic databases and phenotyping of human LoF carriers for target validation in drug discovery.


Sign in / Sign up

Export Citation Format

Share Document