How much do model organism phenotypes contribute to the computational identification of human disease genes?

Computing phenotypic similarity has been shown to be useful in identification of new disease genes and for rare disease diagnostic support. Genotype--phenotype data from orthologous genes in model organisms can compensate for lack of human data to greatly increase genome coverage. Work over the past decade has demonstrated the power of cross-species phenotype comparisons, and several cross-species phenotype ontologies have been developed for this purpose. The relative contribution of different model organisms to identifying disease-associated genes using computational approaches is not yet fully explored. We use methods based on phenotype ontologies to semantically relate phenotypes resulting from loss-of-function mutations in different model organisms to disease-associated phenotypes in humans. Semantic machine learning methods are used to measure how much different model organisms contribute to the identification of known human gene--disease associations. We find that only mouse phenotypes can accurately predict human gene--disease associations. Our work has implications for the future development of integrated phenotype ontologies, as well as for the use of model organism phenotypes in human genetic variant interpretation.

Download Full-text

Losses of human disease-associated genes in placental mammals

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqz012 ◽

2019 ◽

Vol 2 (1) ◽

Cited By ~ 4

Author(s):

Virag Sharma ◽

Michael Hiller

Keyword(s):

Uric Acid ◽

Human Disease ◽

High Serum ◽

Disease Genes ◽

Loss Of Function ◽

Disease Symptoms ◽

Poor Vision ◽

Disease Associated Genes ◽

Human Disease Genes ◽

Respective Species

Abstract We systematically investigate whether losses of human disease-associated genes occurred in other mammals during evolution. We first show that genes lost in any of 62 non-human mammals generally have a lower degree of pleiotropy, and are highly depleted in essential and disease-associated genes. Despite this under-representation, we discovered multiple genes implicated in human disease that are truly lost in non-human mammals. In most cases, traits resembling human disease symptoms are present but not deleterious in gene-loss species, exemplified by losses of genes causing human eye or teeth disorders in poor-vision or enamel-less mammals. We also found widespread losses of PCSK9 and CETP genes, where loss-of-function mutations in humans protect from atherosclerosis. Unexpectedly, we discovered losses of disease genes (TYMP, TBX22, ABCG5, ABCG8, MEFV, CTSE) where deleterious phenotypes do not manifest in the respective species. A remarkable example is the uric acid-degrading enzyme UOX, which we found to be inactivated in elephants and manatees. While UOX loss in hominoids led to high serum uric acid levels and a predisposition for gout, elephants and manatees exhibit low uric acid levels, suggesting alternative ways of metabolizing uric acid. Together, our results highlight numerous mammals that are ‘natural knockouts’ of human disease genes.

Download Full-text

CoCoCoNet: Conserved and Comparative Co-expression Across a Diverse Set of Species

10.1101/2020.04.21.053900 ◽

2020 ◽

Author(s):

John Lee ◽

Manthan Shah ◽

Sara Ballouz ◽

Megan Crow ◽

Jesse Gillis

Keyword(s):

Alternative Model ◽

Model Systems ◽

Model Organisms ◽

Disease Genes ◽

Gene Module ◽

Link Type ◽

Network Properties ◽

Gene Modules ◽

Human Disease Genes ◽

Insight Into

ABSTRACTCo-expression analysis has provided insight into gene function in organisms from Arabidopsis to Zebrafish. Comparison across species has the potential to enrich these results, for example by prioritizing among candidate human disease genes based on their network properties, or by finding alternative model systems where their co-expression is conserved. Here, we present CoCoCoNet as a tool for identifying conserved gene modules and comparing co-expression networks. CoCoCoNet is a resource for both data and methods, providing gold-standard networks and sophisticated tools for on-the-fly comparative analyses across 14 species. We show how CoCoCoNet can be used in two use cases. In the first, we demonstrate deep conservation of a nucleolus gene module across very divergent organisms, and in the second, we show how the heterogeneity of autism mechanisms in humans can be broken down by functional groups, and translated to model organisms. CoCoCoNet is free to use and available to all at https://milton.cshl.edu/CoCoCoNet, with data and R scripts available at ftp://milton.cshl.edu/data.

Download Full-text

Exploring Aβ Proteotoxicity and Therapeutic Candidates Using Drosophila melanogaster

International Journal of Molecular Sciences ◽

10.3390/ijms221910448 ◽

2021 ◽

Vol 22 (19) ◽

pp. 10448

Author(s):

Greta Elovsson ◽

Liza Bergkvist ◽

Ann-Christin Brorsson

Keyword(s):

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Drosophila Melanogaster ◽

Model Organism ◽

Amyloid Β ◽

Therapeutic Strategies ◽

Disease Genes ◽

Amyloid Β Peptide ◽

Drug Candidates ◽

Human Disease Genes

Alzheimer’s disease is a widespread and devastating neurological disorder associated with proteotoxic events caused by the misfolding and aggregation of the amyloid-β peptide. To find therapeutic strategies to combat this disease, Drosophila melanogaster has proved to be an excellent model organism that is able to uncover anti-proteotoxic candidates due to its outstanding genetic toolbox and resemblance to human disease genes. In this review, we highlight the use of Drosophila melanogaster to both study the proteotoxicity of the amyloid-β peptide and to screen for drug candidates. Expanding the knowledge of how the etiology of Alzheimer’s disease is related to proteotoxicity and how drugs can be used to block disease progression will hopefully shed further light on the field in the search for disease-modifying treatments.

Download Full-text

A surprising abundance of human disease genes in a simple “basal” animal, the starlet sea anemone (Nematostella vectensis)

Genome ◽

10.1139/g07-045 ◽

2007 ◽

Vol 50 (7) ◽

pp. 689-692 ◽

Cited By ~ 19

Author(s):

James C. Sullivan ◽

John R. Finnerty

Keyword(s):

Experimental Model ◽

Human Disease ◽

Human Diseases ◽

Model Organisms ◽

Nematostella Vectensis ◽

Disease Genes ◽

Invertebrate Animal ◽

Invertebrate Animals ◽

Invertebrate Model ◽

Human Disease Genes

Invertebrate animals have provided important insights into the mechanisms of, and treatment for, numerous human diseases. A surprisingly high proportion of genes underlying human disease are present in the genome of a simple, evolutionarily basal invertebrate animal, Nematostella vectensis , including some genes that are absent in established invertebrate model organisms. This, together with the laboratory tractability and regenerative capability of N. vectensis, recommends the species as an important new experimental model for the study of genes underlying human disease.

Download Full-text

DeepPheno: Predicting single gene loss-of-function phenotypes using an ontology-aware hierarchical classifier

10.1101/839332 ◽

2019 ◽

Author(s):

Maxat Kulmanov ◽

Robert Hoehndorf

Keyword(s):

Research Group ◽

Molecular Mechanisms ◽

State Of The Art ◽

Single Gene ◽

Hierarchical Classification ◽

Model Organisms ◽

Loss Of Function ◽

Protein Coding ◽

Disease Associations ◽

Molecular Aberrations

AbstractMotivationPredicting the phenotypes resulting from molecular perturbations is one of the key challenges in genetics. Both forward and reverse genetic screen are employed to identify the molecular mechanisms underlying phenotypes and disease, and these resulted in a large number of genotype–phenotype association being available for humans and model organisms. Combined with recent advances in machine learning, it may now be possible to predict human phenotypes resulting from particular molecular aberrations.ResultsWe developed DeepPheno, a neural network based hierarchical multi-class multi-label classification method for predicting the phenotypes resulting from complete loss-of-function in single genes. DeepPheno uses the functional annotations with gene products to predict the phenotypes resulting from a loss-of-function; additionally, we employ a two-step procedure in which we predict these functions first and then predict phenotypes. Prediction of phenotypes is ontology-based and we propose a novel ontology-based classifier suitable for very large hierarchical classification tasks. These methods allow us to predict phenotypes associated with any known protein-coding gene. We evaluate our approach using evaluation metrics established by the CAFA challenge and compare with top performing CAFA2 methods as well as several state of the art phenotype prediction approaches, demonstrating the improvement of DeepPheno over state of the art methods. Furthermore, we show that predictions generated by DeepPheno are applicable to predicting gene–disease associations based on comparing phenotypes, and that a large number of new predictions made by DeepPheno interact with a gene that is already associated with the predicted phenotype.Availabilityhttps://github.com/bio-ontology-research-group/[email protected]

Download Full-text

CoCoCoNet: conserved and comparative co-expression across a diverse set of species

Nucleic Acids Research ◽

10.1093/nar/gkaa348 ◽

2020 ◽

Vol 48 (W1) ◽

pp. W566-W571 ◽

Cited By ~ 3

Author(s):

John Lee ◽

Manthan Shah ◽

Sara Ballouz ◽

Megan Crow ◽

Jesse Gillis

Keyword(s):

Alternative Model ◽

Model Systems ◽

Model Organisms ◽

Disease Genes ◽

Gene Module ◽

Network Properties ◽

Gene Modules ◽

Conserved Gene ◽

Human Disease Genes ◽

Insight Into

Abstract Co-expression analysis has provided insight into gene function in organisms from Arabidopsis to zebrafish. Comparison across species has the potential to enrich these results, for example by prioritizing among candidate human disease genes based on their network properties or by finding alternative model systems where their co-expression is conserved. Here, we present CoCoCoNet as a tool for identifying conserved gene modules and comparing co-expression networks. CoCoCoNet is a resource for both data and methods, providing gold standard networks and sophisticated tools for on-the-fly comparative analyses across 14 species. We show how CoCoCoNet can be used in two use cases. In the first, we demonstrate deep conservation of a nucleolus gene module across very divergent organisms, and in the second, we show how the heterogeneity of autism mechanisms in humans can be broken down by functional groups and translated to model organisms. CoCoCoNet is free to use and available to all at https://milton.cshl.edu/CoCoCoNet, with data and R scripts available at ftp://milton.cshl.edu/data.

Download Full-text

Model organisms contribute to diagnosis and discovery in the undiagnosed diseases network: current state and a future vision

Orphanet Journal of Rare Diseases ◽

10.1186/s13023-021-01839-9 ◽

2021 ◽

Vol 16 (1) ◽

Author(s):

Dustin Baldridge ◽

◽

Michael F. Wangler ◽

Angela N. Bowman ◽

Shinya Yamamoto ◽

...

Keyword(s):

Rare Diseases ◽

Model Organism ◽

Position Statement ◽

Model Organisms ◽

Multidisciplinary Teams ◽

Disease Genes ◽

Functional Studies ◽

Underlying Mechanisms ◽

Undiagnosed Diseases ◽

Undiagnosed Diseases Network

AbstractDecreased sequencing costs have led to an explosion of genetic and genomic data. These data have revealed thousands of candidate human disease variants. Establishing which variants cause phenotypes and diseases, however, has remained challenging. Significant progress has been made, including advances by the National Institutes of Health (NIH)-funded Undiagnosed Diseases Network (UDN). However, 6000–13,000 additional disease genes remain to be identified. The continued discovery of rare diseases and their genetic underpinnings provides benefits to affected patients, of whom there are more than 400 million worldwide, and also advances understanding the mechanisms of more common diseases. Platforms employing model organisms enable discovery of novel gene-disease relationships, help establish variant pathogenicity, and often lead to the exploration of underlying mechanisms of pathophysiology that suggest new therapies. The Model Organism Screening Center (MOSC) of the UDN is a unique resource dedicated to utilizing informatics and functional studies in model organisms, including worm (Caenorhabditis elegans), fly (Drosophila melanogaster), and zebrafish (Danio rerio), to aid in diagnosis. The MOSC has directly contributed to the diagnosis of challenging cases, including multiple patients with complex, multi-organ phenotypes. In addition, the MOSC provides a framework for how basic scientists and clinicians can collaborate to drive diagnoses. Customized experimental plans take into account patient presentations, specific genes and variant(s), and appropriateness of each model organism for analysis. The MOSC also generates bioinformatic and experimental tools and reagents for the wider scientific community. Two elements of the MOSC that have been instrumental in its success are (1) multidisciplinary teams with expertise in variant bioinformatics and in human and model organism genetics, and (2) mechanisms for ongoing communication with clinical teams. Here we provide a position statement regarding the central role of model organisms for continued discovery of disease genes, and we advocate for the continuation and expansion of MOSC-type research entities as a Model Organisms Network (MON) to be funded through grant applications submitted to the NIH, family groups focused on specific rare diseases, other philanthropic organizations, industry partnerships, and other sources of support.

Download Full-text

DeepPheno: Predicting single gene loss-of-function phenotypes using an ontology-aware hierarchical classifier

PLoS Computational Biology ◽

10.1371/journal.pcbi.1008453 ◽

2020 ◽

Vol 16 (11) ◽

pp. e1008453

Author(s):

Maxat Kulmanov ◽

Robert Hoehndorf

Keyword(s):

Molecular Mechanisms ◽

Single Gene ◽

Hierarchical Classification ◽

Model Organisms ◽

Loss Of Function ◽

Protein Coding ◽

Functional Annotations ◽

Step Procedure ◽

Disease Associations ◽

Molecular Aberrations

Predicting the phenotypes resulting from molecular perturbations is one of the key challenges in genetics. Both forward and reverse genetic screen are employed to identify the molecular mechanisms underlying phenotypes and disease, and these resulted in a large number of genotype–phenotype association being available for humans and model organisms. Combined with recent advances in machine learning, it may now be possible to predict human phenotypes resulting from particular molecular aberrations. We developed DeepPheno, a neural network based hierarchical multi-class multi-label classification method for predicting the phenotypes resulting from loss-of-function in single genes. DeepPheno uses the functional annotations with gene products to predict the phenotypes resulting from a loss-of-function; additionally, we employ a two-step procedure in which we predict these functions first and then predict phenotypes. Prediction of phenotypes is ontology-based and we propose a novel ontology-based classifier suitable for very large hierarchical classification tasks. These methods allow us to predict phenotypes associated with any known protein-coding gene. We evaluate our approach using evaluation metrics established by the CAFA challenge and compare with top performing CAFA2 methods as well as several state of the art phenotype prediction approaches, demonstrating the improvement of DeepPheno over established methods. Furthermore, we show that predictions generated by DeepPheno are applicable to predicting gene–disease associations based on comparing phenotypes, and that a large number of new predictions made by DeepPheno have recently been added as phenotype databases.

Download Full-text

The Xenopus Phenotype Ontology: bridging model organism phenotype data to human health and development.

10.1101/2021.11.12.467727 ◽

2021 ◽

Author(s):

Malcolm E Fisher ◽

Erik J Segerdell ◽

Nicolas Matentzoglu ◽

Mardi J Nenni ◽

Joshua D Fortriede ◽

...

Keyword(s):

Design Patterns ◽

Model Organism ◽

Model Organisms ◽

Phenotype Ontology ◽

Phenotypic Data ◽

Anatomy Ontology ◽

Ontology Language ◽

Phenotype Data ◽

Vertebrate Model ◽

Research Continuum

Background: Ontologies of precisely defined, controlled vocabularies are essential to curate the results of biological experiments such that the data are machine searchable, can be computationally analyzed, and are interoperable across the biomedical research continuum. There is also an increasing need for methods to interrelate phenotypic data easily and accurately from experiments in animal models with human development and disease. Results: Here we present the Xenopus Phenotype Ontology (XPO) to annotate phenotypic data from experiments in Xenopus, one of the major vertebrate model organisms used to study gene function in development and disease. The XPO implements design patterns from the Unified Phenotype Ontology (uPheno), and the principles outlined by the Open Biological and Biomedical Ontologies (OBO Foundry) to maximize interoperability with other species and facilitate ongoing ontology management. Constructed in Web Ontology Language (OWL) the XPO combines the existing uPheno library of ontology design patterns with additional terms from the Xenopus Anatomy Ontology (XAO), the Phenotype and Trait Ontology (PATO) and the Gene Ontology (GO). The integration of these different ontologies into the XPO enables rich phenotypic curation, whilst the uPheno bridging axioms allows phenotypic data from Xenopus experiments to be related to phenotype data from other model organisms and human disease. Moreover, the simple post-composed uPheno design patterns facilitate ongoing XPO development as the generation of new terms and classes of terms can be substantially automated. Conclusions: The XPO serves as an example of current best practices to help overcome many of the inherent challenges in harmonizing phenotype data between different species. The XPO currently consists of approximately 22,000 terms and is being used to curate phenotypes by Xenbase, the Xenopus Model Organism Knowledgebase, forming a standardized corpus of genotype-phenotype data that can be directly related to other uPheno compliant resources.

Download Full-text

Is the average shortest path length of gene set a reflection of their biological relatedness?

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720016600027 ◽

2016 ◽

Vol 14 (06) ◽

pp. 1660002 ◽

Cited By ~ 6

Author(s):

Varsha Embar ◽

Adam Handen ◽

Madhavi K. Ganapathiraju

Keyword(s):

Shortest Path ◽

Path Length ◽

Gene Expression Analysis ◽

Average Distance ◽

Random Permutation ◽

Disease Genes ◽

Human Interactome ◽

Control Sets ◽

Disease Associations ◽

Disease Associated Genes

When a set of genes are identified to be related to a disease, say through gene expression analysis, it is common to examine the average distance among their protein products in the human interactome as a measure of biological relatedness of these genes. The reasoning for this is that, genes associated with a disease would tend to be functionally related, and that functionally related genes would be closely connected to each other in the interactome. Typically, average shortest path length (ASPL) of disease genes (although referred to as genes in the context of disease-associations, the interactions are among protein-products of these genes) is compared to ASPL of randomly selected genes or to ASPL in a randomly permuted network. We examined whether the ASPL of a set of genes is indeed a good measure of biological relatedness or whether it is simply a characteristic of the degree distribution of those genes. We examined the ASPL of genes sets of some disease and pathway associations and compared them to ASPL of three types of randomly selected control sets: uniform selection, from entire proteome, degree-matched selection, and random permutation of the network. We found that disease associated genes and their degree-matched random genes have comparable ASPL. In other words, ASPL is a characteristic of the degree of the genes and the network topology, and not that of functional coherence.

Download Full-text