scholarly journals Phen2Gene: Rapid Phenotype-Driven Gene Prioritization for Rare Diseases

2019 ◽  
Author(s):  
Mengge Zhao ◽  
James M. Havrilla ◽  
Li Fang ◽  
Ying Chen ◽  
Jacqueline Peng ◽  
...  

AbstractHuman Phenotype Ontology (HPO) terms are increasingly used in diagnostic settings to aid in the characterization of patient phenotypes. The HPO annotation database is updated frequently and can provide detailed phenotype knowledge on various human diseases, and many HPO terms are now mapped to candidate causal genes with binary relationships. To further improve the genetic diagnosis of rare diseases, we incorporated these HPO annotations, gene-disease databases, and gene-gene databases in a probabilistic model to build a novel HPO-driven gene prioritization tool, Phen2Gene. Phen2Gene accesses a database built upon this information called the HPO2Gene Knowledgebase (H2GKB), which provides weighted and ranked gene lists for every HPO term. Phen2Gene is then able to access the H2GKB for patient-specific lists of HPO terms or PhenoPackets descriptions supported by GA4GH (http://phenopackets.org/), calculate a prioritized gene list based on a probabilistic model, and output gene-disease relationships with great accuracy. Phen2Gene outperforms existing gene prioritization tools in speed, and acts as a real-time phenotype driven gene prioritization tool to aid the clinical diagnosis of rare undiagnosed diseases. In addition to a command line tool released under the MIT license (https://github.com/WGLab/Phen2Gene), we also developed a web server and web service (https://phen2gene.wglab.org/) for running the tool via web interface or RESTful API queries. Finally, we have curated a large amount of benchmarking data for phenotype-to-gene tools involving 197 patients across 76 scientific articles and 85 patients’ de-identified HPO term data from CHOP.

2020 ◽  
Vol 2 (2) ◽  
Author(s):  
Mengge Zhao ◽  
James M Havrilla ◽  
Li Fang ◽  
Ying Chen ◽  
Jacqueline Peng ◽  
...  

Abstract Human Phenotype Ontology (HPO) terms are increasingly used in diagnostic settings to aid in the characterization of patient phenotypes. The HPO annotation database is updated frequently and can provide detailed phenotype knowledge on various human diseases, and many HPO terms are now mapped to candidate causal genes with binary relationships. To further improve the genetic diagnosis of rare diseases, we incorporated these HPO annotations, gene–disease databases and gene–gene databases in a probabilistic model to build a novel HPO-driven gene prioritization tool, Phen2Gene. Phen2Gene accesses a database built upon this information called the HPO2Gene Knowledgebase (H2GKB), which provides weighted and ranked gene lists for every HPO term. Phen2Gene is then able to access the H2GKB for patient-specific lists of HPO terms or PhenoPacket descriptions supported by GA4GH (http://phenopackets.org/), calculate a prioritized gene list based on a probabilistic model and output gene–disease relationships with great accuracy. Phen2Gene outperforms existing gene prioritization tools in speed and acts as a real-time phenotype-driven gene prioritization tool to aid the clinical diagnosis of rare undiagnosed diseases. In addition to a command line tool released under the MIT license (https://github.com/WGLab/Phen2Gene), we also developed a web server and web service (https://phen2gene.wglab.org/) for running the tool via web interface or RESTful API queries. Finally, we have curated a large amount of benchmarking data for phenotype-to-gene tools involving 197 patients across 76 scientific articles and 85 patients’ de-identified HPO term data from the Children’s Hospital of Philadelphia.


2021 ◽  
Author(s):  
Eyal Simonovsky ◽  
Moran Sharon ◽  
Maya Ziv ◽  
Omry Mauer ◽  
Idan Hekselman ◽  
...  

ABSTRACTGenetic studies of Mendelian and rare diseases face the critical challenges of identifying pathogenic gene variants and their modes-of-action. Previous efforts rarely utilized the tissue-selective manifestation of these diseases for their elucidation. Here we introduce an interpretable machine learning (ML) platform that utilizes heterogeneous and large-scale tissue-aware datasets of human genes, and rigorously, concurrently and quantitatively assesses hundreds of candidate mechanisms per disease. The resulting tissue-aware ML platform is applicable in gene-specific, tissue-specific, or patient-specific modes. Application of the platform to selected Mendelian disease genes pinpointed mechanisms that lead to tissue-specific disease manifestation. When applied jointly to diseases that manifest in the same tissue, the models revealed common known and previously underappreciated factors that underlie tissue-selective disease manifestation. Lastly, we harnessed our ML platform toward genetic diagnosis of tissue-selective rare diseases. Patient-specific models of candidate disease-causing genes from 50 patients successfully prioritized the pathogenic gene in 86% of the cases, implying that the tissue-selectivity of rare diseases aids in filtering out unlikely candidate genes. Thus, interpretable tissue-aware ML models can boost mechanistic understanding and genetic diagnosis of tissue-selective heritable diseases. A webserver supporting gene prioritization is available at https://netbio.bgu.ac.il/trace/.


Blood ◽  
2016 ◽  
Vol 128 (22) ◽  
pp. SCI-38-SCI-38 ◽  
Author(s):  
Kathleen Freson

Abstract Inherited platelet disorders (IPDs) comprise a heterogeneous group of disorders with a complex genetic etiology, characterized by impairments in platelet formation, morphology and function. Since the implementation of next generation sequencing (NGS) in 2009, the gene list for diagnosis of IPDs rapidly expanded from 39 to 53 genes. A diagnostic high-throughput targeted NGS platform (referred to as ThromboGenomics; www.thrombogenomics.org.uk) was very recently described as an affordable DNA-based test of 76 genes to diagnose patients 'suspected of having a known inherited platelet, thrombotic or bleeding disorder' (BPD). When the phenotype is strongly indicative of the presence of a particular disease etiology but the variants are unknown, sensitivity remains high (>90% based on 61 samples) while patients included with an uncertain disease such as delta storage pool disease, mostly receive no genetic diagnosis (only 10% a genetic diagnosis was obtained). Such IPDs should be included in gene discovery NGS programs such as the BRIDGE-BPD2 study. For this study, whole genome sequencing results of the DNA samples of nearly 1000 probands with uncharacterized IPDs, analyzed using assigned Human Phenotype Ontology (HPO) terms have helped to identify pathogenic variants in almost 20% of cases. New clustering algorithms to group cases with similar phenotypes have been used to identify two novel IPD genes (DIAPH1 and SRC2) and several putative ones. Still many IPD patients don't receive a genetic diagnosis. A majority of cases either harbor pathogenic variants in unknown genes or in regulatory regions or are the result of a digenic mode of inheritance. NGS combined with data from RNA-seq, ChIP-seq, gene regulatory network analysis, epigenome, proteomics and mouse knock-out studies amongst others will also help explore the non-coding regulatory space and gene-gene interactions. Given the existence of many non-pathogenic variants in any individual's genome, the main challenge faced by researchers when interpreting NGS data of an IPD case is determining which variants are causing the disorder.3Interpreting the functional consequences of novel rare variants is not easy and it is extremely important to apply rigorous standards when assigning pathogenicity. Clinical genomic data are the same as other complex medical data and should be interpreted by a multidisciplinary team comprising typically a statistical geneticist, clinical geneticist, and genetic counselors, who have the skills to interpret these results in the context of the test methodology, the theoretical background of genetics, Bayesian reasoning, and a myriad of other factors. 1. Simeoni I, Stephens JC, Hu F, et al. A comprehensive high-throughput sequencing test for the diagnosis of inherited bleeding, thrombotic and platelet disorders. Blood. 2016; 127: 279. 2. Turro E, Greene D, Wijgaerts A, et al. A dominant gain-of-function mutation in universal tyrosine kinase SRC causes thrombocytopenia, myelofibrosis, bleeding, and bone pathologies. Sci Transl Med. 2016;8:328. 3. Lentaigne C, Freson K, Laffan MA, et al. Inherited platelet disorders: towards DNA-based diagnosis. Blood. 2016; 127: 2814. Disclosures No relevant conflicts of interest to declare.


2021 ◽  
pp. 1-21
Author(s):  
Antonio Atalaia ◽  
Rabah Ben Yaou ◽  
Karim Wahbi ◽  
Annachiara De Sandre-Giovannoli ◽  
Corinne Vigouroux ◽  
...  

Background: Variants in the LMNA gene, encoding lamins A/C, are responsible for a growing number of diseases, all of which complying with the definition of rare diseases. LMNA-related disorders have a varied phenotypic expression with more than 15 syndromes described, belonging to five phenotypic groups: Muscular Dystrophies, Neuropathies, Cardiomyopathies, Lipodystrophies and Progeroid Syndromes. Overlapping phenotypes are also reported. Linking gene and variants with phenotypic expression, disease mechanisms, and corresponding treatments is particularly challenging in laminopathies. Treatment recommendations are limited, and very few are variant-based. Objective: The Treatabolome initiative aims to provide a shareable dataset of existing variant-specific treatment for rare diseases within the Solve-RD EU project. As part of this project, we gathered evidence of specific treatments for laminopathies via a systematic literature review adopting the FAIR (Findable, Accessible, Interoperable, and Reusable) guidelines for scientific data production. Methods: Treatments for LMNA-related conditions were systematically collected from MEDLINE and Embase bibliographic databases and clinical trial registries (Cochrane Central Registry of Controlled Trials, clinicaltrial.gov and EudraCT). Two investigators extracted and analyzed the literature data independently. The included papers were assessed using the Oxford Centre for Evidence-Based Medicine 2011 Levels of Evidence. Results: From the 4783 selected articles by a systematic approach, we identified 78 papers for our final analysis that corresponded to the profile of data defined in the inclusion and exclusion criteria. These papers include 2 guidelines/consensus papers, 4 meta-analyses, 14 single-arm trials, 15 case series, 13 cohort studies, 21 case reports, 8 expert reviews and 1 expert opinion. The treatments were summarized electronically according to significant phenome-genome associations. The specificity of treatments according to the different laminopathic phenotypical presentations is variable. Conclusions: We have extracted Treatabolome-worthy treatment recommendations for patients with different forms of laminopathies based on significant phenome-genome parings. This dataset will be available on the Treatabolome website and, through interoperability, on genetic diagnosis and treatment support tools like the RD-Connect’s Genome Phenome Analysis Platform.


2021 ◽  
Vol 9 (Suppl 3) ◽  
pp. A572-A572
Author(s):  
Samra Turajlic ◽  
Mariam Jamal-Hanjani ◽  
Andrew Furness ◽  
Ruth Plummer ◽  
Judith Cave ◽  
...  

BackgroundEx-vivo expanded tumour infiltrating lymphocytes (TIL) show promise in delivering durable responses among several solid tumour indications. However, characterising, quantifying and tracking the active component of TIL therapy remains challenging as the expansion process does not distinguish between tumour reactive and bystander T-cells. Achilles Therapeutics has developed ATL001, a patient-specific TIL-based product, manufactured using the VELOS™ process that specifically targets clonal neoantigens present in all tumour cells within a patient. Two Phase I/IIa clinical trials of ATL001 are ongoing in patients with advanced Non-Small Cell Lung Cancer, CHIRON (NCT04032847), and metastatic or recurrent melanoma, THETIS (NCT03997474). Extensive product characterisation and immune-monitoring are performed through Achilles’ manufacturing and translational science programme. This enables precise quantification and characterisation of the active component of this therapy – Clonal Neoantigen T cells (cNeT) – during manufacture and following patient administration, offering unique insight into the mechanism of action of ATL001 and aiding the development of next generation processes.MethodsATL001 was manufactured using procured tumour and matched whole blood from 8 patients enrolled in the THETIS (n=5) and CHIRON (n=3) clinical trials. Following administration of ATL001, peripheral blood samples were collected up to week 6. The active component of the product was detected via re-stimulation with clonal neoantigen peptide pools and evaluation of IFN-γ and/or TNF-α production. Deconvolution of individual reactivities was achieved via ELISPOT assays. Immune reconstitution was evaluated by flow cytometry. cNeT expansion was evaluated by restimulation of isolated PBMCs with peptide pools and individual peptide reactivities (ELISPOT).ResultsThe median age was 57 (range 30 – 71) and 6/8 patients were male. The median number of previous lines of systemic anti-cancer treatment at the time of ATL001 dosing was 2.5 (range 1 – 5). Proportion of cNeT in manufactured products ranged from 0.20% - 77.43% (mean 26.78%) and unique single peptide reactivities were observed in 7 of 8 products (range 0 – 28, mean 8.6). Post-dosing, cNeTs were detected in 5/8 patients and cNeT expansion was observed in 3/5 patients.ConclusionsThese data underscore our ability to sensitively detect, quantify and track the patient-specific cNeT component of ATL001 – during manufacture and post dosing. As the dataset matures, these metrics of detection and expansion will be correlated with product, clinical and genomic characteristics to determine variables associated with peripheral cNeT dynamics and clinical response.ReferencesNCT04032847, NCT03997474Ethics ApprovalThe first 8 patients described have all been located within the UK and both trials (CHIRON and THETIS) have been approved by the UK MHRA (among other international bodies, e.g FDA). Additionally, these trials have been approved by local ethics boards at active sites within the UK. Patient‘s are fully informed by provided materials and investigators prior to consenting to enrol into either ATL001 trial.


F1000Research ◽  
2020 ◽  
Vol 9 ◽  
pp. 709 ◽  
Author(s):  
Liis Kolberg ◽  
Uku Raudvere ◽  
Ivan Kuzmin ◽  
Jaak Vilo ◽  
Hedi Peterson

g:Profiler (https://biit.cs.ut.ee/gprofiler) is a widely used gene list functional profiling and namespace conversion toolset that has been contributing to reproducible biological data analysis already since 2007. Here we introduce the accompanying R package, gprofiler2, developed to facilitate programmatic access to g:Profiler computations and databases via REST API. The gprofiler2 package provides an easy-to-use functionality that enables researchers to incorporate functional enrichment analysis into automated analysis pipelines written in R. The package also implements interactive visualisation methods to help to interpret the enrichment results and to illustrate them for publications. In addition, gprofiler2 gives access to the versatile gene/protein identifier conversion functionality in g:Profiler enabling to map between hundreds of different identifier types or orthologous species. The gprofiler2 package is freely available at the CRAN repository.


2019 ◽  
Vol 28 (5) ◽  
pp. 576-586 ◽  
Author(s):  
Omamah A. Jiman ◽  
◽  
Rachel L. Taylor ◽  
Eva Lenassi ◽  
Jill Clayton Smith ◽  
...  

AbstractThirty percent of all inherited retinal disease (IRD) is accounted for by conditions with extra-ocular features. This study aimed to establish the genetic diagnostic pick-up rate for IRD patients with one or more extra-ocular features undergoing panel-based screening in a clinical setting. One hundred and six participants, tested on a gene panel which contained both isolated and syndromic IRD genes, were retrospectively ascertained from the Manchester Genomic Diagnostics Laboratory database spanning 6 years (2012–2017). Phenotypic features were extracted from the clinical notes and classified according to Human Phenotype Ontology; all identified genetic variants were interpreted in accordance to the American College of Medical Genetics and Genomics guidelines. Overall, 49% (n = 52) of patients received a probable genetic diagnosis. A further 6% (n = 6) had a single disease-associated variant in an autosomal recessive disease-relevant gene. Fifty-two percent (n = 55) of patients had a clinical diagnosis at the time of testing. Of these, 71% (n = 39) received a probable genetic diagnosis. By contrast, for those without a provisional clinical diagnosis (n = 51), only 25% (n = 13) received a probable genetic diagnosis. The clinical diagnosis of Usher (n = 33) and Bardet–Biedl syndrome (n = 10) was confirmed in 67% (n = 22) and 80% (n = 8), respectively. The testing diagnostic rate in patients with clinically diagnosed multisystemic IRD conditions was significantly higher than those without one (71% versus 25%; p value < 0.001). The lower pick-up rate in patients without a clinical diagnosis suggests that panel-based approaches are unlikely to be the most effective means of achieving a molecular diagnosis for this group. Here, we suggest that genome-wide approaches (whole exome or genome) are more appropriate.


2021 ◽  
Author(s):  
Magdalena Navarro ◽  
T Ian Simpson

AbstractMotivationAutism spectrum disorder (ASD) has a strong, yet heterogeneous, genetic component. Among the various methods that are being developed to help reveal the underlying molecular aetiology of the disease, one that is gaining popularity is the combination of gene expression and clinical genetic data. For ASD, the SFARI-gene database comprises lists of curated genes in which presumed causative mutations have been identified in patients. In order to predict novel candidate SFARI-genes we built classification models combining differential gene expression data for ASD patients and unaffected individuals with a gene’s status in the SFARI-gene list.ResultsSFARI-genes were not found to be significantly associated with differential gene expression patterns, nor were they enriched in gene co-expression network modules that had a strong correlation with ASD diagnosis. However, network analysis and machine learning models that incorporate information from the whole gene co-expression network were able to predict novel candidate genes that share features of existing SFARI genes and have support for roles in ASD in the literature. We found a statistically significant bias related to the absolute level of gene expression for existing SFARI genes and their scores. It is essential that this bias be taken into account when studies interpret ASD gene expression data at gene, module and whole-network levels.AvailabilitySource code is available from GitHub (https://doi.org/10.5281/zenodo.4463693) and the accompanying data from The University of Edinburgh DataStore (https://doi.org/10.7488/ds/2980)[email protected]


2021 ◽  
Author(s):  
Chengyao Peng ◽  
Simon Dieck ◽  
Alexander Schmid ◽  
Ashar Ahmad ◽  
Alexej Knaus ◽  
...  

AbstractMany rare syndromes can be well described and delineated from other disorders by a combination of characteristic symptoms. These phenotypic features are best documented with terms of the human phenotype ontology (HPO), which is increasingly used in electronic health records (EHRs), too. Many algorithms that perform HPO-based gene prioritization have also been developed, however, the performance of many such tools suffers from an overrepresentation of atypical cases in the medical literature. This is certainly the case if the algorithm cannot handle features that occur with reduced frequency in a disorder. With CADA we built a knowledge-graph that is based on case annotations and disorder annotations and show that CADA exhibits superior performance particularly for patients that present with the pathognomonic findings of a disease. Crucial in the design of our approach is the use of the growing amount of phenotypic information that diagnostic labs deposit in databases such as ClinVar. By this means CADA is an ideal reference tool for differential diagnostics in rare disorders that can also be updated regularly.


2021 ◽  
Author(s):  
Ann-Christin Liebers-Kyungay ◽  
Klaus Mohnike ◽  
Corine van Lingen ◽  
Anita Bressan ◽  
Katja Palm ◽  
...  

Abstract BackgroundFinding a diagnosis for rare diseases is a challenge for patients and those treating them. Establishing a uniform methodology for specifying the symptoms of a patient seems useful. This, as well as a database with clinical parameters reported in patients already diagnosed with the corresponding disease or that have led to the diagnosis, would facilitate the global data exchange between specialists and subsequently diagnosis. The aim of this work is to develop standardized data sets with the most frequent symptoms exemplarily for the three rare diseases late-onset Pompe disease, Gaucher disease Type I and Smith-Lemli-Opitz syndrome (SLOS).Methods and resultsA systematic literature review of characteristic symptoms and diagnostic criteria was performed for each of the three disorders. These parameters were converted into vocabulary standardized by The Human Phenotype Ontology (HPO), so-called HPO terms. Subsequently, a retrospective analysis of the patient files of 23 late-onset Pompe disease patients, 21 Gaucher disease Type I patients and 25 SLOS patients was carried out together with the University Children's Hospital Magdeburg and the Center of excellence for Rare Metabolic Diseases at the Charité Berlin. Features present in ≥ 40 % of the cohort and collected simultaneously in a certain minimum number of patients were filtered out. The analysis resulted in data sets with 20 diagnostic parameters for late-onset Pompe disease, 16 features for Gaucher disease Type I and 17 parameters for SLOS. After the statistical evaluation, the results were discussed comparatively with similar studies exemplarily for SLOS.ConclusionThe developed datasets for the three diseases provide a good basis for expansion with further patient examples and for extending the methodology to other diseases with the aim of improving the diagnostic pathway and thus the health care of patients with rare diseases.


Sign in / Sign up

Export Citation Format

Share Document