Human Monogenic Disease Genes Have Frequently Functionally Redundant Paralogs

Abstract Gene losses provide an insightful route for studying the morphological and physiological adaptations of species, but their discovery is challenging. Existing genome annotation tools focus on annotating intact genes and do not attempt to distinguish nonfunctional genes from genes missing annotation due to sequencing and assembly artifacts. Previous attempts to annotate gene losses have required significant manual curation, which hampers their scalability for the ever-increasing deluge of newly sequenced genomes. Using extreme sequence erosion (amino acid deletions and substitutions) and sister species support as an unambiguous signature of loss, we developed an automated approach for detecting high-confidence gene loss events across a species tree. Our approach relies solely on gene annotation in a single reference genome, raw assemblies for the remaining species to analyze, and the associated phylogenetic tree for all organisms involved. Using human as reference, we discovered over 400 unique human ortholog erosion events across 58 mammals. This includes dozens of clade-specific losses of genes that result in early mouse lethality or are associated with severe human congenital diseases. Our discoveries yield intriguing potential for translational medical genetics and evolutionary biology, and our approach is readily applicable to large-scale genome sequencing efforts across the tree of life.

Download Full-text

Essentiality-specific pathogenicity prioritization gene score to improve filtering of disease sequence data

Briefings in Bioinformatics ◽

10.1093/bib/bbaa029 ◽

2020 ◽

Author(s):

Dareen Alyousfi ◽

Diana Baralle ◽

Andrew Collins

Keyword(s):

Developmental Disorders ◽

Sequence Data ◽

Single Gene ◽

Causal Variant ◽

Monogenic Disease ◽

Disease Genes ◽

Gene Score ◽

Gene Essentiality ◽

Individual Gene ◽

Pathogenic Variation

Abstract The causal genetic variants underlying more than 50% of single gene (monogenic) disorders are yet to be discovered. Many patients with conditions likely to have a monogenic basis do not receive a confirmed molecular diagnosis which has potential impacts on clinical management. We have developed a gene-specific score, essentiality-specific pathogenicity prioritization (ESPP), to guide the recognition of genes likely to underlie monogenic disease variation to assist in filtering of genome sequence data. When a patient genome is sequenced, there are frequently several plausibly pathogenic variants identified in different genes. Recognition of the single gene most likely to include pathogenic variation can guide the identification of a causal variant. The ESPP score integrates gene-level scores which are broadly related to gene essentiality. Previous work towards the recognition of monogenic disease genes proposed a model with increasing gene essentiality from ‘non-essential’ to ‘essential’ genes (for which pathogenic variation may be incompatible with survival) with genes liable to contain disease variation positioned between these two extremes. We demonstrate that the ESPP score is useful for recognizing genes with high potential for pathogenic disease-related variation. Genes classed as essential have particularly high scores, as do genes recently recognized as strong candidates for developmental disorders. Through the integration of individual gene-specific scores, which have different properties and assumptions, we demonstrate the utility of an essentiality-based gene score to improve sequence genome filtering.

Download Full-text

Computational determination of gene age and characterization of evolutionary dynamics in human

Briefings in Bioinformatics ◽

10.1093/bib/bby074 ◽

2018 ◽

Vol 20 (6) ◽

pp. 2141-2149 ◽

Cited By ~ 1

Author(s):

Hongyan Yin ◽

Mengwei Li ◽

Lin Xia ◽

Chaozu He ◽

Zhang Zhang

Keyword(s):

Evolutionary Dynamics ◽

Biological Diversity ◽

Monogenic Disease ◽

Disease Genes ◽

Mendelian Disease ◽

Evolutionary Time ◽

New Genes ◽

Human Genes ◽

Gene Age

Abstract Genes originate at different evolutionary time scales and possess different ages, accordingly presenting diverse functional characteristics and reflecting distinct adaptive evolutionary innovations. In the past decades, progresses have been made in gene age identification by a variety of methods that are principally based on comparative genomics. Here we summarize methods for computational determination of gene age and evaluate the effectiveness of different computational methods for age identification. Our results show that improved age determination can be achieved by combining homolog clustering with phylogeny inference, which enables more accurate age identification in human genes. Accordingly, we characterize evolutionary dynamics of human genes based on an extremely long evolutionary time scale spanning ~4,000 million years from archaea/bacteria to human, revealing that young genes are clustered on certain chromosomes and that Mendelian disease genes (including monogenic disease and polygenic disease genes) and cancer genes exhibit divergent evolutionary origins. Taken together, deciphering genes’ ages as well as their evolutionary dynamics is of fundamental significance in unveiling the underlying mechanisms during evolution and better understanding how young or new genes become indispensable integrants coupled with novel phenotypes and biological diversity.

Download Full-text

Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics

10.1101/045260 ◽

2016 ◽

Cited By ~ 33

Author(s):

Alvaro N. Barbeira ◽

Scott P. Dickinson ◽

Jason M. Torres ◽

Jiamao Zheng ◽

Eric S. Torstenson ◽

...

Keyword(s):

Gene Expression ◽

Meta Analysis ◽

Mathematical Expression ◽

Monogenic Disease ◽

Specific Gene ◽

Disease Genes ◽

Gene Expression Variation ◽

Tissue Specific ◽

Expression Variation ◽

Summary Data

AbstractScalable, integrative methods to understand mechanisms that link genetic variants with phenotypes are needed. Here we derive a mathematical expression to compute PrediXcan (a gene mapping approach) results using summary data (S-PrediXcan) and show its accuracy and general robustness to misspecified reference sets. We apply this framework to 44 GTEx tissues and 100+ phenotypes from GWAS and meta-analysis studies, creating a growing public catalog of associations that seeks to capture the effects of gene expression variation on human phenotypes. Replication in an independent cohort is shown. Most of the associations were tissue specific, suggesting context specificity of the trait etiology. Colocalized significant associations in unexpected tissues underscore the need for an agnostic scanning of multiple contexts to improve our ability to detect causal regulatory mechanisms. Monogenic disease genes are enriched among significant associations for related traits, suggesting that smaller alterations of these genes may cause a spectrum of milder phenotypes.

Download Full-text

InpherNet provides attractive monogenic disease gene hypotheses using patient genes indirect neighbors

10.1101/2020.07.10.20150425 ◽

2020 ◽

Author(s):

Boyoung Yoo ◽

Johannes Birgmeier ◽

Jon A Bernstein ◽

Gill Bejerano

Keyword(s):

Disease Gene ◽

Current Knowledge ◽

Indirect Evidence ◽

Monogenic Disease ◽

Disease Genes ◽

Mendelian Disease ◽

Causative Gene ◽

Human Patient ◽

Pathogenic Variants ◽

Many Sources

Close to 70% of patients suspected to have a Mendelian disease remain undiagnosed after genome sequencing, partly because our current knowledge about disease-causing genes is incomplete. Although hundreds of new diseases-causing genes are discovered every year, the discovery rate has been constant for over a decade. Generating an attractive novel disease gene hypothesis from patient data can be time-consuming as each patient's genome can contain dozens to hundreds of rare, possibly pathogenic variants. To generate the most plausible hypothesis, many sources of indirect evidence about each candidate variant may be considered. We introduce InpherNet, a network-based machine learning approach to accelerate this process. InpherNet ranks candidate genes based on gene neighbors from 4 graphs, of orthologs, paralogs, functional pathway members, and co-localized interaction partners. As such InpherNet can be used to both prioritize potentially novel disease genes and also help reveal known disease genes where their direct annotation is missing, or partial. InpherNet is applied to over 100 patient cases for whom the causative gene is incorrectly given low priority by two clinical gene ranking methods that rely exclusively on human patient-derived evidence. It correctly ranks the causative gene among its top 5 candidates in 68% of the cases, compared to 9-44% using comparable tools including Phevor, Phive and hiPhive.

Download Full-text

S94. MUTATION-INTOLERANT GENES AND MONOGENIC DISEASE GENES IN 145 LOCI OF SCHIZOPHRENIA (SCZ) GWAS ARE LINKED TO THE ISCHEMIA-HYPOXIA RESPONSE

Schizophrenia Bulletin ◽

10.1093/schbul/sbz020.639 ◽

2019 ◽

Vol 45 (Supplement_2) ◽

pp. S342-S343

Author(s):

Rainald Schmidt-Kastner ◽

Sinan Guloksuz ◽

Thomas Kietzmann ◽

Jim van Os ◽

Bart P F Rutten

Keyword(s):

Monogenic Disease ◽

Disease Genes ◽

Hypoxia Response

Download Full-text

seqr: a web-based analysis and collaboration tool for rare disease genomics

10.1101/2021.10.27.21265326 ◽

2021 ◽

Author(s):

Lynn Pais ◽

Hana Snow ◽

Ben Weisburd ◽

Shifa Zhang ◽

Samantha Baxter ◽

...

Keyword(s):

Open Source ◽

Rare Disease ◽

Research Collaboration ◽

Low Cost ◽

Genomic Analysis ◽

Disease Diagnosis ◽

Causal Variant ◽

Monogenic Disease ◽

Disease Genes ◽

Web Based

Exome and genome sequencing have become the tools of choice for rare disease diagnosis, leading to large amounts of data available for analyses. To identify causal variants in these datasets, powerful filtering and decision support tools that can be efficiently used by clinicians and researchers are required. To address this need, we developed seqr - an open source, web-based tool for family-based monogenic disease analysis that allows researchers to work collaboratively to search and annotate genomic callsets. To date, seqr is being used in several research pipelines and one clinical diagnostic lab. In our own experience through the Broad Institute Center for Mendelian Genomics, seqr has enabled analyses of over 10,000 families, supporting the diagnosis of more than 3,800 individuals with rare disease and discovery of over 300 novel disease genes. Here we describe a framework for genomic analysis in rare disease that leverages seqr's capabilities for variant filtration, annotation, and causal variant identification, as well as support for research collaboration and data sharing. The seqr platform is available as open source software, allowing low-cost participation in rare disease research, and a community effort to support diagnosis and gene discovery in rare disease.

Download Full-text

POPDC proteins and cardiac function

Biochemical Society Transactions ◽

10.1042/bst20190249 ◽

2019 ◽

Vol 47 (5) ◽

pp. 1393-1404 ◽

Cited By ~ 4

Author(s):

Thomas Brand

Keyword(s):

Potential Role ◽

Striated Muscle ◽

Short Review ◽

Effector Proteins ◽

Muscle Disease ◽

Disease Genes ◽

Protein Protein Interaction ◽

Interaction Partners ◽

And Function

Abstract The Popeye domain-containing gene family encodes a novel class of cAMP effector proteins in striated muscle tissue. In this short review, we first introduce the protein family and discuss their structure and function with an emphasis on their role in cyclic AMP signalling. Another focus of this review is the recently discovered role of POPDC genes as striated muscle disease genes, which have been associated with cardiac arrhythmia and muscular dystrophy. The pathological phenotypes observed in patients will be compared with phenotypes present in null and knockin mutations in zebrafish and mouse. A number of protein–protein interaction partners have been discovered and the potential role of POPDC proteins to control the subcellular localization and function of these interacting proteins will be discussed. Finally, we outline several areas, where research is urgently needed.

Download Full-text