An Improved Phenotype-Driven Tool for Rare Mendelian Variant Prioritization: Benchmarking Exomiser on Real Patient Whole-Exome Data

Next-generation sequencing has revolutionized rare disease diagnostics, but many patients remain without a molecular diagnosis, particularly because many candidate variants usually survive despite strict filtering. Exomiser was launched in 2014 as a Java tool that performs an integrative analysis of patients’ sequencing data and their phenotypes encoded with Human Phenotype Ontology (HPO) terms. It prioritizes variants by leveraging information on variant frequency, predicted pathogenicity, and gene-phenotype associations derived from human diseases, model organisms, and protein–protein interactions. Early published releases of Exomiser were able to prioritize disease-causative variants as top candidates in up to 97% of simulated whole-exomes. The size of the tested real patient datasets published so far are very limited. Here, we present the latest Exomiser version 12.0.1 with many new features. We assessed the performance using a set of 134 whole-exomes from patients with a range of rare retinal diseases and known molecular diagnosis. Using default settings, Exomiser ranked the correct diagnosed variants as the top candidate in 74% of the dataset and top 5 in 94%; not using the patients’ HPO profiles (i.e., variant-only analysis) decreased the performance to 3% and 27%, respectively. In conclusion, Exomiser is an effective support tool for rare Mendelian phenotype-driven variant prioritization.

Download Full-text

A massively parallel barcoded sequencing pipeline enables generation of the first ORFeome and interactome map for rice

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1918068117 ◽

2020 ◽

Vol 117 (21) ◽

pp. 11836-11842 ◽

Cited By ~ 1

Author(s):

Shayne D. Wierbowski ◽

Tommy V. Vo ◽

Pascal Falter-Braun ◽

Timothy O. Jobe ◽

Lars H. Kruse ◽

...

Keyword(s):

Protein Interactions ◽

Massively Parallel ◽

Model Organisms ◽

Protein Protein Interactions ◽

Numerous Model ◽

High Quality Protein ◽

Protein Interactome ◽

Wide Range ◽

Dna Elements ◽

General Tool

Systematic mappings of protein interactome networks have provided invaluable functional information for numerous model organisms. Here we developPCR-mediatedLinkage of barcodedAdaptersTo nucleic acidElements forsequencing (PLATE-seq) that serves as a general tool to rapidly sequence thousands of DNA elements. We validate its utility by generating the ORFeome forOryza sativacovering 2,300 genes and constructing a high-quality protein–protein interactome map consisting of 322 interactions between 289 proteins, expanding the known interactions in rice by roughly 50%. Our work paves the way for high-throughput profiling of protein–protein interactions in a wide range of organisms.

Download Full-text

Conservation and Variability of Synaptonemal Complex Proteins in Phylogenesis of Eukaryotes

International Journal of Evolutionary Biology ◽

10.1155/2014/856230 ◽

2014 ◽

Vol 2014 ◽

pp. 1-16 ◽

Cited By ~ 20

Author(s):

Tatiana M. Grishaeva ◽

Yuri F. Bogdanov

Keyword(s):

Synaptonemal Complex ◽

Protein Interactions ◽

Phylogenetic Trees ◽

Flowering Plants ◽

Model Organisms ◽

Protein Protein Interactions ◽

Lateral Element ◽

Origin And Evolution ◽

Eukaryotic Proteomes ◽

Related Proteins

The problems of the origin and evolution of meiosis include the enigmatic variability of the synaptonemal complexes (SCs) which, being morphology similar, consist of different proteins in different eukaryotic phyla. Using bioinformatics methods, we monitored all available eukaryotic proteomes to find proteins similar to known SC proteins of model organisms. We found proteins similar to SC lateral element (LE) proteins and possessing the HORMA domain in the majority of the eukaryotic taxa and assume them the most ancient among all SC proteins. Vertebrate LE proteins SYCP2, SYCP3, and SC65 proved to have related proteins in many invertebrate taxa. Proteins of SC central space are most evolutionarily variable. It means that different protein-protein interactions can exist to connect LEs. Proteins similar to the known SC proteins were not found in Euglenophyta, Chrysophyta, Charophyta, Xanthophyta, Dinoflagellata, and primitive Coelomata. We conclude that different proteins whose common feature is the presence of domains with a certain conformation are involved in the formation of the SC in different eukaryotic phyla. This permits a targeted search for orthologs of the SC proteins using phylogenetic trees. Here we consider example of phylogenetic trees for protozoans, fungi, algae, mosses, and flowering plants.

Download Full-text

Transmission distortion and genetic incompatibilities between alleles in a multigenerational mouse advanced intercross line

10.1101/2021.06.09.447720 ◽

2021 ◽

Author(s):

Danny Arends ◽

Stefan Kärst ◽

Sebastian Heise ◽

Paula Korkuc ◽

Deike Hesse ◽

...

Keyword(s):

Protein Interactions ◽

Complex Traits ◽

Genetic Background ◽

Inbred Strains ◽

Parental Origin ◽

Protein Protein Interactions ◽

Sequencing Data ◽

Synonymous Snps ◽

Advanced Intercross Line ◽

Overrepresentation Analysis

Background/Objectives: While direct additive and dominance effects on complex traits have been mapped repeatedly, additional genetic factors contributing to the heterogeneity of complex traits have been scarcely investigated. To assess genetic background effects, we investigated transmission ratio distortions (TRDs) of alleles from parent to offspring using an advanced intercross line (AIL) of an initial cross between the mouse inbred strains C57BL/6NCrl (B6N) and BFMI860-12 (BFMI). Subjects/Methods: 341 males of generation 28 and their respective 61 parents and 66 grandparents were genotyped using Mega Mouse Universal Genotyping Arrays (MegaMUGA). TRDs were investigated using allele transmission asymmetry tests, and pathway overrepresentation analysis was performed. Sequencing data was used to test for overrepresentation of non-synonymous SNPs in TRD regions. Genetic incompatibilities were tested using the Bateson-Dobzhansky-Muller two-locus model. Results: 62 TRD regions were detected, many in close proximity to the telocentric centromere. TRD regions contained 44.5% more non-synonymous SNPs than randomly selected regions (182 vs. 125.9 17.0, P < 1x10-4). Testing for genetic incompatibilities between TRD regions identified 29 genome-wide significant incompatibilities between TRD regions (P(BF) < 0.05). Pathway overrepresentation analysis of genes in TRD regions showed that DNA methylation, epigenetic regulation of RNA, and meiotic/meiosis regulation pathways were affected independent of the parental origin of the TRD. Paternal BFMI TRD regions showed overrepresentation in the small interfering RNA (siRNA) biogenesis and in the metabolism of lipids and lipoproteins. Maternal B6N TRD regions harbored genes involved in meiotic recombination, cell death, and apoptosis pathways. The analysis of genes in TRD regions suggests the potential distortion of protein-protein interactions accounting for obesity and diabetic retinopathy as a result of disadvantageous combinations of allelic variants in Aass, Pgx6 and Nme8. Conclusions: Since genes in TRD regions showed a significant increase in the number of non-synonymous SNPs, these loci likely co-evolved to ensure protein-protein interaction compatibility, survival and optimal adaptation to the genetic background environment. Genes in these regions provide new targets for investigating genetic adaptation, protein-protein interactions, and determinants of complex traits such as obesity.

Download Full-text

NGPINT: A Next-generation protein-protein interaction software

10.1101/2020.09.11.277483 ◽

2020 ◽

Cited By ~ 1

Author(s):

Sagnik Banerjee ◽

Valeria Velásquez-Zapata ◽

Gregory Fuerst ◽

J. Mitch Elmore ◽

Roger P. Wise

Keyword(s):

Protein Interactions ◽

Model Organisms ◽

Cdna Libraries ◽

Published Data ◽

Data Sets ◽

Next Generation ◽

Protein Protein Interactions ◽

Protein Protein Interaction ◽

Alternative Approach ◽

Simulated Test

ABSTRACTMapping protein-protein interactions at a proteome scale is critical to understanding how cellular signaling networks respond to stimuli. Since eukaryotic genomes encode thousands of proteins, testing their interactions one-by-one is a challenging prospect. High-throughput yeast-two hybrid (Y2H) assays that employ next-generation sequencing to interrogate cDNA libraries represent an alternative approach that optimizes scale, cost, and effort. We present NGPINT, a robust and scalable software to identify all putative interactors of a protein using Y2H in batch culture. NGPINT combines diverse tools to align sequence reads to target genomes, reconstruct prey fragments and compute gene enrichment under reporter selection. Central to this pipeline is the identification of fusion reads containing sequences derived from both the Y2H expression plasmid and the cDNA of interest. To reduce false positives, these fusion reads are evaluated as to whether the cDNA fragment forms an in-frame translational fusion with the Y2H transcription factor. NGPINT successfully recognized 95% of interactions in simulated test runs. As proof of concept, NGPINT was tested using published data sets and recognized all validated interactions. NGPINT can be used in any organism with an available reference, thus facilitating the discovery of protein-protein interactions in non-model organisms.

Download Full-text

Expanding Interactome Analyses beyond Model Eukaryotes

10.20944/preprints202110.0185.v1 ◽

2021 ◽

Author(s):

Katherine James ◽

Anil Wipat ◽

Simon Cockell

Keyword(s):

Genome Sequence ◽

Protein Interactions ◽

Model Organisms ◽

Protein Protein Interactions ◽

Data Types ◽

Interaction Prediction ◽

Diverse Species ◽

Eukaryotic Species

Interactome analyses have traditionally been applied to yeast, human and other model organisms due to the availability of protein-protein interactions data for these species. Recently these techniques have been applied to more diverse species using computational interaction prediction from genome sequence and other data types. This review describes the various types of computational interactome networks that can be created and how they have been used in diverse eukaryotic species, highlighting some of the key interactome studies in non-model organisms.

Download Full-text

Circadian Interactomics: How Research Into Protein-Protein Interactions Beyond the Core Clock Has Influenced the Model of Circadian Timekeeping

Journal of Biological Rhythms ◽

10.1177/07487304211014622 ◽

2021 ◽

pp. 074873042110146

Author(s):

Alexander E. Mosier ◽

Jennifer M. Hurley

Keyword(s):

Circadian Clock ◽

Protein Interactions ◽

Large Scale ◽

Protein Complexes ◽

Model Organisms ◽

Protein Protein Interactions ◽

Macromolecular Complexes ◽

The Core ◽

Circadian Control ◽

Macromolecular Protein

The circadian clock is the broadly conserved, protein-based, timekeeping mechanism that synchronizes biology to the Earth’s 24-h light-dark cycle. Studies of the mechanisms of circadian timekeeping have placed great focus on the role that individual protein-protein interactions play in the creation of the timekeeping loop. However, research has shown that clock proteins most commonly act as part of large macromolecular protein complexes to facilitate circadian control over physiology. The formation of these complexes has led to the large-scale study of the proteins that comprise these complexes, termed here “circadian interactomics.” Circadian interactomic studies of the macromolecular protein complexes that comprise the circadian clock have uncovered many basic principles of circadian timekeeping as well as mechanisms of circadian control over cellular physiology. In this review, we examine the wealth of knowledge accumulated using circadian interactomics approaches to investigate the macromolecular complexes of the core circadian clock, including insights into the core mechanisms that impart circadian timing and the clock’s regulation of many physiological processes. We examine data acquired from the investigation of the macromolecular complexes centered on both the activating and repressing arm of the circadian clock and from many circadian model organisms.

Download Full-text

pathDIP 4: an extended pathway annotations and enrichment analysis resource for human, model organisms and domesticated species

Nucleic Acids Research ◽

10.1093/nar/gkz989 ◽

2019 ◽

Cited By ~ 5

Author(s):

Sara Rahmati ◽

Mark Abovsky ◽

Chiara Pastrello ◽

Max Kotlyar ◽

Richard Lu ◽

...

Keyword(s):

Protein Interactions ◽

Enrichment Analysis ◽

Human Model ◽

Model Organisms ◽

Pathway Enrichment Analysis ◽

Protein Protein Interactions ◽

Proteome Coverage ◽

Pathway Annotation ◽

Human Proteins ◽

Integrated Pathway

Abstract PathDIP was introduced to increase proteome coverage of literature-curated human pathway databases. PathDIP 4 now integrates 24 major databases. To further reduce the number of proteins with no curated pathway annotation, pathDIP integrates pathways with physical protein–protein interactions (PPIs) to predict significant physical associations between proteins and curated pathways. For human, it provides pathway annotations for 5366 pathway orphans. Integrated pathway annotation now includes six model organisms and ten domesticated animals. A total of 6401 core and ortholog pathways have been curated from the literature or by annotating orthologs of human proteins in the literature-curated pathways. Extended pathways are the result of combining these pathways with protein-pathway associations that are predicted using organism-specific PPIs. Extended pathways expand proteome coverage from 81 088 to 120 621 proteins, making pathDIP 4 the largest publicly available pathway database for these organisms and providing a necessary platform for comprehensive pathway-enrichment analysis. PathDIP 4 users can customize their search and analysis by selecting organism, identifier and subset of pathways. Enrichment results and detailed annotations for input list can be obtained in different formats and views. To support automated bioinformatics workflows, Java, R and Python APIs are available for batch pathway annotation and enrichment analysis. PathDIP 4 is publicly available at http://ophid.utoronto.ca/pathDIP.

Download Full-text

The tapeworm interactome: inferring confidence scored protein-protein interactions from the proteome of Hymenolepis microstoma

10.1101/668988 ◽

2019 ◽

Cited By ~ 1

Author(s):

Katherine James ◽

Peter D. Olson

Keyword(s):

Transcription Factors ◽

Protein Interaction ◽

Protein Interactions ◽

System Level ◽

Model Organisms ◽

Phylogenetic Distance ◽

Interaction Data ◽

Protein Protein Interactions ◽

Gene Models ◽

Hymenolepis Microstoma

AbstractReference genome and transcriptome assemblies of helminths have reached a level of completion whereby secondary analyses that rely on accurate gene estimation or syntenic relationships can be now conducted with a high level of confidence. Recent public release of the v.3 assembly of the mouse bile-duct tapeworm, Hymenolepis microstoma, provides chromosome-level characterisation of the genome and a stabilised set of protein coding gene models underpinned by both bioinformatic and empirical data. However, interactome data have not been produced. Conserved protein-protein interactions in other organisms, termed interologs, can be used to transfer interactions between species, allowing systems-level analysis in non-model organisms. Here, we describe a probabilistic, integrated network of interologs for the H. microstoma proteome, based on conserved protein interactions found in eukaryote model species. Almost a third of the 10,139 gene models in the v.3 assembly could be assigned interaction data and assessment of the resulting network indicates that topologically-important proteins are related to essential cellular pathways, and that the network clusters into biologically meaningful components. Moreover, network parameters are similar to those of single-species interaction networks that we constructed in the same way for S. cerevisiae, C. elegans and H. sapiens, demonstrating that information-rich, system-level analyses can be conducted even on species separated by a large phylogenetic distance from the major model organisms from which most protein interaction evidence is based. Using the interolog network, we then focused on sub-networks of interactions assigned to discrete suites of genes of interest, including signalling components and transcription factors, germline ‘multipotency’ genes, and differentially-expressed genes between larval and adult worms. These analyses not only showed an expected bias toward highly-conserved proteins, such as components of intracellular signal transduction, but in some cases predicted interactions with transcription factors that aid in identifying their target genes. With the completion of key helminth genomes, such systems level analyses can provide an important predictive framework to guide basic and applied research on helminths and will become increasingly informative as protein-protein interaction data accumulate.

Download Full-text

A Collection of Benchmark Data Sets for Knowledge Graph-based Similarity in the Biomedical Domain

Database ◽

10.1093/database/baaa078 ◽

2020 ◽

Vol 2020 ◽

Author(s):

Carlota Cardoso ◽

Rita T Sousa ◽

Sebastian Köhler ◽

Catia Pesquita

Keyword(s):

Semantic Similarity ◽

Protein Interactions ◽

Knowledge Graph ◽

Data Sets ◽

Biomedical Domain ◽

Protein Protein Interactions ◽

Data Set ◽

Human Phenotype ◽

Benchmark Data ◽

Gene Similarity

Abstract The ability to compare entities within a knowledge graph is a cornerstone technique for several applications, ranging from the integration of heterogeneous data to machine learning. It is of particular importance in the biomedical domain, where semantic similarity can be applied to the prediction of protein–protein interactions, associations between diseases and genes, cellular localization of proteins, among others. In recent years, several knowledge graph-based semantic similarity measures have been developed, but building a gold standard data set to support their evaluation is non-trivial. We present a collection of 21 benchmark data sets that aim at circumventing the difficulties in building benchmarks for large biomedical knowledge graphs by exploiting proxies for biomedical entity similarity. These data sets include data from two successful biomedical ontologies, Gene Ontology and Human Phenotype Ontology, and explore proxy similarities calculated based on protein sequence similarity, protein family similarity, protein–protein interactions and phenotype-based gene similarity. Data sets have varying sizes and cover four different species at different levels of annotation completion. For each data set, we also provide semantic similarity computations with state-of-the-art representative measures. Database URL: https://github.com/liseda-lab/kgsim-benchmark.

Download Full-text

Corynebacterium glutamicum Regulation beyond Transcription: Organizing Principles and Reconstruction of an Extended Regulatory Network Incorporating Regulations Mediated by Small RNA and Protein–Protein Interactions

Microorganisms ◽

10.3390/microorganisms9071395 ◽

2021 ◽

Vol 9 (7) ◽

pp. 1395

Author(s):

Juan M. Escorcia-Rodríguez ◽

Andreas Tauch ◽

Julio A. Freyre-González

Keyword(s):

Corynebacterium Glutamicum ◽

Protein Interactions ◽

Regulatory Network ◽

Regulatory Networks ◽

Network Models ◽

Global Scale ◽

System Level ◽

Model Organisms ◽

Protein Protein Interactions ◽

Regulatory Structure

Corynebacterium glutamicum is a Gram-positive bacterium found in soil where the condition changes demand plasticity of the regulatory machinery. The study of such machinery at the global scale has been challenged by the lack of data integration. Here, we report three regulatory network models for C. glutamicum: strong (3040 interactions) constructed solely with regulations previously supported by directed experiments; all evidence (4665 interactions) containing the strong network, regulations previously supported by nondirected experiments, and protein–protein interactions with a direct effect on gene transcription; sRNA (5222 interactions) containing the all evidence network and sRNA-mediated regulations. Compared to the previous version (2018), the strong and all evidence networks increased by 75 and 1225 interactions, respectively. We analyzed the system-level components of the three networks to identify how they differ and compared their structures against those for the networks of more than 40 species. The inclusion of the sRNA-mediated regulations changed the proportions of the system-level components and increased the number of modules but decreased their size. The C. glutamicum regulatory structure contrasted with other bacterial regulatory networks. Finally, we used the strong networks of three model organisms to provide insights and future directions of the C.glutamicum regulatory network characterization.

Download Full-text