Ribosome occupancy profiles are conserved between structurally and evolutionarily related yeast domains

Bioinformatics ◽

10.1093/bioinformatics/btab020 ◽

2021 ◽

Author(s):

Daniel A Nissley ◽

Anna Carbery ◽

Mark Chonofsky ◽

Charlotte M Deane

Keyword(s):

Amino Acid ◽

Large Scale ◽

Sequence Similarity ◽

Ribosome Profiling ◽

Supplementary Information ◽

Rare Codon ◽

Homologous Proteins ◽

Translation Speed ◽

And Function ◽

Related Proteins

Abstract Motivation Protein synthesis is a non-equilibrium process, meaning that the speed of translation can influence the ability of proteins to fold and function. Assuming that structurally similar proteins fold by similar pathways, the profile of translation speed along an mRNA should be evolutionarily conserved between related proteins to direct correct folding and downstream function. The only evidence to date for such conservation of translation speed between homologous proteins has used codon rarity as a proxy for translation speed. There are, however, many other factors including mRNA structure and the chemistry of the amino acids in the A- and P-sites of the ribosome that influence the speed of amino acid addition. Results Ribosome profiling experiments provide a signal directly proportional to the underlying translation times at the level of individual codons. We compared ribosome occupancy profiles (extracted from five different large-scale yeast ribosome profiling studies) between related protein domains to more directly test if their translation schedule was conserved. Our analysis reveals that the ribosome occupancy profiles of paralogous domains tend to be significantly more similar to one another than to profiles of non-paralogous domains. This trend does not depend on domain length, structural classes, amino acid composition or sequence similarity. Our results indicate that entire ribosome occupancy profiles and not just rare codon locations are conserved between even distantly related domains in yeast, providing support for the hypothesis that translation schedule is conserved between structurally related domains to retain folding pathways and facilitate efficient folding. Availability and implementation Python3 code is available on GitHub at https://github.com/DanNissley/Compare-ribosome-occupancy. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Ribosome elongation kinetics of consecutively charged residues are coupled to electrostatic force

10.1101/2021.08.04.455055 ◽

2021 ◽

Author(s):

Sarah E Leininger ◽

Judith Rodriguez ◽

Quyen V Vu ◽

Yang Jiang ◽

Ma Suan Li ◽

...

Keyword(s):

Amino Acid ◽

Conformational Changes ◽

Electrostatic Force ◽

Ribosome Profiling ◽

Peptide Bond Formation ◽

Translation Speed ◽

Charged Residues ◽

Nascent Chain ◽

Chain Conformations

The speed of protein synthesis can dramatically change when consecutively charged residues are incorporated into an elongating nascent protein by the ribosome. The molecular origins of this class of allosteric coupling remain unknown. We demonstrate, using multi-scale simulations, that positively charged residues generate large forces that pull the P-site amino acid away from the A-site amino acid. Negatively charged residues generate forces of similar magnitude but opposite direction. And that these conformational changes, respectively, raise and lower the transition state barrier height to peptide bond formation, explaining how charged residues mechanochemically alter translation speed. This mechanochemical mechanism is consistent with in vivo ribosome profiling data exhibiting a proportionality between translation speed and the number of charged residues, experimental data characterizing nascent chain conformations, and a previously published cryo-EM structure of a ribosome-nascent chain complex containing consecutive lysines. These results expand the role of mechanochemistry in translation, and provide a framework for interpreting experimental results on translation speed.

Download Full-text

Detecting and correcting misclassified sequences in the large-scale public databases

Bioinformatics ◽

10.1093/bioinformatics/btaa586 ◽

2020 ◽

Vol 36 (18) ◽

pp. 4699-4705

Author(s):

Hamid Bagheri ◽

Andrew J Severin ◽

Hridesh Rajan

Keyword(s):

Large Scale ◽

Sequence Similarity ◽

Heuristic Method ◽

Simulated Data ◽

Supplementary Information ◽

Small Subset ◽

Taxonomic Assignment ◽

User Input ◽

Public Repositories ◽

Taxonomic Assignments

Abstract Motivation As the cost of sequencing decreases, the amount of data being deposited into public repositories is increasing rapidly. Public databases rely on the user to provide metadata for each submission that is prone to user error. Unfortunately, most public databases, such as non-redundant (NR), rely on user input and do not have methods for identifying errors in the provided metadata, leading to the potential for error propagation. Previous research on a small subset of the NR database analyzed misclassification based on sequence similarity. To the best of our knowledge, the amount of misclassification in the entire database has not been quantified. We propose a heuristic method to detect potentially misclassified taxonomic assignments in the NR database. We applied a curation technique and quality control to find the most probable taxonomic assignment. Our method incorporates provenance and frequency of each annotation from manually and computationally created databases and clustering information at 95% similarity. Results We found more than two million potentially taxonomically misclassified proteins in the NR database. Using simulated data, we show a high precision of 97% and a recall of 87% for detecting taxonomically misclassified proteins. The proposed approach and findings could also be applied to other databases. Availability and implementation Source code, dataset, documentation, Jupyter notebooks and Docker container are available at https://github.com/boalang/nr. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Three-Dimensional Modelling of Honeybee Venom Allergenic Proteases: Relation to Allergenicity

Zeitschrift für Naturforschung C ◽

10.1515/znc-2011-5-615 ◽

2011 ◽

Vol 66 (5-6) ◽

pp. 305-312 ◽

Cited By ~ 2

Author(s):

Dessislava Georgieva ◽

Kerstin Greunke ◽

Raghuvir K. Arni ◽

Christian Betzel

Keyword(s):

Amino Acid ◽

Serine Proteases ◽

Three Dimensional ◽

Amino Acid Sequences ◽

Honeybee Venom ◽

Linker Peptide ◽

Cub Domain ◽

And Function ◽

Substrate Binding Sites ◽

Related Proteins

Api SI and Api SII are serine proteases of the honeybee venom containing allergenic determinants. Each protease consists of two structural modules: an N-terminal CUB (Api SI) or a clip domain (Api SII) and a C-terminal serine protease-like (SPL) domain. Both domains are connected with a linker peptide. The knowledge about the structure and function of Api SI and Api SII is limited mainly to their amino acid sequences. We constructed 3-D models of the two proteases using their amino acid sequences and crystallographic coordinates of related proteins. The models of the SPL domains were built using the structure of the prophenoloxidase-activating factor (PPAF)-II as a template. For modelling of the Api SI CUB domain the coordinates of porcine spermadhesin PSP-I were used. The models revealed the catalytic and substrate-binding sites and the negatively charged residue responsible for the trypsin-like activity. IgE-binding and antigenic sites in the two allergens were predicted using the models and programs based on the structure of known epitopes. Api SI and Api SII show structural and functional similarity to the members of the PPAF-II family. Most probably, they are part of the defence system of Apis mellifera

Download Full-text

Positive epistasis between disease-causing missense mutations and silent polymorphism with effect on mRNA translation velocity

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.2010612118 ◽

2021 ◽

Vol 118 (4) ◽

pp. e2010612118

Author(s):

Robert Rauscher ◽

Giovana B. Bampi ◽

Marta Guevara-Ferrer ◽

Leonardo A. Santos ◽

Disha Joshi ◽

...

Keyword(s):

Cystic Fibrosis ◽

Amino Acid ◽

Time Window ◽

Mrna Translation ◽

Underlying Mechanism ◽

Missense Mutations ◽

Translation Speed ◽

Functional Studies ◽

Translation Velocity ◽

And Function

Epistasis refers to the dependence of a mutation on other mutation(s) and the genetic context in general. In the context of human disorders, epistasis complicates the spectrum of disease symptoms and has been proposed as a major contributor to variations in disease outcome. The nonadditive relationship between mutations and the lack of complete understanding of the underlying physiological effects limit our ability to predict phenotypic outcome. Here, we report positive epistasis between intragenic mutations in the cystic fibrosis transmembrane conductance regulator (CFTR)—the gene responsible for cystic fibrosis (CF) pathology. We identified a synonymous single-nucleotide polymorphism (sSNP) that is invariant for the CFTR amino acid sequence but inverts translation speed at the affected codon. This sSNP in cis exhibits positive epistatic effects on some CF disease–causing missense mutations. Individually, both mutations alter CFTR structure and function, yet when combined, they lead to enhanced protein expression and activity. The most robust effect was observed when the sSNP was present in combination with missense mutations that, along with the primary amino acid change, also alter the speed of translation at the affected codon. Functional studies revealed that synergistic alteration in ribosomal velocity is the underlying mechanism; alteration of translation speed likely increases the time window for establishing crucial domain–domain interactions that are otherwise perturbed by each individual mutation.

Download Full-text

Methods for Automatic Reference Trees and Multilevel Phylogenetic Placement

10.1101/299792 ◽

2018 ◽

Cited By ~ 1

Author(s):

Lucas Czech ◽

Alexandros Stamatakis

Keyword(s):

Large Scale ◽

Sequence Data ◽

Sequence Similarity ◽

Computational Effort ◽

Supplementary Information ◽

Data Sets ◽

Metagenomic Sequencing ◽

Sequencing Studies ◽

Manual Selection ◽

Supplementary Material

AbstractMotivationIn most metagenomic sequencing studies, the initial analysis step consists in assessing the evolutionary provenance of the sequences. Phylogenetic (or Evolutionary) Placement methods can be employed to determine the evolutionary position of sequences with respect to a given reference phylogeny. These placement methods do however face certain limitations: The manual selection of reference sequences is labor-intensive; the computational effort to infer reference phylogenies is substantially larger than for methods that rely on sequence similarity; the number of taxa in the reference phylogeny should be small enough to allow for visually inspecting the results.ResultsWe present algorithms to overcome the above limitations. First, we introduce a method to automatically construct representative sequences from databases to infer reference phylogenies. Second, we present an approach for conducting large-scale phylogenetic placements on nested phylogenies. Third, we describe a preprocessing pipeline that allows for handling huge sequence data sets. Our experiments on empirical data show that our methods substantially accelerate the workflow and yield highly accurate placement results.ImplementationFreely available under GPLv3 at http://github.com/lczech/[email protected] InformationSupplementary data are available at Bioinformatics online.

Download Full-text

FUpred: detecting protein domains through deep-learning-based contact map prediction

Bioinformatics ◽

10.1093/bioinformatics/btaa217 ◽

2020 ◽

Vol 36 (12) ◽

pp. 3749-3757 ◽

Cited By ~ 1

Author(s):

Wei Zheng ◽

Xiaogen Zhou ◽

Qiqige Wuyun ◽

Robin Pearce ◽

Yang Li ◽

...

Keyword(s):

Large Scale ◽

Control Method ◽

Domain Boundary ◽

Protein Domains ◽

Protein Domain ◽

Supplementary Information ◽

Contact Maps ◽

Core Idea ◽

Matthew’S Correlation Coefficient ◽

And Function

Abstract Motivation Protein domains are subunits that can fold and function independently. Correct domain boundary assignment is thus a critical step toward accurate protein structure and function analyses. There is, however, no efficient algorithm available for accurate domain prediction from sequence. The problem is particularly challenging for proteins with discontinuous domains, which consist of domain segments that are separated along the sequence. Results We developed a new algorithm, FUpred, which predicts protein domain boundaries utilizing contact maps created by deep residual neural networks coupled with coevolutionary precision matrices. The core idea of the algorithm is to retrieve domain boundary locations by maximizing the number of intra-domain contacts, while minimizing the number of inter-domain contacts from the contact maps. FUpred was tested on a large-scale dataset consisting of 2549 proteins and generated correct single- and multi-domain classifications with a Matthew’s correlation coefficient of 0.799, which was 19.1% (or 5.3%) higher than the best machine learning (or threading)-based method. For proteins with discontinuous domains, the domain boundary detection and normalized domain overlapping scores of FUpred were 0.788 and 0.521, respectively, which were 17.3% and 23.8% higher than the best control method. The results demonstrate a new avenue to accurately detect domain composition from sequence alone, especially for discontinuous, multi-domain proteins. Availability and implementation https://zhanglab.ccmb.med.umich.edu/FUpred. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

A structural homology approach for computational protein design with flexible backbone

Bioinformatics ◽

10.1093/bioinformatics/bty975 ◽

2018 ◽

Vol 35 (14) ◽

pp. 2418-2426 ◽

Cited By ~ 2

Author(s):

David Simoncini ◽

Kam Y J Zhang ◽

Thomas Schiex ◽

Sophie Barbe

Keyword(s):

Amino Acid ◽

Protein Design ◽

Protein Sequence ◽

Critical Role ◽

Protein Structures ◽

Amino Acid Sequences ◽

Computational Protein Design ◽

Supplementary Information ◽

Structural Homology ◽

Homologous Proteins

Abstract Motivation Structure-based Computational Protein design (CPD) plays a critical role in advancing the field of protein engineering. Using an all-atom energy function, CPD tries to identify amino acid sequences that fold into a target structure and ultimately perform a desired function. Energy functions remain however imperfect and injecting relevant information from known structures in the design process should lead to improved designs. Results We introduce Shades, a data-driven CPD method that exploits local structural environments in known protein structures together with energy to guide sequence design, while sampling side-chain and backbone conformations to accommodate mutations. Shades (Structural Homology Algorithm for protein DESign), is based on customized libraries of non-contiguous in-contact amino acid residue motifs. We have tested Shades on a public benchmark of 40 proteins selected from different protein families. When excluding homologous proteins, Shades achieved a protein sequence recovery of 30% and a protein sequence similarity of 46% on average, compared with the PFAM protein family of the target protein. When homologous structures were added, the wild-type sequence recovery rate achieved 93%. Availability and implementation Shades source code is available at https://bitbucket.org/satsumaimo/shades as a patch for Rosetta 3.8 with a curated protein structure database and ITEM library creation software. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Master Blaster: an approach to sensitive identification of remotely related proteins

Scientific Reports ◽

10.1038/s41598-021-87833-4 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Chintalapati Janaki ◽

Venkatraman S. Gowri ◽

Narayanaswamy Srinivasan

Keyword(s):

Evolutionary Divergence ◽

Homology Detection ◽

Homologous Proteins ◽

Scoring Matrices ◽

A Genome ◽

Successful Approach ◽

Family Connections ◽

And Function ◽

Related Proteins ◽

Remote Homologs

AbstractGenome sequencing projects unearth sequences of all the protein sequences encoded in a genome. As the first step, homology detection is employed to obtain clues to structure and function of these proteins. However, high evolutionary divergence between homologous proteins challenges our ability to detect distant relationships. In the past, an approach involving multiple Position Specific Scoring Matrices (PSSMs) was found to be more effective than traditional single PSSMs. Cascaded search is another successful approach where hits of a search are queried to detect more homologues. We propose a protocol, ‘Master Blaster’, which combines the principles adopted in these two approaches to enhance our ability to detect remote homologues even further. Assessment of the approach was performed using known relationships available in the SCOP70 database, and the results were compared against that of PSI-BLAST and HHblits, a hidden Markov model-based method. Compared to PSI-BLAST, Master Blaster resulted in 10% improvement with respect to detection of cross superfamily connections, nearly 35% improvement in cross family and more than 80% improvement in intra family connections. From the results it was observed that HHblits is more sensitive in detecting remote homologues compared to Master Blaster. However, there are true hits from 46-folds for which Master Blaster reported homologs that are not reported by HHblits even using the optimal parameters indicating that for detecting remote homologues, use of multiple methods employing a combination of different approaches can be more effective in detecting remote homologs. Master Blaster stand-alone code is available for download in the supplementary archive.

Download Full-text

Large-scale sequence similarity analysis reveals the scope of sequence and function divergence in PilZ domain proteins

10.1101/2020.02.11.943704 ◽

2020 ◽

Author(s):

Qing Wei Cheang ◽

Shuo Sheng ◽

Linghui Xu ◽

Zhao-Xun Liang

Keyword(s):

Large Scale ◽

Sequence Similarity ◽

Protein Domain ◽

Divergent Evolution ◽

Cellular Functions ◽

Vast Number ◽

Future Studies ◽

Function Relationship ◽

And Function ◽

Scale Sequence

AbstractPilZ domain-containing proteins constitute a superfamily of widely distributed bacterial signalling proteins. Although studies have established the canonical PilZ domain as an adaptor protein domain evolved to specifically bind the second messenger c-di-GMP, mounting evidence suggest that the PilZ domain has undergone enormous divergent evolution to generate a superfamily of proteins that are characterized by a wide range of c-di-GMP-binding affinity, binding partners and cellular functions. The divergent evolution has even generated families of non-canonical PilZ domains that completely lack c-di-GMP binding ability. In this study, we performed a large-scale sequence analysis on more than 28,000 single- and di-domain PilZ proteins using the sequence similarity networking tool created originally to analyse functionally diverse enzyme superfamilies. The sequence similarity networks (SSN) generated by the analysis feature a large number of putative isofunctional protein clusters, and thus, provide an unprecedented panoramic view of the sequence-function relationship and function diversification in PilZ proteins. Some of the protein clusters in the networks are considered as unexplored clusters that contain proteins with completely unknown biological function; whereas others contain one, two or a few functionally known proteins, and therefore, enabling us to infer the cellular function of uncharacterized homologs or orthologs. With the ultimate goal of elucidating the diverse roles played by PilZ proteins in bacterial signal transduction, the work described here will facilitate the annotation of the vast number of PilZ proteins encoded by bacterial genome and help to prioritize functionally unknown PilZ proteins for future studies.ImportanceAlthough PilZ domain is best known as the protein domain evolved specifically for the binding of the second messenger c-di-GMP, divergent evolution has generated a superfamily of PilZ proteins with a diversity of ligand or protein-binding properties and cellular functions. We analysed the sequences of more than 28,000 PilZ proteins using the sequence similarity networking (SSN) tool to yield a global view of the sequence-function relationship and function diversification in PilZ proteins. The results will facilitate the annotation of the vast number of PilZ proteins encoded by bacterial genomes and help us prioritize PilZ proteins for future studies.

Download Full-text

CoRINs: A tool to compare residue interaction networks from homologous proteins and conformers

10.1101/2020.06.29.178541 ◽

2020 ◽

Author(s):

Felipe V. da Fonseca ◽

Romildo O. Souza Júnior ◽

Marília V. A. de Almeida ◽

Thiago D. Soares ◽

Diego A. A. Morais ◽

...

Keyword(s):

Protein Structure ◽

Amino Acid ◽

Conformational Changes ◽

Protein Function ◽

Protein Structures ◽

Software Tool ◽

Interaction Networks ◽

Homologous Proteins ◽

Residue Interaction ◽

And Function

ABSTRACTMotivationA useful approach to evaluate protein structure and quickly visualize crucial physicochemical interactions related to protein function is to construct Residue Interactions Networks (RINs). By using this application of graphs theory, the amino acid residues constitute the nodes, and the edges represent their interactions with other structural elements. Although several tools that construct RINs are available, many of them do not compare RINs from distinct protein structures. This comparison can give valuable insights into the understanding of conformational changes and the effects of amino acid substitutions in protein structure and function. With that in mind, we present CoRINs (Comparator of Residue Interaction Networks), a software tool that extensively compares RINs. The program has an accessible and user-friendly web interface, which summarizes the differences in several network parameters using interactive plots and tables. As a usage example of CoRINs, we compared RINs from conformers of two cancer-associated proteins.AvailabilityThe program is available at https://github.com/LasisUFRN/CoRINs.

Download Full-text