ODiNPred: comprehensive prediction of protein order and disorder

Abstract Structural disorder is widespread in eukaryotic proteins and is vital for their function in diverse biological processes. It is therefore highly desirable to be able to predict the degree of order and disorder from amino acid sequence. It is, however, notoriously difficult to predict the degree of local flexibility within structured domains and the presence and nuances of localized rigidity within intrinsically disordered regions. To identify such instances, we used the CheZOD database, which encompasses accurate, balanced, and continuous-valued quantification of protein (dis)order at amino acid resolution based on NMR chemical shifts. To computationally forecast the spectrum of protein disorder in the most comprehensive manner possible, we constructed the sequence-based protein order/disorder predictor ODiNPred, trained on an expanded version of CheZOD. ODiNPred applies a deep neural network comprising 157 unique sequence features to 1325 protein sequences together with the experimental NMR chemical shift data. Cross-validation for 117 protein sequences shows that ODiNPred better predicts the continuous variation in order along the protein sequence, suggesting that contemporary predictors are limited by the quality of training data. The inclusion of evolutionary features reduces the performance gap between ODiNPred and its peers, but analysis shows that it retains greater accuracy for the more challenging prediction of intermediate disorder.

Download Full-text

PlaToLoCo: the first web meta-server for visualization and annotation of low complexity regions in proteins

Nucleic Acids Research ◽

10.1093/nar/gkaa339 ◽

2020 ◽

Vol 48 (W1) ◽

pp. W77-W84 ◽

Cited By ~ 4

Author(s):

Patryk Jarnot ◽

Joanna Ziemska-Legiecka ◽

Laszlo Dobson ◽

Matthew Merski ◽

Pablo Mier ◽

...

Keyword(s):

Amino Acid ◽

Query Sequence ◽

Protein Sequences ◽

Low Complexity ◽

Web Based ◽

Functional Annotations ◽

Additional Information ◽

Intrinsically Disordered ◽

Intrinsically Disordered Regions ◽

Protein Functions

Abstract Low complexity regions (LCRs) in protein sequences are characterized by a less diverse amino acid composition compared to typically observed sequence diversity. Recent studies have shown that LCRs may co-occur with intrinsically disordered regions, are highly conserved in many organisms, and often play important roles in protein functions and in diseases. In previous decades, several methods have been developed to identify regions with LCRs or amino acid bias, but most of them as stand-alone applications and currently there is no web-based tool which allows users to explore LCRs in protein sequences with additional functional annotations. We aim to fill this gap by providing PlaToLoCo - PLAtform of TOols for LOw COmplexity—a meta-server that integrates and collects the output of five different state-of-the-art tools for discovering LCRs and provides functional annotations such as domain detection, transmembrane segment prediction, and calculation of amino acid frequencies. In addition, the union or intersection of the results of the search on a query sequence can be obtained. By developing the PlaToLoCo meta-server, we provide the community with a fast and easily accessible tool for the analysis of LCRs with additional information included to aid the interpretation of the results. The PlaToLoCo platform is available at: http://platoloco.aei.polsl.pl/.

Download Full-text

Embeddings from deep learning transfer GO annotations beyond homology

Scientific Reports ◽

10.1038/s41598-020-80786-0 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Maria Littmann ◽

Michael Heinzinger ◽

Christian Dallago ◽

Tobias Olenyi ◽

Burkhard Rost

Keyword(s):

Protein Function ◽

Protein Sequences ◽

Language Models ◽

Evolutionary Information ◽

Pairwise Sequence Identity ◽

Intrinsically Disordered ◽

Intrinsically Disordered Regions ◽

Sequence Identity ◽

Experimental Function ◽

Go Terms

AbstractKnowing protein function is crucial to advance molecular and medical biology, yet experimental function annotations through the Gene Ontology (GO) exist for fewer than 0.5% of all known proteins. Computational methods bridge this sequence-annotation gap typically through homology-based annotation transfer by identifying sequence-similar proteins with known function or through prediction methods using evolutionary information. Here, we propose predicting GO terms through annotation transfer based on proximity of proteins in the SeqVec embedding rather than in sequence space. These embeddings originate from deep learned language models (LMs) for protein sequences (SeqVec) transferring the knowledge gained from predicting the next amino acid in 33 million protein sequences. Replicating the conditions of CAFA3, our method reaches an Fmax of 37 ± 2%, 50 ± 3%, and 57 ± 2% for BPO, MFO, and CCO, respectively. Numerically, this appears close to the top ten CAFA3 methods. When restricting the annotation transfer to proteins with < 20% pairwise sequence identity to the query, performance drops (Fmax BPO 33 ± 2%, MFO 43 ± 3%, CCO 53 ± 2%); this still outperforms naïve sequence-based transfer. Preliminary results from CAFA4 appear to confirm these findings. Overall, this new concept is likely to change the annotation of proteins, in particular for proteins from smaller families or proteins with intrinsically disordered regions.

Download Full-text

Spritz: a server for the prediction of intrinsically disordered regions in protein sequences using kernel machines

Nucleic Acids Research ◽

10.1093/nar/gkl166 ◽

2006 ◽

Vol 34 (Web Server) ◽

pp. W164-W168 ◽

Cited By ~ 89

Author(s):

A. Vullo ◽

O. Bortolami ◽

G. Pollastri ◽

S. C. E. Tosatto

Keyword(s):

Protein Sequences ◽

Kernel Machines ◽

Intrinsically Disordered ◽

Intrinsically Disordered Regions ◽

Disordered Regions

Download Full-text

Random coil chemical shifts for serine, threonine and tyrosine phosphorylation over a broad pH range

Journal of Biomolecular NMR ◽

10.1007/s10858-019-00283-z ◽

2019 ◽

Vol 73 (12) ◽

pp. 713-725 ◽

Cited By ~ 4

Author(s):

Ruth Hendus-Altenburger ◽

Catarina B. Fernandes ◽

Katrine Bugge ◽

Micha B. A. Kunze ◽

Wouter Boomsma ◽

...

Keyword(s):

Chemical Shift ◽

Secondary Structure ◽

Intrinsically Disordered Proteins ◽

Chemical Shifts ◽

Random Coil ◽

Disordered Proteins ◽

Phosphoryl Group ◽

Intrinsically Disordered ◽

Intrinsically Disordered Regions ◽

Secondary Chemical

Abstract Phosphorylation is one of the main regulators of cellular signaling typically occurring in flexible parts of folded proteins and in intrinsically disordered regions. It can have distinct effects on the chemical environment as well as on the structural properties near the modification site. Secondary chemical shift analysis is the main NMR method for detection of transiently formed secondary structure in intrinsically disordered proteins (IDPs) and the reliability of the analysis depends on an appropriate choice of random coil model. Random coil chemical shifts and sequence correction factors were previously determined for an Ac-QQXQQ-NH2-peptide series with X being any of the 20 common amino acids. However, a matching dataset on the phosphorylated states has so far only been incompletely determined or determined only at a single pH value. Here we extend the database by the addition of the random coil chemical shifts of the phosphorylated states of serine, threonine and tyrosine measured over a range of pH values covering the pKas of the phosphates and at several temperatures (www.bio.ku.dk/sbinlab/randomcoil). The combined results allow for accurate random coil chemical shift determination of phosphorylated regions at any pH and temperature, minimizing systematic biases of the secondary chemical shifts. Comparison of chemical shifts using random coil sets with and without inclusion of the phosphoryl group, revealed under/over estimations of helicity of up to 33%. The expanded set of random coil values will improve the reliability in detection and quantification of transient secondary structure in phosphorylation-modified IDPs.

Download Full-text

Differential roles of two DDX17 isoforms in the formation of membraneless organelles

The Journal of Biochemistry ◽

10.1093/jb/mvaa023 ◽

2020 ◽

Vol 168 (1) ◽

pp. 33-40

Author(s):

Yuya Hirai ◽

Eisuke Domae ◽

Yoshihiro Yoshikawa ◽

Keizo Tomonaga

Keyword(s):

Amino Acid ◽

Enzymatic Activity ◽

Rna Helicase ◽

Intracellular Distribution ◽

Amino Acid Sequences ◽

Nucleolar Localization ◽

Intrinsically Disordered ◽

Intrinsically Disordered Regions ◽

Dead Box ◽

Additional Amino Acid

Abstract The RNA helicase, DDX17 is a member of the DEAD-box protein family. DDX17 has two isoforms: p72 and p82. The p82 isoform has additional amino acid sequences called intrinsically disordered regions (IDRs), which are related to the formation of membraneless organelles (MLOs). Here, we reveal that p72 is mostly localized to the nucleoplasm, while p82 is localized to the nucleoplasm and nucleoli. Additionally, p82 exhibited slower intranuclear mobility than p72. Furthermore, the enzymatic mutants of both p72 and p82 accumulate into the stress granules. The enzymatic mutant of p82 abolishes nucleolar localization of p82. Our findings suggest the importance of IDRs and enzymatic activity of DEAD-box proteins in the intracellular distribution and formation of MLOs.

Download Full-text

Identifying short disorder-to-order binding regions in disordered proteins with a deep convolutional neural network method

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720019500045 ◽

2019 ◽

Vol 17 (01) ◽

pp. 1950004 ◽

Cited By ~ 2

Author(s):

Chun Fang ◽

Yoshitaka Moriwaki ◽

Aikui Tian ◽

Caihong Li ◽

Kentaro Shimizu

Keyword(s):

Neural Network ◽

Amino Acid ◽

Convolutional Neural Network ◽

Intrinsically Disordered Proteins ◽

Protein Sequences ◽

Interaction Network ◽

Deep Convolutional Neural Network ◽

Disordered Proteins ◽

Related Factors ◽

Intrinsically Disordered

Molecular recognition features (MoRFs) are key functional regions of intrinsically disordered proteins (IDPs), which play important roles in the molecular interaction network of cells and are implicated in many serious human diseases. Identifying MoRFs is essential for both functional studies of IDPs and drug design. This study adopts the cutting-edge machine learning method of artificial intelligence to develop a powerful model for improving MoRFs prediction. We proposed a method, named as en_DCNNMoRF (ensemble deep convolutional neural network-based MoRF predictor). It combines the outcomes of two independent deep convolutional neural network (DCNN) classifiers that take advantage of different features. The first, DCNNMoRF1, employs position-specific scoring matrix (PSSM) and 22 types of amino acid-related factors to describe protein sequences. The second, DCNNMoRF2, employs PSSM and 13 types of amino acid indexes to describe protein sequences. For both single classifiers, DCNN with a novel two-dimensional attention mechanism was adopted, and an average strategy was added to further process the output probabilities of each DCNN model. Finally, en_DCNNMoRF combined the two models by averaging their final scores. When compared with other well-known tools applied to the same datasets, the accuracy of the novel proposed method was comparable with that of state-of-the-art methods. The related web server can be accessed freely via http://vivace.bi.a.u-tokyo.ac.jp:8008/fang/en_MoRFs.php .

Download Full-text

Markov Models of Amino Acid Substitution to Study Proteins with Intrinsically Disordered Regions

PLoS ONE ◽

10.1371/journal.pone.0020488 ◽

2011 ◽

Vol 6 (5) ◽

pp. e20488 ◽

Cited By ~ 27

Author(s):

Adam M. Szalkowski ◽

Maria Anisimova

Keyword(s):

Amino Acid ◽

Amino Acid Substitution ◽

Markov Models ◽

Intrinsically Disordered ◽

Intrinsically Disordered Regions ◽

Disordered Regions

Download Full-text

Proteome-wide signatures of function in highly diverged intrinsically disordered regions

10.1101/578716 ◽

2019 ◽

Author(s):

Taraneh Zarin ◽

Bob Strome ◽

Alex N Nguyen Ba ◽

Simon Alberti ◽

Julie D Forman-Kay ◽

...

Keyword(s):

Amino Acid ◽

Amino Acid Sequences ◽

Functional Annotations ◽

Intrinsically Disordered ◽

Intrinsically Disordered Regions ◽

Molecular Features ◽

Primary Amino ◽

Function Relationship ◽

Disordered Regions

AbstractIntrinsically disordered regions make up a large part of the proteome, but the sequence-to-function relationship in these regions is poorly understood, in part because the primary amino acid sequences of these regions are poorly conserved in alignments. Here we use an evolutionary approach to detect molecular features that are preserved in the amino acid sequences of orthologous intrinsically disordered regions. We find that most disordered regions contain multiple molecular features that are preserved, and we define these as “evolutionary signatures” of disordered regions. We demonstrate that intrinsically disordered regions with similar evolutionary signatures can rescue functionin vivo,and that groups of intrinsically disordered regions with similar evolutionary signatures are strongly enriched for functional annotations and phenotypes. We propose that evolutionary signatures can be used to predict function for many disordered regions from their amino acid sequences.

Download Full-text

Structural propensity database of proteins

10.1101/144840 ◽

2017 ◽

Author(s):

Kamil Tamiola ◽

Matthew M Heberling ◽

Jan Domanski

Keyword(s):

Structural Dynamics ◽

Protein Function ◽

Chemical Shifts ◽

Protein Disorder ◽

Computational Biophysics ◽

Intrinsic Protein ◽

Intrinsically Disordered ◽

Intrinsically Disordered Regions ◽

Intrinsic Protein Disorder ◽

Structural Insights

AbstractAn overwhelming amount of experimental evidence suggests that elucidations of protein function, interactions, and pathology are incomplete without inclusion of intrinsic protein disorder and structural dynamics. Thus, to expand our understanding of intrinsic protein disorder, we have created a database of secondary structure (SS) propensities for proteins (dSPP) as a reference resource for experimental research and computational biophysics. The dSPP comprises SS propensities of 7,094 unrelated proteins, as gauged from NMR chemical shift measurements in solution and solid state. Here, we explain the concept of SS propensity and analyze dSPP entries of therapeutic relevance, α-synuclein, MOAG-4, and the ZIKA NS2B-NS3 complex to show: (1) how propensity mapping generates novel structural insights into intrinsically disordered regions of pathologically relevant proteins, (2) how computational biophysics tools can benefit from propensity mapping, and (3) how the residual disorder estimation based on NMR chemical shifts compares with sequence-based disorder predictors. This work demonstrates the benefit of propensity estimation as a method that reports both on protein structure, lability, and disorder.

Download Full-text

Learning protein constitutive motifs from sequence data

eLife ◽

10.7554/elife.39397 ◽

2019 ◽

Vol 8 ◽

Cited By ~ 30

Author(s):

Jérôme Tubiana ◽

Simona Cocco ◽

Rémi Monasson

Keyword(s):

Sequence Data ◽

Protein Sequences ◽

Sequence Information ◽

Ligand Specificity ◽

Protein Families ◽

Restricted Boltzmann Machines ◽

Intrinsically Disordered ◽

Intrinsically Disordered Regions ◽

Model Protein ◽

Lattice Proteins

Statistical analysis of evolutionary-related protein sequences provides information about their structure, function, and history. We show that Restricted Boltzmann Machines (RBM), designed to learn complex high-dimensional data and their statistical features, can efficiently model protein families from sequence information. We here apply RBM to 20 protein families, and present detailed results for two short protein domains (Kunitz and WW), one long chaperone protein (Hsp70), and synthetic lattice proteins for benchmarking. The features inferred by the RBM are biologically interpretable: they are related to structure (residue-residue tertiary contacts, extended secondary motifs (α-helixes and β-sheets) and intrinsically disordered regions), to function (activity and ligand specificity), or to phylogenetic identity. In addition, we use RBM to design new protein sequences with putative properties by composing and 'turning up' or 'turning down' the different modes at will. Our work therefore shows that RBM are versatile and practical tools that can be used to unveil and exploit the genotype–phenotype relationship for protein families.

Download Full-text