scholarly journals Comparative Assessment of Intrinsic Disorder Predictions with a Focus on Protein and Nucleic Acid-Binding Proteins

Biomolecules ◽  
2020 ◽  
Vol 10 (12) ◽  
pp. 1636
Author(s):  
Akila Katuwawala ◽  
Lukasz Kurgan

With over 60 disorder predictors, users need help navigating the predictor selection task. We review 28 surveys of disorder predictors, showing that only 11 include assessment of predictive performance. We identify and address a few drawbacks of these past surveys. To this end, we release a novel benchmark dataset with reduced similarity to the training sets of the considered predictors. We use this dataset to perform a first-of-its-kind comparative analysis that targets two large functional families of disordered proteins that interact with proteins and with nucleic acids. We show that limiting sequence similarity between the benchmark and the training datasets has a substantial impact on predictive performance. We also demonstrate that predictive quality is sensitive to the use of the well-annotated order and inclusion of the fully structured proteins in the benchmark datasets, both of which should be considered in future assessments. We identify three predictors that provide favorable results using the new benchmark set. While we find that VSL2B offers the most accurate and robust results overall, ESpritz-DisProt and SPOT-Disorder perform particularly well for disordered proteins. Moreover, we find that predictions for the disordered protein-binding proteins suffer low predictive quality compared to generic disordered proteins and the disordered nucleic acids-binding proteins. This can be explained by the high disorder content of the disordered protein-binding proteins, which makes it difficult for the current methods to accurately identify ordered regions in these proteins. This finding motivates the development of a new generation of methods that would target these difficult-to-predict disordered proteins. We also discuss resources that support users in collecting and identifying high-quality disorder predictions.

2020 ◽  
Vol 36 (18) ◽  
pp. 4797-4804
Author(s):  
Shu Yang ◽  
Xiaoxi Liu ◽  
Raymond T Ng

Abstract Motivation The interaction between proteins and nucleic acids plays a crucial role in gene regulation and cell function. Determining the binding preferences of nucleic acid-binding proteins (NBPs), namely RNA-binding proteins (RBPs) and transcription factors (TFs), is the key to decipher the protein–nucleic acids interaction code. Today, available NBP binding data from in vivo or in vitro experiments are still limited, which leaves a large portion of NBPs uncovered. Unfortunately, existing computational methods that model the NBP binding preferences are mostly protein specific: they need the experimental data for a specific protein in interest, and thus only focus on experimentally characterized NBPs. The binding preferences of experimentally unexplored NBPs remain largely unknown. Results Here, we introduce ProbeRating, a nucleic acid recommender system that utilizes techniques from deep learning and word embeddings of natural language processing. ProbeRating is developed to predict binding profiles for unexplored or poorly studied NBPs by exploiting their homologs NBPs which currently have available binding data. Requiring only sequence information as input, ProbeRating adapts FastText from Facebook AI Research to extract biological features. It then builds a neural network-based recommender system. We evaluate the performance of ProbeRating on two different tasks: one for RBP and one for TF. As a result, ProbeRating outperforms previous methods on both tasks. The results show that ProbeRating can be a useful tool to study the binding mechanism for the many NBPs that lack direct experimental evidence. and implementation Availability and implementation The source code is freely available at <https://github.com/syang11/ProbeRating>. Supplementary information Supplementary data are available at Bioinformatics online.


2014 ◽  
Vol 192 (11) ◽  
pp. 5390-5397 ◽  
Author(s):  
Marshall P. Thomas ◽  
Jennifer Whangbo ◽  
Geoffrey McCrossan ◽  
Aaron J. Deutsch ◽  
Kimberly Martinod ◽  
...  

2020 ◽  
Vol 36 (Supplement_2) ◽  
pp. i735-i744
Author(s):  
Fuhao Zhang ◽  
Wenbo Shi ◽  
Jian Zhang ◽  
Min Zeng ◽  
Min Li ◽  
...  

Abstract Motivation Knowledge of protein-binding residues (PBRs) improves our understanding of protein−protein interactions, contributes to the prediction of protein functions and facilitates protein−protein docking calculations. While many sequence-based predictors of PBRs were published, they offer modest levels of predictive performance and most of them cross-predict residues that interact with other partners. One unexplored option to improve the predictive quality is to design consensus predictors that combine results produced by multiple methods. Results We empirically investigate predictive performance of a representative set of nine predictors of PBRs. We report substantial differences in predictive quality when these methods are used to predict individual proteins, which contrast with the dataset-level benchmarks that are currently used to assess and compare these methods. Our analysis provides new insights for the cross-prediction concern, dissects complementarity between predictors and demonstrates that predictive performance of the top methods depends on unique characteristics of the input protein sequence. Using these insights, we developed PROBselect, first-of-its-kind consensus predictor of PBRs. Our design is based on the dynamic predictor selection at the protein level, where the selection relies on regression-based models that accurately estimate predictive performance of selected predictors directly from the sequence. Empirical assessment using a low-similarity test dataset shows that PROBselect provides significantly improved predictive quality when compared with the current predictors and conventional consensuses that combine residue-level predictions. Moreover, PROBselect informs the users about the expected predictive quality for the prediction generated from a given input protein. Availability and implementation PROBselect is available at http://bioinformatics.csu.edu.cn/PROBselect/home/index. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 22 (2) ◽  
pp. 922
Author(s):  
Martin Bartas ◽  
Jiří Červeň ◽  
Simona Guziurová ◽  
Kristyna Slychko ◽  
Petr Pečinka

Nucleic acid-binding proteins are traditionally divided into two categories: With the ability to bind DNA or RNA. In the light of new knowledge, such categorizing should be overcome because a large proportion of proteins can bind both DNA and RNA. Another even more important features of nucleic acid-binding proteins are so-called sequence or structure specificities. Proteins able to bind nucleic acids in a sequence-specific manner usually contain one or more of the well-defined structural motifs (zinc-fingers, leucine zipper, helix-turn-helix, or helix-loop-helix). In contrast, many proteins do not recognize nucleic acid sequence but rather local DNA or RNA structures (G-quadruplexes, i-motifs, triplexes, cruciforms, left-handed DNA/RNA form, and others). Finally, there are also proteins recognizing both sequence and local structural properties of nucleic acids (e.g., famous tumor suppressor p53). In this mini-review, we aim to summarize current knowledge about the amino acid composition of various types of nucleic acid-binding proteins with a special focus on significant enrichment and/or depletion in each category.


Author(s):  
Stephen D. Jett

The electrophoresis gel mobility shift assay is a popular method for the study of protein-nucleic acid interactions. The binding of proteins to DNA is characterized by a reduction in the electrophoretic mobility of the nucleic acid. Binding affinity, stoichiometry, and kinetics can be obtained from such assays; however, it is often desirable to image the various species in the gel bands using TEM. Present methods for isolation of nucleoproteins from gel bands are inefficient and often destroy the native structure of the complexes. We have developed a technique, called “snapshot blotting,” by which nucleic acids and nucleoprotein complexes in electrophoresis gels can be electrophoretically transferred directly onto carbon-coated grids for TEM imaging.


PLoS ONE ◽  
2012 ◽  
Vol 7 (10) ◽  
pp. e47233 ◽  
Author(s):  
Guy Caljon ◽  
Karin De Ridder ◽  
Benoît Stijlemans ◽  
Marc Coosemans ◽  
Stefan Magez ◽  
...  

2007 ◽  
Vol 47 (supplement) ◽  
pp. S54
Author(s):  
Koji HASEGAWA ◽  
Tatsushi GOTO ◽  
Daisuke KITANO ◽  
Mari KOTOURA ◽  
Fumio TOKUNAGA ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document