Statistical analysis of pair-wise compatibility of spatially nearest neighbor and adjacent residues in α-helix and β-strands: application to a minimal model for secondary structure prediction

Accurate determination of protein secondary structure from the chemical shift information is a key step for NMR tertiary structure determination. Relatively few work has been done on this subject. There needs to be a systematic investigation of algorithms that are (a) robust for large datasets; (b) easily extendable to (the dynamic) new databases; and (c) approaching to the limit of accuracy. We introduce new approaches using k-nearest neighbor algorithm to do the basic prediction and use the BCJR algorithm to smooth the predictions and combine different predictions from chemical shifts and based on sequence information only. Our new system, SUCCES, improves the accuracy of all existing methods on a large dataset of 805 proteins (at 86% Q3 accuracy and at 92.6% accuracy when the boundary residues are ignored), and it is easily extendable to any new dataset without requiring any new training. The software is publicly available at .

Download Full-text

A fast and efficient nearest neighbor method for protein secondary structure prediction

2011 3rd International Conference on Advanced Computer Control ◽

10.1109/icacc.2011.6016402 ◽

2011 ◽

Cited By ~ 1

Author(s):

Wei Yang ◽

Kuanquan Wang ◽

Wangmeng Zuo

Keyword(s):

Secondary Structure ◽

Structure Prediction ◽

Nearest Neighbor ◽

Secondary Structure Prediction ◽

Protein Secondary Structure ◽

Protein Secondary Structure Prediction

Download Full-text

Quantitative high-throughput tests of ubiquitous RNA secondary structure prediction algorithms via RNA/protein binding

10.1101/571588 ◽

2019 ◽

Cited By ~ 2

Author(s):

Winston R. Becker ◽

Inga Jarmoskaite ◽

Kalli Kappel ◽

Pavanapuresan P. Vaidyanathan ◽

Sarah K. Denny ◽

...

Keyword(s):

Secondary Structure ◽

Protein Binding ◽

Rna Structure ◽

Structure Prediction ◽

Rna Secondary Structure ◽

Nearest Neighbor ◽

Secondary Structure Prediction ◽

Structural Features ◽

Vast Number ◽

Prediction Algorithms

AbstractNearest-neighbor (NN) rules provide a simple and powerful quantitative framework for RNA structure prediction that is strongly supported for canonical Watson-Crick duplexes from a plethora of thermodynamic measurements. Predictions of RNA secondary structure based on nearest-neighbor (NN) rules are routinely used to understand biological function and to engineer and control new functions in biotechnology. However, NN applications to RNA structural features such as internal and terminal loops rely on approximations and assumptions, with sparse experimental coverage of the vast number of possible sequence and structural features. To test to what extent NN rules accurately predict thermodynamic stabilities across RNAs with non-WC features, we tested their predictions using a quantitative high-throughput assay platform, RNA-MaP. Using a thermodynamic assay with coupled protein binding, we carried out equilibrium measurements for over 1000 RNAs with a range of predicted secondary structure stabilities. Our results revealed substantial scatter and systematic deviations between NN predictions and observed stabilities. Solution salt effects and incorrect or omitted loop parameters contribute to these observed deviations. Our results demonstrate the need to independently and quantitatively test NN computational algorithms to identify their capabilities and limitations. RNA-MaP and related approaches can be used to test computational predictions and can be adapted to obtain experimental data to improve RNA secondary structure and other prediction algorithms.Significance statementRNA secondary structure prediction algorithms are routinely used to understand, predict and design functional RNA structures in biology and biotechnology. Given the vast number of RNA sequence and structural features, these predictions rely on a series of approximations, and independent tests are needed to quantitatively evaluate the accuracy of predicted RNA structural stabilities. Here we measure the stabilities of over 1000 RNA constructs by using a coupled protein binding assay. Our results reveal substantial deviations from the RNA stabilities predicted by popular algorithms, and identify factors contributing to the observed deviations. We demonstrate the importance of quantitative, experimental tests of computational RNA structure predictions and present an approach that can be used to routinely test and improve the prediction accuracy.

Download Full-text

PROFILES AND FUZZY K-NEAREST NEIGHBOR ALGORITHM FOR PROTEIN SECONDARY STRUCTURE PREDICTION

Proceedings of the 3rd Asia-Pacific Bioinformatics Conference ◽

10.1142/9781860947322_0009 ◽

2005 ◽

Cited By ~ 6

Author(s):

RAJKUMAR BONDUGULA ◽

OGNEN DUZLEVSKI ◽

DONG XU

Keyword(s):

Secondary Structure ◽

Structure Prediction ◽

Nearest Neighbor ◽

Secondary Structure Prediction ◽

Protein Secondary Structure ◽

K Nearest Neighbor ◽

Protein Secondary Structure Prediction ◽

Nearest Neighbor Algorithm ◽

K Nearest Neighbor Algorithm

Download Full-text

The ‘30K’ superfamily of viral movement proteins

Microbiology ◽

10.1099/0022-1317-81-1-257 ◽

2000 ◽

Vol 81 (1) ◽

pp. 257-266 ◽

Cited By ~ 206

Author(s):

Ulrich Melcher

Keyword(s):

Secondary Structure ◽

Structure Prediction ◽

Secondary Structure Prediction ◽

Consensus Sequence ◽

Amino Acid Sequences ◽

Movement Proteins ◽

Consensus Sequences ◽

Viral Movement ◽

Secondary Structure Predictions ◽

Α Helix

Relationships among the amino acid sequences of viral movement proteins related to the 30 kDa (‘30K’) movement protein of tobacco mosaic virus – the 30K superfamily – were explored. Sequences were grouped into 18 families. A comparison of secondary structure predictions for each family revealed a common predicted core structure flanked by variable N- and C-terminal domains. The core consisted of a series of β-elements flanked by an α-helix on each end. Consensus sequences for each of the families were generated and aligned with one another. From this alignment an overall secondary structure prediction was generated and a consensus sequence that can recognize each family in database searches was obtained. The analysis led to criteria that were used to evaluate other virus-encoded proteins for possible membership of the 30K superfamily. A rhabdoviral and a tenuiviral protein were identified as 30K superfamily members, as were plant-encoded phloem proteins. Parsimony analysis grouped tubule-forming movement proteins separate from others. Establishment of the alignment of residues of diverse families facilitates comparison of mutagenesis experiments done on different movement proteins and should serve as a guide for further such experiments.

Download Full-text