Fuzzy k-Nearest Neighbor Method for Protein Secondary Structure Prediction and Its Parallel Implementation

Accurate determination of protein secondary structure from the chemical shift information is a key step for NMR tertiary structure determination. Relatively few work has been done on this subject. There needs to be a systematic investigation of algorithms that are (a) robust for large datasets; (b) easily extendable to (the dynamic) new databases; and (c) approaching to the limit of accuracy. We introduce new approaches using k-nearest neighbor algorithm to do the basic prediction and use the BCJR algorithm to smooth the predictions and combine different predictions from chemical shifts and based on sequence information only. Our new system, SUCCES, improves the accuracy of all existing methods on a large dataset of 805 proteins (at 86% Q3 accuracy and at 92.6% accuracy when the boundary residues are ignored), and it is easily extendable to any new dataset without requiring any new training. The software is publicly available at .

Download Full-text

A fast and efficient nearest neighbor method for protein secondary structure prediction

2011 3rd International Conference on Advanced Computer Control ◽

10.1109/icacc.2011.6016402 ◽

2011 ◽

Cited By ~ 1

Author(s):

Wei Yang ◽

Kuanquan Wang ◽

Wangmeng Zuo

Keyword(s):

Secondary Structure ◽

Structure Prediction ◽

Nearest Neighbor ◽

Secondary Structure Prediction ◽

Protein Secondary Structure ◽

Protein Secondary Structure Prediction

Download Full-text

Boosting the accuracy of protein secondary structure prediction through nearest neighbor search and method hybridization

Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics ◽

10.1145/3388440.3414212 ◽

2020 ◽

Author(s):

Spencer Krieger ◽

John Kececioglu

Keyword(s):

Secondary Structure ◽

Structure Prediction ◽

Nearest Neighbor ◽

Secondary Structure Prediction ◽

Protein Secondary Structure ◽

Nearest Neighbor Search ◽

Protein Secondary Structure Prediction ◽

Neighbor Search

Download Full-text

Protein Secondary Structure Prediction Using Nearest-neighbor Methods

Journal of Molecular Biology ◽

10.1006/jmbi.1993.1464 ◽

1993 ◽

Vol 232 (4) ◽

pp. 1117-1129 ◽

Cited By ~ 105

Author(s):

Tau-Mu Yi ◽

Eric S. Lander

Keyword(s):

Secondary Structure ◽

Structure Prediction ◽

Nearest Neighbor ◽

Secondary Structure Prediction ◽

Protein Secondary Structure ◽

Protein Secondary Structure Prediction

Download Full-text

Boosting the accuracy of protein secondary structure prediction through nearest neighbor search and method hybridization

Bioinformatics ◽

10.1093/bioinformatics/btaa336 ◽

2020 ◽

Vol 36 (Supplement_1) ◽

pp. i317-i325

Author(s):

Spencer Krieger ◽

John Kececioglu

Keyword(s):

Secondary Structure ◽

Structure Prediction ◽

Nearest Neighbor ◽

Secondary Structure Prediction ◽

State Of The Art ◽

Hybrid Approach ◽

Protein Secondary Structure ◽

Nearest Neighbor Search ◽

Protein Secondary Structure Prediction ◽

Neighbor Search

Abstract Motivation Protein secondary structure prediction is a fundamental precursor to many bioinformatics tasks. Nearly all state-of-the-art tools when computing their secondary structure prediction do not explicitly leverage the vast number of proteins whose structure is known. Leveraging this additional information in a so-called template-based method has the potential to significantly boost prediction accuracy. Method We present a new hybrid approach to secondary structure prediction that gains the advantages of both template- and non-template-based methods. Our core template-based method is an algorithmic approach that uses metric-space nearest neighbor search over a template database of fixed-length amino acid words to determine estimated class-membership probabilities for each residue in the protein. These probabilities are then input to a dynamic programming algorithm that finds a physically valid maximum-likelihood prediction for the entire protein. Our hybrid approach exploits a novel accuracy estimator for our core method, which estimates the unknown true accuracy of its prediction, to discern when to switch between template- and non-template-based methods. Results On challenging CASP benchmarks, the resulting hybrid approach boosts the state-of-the-art Q8 accuracy by more than 2–10%, and Q3 accuracy by more than 1–3%, yielding the most accurate method currently available for both 3- and 8-state secondary structure prediction. Availability and implementation A preliminary implementation in a new tool we call Nnessy is available free for non-commercial use at http://nnessy.cs.arizona.edu.

Download Full-text