Testing Machine Learning Techniques for General Application by using Protein Secondary Structure Prediction. A Brief Survey with Studies of Pitfalls and Benefits Using a Simple Progressive Learning Approach

AbstractMotivationProtein structure predictions, that combine theoretical chemistry and bioinformatics, are an increasingly important technique in biotechnology and biomedical research, for example in the design of novel enzymes and drugs. Here, we present a new ensemble bi-layered machine learning architecture, that directly builds on ten existing pipelines providing rapid, high accuracy, 3-State secondary structure prediction of proteins.ResultsAfter training on 1348 solved protein structures, we evaluated the model with four independent datasets: JPRED4 - compiled by the authors of the successful predictor with the same name, and CASP11, CASP12 & CASP13 - assembled by the Critical Assessment of protein Structure Prediction consortium who run biannual experiments focused on objective testing of predictors. These rigorous, pre-established protocols included 7-fold cross-validation and blind testing. This led to a mean Hermes accuracy of 95.5%, significantly (p<0.05) better than the ten previously published models analysed in this paper. Furthermore, Hermes yielded a reduction in standard deviation, lower boundary outliers, and reduced dependency on solved structures of homologous proteins, as measured by NEFF score. This architecture provides advantages over other pipelines, while remaining accessible to users at any level of bioinformatics experience.Availability and ImplementationThe source code for Hermes is freely available at: https://github.com/HermesPrediction/Hermes. This page also includes the cross-validation with corresponding models, and all training/testing data presented in this study with predictions and accuracy.

Download Full-text

Machine Learning Techniques for Protein Secondary Structure Prediction:An Overview and Evaluation

Current Bioinformatics ◽

10.2174/157489308784340676 ◽

2008 ◽

Vol 3 (2) ◽

pp. 74-86 ◽

Cited By ~ 21

Author(s):

Paul Yoo ◽

Bing Zhou ◽

Albert Zomaya

Keyword(s):

Machine Learning ◽

Secondary Structure ◽

Protein Secondary Structure ◽

Machine Learning Techniques ◽

Learning Techniques

Download Full-text

A multi-stage protein secondary structure prediction system using machine learning and information theory

2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) ◽

10.1109/bibm.2015.7359867 ◽

2015 ◽

Cited By ~ 1

Author(s):

Masood Zamani ◽

Stefan C. Kremer

Keyword(s):

Machine Learning ◽

Information Theory ◽

Secondary Structure ◽

Structure Prediction ◽

Secondary Structure Prediction ◽

Protein Secondary Structure ◽

Prediction System ◽

Protein Secondary Structure Prediction ◽

Multi Stage

Download Full-text

Application of machine learning to structural molecular biology

Philosophical Transactions of the Royal Society B Biological Sciences ◽

10.1098/rstb.1994.0075 ◽

1994 ◽

Vol 344 (1310) ◽

pp. 365-371 ◽

Cited By ~ 21

Keyword(s):

Machine Learning ◽

Secondary Structure ◽

Molecular Biology ◽

Structure Prediction ◽

Secondary Structure Prediction ◽

Inductive Logic ◽

Protein Secondary Structure ◽

Quantitative Structure Activity Relationship ◽

Machine Leaning ◽

Human Inspection

A technique of machine learning, inductive logic programming implemented in the program GOLEM, has been applied to three problems in structural molecular biology. These problems are: the prediction of protein secondary structure; the identification of rules governing the arrangement of β-sheets strands in the tertiary folding of proteins; and the modelling of a quantitative structure activity relationship (QSAR) of a series of drugs. For secondary structure prediction and the QSAR, GOLEM yielded predictions comparable with contemporary approaches including neural networks. Rules for β-strand arrangement are derived and it is planned to contrast their accuracy with those obtained by human inspection. In all three studies GOLEM discovered rules that provided insight into the stereochemistry of the system. We conclude machine leaning used together with human intervention will provide a powerful tool to discover patterns in biological sequences and structures.

Download Full-text

Protein secondary structure prediction using machine learning

Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005. ◽

10.1109/ijcnn.2005.1555887 ◽

2006 ◽

Cited By ~ 2

Author(s):

BaiFang Zhang ◽

Zhihang Chen ◽

Yi Lu Murphey

Keyword(s):

Machine Learning ◽

Secondary Structure ◽

Structure Prediction ◽

Secondary Structure Prediction ◽

Protein Secondary Structure ◽

Protein Secondary Structure Prediction

Download Full-text

Protein secondary structure prediction based on fusion of machine learning classifiers

Proceedings of the 36th Annual ACM Symposium on Applied Computing ◽

10.1145/3412841.3442067 ◽

2021 ◽

Author(s):

Gabriel Bianchin de Oliveira ◽

Helio Pedrini ◽

Zanoni Dias

Keyword(s):

Machine Learning ◽

Secondary Structure ◽

Structure Prediction ◽

Secondary Structure Prediction ◽

Protein Secondary Structure ◽

Protein Secondary Structure Prediction ◽

Machine Learning Classifiers ◽

Learning Classifiers

Download Full-text

A secondary structure-based position-specific scoring matrix applied to the improvement in protein secondary structure prediction

PLoS ONE ◽

10.1371/journal.pone.0255076 ◽

2021 ◽

Vol 16 (7) ◽

pp. e0255076

Author(s):

Teng-Ruei Chen ◽

Sheng-Hung Juan ◽

Yu-Wei Huang ◽

Yen-Cheng Lin ◽

Wei-Cheng Lo

Keyword(s):

Machine Learning ◽

Amino Acid ◽

Secondary Structure ◽

Structure Prediction ◽

Secondary Structure Prediction ◽

Learning Algorithm ◽

Protein Secondary Structure ◽

Position Specific Scoring Matrix ◽

Protein Secondary Structure Prediction ◽

Scoring Matrix

Protein secondary structure prediction (SSP) has a variety of applications; however, there has been relatively limited improvement in accuracy for years. With a vision of moving forward all related fields, we aimed to make a fundamental advance in SSP. There have been many admirable efforts made to improve the machine learning algorithm for SSP. This work thus took a step back by manipulating the input features. A secondary structure element-based position-specific scoring matrix (SSE-PSSM) is proposed, based on which a new set of machine learning features can be established. The feasibility of this new PSSM was evaluated by rigid independent tests with training and testing datasets sharing <25% sequence identities. In all experiments, the proposed PSSM outperformed the traditional amino acid PSSM. This new PSSM can be easily combined with the amino acid PSSM, and the improvement in accuracy was remarkable. Preliminary tests made by combining the SSE-PSSM and well-known SSP methods showed 2.0% and 5.2% average improvements in three- and eight-state SSP accuracies, respectively. If this PSSM can be integrated into state-of-the-art SSP methods, the overall accuracy of SSP may break the current restriction and eventually bring benefit to all research and applications where secondary structure prediction plays a vital role during development. To facilitate the application and integration of the SSE-PSSM with modern SSP methods, we have established a web server and standalone programs for generating SSE-PSSM available at http://10.life.nctu.edu.tw/SSE-PSSM.

Download Full-text

Ensemble of Template-Free and Template-Based Classifiers for Protein Secondary Structure Prediction

International Journal of Molecular Sciences ◽

10.3390/ijms222111449 ◽

2021 ◽

Vol 22 (21) ◽

pp. 11449

Author(s):

Gabriel Bianchin de Oliveira ◽

Helio Pedrini ◽

Zanoni Dias

Keyword(s):

Computational Methods ◽

Structure Prediction ◽

Secondary Structure Prediction ◽

Protein Secondary Structure ◽

Secondary Structures ◽

Machine Learning Techniques ◽

Laboratory Methods ◽

Protein Secondary Structures ◽

Learning Techniques ◽

Template Free

Protein secondary structures are important in many biological processes and applications. Due to advances in sequencing methods, there are many proteins sequenced, but fewer proteins with secondary structures defined by laboratory methods. With the development of computer technology, computational methods have (started to) become the most important methodologies for predicting secondary structures. We evaluated two different approaches to this problem—driven by the recent results obtained by computational methods in this task—i) template-free classifiers, based on machine learning techniques; and ii) template-based classifiers, based on searching tools. Both approaches are formed by different sub-classifiers—six for template-free and two for template-based, each with a specific view of the protein. Our results show that these ensembles improve the results of each approach individually.

Download Full-text