scholarly journals Assigning Secondary Structure in Proteins using AI

2021 ◽  
Author(s):  
Jisna Vellara Antony ◽  
Prayagh Madhu ◽  
Jayaraj Pottekkattuvalappil Balakrishnan

AbstractKnowledge about protein structure assignment enriches the structural and functional understanding of proteins. Accurate and reliable structure assignment data is crucial for secondary structure prediction systems. Since the ’80s various methods based on hydrogen bond analysis and atomic coordinate geometry, followed by Machine Learning, have been employed in protein structure assignment. However, the assignment process becomes challenging when missing atoms are present in protein files. Our model develops a multi-class classifier program named DLFSA for assigning protein Secondary Structure Elements(SSE) using Convolutional Neural Networks(CNN). A fast and efficient GPU based parallel procedure extracts fragments from protein files. The model implemented in this work is trained with a subset of protein fragments and achieves 88.1% and 82.5% train and test accuracy, respectively. Our model uses only Cα coordinates for secondary structure assignments. The model is successfully tested on a few full-length proteins also. Results from the fragment-based studies demonstrate the feasibility of applying deep learning solutions for structure assignment problems.

2021 ◽  
Author(s):  
Jisna Vellara Antony ◽  
Prayagh Madhu ◽  
Jayaraj Pottekkattuvalappil Balakrishnan

Abstract Knowledge about protein structure assignment enriches the structural and functional understanding of proteins. Accurate and reliable structure assignment data is crucial for secondary structure prediction systems. Since the '80s various methods based on hydrogen bond analysis and atomic coordinate geometry, followed by Machine Learning, have been employed in protein structure assignment. However, the assignment process becomes challenging when missing atoms are present in protein files. Our model develops a multi-class classifier program named DLFSA for assigning protein Secondary Structure Elements(SSE) using Convolutional Neural Networks(CNN). A fast and efficient GPU based parallel procedure extracts fragments from protein files. The model implemented in this work is trained with a subset of protein fragments and achieves 88.1% and 82.5% train and test accuracy, respectively. Our model uses only Cα coordinates for secondary structure assignments. The model is successfully tested on a few full-length proteins also. Results from the fragment-based studies demonstrate the feasibility of applying deep learning solutions for structure assignment problems.


2019 ◽  
Author(s):  
Larry Bliss ◽  
Ben Pascoe ◽  
Samuel K Sheppard

AbstractMotivationProtein structure predictions, that combine theoretical chemistry and bioinformatics, are an increasingly important technique in biotechnology and biomedical research, for example in the design of novel enzymes and drugs. Here, we present a new ensemble bi-layered machine learning architecture, that directly builds on ten existing pipelines providing rapid, high accuracy, 3-State secondary structure prediction of proteins.ResultsAfter training on 1348 solved protein structures, we evaluated the model with four independent datasets: JPRED4 - compiled by the authors of the successful predictor with the same name, and CASP11, CASP12 & CASP13 - assembled by the Critical Assessment of protein Structure Prediction consortium who run biannual experiments focused on objective testing of predictors. These rigorous, pre-established protocols included 7-fold cross-validation and blind testing. This led to a mean Hermes accuracy of 95.5%, significantly (p<0.05) better than the ten previously published models analysed in this paper. Furthermore, Hermes yielded a reduction in standard deviation, lower boundary outliers, and reduced dependency on solved structures of homologous proteins, as measured by NEFF score. This architecture provides advantages over other pipelines, while remaining accessible to users at any level of bioinformatics experience.Availability and ImplementationThe source code for Hermes is freely available at: https://github.com/HermesPrediction/Hermes. This page also includes the cross-validation with corresponding models, and all training/testing data presented in this study with predictions and accuracy.


2012 ◽  
Author(s):  
Satya Nanda Vel Arjunan ◽  
Safaai Deris ◽  
Rosli Md Illias

Dengan wujudnya projek jujukan DNA secara besar-besaran, teknik yang tepat untuk meramalkan struktur protein diperlukan. Masalah meramalkan struktur protein daripada jujukan DNA pada dasarnya masih belum dapat diselesaikan walaupun kajian intensif telah dilakukan selama lebih daripada tiga dekad. Dalam kertas kerja ini, teori asas struktur protein akan dibincangkan sebagai panduan umum bagi kajian peramalan struktur protein sekunder. Analisis jujukan terkini serta prinsi p yang digunakan dalam teknik-teknik tersebut akan diterangkan. Kata kunci: peramalan stuktur sekunder protein; rangkaian neural. In the wake of large-scale DNA sequencing projects, accurate tools are needed to predict protein structures. The problem of predicting protein structure from DNA sequence remains fundamentally unsolved even after more than three decades of intensive research. In this paper, fundamental theory of the protein structure of the protein structure will be presented as a general guide to protein secondary structure prediction research. An overview of the state-of-theart in sequence analysis and some princi ples of the methods invloved wil be described. Key words: protein secondary structure prediction;neural networks.


2012 ◽  
Author(s):  
Satya Nanda Vel Arjunan ◽  
Safaai Deris ◽  
Rosli Md Illias

Dengan wujudnya projek jujukan DNA secara besar–besaran, teknik yang tepat untuk meramalkan struktur protein diperlukan. Masalah meramalkan struktur protein daripada jujukan DNA pada dasarnya masih belum dapat diselesaikan walaupun kajian intensif telah dilakukan selama lebih daripada tiga dekad. Dalam kertas kerja ini, teori asas struktur protein akan dibincangkan sebagai panduan umum bagi kajian peramalan struktur protein sekunder. Analisis jujukan terkini serta prinsip yang digunakan dalam teknik–teknik tersebut akan diterangkan. Kata kunci: Peramalan struktur sekunder protein; Rangkaian Neural In the wake of large-scale DNA sequencing projects, accurate tools are needed to predict protein structures. The problem of predicting protein structure from DNA sequence remains fundamentally unsolved even after more than three decades of intensive research. In this paper, fundamental theory of the protein structure will be presented as a general guide to protein secondary structure prediction research. An overview of the state–of–the–art in sequence analysis and some principles of the methods involved wil be described. Key words: Protein secondary structure prediction; Neural networks


2019 ◽  
Author(s):  
◽  
Jie Hou

Protein structure prediction is one of the most important scientific problems in the field of bioinformatics and computational biology. The availability of protein three-dimensional (3D) structure is crucial for studying biological and cellular functions of proteins. The importance of four major sub-problems in protein structure prediction have been clearly recognized. Those include, first, protein secondary structure prediction, second, protein fold recognition, third, protein quality assessment, and fourth, multi-domain assembly. In recent years, deep learning techniques have proved to be a highly effective machine learning method, which has brought revolutionary advances in computer vision, speech recognition and bioinformatics. In this dissertation, five contributions are described. First, DNSS2, a method for protein secondary structure prediction using one-dimensional deep convolution network. Second, DeepSF, a method of applying deep convolutional network to classify protein sequence into one of thousands known folds. Third, CNNQA and DeepRank, two deep neural network approaches to systematically evaluate the quality of predicted protein structures and select the most accurate model as the final protein structure prediction. Fourth, MULTICOM, a protein structure prediction system empowered by deep learning and protein contact prediction. Finally, SAXSDOM, a data-assisted method for protein domain assembly using small-angle X-ray scattering data. All the methods are available as software tools or web servers which are freely available to the scientific community.


Author(s):  
Fawaz H. H. Mahyoub ◽  
Rosni Abdullah

The prediction of protein secondary structure from a protein sequence provides useful information for predicting the three-dimensional structure and function of the protein. In recent decades, protein secondary structure prediction systems have been improved benefiting from the advances in computational techniques as well as the growth and increased availability of solved protein structures in protein data banks. Existing methods for predicting the secondary structure of proteins can be roughly subdivided into statistical, nearest-neighbor, machine learning, meta-predictors, and deep learning approaches. This chapter provides an overview of these computational approaches to predict the secondary structure of proteins, focusing on deep learning techniques, with highlights on key aspects in each approach.


Understanding of intermediate protein structure prediction serves as a crucial component to find the function of residues of amino acid. In this paper, focus on the intermediate protein structure by using feed forward and feedback method and enhancing the concept of sliding window. Prediction of secondary structure is a very cosmic problem of bioinformatics. This can be reduced by predicting or unfold the protein structures if it is unfolded so that can give the great results in medical sciences. Our main motive is to improve the accuracy of secondary structures and minimize the error .Experimentally, use the Multilayer ADALINE network for learning and KERAS TENSORFLOW use for train the weight matrix and sigmoid function for calculating the resultant with back propagation. Resultant of this paper results provides more prominent results as compare to already existing methods. Those improve the accuracy of secondary structure prediction


2010 ◽  
Vol 4 (1) ◽  
pp. 17-30 ◽  
Author(s):  
Leong Lee ◽  
Jennifer L. Leopold ◽  
Cyriac Kandoth ◽  
Ronald L. Frank

Protein structure prediction has always been an important research area in biochemistry. In particular, the prediction of protein secondary structure has been a well-studied research topic. The experimental methods currently used to determine protein structure are accurate, yet costly both in terms of equipment and time. Despite the recent breakthrough of combining multiple sequence alignment information and artificial intelligence algorithms to predict protein secondary structure, the Q3 accuracy of various computational prediction methods rarely has exceeded 75%. In this paper, a newly developed rule-based data-mining approach called RT-RICO (Relaxed Threshold Rule Induction from Coverings) is presented. This method identifies dependencies between amino acids in a protein sequence and generates rules that can be used to predict secondary structure. RT-RICO achieved a Q3 score of 81.75% on the standard test dataset RS126 and a Q3 score of 79.19% on the standard test dataset CB396, an improvement over comparable computational methods.


2021 ◽  
Author(s):  
Shutong Yang ◽  
Yuhong Wang ◽  
Kennie Cruz-Gutierrez ◽  
Fangling Wu ◽  
Chuan-Fan Ding

Abstract BackgroundProtein secondary structure prediction (PSSP) is important for protein structure modeling and design. Over the past a few years, deep learning models have shown promising results for PSSP. However, the current good performers for PSSP often require evolutionary information such as multiple sequence alignments and even real protein structures (templates), entire protein sequences, and amino acid property profiles. ResultsIn this study, we used a fixed-size window of adjacent residues and only amino acid sequences, without any evolutionary information, as inputs, and developed a very simple, yet accurate RNN model: LocalNet. The accuracy for three states of secondary structures is as high as 85.15%, indicating that the local amino acid sequence itself contains enough information for PSSP, a well-known classical view. By comparing to other predictors, we also achieve an state-of-art accuracy on dataset of CASP11, CASP12 and CASP13.ConclusionThe well-trained models are expected to have good applications in protein structure modeling and protein design. This model can be downloaded from https://github.com/lake-chao/protein-secondary-structure-prediction.


Sign in / Sign up

Export Citation Format

Share Document