Combining Artificial Neural Networks and GOR-V Information Theory to Predict Protein Secondary Structure from Amino Acid Sequences

2005 ◽  
Vol 1 (4) ◽  
pp. 53-72
Author(s):  
Saad Osman Abdalla Subair ◽  
Safaai Deris
Author(s):  
Saad O.A. Subair ◽  
Safaai Deris

Protein secondary-structure prediction is a fundamental step in determining the 3D structure of a protein. In this chapter, a new method for predicting protein secondary structure from amino-acid sequences has been proposed and implemented. Cuff and Barton 513 protein data set is used in training and testing the prediction methods under the same hardware, platforms, and environments. The newly developed method utilizes the knowledge of the GOR-V information theory and the power of the neural networks to classify a novel protein sequence in one of its three secondary-structures classes (i.e., helices, strands, and coils). The newly developed method (NN-GORV-I) is further improved by applying a filtering mechanism to the searched database and hence named NN-GORV-II. The developed prediction methods are rigorously analyzed and tested together with the other five well-known prediction methods in this domain to allow easy comparison and clear conclusions.


Author(s):  
BO YANG ◽  
XIAOHONG SU ◽  
YADONG WANG

Learning with very large-scale datasets is always necessary when handling real problems using artificial neural networks. However, it is still an open question how to balance computing efficiency and learning stability, when traditional neural networks spend a large amount of running time and memory to solve a problem with large-scale learning dataset. In this paper, we report the first evaluation of neural network distributed-learning strategies in large-scale classification over protein secondary structure. Our accomplishments include: (1) an architecture analysis on distributed-learning, (2) the development of scalable distributed system for large-scale dataset classification, (3) the description of a novel distributed-learning strategy based on chips, (4) a theoretical analysis of distributed-learning strategies for structure-distributed and data-distributed, (5) an investigation and experimental evaluation of distributed-learning strategy based-on chips with respect to time complexity and their effect on the classification accuracy of artificial neural networks. It is demonstrated that the novel distributed-learning strategy is better-balanced in parallel computing efficiency and stability as compared with the previous algorithms. The application of the protein secondary structure prediction demonstrates that this method is feasible and effective in practical applications.


Sign in / Sign up

Export Citation Format

Share Document