Sars-Cov-2 Spike protein function prediction using a convolutional neural network ensemble

Design Engineering ◽

10.17762/de.vi.4293 ◽

2021 ◽

pp. 7831-7845

Author(s):

Raghad Monther Eid, Eman K. Elsayed, Fatma T. Ghanam

Keyword(s):

Neural Network ◽

Amino Acid ◽

Protein Function ◽

Protein Function Prediction ◽

Small Error ◽

Amino Acid Sequences ◽

Spike Protein ◽

Neural Network Ensemble ◽

Classification Problems ◽

Past Experiences

Introduction: SARS-CoV-2 has become a worldwide pandemic that affects all aspects of life; therefore, numerous organizations and open exploration foundations focus their efforts on research for viable therapeutics. Given past experiences and involvement in SARS, the essential focus has been the Spike protein, considered as the perfect objective for COVID-19 immunotherapies. Most of the vaccines being developed target the spike proteins because this protein covers the virus and helps it invade human cells. Methods: Applications of deep neural network is a quickly expanding field now reaching many areas including proteomics. Results: To be precise, convolutional neural networks have been used for identifying the functional role of amino acid sequences, because of its ability to give nearly accurate results for multi-label classification problems. Here we present a modified convolutional deep learning model that can identify if a given amino acid sequence is a spike protein or not based on the length of the sequence and the function of the protein, that will be done with a short execution time and a relatively small error rate. Conclusion: CNN is an efficient tool at supervised multilabel classification problems

Download Full-text

Convolutional neural networks with image representation of amino acid sequences for protein function prediction

Computational Biology and Chemistry ◽

10.1016/j.compbiolchem.2021.107494 ◽

2021 ◽

Vol 92 ◽

pp. 107494

Author(s):

Samia Tasnim Sara ◽

Md Mehedi Hasan ◽

Ahsan Ahmad ◽

Swakkhar Shatabda

Keyword(s):

Neural Networks ◽

Amino Acid ◽

Convolutional Neural Networks ◽

Protein Function ◽

Protein Function Prediction ◽

Image Representation ◽

Function Prediction ◽

Amino Acid Sequences

Download Full-text

Deep_CNN_LSTM_GO: Protein function prediction from amino-acid sequences

Computational Biology and Chemistry ◽

10.1016/j.compbiolchem.2021.107584 ◽

2021 ◽

Vol 95 ◽

pp. 107584

Author(s):

Mohamed E.M. Elhaj-Abdou ◽

Hassan El-Dib ◽

Amr El-Helw ◽

Mohamed El-Habrouk

Keyword(s):

Amino Acid ◽

Protein Function ◽

Protein Function Prediction ◽

Function Prediction ◽

Amino Acid Sequences

Download Full-text

ProteInfer: deep networks for protein functional inference

10.1101/2021.09.20.461077 ◽

2021 ◽

Author(s):

Theo Sanderson ◽

Maxwell L Bileschi ◽

David Belanger ◽

Lucy Colwell

Keyword(s):

Amino Acid ◽

Amino Acid Sequence ◽

Protein Function ◽

Protein Function Prediction ◽

Query Sequence ◽

Functional Space ◽

Amino Acid Sequences ◽

Deep Convolutional Neural Networks ◽

Software Interfaces ◽

Downstream Analysis

Predicting the function of a protein from its amino acid sequence is a long-standing challenge in bioinformatics. Traditional approaches use sequence alignment to compare a query sequence either to thousands of models of protein families or to large databases of individual protein sequences. Here we instead employ deep convolutional neural networks to directly predict a variety of protein functions -- EC numbers and GO terms -- directly from an unaligned amino acid sequence. This approach provides precise predictions which complement alignment-based methods, and the computational efficiency of a single neural network permits novel and lightweight software interfaces, which we demonstrate with an in-browser graphical interface for protein function prediction in which all computation is performed on the user's personal computer with no data uploaded to remote servers. Moreover, these models place full-length amino acid sequences into a generalised functional space, facilitating downstream analysis and interpretation. To read the interactive version of this paper, visit https://google-research.github.io/proteinfer/

Download Full-text

Using Deep Learning to Annotate the Protein Universe

10.1101/626507 ◽

2019 ◽

Cited By ~ 9

Author(s):

Maxwell L. Bileschi ◽

David Belanger ◽

Drew Bryant ◽

Theo Sanderson ◽

Brandon Carter ◽

...

Keyword(s):

Deep Learning ◽

Amino Acid ◽

Protein Function ◽

Protein Function Prediction ◽

Structural Disorder ◽

Amino Acid Sequences ◽

Computationally Efficient ◽

Learning Sequence ◽

Convolutional Networks ◽

The Relationship

AbstractUnderstanding the relationship between amino acid sequence and protein function is a long-standing problem in molecular biology with far-reaching scientific implications. Despite six decades of progress, state-of-the-art techniques cannot annotate 1/3 of microbial protein sequences, hampering our ability to exploit sequences collected from diverse organisms. In this paper, we explore an alternative methodology based on deep learning that learns the relationship between unaligned amino acid sequences and their functional annotations across all 17929 families of the Pfam database. Using the Pfam seed sequences we establish rigorous benchmark assessments that use both random and clustered data splits to control for potentially confounding sequence similarities between train and test sequences. Using Pfam full, we report convolutional networks that are significantly more accurate and computationally efficient than BLASTp, while learning sequence features such as structural disorder and transmembrane helices. Our model co-locates sequences from unseen families in embedding space, allowing sequences from novel families to be accurately annotated. These results suggest deep learning models will be a core component of future protein function prediction tools.

Download Full-text

Prediction of protein function using a deep convolutional neural network ensemble

PeerJ Computer Science ◽

10.7717/peerj-cs.124 ◽

2017 ◽

Vol 3 ◽

pp. e124 ◽

Cited By ~ 12

Author(s):

Evangelia I. Zacharaki

Keyword(s):

Neural Network ◽

Amino Acid ◽

Convolutional Neural Network ◽

Protein Function ◽

Protein Function Prediction ◽

Function Prediction ◽

Deep Convolutional Neural Network ◽

Supervised Machine Learning ◽

Support Vector ◽

Homologous Proteins

Background The availability of large databases containing high resolution three-dimensional (3D) models of proteins in conjunction with functional annotation allows the exploitation of advanced supervised machine learning techniques for automatic protein function prediction. Methods In this work, novel shape features are extracted representing protein structure in the form of local (per amino acid) distribution of angles and amino acid distances, respectively. Each of the multi-channel feature maps is introduced into a deep convolutional neural network (CNN) for function prediction and the outputs are fused through support vector machines or a correlation-based k-nearest neighbor classifier. Two different architectures are investigated employing either one CNN per multi-channel feature set, or one CNN per image channel. Results Cross validation experiments on single-functional enzymes (n = 44,661) from the PDB database achieved 90.1% correct classification, demonstrating an improvement over previous results on the same dataset when sequence similarity was not considered. Discussion The automatic prediction of protein function can provide quick annotations on extensive datasets opening the path for relevant applications, such as pharmacological target identification. The proposed method shows promise for structure-based protein function prediction, but sufficient data may not yet be available to properly assess the method’s performance on non-homologous proteins and thus reduce the confounding factor of evolutionary relationships.

Download Full-text

ProLanGO: Protein Function Prediction Using Neural Machine Translation Based on a Recurrent Neural Network

Molecules ◽

10.3390/molecules22101732 ◽

2017 ◽

Vol 22 (10) ◽

pp. 1732 ◽

Cited By ~ 84

Author(s):

Renzhi Cao ◽

Colton Freitas ◽

Leong Chan ◽

Miao Sun ◽

Haiqing Jiang ◽

...

Keyword(s):

Neural Network ◽

Machine Translation ◽

Recurrent Neural Network ◽

Protein Function ◽

Protein Function Prediction ◽

Function Prediction ◽

Neural Machine Translation

Download Full-text

A Deep Neural Network Based Hierarchical Multi-Label Classifier for Protein Function Prediction

2019 International Conference on Computer, Information and Telecommunication Systems (CITS) ◽

10.1109/cits.2019.8862034 ◽

2019 ◽

Author(s):

Xin Yuan ◽

Weite Li ◽

Kui Lin ◽

Jinglu Hu

Keyword(s):

Neural Network ◽

Protein Function ◽

Deep Neural Network ◽

Protein Function Prediction ◽

Function Prediction

Download Full-text

ENSEMBLES OF NEURAL NETWORKS BASED ON THE ALTERATION OF INPUT FEATURE VALUES

International Journal of Neural Systems ◽

10.1142/s0129065712003079 ◽

2012 ◽

Vol 22 (01) ◽

pp. 77-87 ◽

Cited By ~ 22

Author(s):

M. A. H. AKHAND ◽

K. MURASE

Keyword(s):

Neural Network ◽

Neural Networks ◽

Experimental Investigation ◽

Pattern Generation ◽

Neural Network Ensemble ◽

Classification Problems ◽

Input Feature ◽

A New Technique ◽

Feature Values ◽

Better Than

An ensemble performs well when the component classifiers are diverse yet accurate, so that the failure of one is compensated for by others. A number of methods have been investigated for constructing ensemble in which some of them train classifiers with the generated patterns. This study investigates a new technique of training pattern generation. The method alters input feature values of some patterns using the values of other patterns to generate different patterns for different classifiers. The effectiveness of neural network ensemble based on the proposed technique was evaluated using a suite of 25 benchmark classification problems, and was found to achieve performance better than or competitive with related conventional methods. Experimental investigation of different input values alteration techniques finds that alteration with pattern values in the same class is better for generalization, although other alteration techniques may offer more diversity.

Download Full-text

Prediction of protein function using a deep convolutional neural network ensemble

10.7287/peerj.preprints.2778 ◽

2017 ◽

Author(s):

Evangelia I Zacharaki

Keyword(s):

Neural Network ◽

Amino Acid ◽

Convolutional Neural Network ◽

Protein Function ◽

Protein Structures ◽

Function Prediction ◽

Deep Convolutional Neural Network ◽

Supervised Machine Learning ◽

Support Vector ◽

Feature Maps

Background. The availability of large databases containing high resolution three-dimensional (3D) models of proteins in conjunction with functional annotation allows the exploitation of advanced supervised machine learning techniques for automatic protein function prediction. Methods. In this work, novel shape features are extracted representing protein structure in the form of local (per amino acid) distribution of angles and amino acid distances, respectively. Each of the multi-channel feature maps is introduced into a deep convolutional neural network (CNN) for function prediction and the outputs are fused through Support Vector Machines (SVM) or a correlation-based k-nearest neighbor classifier. Two different architectures are investigated employing either one CNN per multi-channel feature set, or one CNN per image channel. Results. Cross validation experiments on enzymes (n = 44,661) from the PDB database achieved 90.1% correct classification demonstrating the effectiveness of the proposed method for automatic function annotation of protein structures. Discussion. The automatic prediction of protein function can provide quick annotations on extensive datasets opening the path for relevant applications, such as pharmacological target identification.

Download Full-text

In Silico Pharmaco-Gene-Informatic Identification of Insulin-Like Proteins in Plants

Bioinformatics ◽

10.4018/978-1-4666-3604-0.ch051 ◽

2013 ◽

pp. 948-963 ◽

Cited By ~ 1

Author(s):

Koona Saradha Jyothi ◽

G. R. Sridhar ◽

Kudipudi Srinivas ◽

B. Subba Rao ◽

Allam Apparao

Keyword(s):

Amino Acid ◽

Human Insulin ◽

Vigna Unguiculata ◽

Protein Function ◽

In Silico ◽

Cell Envelope ◽

Amino Acid Sequences ◽

Jack Bean ◽

Canavalia Ensiformis ◽

Bauhinia Purpurea

This chapter presents an extension of the authors’ earlier work, where they showed that nucleotide/amino acid sequences related to insulin occurred in the plant kingdom. It was believed that plants did not have, nor did they need insulin, a protein hormone considered to be restricted to the animal kingdom. In the current study, the human insulin sequence was initially obtained from UniProt/SwissProt (accession no. P01308). Plant genome sequences were obtained from NCBI PubMed (Bauhinia purpurea [Gi|229412], Vigna unguiculata [P83770], and Canavalia ensiformis [Gi|7438602]. Scores were obtained from ProtFun 2.2 [http://www.cbs.dtu.dk/services/ProtFun/]. At the next stage, functions of insulin and glucokinin (insulin like proteins in plants) were predicted by the Protein Function Prediction database (http://dragon.bio.purdue.edu/pfp/index.html), followed by functional site prediction from the ELM database (http://elm.eu.org/). ProtFun predicted the following functions: human insulin (Cell envelope), Jack bean (Energy metabolism), Bauhinia purpurea(Translation). The amino acid Glycine at 32 positions was most highly conserved. Present predictions advocate the use of these sequences (QHLCGS motif) as targets for probing the other plants with lesser homology. In summary our in silico studies have suggested that Bauhinia purpurea (Purple orchid tree-BP), Vigna unguiculata (Cow pea-CP) and Canavalia ensiformis (Jack bean-JB) have conserved the important regions of the human insulin protein.

Download Full-text