Structural Learning of Proteins Using Graph Convolutional Neural Networks

AbstractThe exponential growth of protein structure databases has motivated the development of efficient deep learning methods that perform structural analysis tasks at large scale, ranging from the classification of experimentally determined proteins to the quality assessment and ranking of computationally generated protein models in the context of protein structure prediction. Yet, the literature discussing these methods does not usually interpret what the models learned from the training or identify specific data attributes that contribute to the classification or regression task. While 3D and 2D CNNs have been widely used to deal with structural data, they have several limitations when applied to structural proteomics data. We pose that graph-based convolutional neural networks (GCNNs) are an efficient alternative while producing results that are interpretable. In this work, we demonstrate the applicability of GCNNs to protein structure classification problems. We define a novel spatial graph convolution network architecture which employs graph reduction methods to reduce the total number of trainable parameters and promote abstraction in interme-diate representations. We show that GCNNs are able to learn effectively from simplistic graph representations of protein structures while providing the ability to interpret what the network learns during the training and how it applies it to perform its task. GCNNs perform comparably to their 2D CNN counterparts in predictive performance and they are outperformed by them in training speeds. The graph-based data representation allows GCNNs to be a more efficient option over 3D CNNs when working with large-scale datasets as preprocessing costs and data storage requirements are negligible in comparison.

Download Full-text

Literature Survey of Protein Secondary Structure Prediction

Jurnal Teknologi ◽

10.11113/jt.v34.642 ◽

2012 ◽

Author(s):

Satya Nanda Vel Arjunan ◽

Safaai Deris ◽

Rosli Md Illias

Keyword(s):

Protein Structure ◽

Secondary Structure ◽

Structure Prediction ◽

Large Scale ◽

Secondary Structure Prediction ◽

Protein Structures ◽

Protein Secondary Structure ◽

Fundamental Theory ◽

Protein Secondary Structure Prediction ◽

General Guide

Dengan wujudnya projek jujukan DNA secara besar-besaran, teknik yang tepat untuk meramalkan struktur protein diperlukan. Masalah meramalkan struktur protein daripada jujukan DNA pada dasarnya masih belum dapat diselesaikan walaupun kajian intensif telah dilakukan selama lebih daripada tiga dekad. Dalam kertas kerja ini, teori asas struktur protein akan dibincangkan sebagai panduan umum bagi kajian peramalan struktur protein sekunder. Analisis jujukan terkini serta prinsi p yang digunakan dalam teknik-teknik tersebut akan diterangkan. Kata kunci: peramalan stuktur sekunder protein; rangkaian neural. In the wake of large-scale DNA sequencing projects, accurate tools are needed to predict protein structures. The problem of predicting protein structure from DNA sequence remains fundamentally unsolved even after more than three decades of intensive research. In this paper, fundamental theory of the protein structure of the protein structure will be presented as a general guide to protein secondary structure prediction research. An overview of the state-of-theart in sequence analysis and some princi ples of the methods invloved wil be described. Key words: protein secondary structure prediction;neural networks.

Download Full-text

Prediction of Protein Secondary Structure

Jurnal Teknologi ◽

10.11113/jt.v35.605 ◽

2012 ◽

Author(s):

Satya Nanda Vel Arjunan ◽

Safaai Deris ◽

Rosli Md Illias

Keyword(s):

Protein Structure ◽

Secondary Structure ◽

Structure Prediction ◽

Large Scale ◽

Secondary Structure Prediction ◽

State Of The Art ◽

Protein Structures ◽

Protein Secondary Structure ◽

Protein Secondary Structure Prediction ◽

General Guide

Dengan wujudnya projek jujukan DNA secara besar–besaran, teknik yang tepat untuk meramalkan struktur protein diperlukan. Masalah meramalkan struktur protein daripada jujukan DNA pada dasarnya masih belum dapat diselesaikan walaupun kajian intensif telah dilakukan selama lebih daripada tiga dekad. Dalam kertas kerja ini, teori asas struktur protein akan dibincangkan sebagai panduan umum bagi kajian peramalan struktur protein sekunder. Analisis jujukan terkini serta prinsip yang digunakan dalam teknik–teknik tersebut akan diterangkan. Kata kunci: Peramalan struktur sekunder protein; Rangkaian Neural In the wake of large-scale DNA sequencing projects, accurate tools are needed to predict protein structures. The problem of predicting protein structure from DNA sequence remains fundamentally unsolved even after more than three decades of intensive research. In this paper, fundamental theory of the protein structure will be presented as a general guide to protein secondary structure prediction research. An overview of the state–of–the–art in sequence analysis and some principles of the methods involved wil be described. Key words: Protein secondary structure prediction; Neural networks

Download Full-text

Learning the local landscape of protein structures with convolutional neural networks

10.1101/2021.08.19.456994 ◽

2021 ◽

Author(s):

Anastasiya V Kulikova ◽

Daniel J Diaz ◽

James M Loy ◽

Andrew D Ellington ◽

Claus O Wilke

Keyword(s):

Neural Networks ◽

Protein Structure ◽

Amino Acid ◽

Protein Engineering ◽

Convolutional Neural Networks ◽

Fitness Landscape ◽

Fundamental Problem ◽

Protein Structures ◽

Wild Type ◽

Multiple Sequence

The fundamental problem of protein biochemistry is to predict protein structure from amino acid sequence. The inverse problem, predicting either entire sequences or individual mutations that are consistent with a given protein structure, has received much less attention even though it has important applications in both protein engineering and evolutionary biology. Here, we ask whether 3D convolutional neural networks (3D CNNs) can learn the local fitness landscape of protein structure to reliably predict either the wild-type amino acid or the consensus in a multiple sequence alignment from the local structural context surrounding a site of interest. We find that the network can predict wild type with good accuracy, and that network confidence is a reliable measure of whether a given prediction is likely going to be correct or not. Predictions of consensus are less accurate, and are primarily driven by whether or not the consensus matches the wild type. Our work suggests that high-confidence mis-predictions of the wild type may identify sites that are primed for mutation and likely targets for protein engineering.

Download Full-text

Efficient Multicriteria Protein Structure Comparison on Modern Processor Architectures

BioMed Research International ◽

10.1155/2015/563674 ◽

2015 ◽

Vol 2015 ◽

pp. 1-13 ◽

Cited By ~ 2

Author(s):

Anuj Sharma ◽

Elias S. Manolakos

Keyword(s):

Protein Structure ◽

Large Scale ◽

Protein Structures ◽

Structural Proteomics ◽

Single Chip ◽

Structure Comparison ◽

Protein Structure Comparison ◽

Processor Architectures ◽

Comparison Algorithms ◽

Many Core

Fast increasing computational demand for all-to-all protein structures comparison (PSC) is a result of three confounding factors: rapidly expanding structural proteomics databases, high computational complexity of pairwise protein comparison algorithms, and the trend in the domain towards using multiple criteria for protein structures comparison (MCPSC) and combining results. We have developed a software framework that exploits many-core and multicore CPUs to implement efficient parallel MCPSC in modern processors based on three popular PSC methods, namely, TMalign, CE, and USM. We evaluate and compare the performance and efficiency of the two parallel MCPSC implementations using Intel’s experimental many-core Single-Chip Cloud Computer (SCC) as well as Intel’s Core i7 multicore processor. We show that the 48-core SCC is more efficient than the latest generation Core i7, achieving a speedup factor of 42 (efficiency of 0.9), making many-core processors an exciting emerging technology for large-scale structural proteomics. We compare and contrast the performance of the two processors on several datasets and also show that MCPSC outperforms its component methods in grouping related domains, achieving a highF-measure of 0.91 on the benchmark CK34 dataset. The software implementation for protein structure comparison using the three methods and combined MCPSC, along with the developed underlyingrckskelalgorithmic skeletons library, is available via GitHub.

Download Full-text

Estimating Protein Structure Prediction Models Quality Using Convolutional Neural Networks

2018 International Joint Conference on Neural Networks (IJCNN) ◽

10.1109/ijcnn.2018.8489051 ◽

2018 ◽

Cited By ~ 3

Author(s):

Emerson Correia Lima ◽

Fabio Lima Custodio ◽

Gregorio Kappaun Rocha ◽

Laurent E. Dardenne

Keyword(s):

Neural Networks ◽

Protein Structure ◽

Protein Structure Prediction ◽

Convolutional Neural Networks ◽

Structure Prediction ◽

Prediction Models

Download Full-text

Deep learning methods for protein prediction problem

10.32469/10355/65461 ◽

2017 ◽

Author(s):

◽

Son Phong Nguyen

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Convolutional Neural Networks ◽

Structure Prediction ◽

State Of The Art ◽

Protein Structures ◽

Distance Matrix ◽

Loop Modeling ◽

Deep Convolutional Neural Networks

[ACCESS RESTRICTED TO THE UNIVERSITY OF MISSOURI AT AUTHOR'S REQUEST.] Computational protein structure prediction is very important for many applications in bioinformatics. Many prediction methods have been developed, including Modeller, HHpred, I-TASSER, Robetta, and MUFOLD. In the process of predicting protein structures, it is essential to accurately assess the quality of generated models. Consensus quality assessment (QA) methods, such as Pcons-net and MULTICOM-refine, which are based on structure similarity, performed well on QA tasks. The drawback of consensus QA methods is that they require a pool of diverse models to work well, which is not always available. More importantly, they cannot evaluate the quality of a single protein model, which is a very common task in protein predictions and other applications. Although many single-model quality assessment methods, such as ProQ2, MQAPmulti, OPUS-CA, DOPE, DFIRE, and RW, etc. have been developed to address that problem, their accuracy is not good enough for most real applications. In this dissertation, based on the idea of using C-[alpha] atoms distance matrix and deep learning methods, two methods have been proposed for assessing quality of protein structures. First, a novel algorithm based on deep learning techniques, called DL-Pro, is proposed. From training examples of distance matrices corresponding to good and bad models, DL-Pro learns a stacked autoencoder network as a classifier. In experiments on selected targets from the Critical Assessment of Structure Prediction (CASP) competition, DL-Pro obtained promising results, outperforming state-of-the-art energy/scoring functions, including OPUS-CA, DOPE, DFIRE, and RW. Second, a new method DeepCon-QA is developed to predict quality of single protein model. Based on the idea of using protein vector representation and distance matrix, DeepCon-QA was able to achieve comparable performance with the best state-of-the-art QA method in our experiments. It also takes advantage the strength of deep convolutional neural networks to â€œlearnâ€ and â€œunderstandâ€ the input data to be able to predict output data precisely. On the other hand, this dissertation also proposes several new methods for solving loop modeling problem. Five new loop modeling methods based on machine learning techniques, called NearLooper, ConLooper, ResLooper, HyLooper1 and HyLooper2 are proposed. NearLooper is based on the nearest neighbor technique; ConLooper applies deep convolutional neural networks to predict CÎ± atoms distance matrix as an orientation-independent representation of protein structure; ResLooper uses residual neural networks instead of deep convolutional neural networks; HyLooper1 combines the results of NearLooper and ConLooper while HyLooper2 combines NearLooper and ResLooper. Three commonly used benchmarks for loop modeling are used to compare the performance between these methods and existing state-of-the-art methods. The experiment results show promising performance in which our best method improves existing state-of-the-art methods by 28% and 54% of average RMSD on two datasets while being comparable on the other one.

Download Full-text

Sequence Specific Dihedral Angle Distribution: Application in Protein Structure Prediction and Evaluation

Plant Tissue Culture and Biotechnology ◽

10.3329/ptcb.v19i2.5439 ◽

1970 ◽

Vol 19 (2) ◽

pp. 217-226

Author(s):

S. M. Minhaz Ud-Dean ◽

Mahdi Muhammad Moosa

Keyword(s):

Protein Structure ◽

Dihedral Angle ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Protein Structures ◽

Angle Distribution ◽

Ramachandran Plot ◽

Specific Data ◽

Specific Distribution ◽

Structure Evaluation

Protein structure prediction and evaluation is one of the major fields of computational biology. Estimation of dihedral angle can provide information about the acceptability of both theoretically predicted and experimentally determined structures. Here we report on the sequence specific dihedral angle distribution of high resolution protein structures available in PDB and have developed Sasichandran, a tool for sequence specific dihedral angle prediction and structure evaluation. This tool will allow evaluation of a protein structure in pdb format from the sequence specific distribution of Ramachandran angles. Additionally, it will allow retrieval of the most probable Ramachandran angles for a given sequence along with the sequence specific data. Key words: Torsion angle, φ-ψ distribution, sequence specific ramachandran plot, Ramasekharan, protein structure appraisal D.O.I. 10.3329/ptcb.v19i2.5439 Plant Tissue Cult. & Biotech. 19(2): 217-226, 2009 (December)

Download Full-text

Large-Scale E-Commerce Image Retrieval with Top-Weighted Convolutional Neural Networks

Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval - ICMR '16 ◽

10.1145/2911996.2912052 ◽

2016 ◽

Cited By ~ 2

Author(s):

Shichao Zhao ◽

Youjiang Xu ◽

Yahong Han

Keyword(s):

Neural Networks ◽

Image Retrieval ◽

Convolutional Neural Networks ◽

Large Scale

Download Full-text

SketchGNN: Semantic Sketch Segmentation with Graph Neural Networks

ACM Transactions on Graphics ◽

10.1145/3450284 ◽

2021 ◽

Vol 40 (3) ◽

pp. 1-13

Author(s):

Lumin Yang ◽

Jiajie Zhuang ◽

Hongbo Fu ◽

Xiangzhi Wei ◽

Kun Zhou ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Network Architecture ◽

Large Scale ◽

State Of The Art ◽

Semantic Segmentation ◽

Structure Information ◽

Graph Neural Networks ◽

Node Labels ◽

Point Level

We introduce SketchGNN , a convolutional graph neural network for semantic segmentation and labeling of freehand vector sketches. We treat an input stroke-based sketch as a graph with nodes representing the sampled points along input strokes and edges encoding the stroke structure information. To predict the per-node labels, our SketchGNN uses graph convolution and a static-dynamic branching network architecture to extract the features at three levels, i.e., point-level, stroke-level, and sketch-level. SketchGNN significantly improves the accuracy of the state-of-the-art methods for semantic sketch segmentation (by 11.2% in the pixel-based metric and 18.2% in the component-based metric over a large-scale challenging SPG dataset) and has magnitudes fewer parameters than both image-based and sequence-based methods.

Download Full-text