DeepInterface: Protein-protein interface validation using 3D Convolutional Neural Networks

AbstractMotivationProtein–protein interactions are crucial in almost all biological processes. Proteins interact through their interfaces. It is important to determine how proteins interact through interfaces to understand protein binding mechanisms and to predict new protein-protein interactions.ResultsWe present DeepInterface, a deep learning based method which predicts, for a given protein complex, if the interface between the proteins of a complex is a true interface or not. The model is a 3-dimensional convolutional neural networks model and the positive datasets are obtained from all complexes in the Protein Data Bank, the negative datasets are the incorrect solutions of the docking decoys. The model analyzes a given interface structure and outputs the probability of the given structure being an interface. The accuracy of the model for several interface data sets, including PIFACE, PPI4DOCK, DOCKGROUND is approximately 88% in the validation dataset and 75% in the test dataset. The method can be used to improve the accuracy of template based PPI predictions.

Download Full-text

Faculty Opinions recommendation of Comparative assessment of large-scale data sets of protein-protein interactions.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1006598.82257 ◽

2002 ◽

Author(s):

Rob Russell

Keyword(s):

Protein Interactions ◽

Large Scale ◽

Comparative Assessment ◽

Data Sets ◽

Protein Protein Interactions ◽

Large Scale Data ◽

Scale Data ◽

Large Scale Data Sets

Download Full-text

Convolutional Neural Networks for Scientific Images and Other Large Data Sets

10.1007/978-3-030-70388-2_6 ◽

2021 ◽

pp. 149-172

Author(s):

Ryan G. McClarren

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Scientific Images

Download Full-text

EnzyNet: enzyme classification using 3D convolutional neural networks on spatial representation

PeerJ ◽

10.7717/peerj.4750 ◽

2018 ◽

Vol 6 ◽

pp. e4750 ◽

Cited By ~ 24

Author(s):

Afshine Amidi ◽

Shervine Amidi ◽

Dimitrios Vlachakis ◽

Vasileios Megalooikonomou ◽

Nikos Paragios ◽

...

Keyword(s):

Neural Networks ◽

Amino Acid ◽

Convolutional Neural Networks ◽

Protein Function ◽

Enzyme Commission Number ◽

Data Bank ◽

Biochemical Properties ◽

Data Availability ◽

Binary Representation ◽

Enzymatic Function

During the past decade, with the significant progress of computational power as well as ever-rising data availability, deep learning techniques became increasingly popular due to their excellent performance on computer vision problems. The size of the Protein Data Bank (PDB) has increased more than 15-fold since 1999, which enabled the expansion of models that aim at predicting enzymatic function via their amino acid composition. Amino acid sequence, however, is less conserved in nature than protein structure and therefore considered a less reliable predictor of protein function. This paper presents EnzyNet, a novel 3D convolutional neural networks classifier that predicts the Enzyme Commission number of enzymes based only on their voxel-based spatial structure. The spatial distribution of biochemical properties was also examined as complementary information. The two-layer architecture was investigated on a large dataset of 63,558 enzymes from the PDB and achieved an accuracy of 78.4% by exploiting only the binary representation of the protein shape. Code and datasets are available at https://github.com/shervinea/enzynet.

Download Full-text

Deep learning based prediction of reversible HAT/HDAC-specific lysine acetylation

Briefings in Bioinformatics ◽

10.1093/bib/bbz107 ◽

2019 ◽

Vol 21 (5) ◽

pp. 1798-1805 ◽

Cited By ~ 1

Author(s):

Kai Yu ◽

Qingfeng Zhang ◽

Zekun Liu ◽

Yimeng Du ◽

Xinjiao Gao ◽

...

Keyword(s):

Deep Learning ◽

Protein Interactions ◽

Web Server ◽

Data Sets ◽

Lysine Acetylation ◽

Protein Acetylation ◽

Protein Protein Interactions ◽

Cellular Processes ◽

Protein Lysine Acetylation ◽

Specific Regulation

Abstract Protein lysine acetylation regulation is an important molecular mechanism for regulating cellular processes and plays critical physiological and pathological roles in cancers and diseases. Although massive acetylation sites have been identified through experimental identification and high-throughput proteomics techniques, their enzyme-specific regulation remains largely unknown. Here, we developed the deep learning-based protein lysine acetylation modification prediction (Deep-PLA) software for histone acetyltransferase (HAT)/histone deacetylase (HDAC)-specific acetylation prediction based on deep learning. Experimentally identified substrates and sites of several HATs and HDACs were curated from the literature to generate enzyme-specific data sets. We integrated various protein sequence features with deep neural network and optimized the hyperparameters with particle swarm optimization, which achieved satisfactory performance. Through comparisons based on cross-validations and testing data sets, the model outperformed previous studies. Meanwhile, we found that protein–protein interactions could enrich enzyme-specific acetylation regulatory relations and visualized this information in the Deep-PLA web server. Furthermore, a cross-cancer analysis of acetylation-associated mutations revealed that acetylation regulation was intensively disrupted by mutations in cancers and heavily implicated in the regulation of cancer signaling. These prediction and analysis results might provide helpful information to reveal the regulatory mechanism of protein acetylation in various biological processes to promote the research on prognosis and treatment of cancers. Therefore, the Deep-PLA predictor and protein acetylation interaction networks could provide helpful information for studying the regulation of protein acetylation. The web server of Deep-PLA could be accessed at http://deeppla.cancerbio.info.

Download Full-text

Preprocessing for Enhancing the Classification of Pulmonary Data Sets using Convolutional Neural Networks

PROGRAMMNAYA INGENERIA ◽

10.17587/prin.9.369-374 ◽

2018 ◽

Vol 9 (10) ◽

pp. 369-374

Author(s):

N. Esmaeilishahmirzadi ◽

◽

H. Mortezapour ◽

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Data Sets

Download Full-text

NGPINT: A Next-generation protein-protein interaction software

10.1101/2020.09.11.277483 ◽

2020 ◽

Cited By ~ 1

Author(s):

Sagnik Banerjee ◽

Valeria Velásquez-Zapata ◽

Gregory Fuerst ◽

J. Mitch Elmore ◽

Roger P. Wise

Keyword(s):

Protein Interactions ◽

Model Organisms ◽

Cdna Libraries ◽

Published Data ◽

Data Sets ◽

Next Generation ◽

Protein Protein Interactions ◽

Protein Protein Interaction ◽

Alternative Approach ◽

Simulated Test

ABSTRACTMapping protein-protein interactions at a proteome scale is critical to understanding how cellular signaling networks respond to stimuli. Since eukaryotic genomes encode thousands of proteins, testing their interactions one-by-one is a challenging prospect. High-throughput yeast-two hybrid (Y2H) assays that employ next-generation sequencing to interrogate cDNA libraries represent an alternative approach that optimizes scale, cost, and effort. We present NGPINT, a robust and scalable software to identify all putative interactors of a protein using Y2H in batch culture. NGPINT combines diverse tools to align sequence reads to target genomes, reconstruct prey fragments and compute gene enrichment under reporter selection. Central to this pipeline is the identification of fusion reads containing sequences derived from both the Y2H expression plasmid and the cDNA of interest. To reduce false positives, these fusion reads are evaluated as to whether the cDNA fragment forms an in-frame translational fusion with the Y2H transcription factor. NGPINT successfully recognized 95% of interactions in simulated test runs. As proof of concept, NGPINT was tested using published data sets and recognized all validated interactions. NGPINT can be used in any organism with an available reference, thus facilitating the discovery of protein-protein interactions in non-model organisms.

Download Full-text

Design space exploration of Convolutional Neural Networks based on Evolutionary Algorithms

Journal of Computational Vision and Imaging Systems ◽

10.15353/vsnl.v3i1.162 ◽

2017 ◽

Vol 3 (1) ◽

Cited By ~ 4

Author(s):

Abeer Al-Hyari ◽

Shawki Areibi

Keyword(s):

Neural Networks ◽

Genetic Algorithms ◽

Evolutionary Algorithms ◽

Convolutional Neural Networks ◽

Design Space Exploration ◽

Design Space ◽

Space Exploration ◽

Handwritten Digit ◽

Fully Connected ◽

The Given

This paper proposes a framework for design space exploration ofConvolutional Neural Networks (CNNs) using Genetic Algorithms(GAs). CNNs have many hyperparameters that need to be tunedcarefully in order to achieve favorable results when used for imageclassification tasks or similar vision applications. Genetic Algorithmsare adopted to efficiently traverse the huge search spaceof CNNs hyperparameters, and generate the best architecture thatfits the given task. Some of the hyperparameters that were testedinclude the number of convolutional and fully connected layers, thenumber of filters for each convolutional layer, and the number ofnodes in the fully connected layers. The proposed approach wastested using MNIST dataset for handwritten digit classification andresults obtained indicate that the proposed approach is able to generatea CNN architecture with validation accuracy up to 96.66% onaverage.

Download Full-text

Multi-Branch-CNN: classification of ion channel interacting peptides using parallel convolutional neural networks

10.1101/2021.11.13.468342 ◽

2021 ◽

Author(s):

Jielu Yan ◽

Bob Zhang ◽

Mingliang Zhou ◽

Hang Fai Kwok ◽

Shirley W.I. Siu

Keyword(s):

Neural Networks ◽

Ion Channels ◽

Ion Channel ◽

Convolutional Neural Networks ◽

Data Sets ◽

The Novel ◽

Sodium Potassium ◽

Test Set ◽

Drug Candidates

Ligand peptides that have high affinity for ion channels are critical for regulating ion flux across the plasma membrane. These peptides are now being considered as potential drug candidates for many diseases, such as cardiovascular disease and cancers. There are several studies to identify ion channel interacting peptides computationally, but, to the best of our knowledge, none of them published available tools for prediction. To provide a solution, we present Multi-branch-CNN, a parallel convolutional neural networks (CNNs) method for identifying three types of ion channel peptide binders (sodium, potassium, and calcium). Our experiment shows that the Multi-Branch-CNN method performs comparably to thirteen traditional ML algorithms (TML13) on the test sets of three ion channels. To evaluate the predictive power of our method with respect to novel sequences, as is the case in real-world applications, we created an additional test set for each ion channel, called the novel-test set, which has little or no similarities to the sequences in either the sequences of the train set or the test set. In the novel-test experiment, Multi-Branch-CNN performs significantly better than TML13, showing an improvement in accuracy of 6%, 14%, and 15% for sodium, potassium, and calcium channels, respectively. We confirmed the effectiveness of Multi-Branch-CNN by comparing it to the standard CNN method with one input branch (Single-Branch-CNN) and an ensemble method (TML13-Stack). To facilitate applications, the data sets, script files to reproduce the experiments, and the final predictive models are freely available at https://github.com/jieluyan/Multi-Branch-CNN.

Download Full-text

Using Convolutional Neural Networks for Detection and Morphometric Analysis of Carolina Bays from Publicly Available Digital Elevation Models

Remote Sensing ◽

10.3390/rs13183770 ◽

2021 ◽

Vol 13 (18) ◽

pp. 3770

Author(s):

Mark A. Lundine ◽

Arthur C. Trembanis

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Feature Detection ◽

Digital Elevation Models ◽

Open Water ◽

Atlantic Coastal Plain ◽

Detection Scheme ◽

Carolina Bays ◽

Digital Elevation ◽

Almost All

Carolina Bays are oriented and sandy-rimmed depressions that are ubiquitous throughout the Atlantic Coastal Plain (ACP). Their origin has been a highly debated topic since the 1800s and remains unsolved. Past population estimates of Carolina Bays have varied vastly, ranging between as few as 10,000 to as many as 500,000. With such a large uncertainty around the actual population size, mapping these enigmatic features is a problem that requires an automated detection scheme. Using publicly available LiDAR-derived digital elevation models (DEMs) of the ACP as training images, various types of convolutional neural networks (CNNs) were trained to detect Carolina bays. The detection results were assessed for accuracy and scalability, as well as analyzed for various morphologic, land-use and land cover, and hydrologic characteristics. Overall, the detector found over 23,000 Carolina Bays from southern New Jersey to northern Florida, with highest densities along interfluves. Carolina Bays in Delmarva were found to be smaller and shallower than Bays in the southeastern ACP. At least a third of Carolina Bays have been converted to agricultural lands and almost half of all Carolina Bays are forested. Few Carolina Bays are classified as open water basins, yet almost all of the detected Bays were within 2 km of a water body. In addition, field investigations based upon detection results were performed to describe the sedimentology of Carolina Bays. Sedimentological investigations showed that Bays typically have 1.5 m to 2.5 m thick sand rims that show a gradient in texture, with coarser sand at the bottom and finer sand and silt towards the top. Their basins were found to be 0.5 m to 2 m thick and showed a mix of clayey, silty, and sandy deposits. Last, the results compiled during this study were compared to similar depressional features (i.e., playa-lunette systems) to pinpoint any similarities in origin processes. Altogether, this study shows that CNNs are valuable tools for automated geomorphic feature detection and can lead to new insights when coupled with various forms of remotely sensed and field-based datasets.

Download Full-text

Introspective analysis of convolutional neural networks for improving discrimination performance and feature visualisation

PeerJ Computer Science ◽

10.7717/peerj-cs.497 ◽

2021 ◽

Vol 7 ◽

pp. e497

Author(s):

Shakeel Shafiq ◽

Tayyaba Azim

Keyword(s):

Neural Network ◽

Neural Networks ◽

Convolutional Neural Network ◽

Convolutional Neural Networks ◽

Discrimination Performance ◽

Input Image ◽

Data Sets ◽

Discrimination Power ◽

Level Information ◽

Fully Connected

Deep neural networks have been widely explored and utilised as a useful tool for feature extraction in computer vision and machine learning. It is often observed that the last fully connected (FC) layers of convolutional neural network possess higher discrimination power as compared to the convolutional and maxpooling layers whose goal is to preserve local and low-level information of the input image and down sample it to avoid overfitting. Inspired from the functionality of local binary pattern (LBP) operator, this paper proposes to induce discrimination into the mid layers of convolutional neural network by introducing a discriminatively boosted alternative to pooling (DBAP) layer that has shown to serve as a favourable replacement of early maxpooling layer in a convolutional neural network (CNN). A thorough research of the related works show that the proposed change in the neural architecture is novel and has not been proposed before to bring enhanced discrimination and feature visualisation power achieved from the mid layer features. The empirical results reveal that the introduction of DBAP layer in popular neural architectures such as AlexNet and LeNet produces competitive classification results in comparison to their baseline models as well as other ultra-deep models on several benchmark data sets. In addition, better visualisation of intermediate features can allow one to seek understanding and interpretation of black box behaviour of convolutional neural networks, used widely by the research community.

Download Full-text