Protein docking model evaluation by 3D deep convolutional neural networks

Xiao Wang; Genki Terashi; Charles W Christoffer; Mengmeng Zhu; Daisuke Kihara

doi:10.1093/bioinformatics/btz870

Protein docking model evaluation by 3D deep convolutional neural networks

Bioinformatics ◽

10.1093/bioinformatics/btz870 ◽

2019 ◽

Vol 36 (7) ◽

pp. 2113-2118 ◽

Cited By ~ 7

Author(s):

Xiao Wang ◽

Genki Terashi ◽

Charles W Christoffer ◽

Mengmeng Zhu ◽

Daisuke Kihara

Keyword(s):

Neural Network ◽

Structure Prediction ◽

Deep Neural Network ◽

Molecular Mechanisms ◽

Complex Structure ◽

Protein Docking ◽

Supplementary Information ◽

Atomic Interaction ◽

Deep Convolutional Neural Networks ◽

Docking Model

Abstract Motivation Many important cellular processes involve physical interactions of proteins. Therefore, determining protein quaternary structures provide critical insights for understanding molecular mechanisms of functions of the complexes. To complement experimental methods, many computational methods have been developed to predict structures of protein complexes. One of the challenges in computational protein complex structure prediction is to identify near-native models from a large pool of generated models. Results We developed a convolutional deep neural network-based approach named DOcking decoy selection with Voxel-based deep neural nEtwork (DOVE) for evaluating protein docking models. To evaluate a protein docking model, DOVE scans the protein–protein interface of the model with a 3D voxel and considers atomic interaction types and their energetic contributions as input features applied to the neural network. The deep learning models were trained and validated on docking models available in the ZDock and DockGround databases. Among the different combinations of features tested, almost all outperformed existing scoring functions. Availability and implementation Codes available at http://github.com/kiharalab/DOVE, http://kiharalab.org/dove/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Varmole: a biologically drop-connect deep neural network model for prioritizing disease risk variants and genes

Bioinformatics ◽

10.1093/bioinformatics/btaa866 ◽

2020 ◽

Author(s):

Nam D Nguyen ◽

Ting Jin ◽

Daifeng Wang

Keyword(s):

Neural Network ◽

Network Architecture ◽

Deep Neural Network ◽

Molecular Mechanisms ◽

Genome Wide Association Study ◽

Disease Risk ◽

Supplementary Information ◽

Neural Network Architecture ◽

Risk Variants ◽

Genome Wide

Abstract Summary Population studies such as genome-wide association study have identified a variety of genomic variants associated with human diseases. To further understand potential mechanisms of disease variants, recent statistical methods associate functional omic data (e.g. gene expression) with genotype and phenotype and link variants to individual genes. However, how to interpret molecular mechanisms from such associations, especially across omics, is still challenging. To address this problem, we developed an interpretable deep learning method, Varmole, to simultaneously reveal genomic functions and mechanisms while predicting phenotype from genotype. In particular, Varmole embeds multi-omic networks into a deep neural network architecture and prioritizes variants, genes and regulatory linkages via biological drop-connect without needing prior feature selections. Availability and implementation Varmole is available as a Python tool on GitHub at https://github.com/daifengwanglab/Varmole. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Integrating ab initio and template-based algorithms for protein–protein complex structure prediction

Bioinformatics ◽

10.1093/bioinformatics/btz623 ◽

2019 ◽

Vol 36 (3) ◽

pp. 751-757 ◽

Cited By ~ 1

Author(s):

Sweta Vangaveti ◽

Thom Vreven ◽

Yang Zhang ◽

Zhiping Weng

Keyword(s):

Protein Complex ◽

Structure Prediction ◽

Protein Complexes ◽

Complex Structure ◽

Protein Docking ◽

Supplementary Information ◽

Test Case ◽

Binding Modes ◽

Success Rates ◽

Template Free

Abstract Motivation Template-based and template-free methods have both been widely used in predicting the structures of protein–protein complexes. Template-based modeling is effective when a reliable template is available, while template-free methods are required for predicting the binding modes or interfaces that have not been previously observed. Our goal is to combine the two methods to improve computational protein–protein complex structure prediction. Results Here, we present a method to identify and combine high-confidence predictions of a template-based method (SPRING) with a template-free method (ZDOCK). Cross-validated using the protein–protein docking benchmark version 5.0, our method (ZING) achieved a success rate of 68.2%, outperforming SPRING and ZDOCK, with success rates of 52.1% and 35.9% respectively, when the top 10 predictions were considered per test case. In conclusion, a statistics-based method that evaluates and integrates predictions from template-based and template-free methods is more successful than either method independently. Availability and implementation ZING is available for download as a Github repository (https://github.com/weng-lab/ZING.git). Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Protein Docking Model Evaluation by Graph Neural Networks

10.1101/2020.12.30.424859 ◽

2020 ◽

Author(s):

Xiao Wang ◽

Sean T Flannery ◽

Daisuke Kihara

Keyword(s):

Neural Network ◽

Molecular Mechanisms ◽

Protein Complexes ◽

Chemical Properties ◽

Protein Docking ◽

Cellular Processes ◽

Docking Model ◽

Experimental Approaches ◽

Physical Interactions ◽

Graph Neural Networks

AbstractPhysical interactions of proteins play key roles in many important cellular processes. Therefore, it is crucial to determine the structure of protein complexes to understand molecular mechanisms of interactions. To complement experimental approaches, which usually take a considerable amount of time and resources, various computational methods have been developed to predict the structures of protein complexes. In computational modeling, one of the challenges is to identify near-native structures from a large pool of generated models. Here, we developed a deep learning-based approach named Graph Neural Network-based DOcking decoy eValuation scorE (GNN-DOVE). To evaluate a protein docking model, GNN-DOVE extracts the interface area and represents it as a graph. The chemical properties of atoms and the inter-atom distances are used as features of nodes and edges in the graph. GNN-DOVE was trained and validated on docking models in the Dockground database. GNN-DOVE performed better than existing methods including DOVE, which is our previous development that uses convolutional neural network on voxelized structure models.

Download Full-text

Protein Docking Model Evaluation by Graph Neural Networks

Frontiers in Molecular Biosciences ◽

10.3389/fmolb.2021.647915 ◽

2021 ◽

Vol 8 ◽

Author(s):

Xiao Wang ◽

Sean T. Flannery ◽

Daisuke Kihara

Keyword(s):

Neural Network ◽

Molecular Mechanisms ◽

Protein Complexes ◽

Chemical Properties ◽

Protein Docking ◽

Functional Roles ◽

Cellular Processes ◽

Docking Model ◽

Experimental Approaches ◽

Graph Neural Networks

Physical interactions of proteins play key functional roles in many important cellular processes. To understand molecular mechanisms of such functions, it is crucial to determine the structure of protein complexes. To complement experimental approaches, which usually take a considerable amount of time and resources, various computational methods have been developed for predicting the structures of protein complexes. In computational modeling, one of the challenges is to identify near-native structures from a large pool of generated models. Here, we developed a deep learning–based approach named Graph Neural Network–based DOcking decoy eValuation scorE (GNN-DOVE). To evaluate a protein docking model, GNN-DOVE extracts the interface area and represents it as a graph. The chemical properties of atoms and the inter-atom distances are used as features of nodes and edges in the graph, respectively. GNN-DOVE was trained, validated, and tested on docking models in the Dockground database and further tested on a combined dataset of Dockground and ZDOCK benchmark as well as a CAPRI scoring dataset. GNN-DOVE performed better than existing methods, including DOVE, which is our previous development that uses a convolutional neural network on voxelized structure models.

Download Full-text

SINC: a scale-invariant deep-neural-network classifier for bulk and single-cell RNA-seq data

Bioinformatics ◽

10.1093/bioinformatics/btz801 ◽

2019 ◽

Vol 36 (6) ◽

pp. 1779-1784 ◽

Cited By ~ 1

Author(s):

Chuanqi Wang ◽

Jun Li

Keyword(s):

Neural Network ◽

Single Cell ◽

Count Data ◽

Deep Neural Network ◽

Sequencing Depth ◽

Supplementary Information ◽

Neural Network Classifier ◽

Rna Seq ◽

Scale Invariant ◽

Downstream Analysis

Abstract Motivation Scaling by sequencing depth is usually the first step of analysis of bulk or single-cell RNA-seq data, but estimating sequencing depth accurately can be difficult, especially for single-cell data, risking the validity of downstream analysis. It is thus of interest to eliminate the use of sequencing depth and analyze the original count data directly. Results We call an analysis method ‘scale-invariant’ (SI) if it gives the same result under different estimates of sequencing depth and hence can use the original count data without scaling. For the problem of classifying samples into pre-specified classes, such as normal versus cancerous, we develop a deep-neural-network based SI classifier named scale-invariant deep neural-network classifier (SINC). On nine bulk and single-cell datasets, the classification accuracy of SINC is better than or competitive to the best of eight other classifiers. SINC is easier to use and more reliable on data where proper sequencing depth is hard to determine. Availability and implementation This source code of SINC is available at https://www.nd.edu/∼jli9/SINC.zip. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Feature selection may improve deep neural networks for the bioinformatics problems

Bioinformatics ◽

10.1093/bioinformatics/btz763 ◽

2019 ◽

Cited By ~ 5

Author(s):

Zheng Chen ◽

Meng Pang ◽

Zixin Zhao ◽

Shuainan Li ◽

Rui Miao ◽

...

Keyword(s):

Neural Network ◽

Feature Selection ◽

Deep Neural Network ◽

Deep Neural Networks ◽

Binary Classification ◽

Supplementary Information ◽

Good Prediction ◽

Programming Environment ◽

Data Types ◽

Selection Algorithms

Abstract Motivation Deep neural network (DNN) algorithms were utilized in predicting various biomedical phenotypes recently, and demonstrated very good prediction performances without selecting features. This study proposed a hypothesis that the DNN models may be further improved by feature selection algorithms. Results A comprehensive comparative study was carried out by evaluating 11 feature selection algorithms on three conventional DNN algorithms, i.e. convolution neural network (CNN), deep belief network (DBN) and recurrent neural network (RNN), and three recent DNNs, i.e. MobilenetV2, ShufflenetV2 and Squeezenet. Five binary classification methylomic datasets were chosen to calculate the prediction performances of CNN/DBN/RNN models using feature selected by the 11 feature selection algorithms. Seventeen binary classification transcriptome and two multi-class transcriptome datasets were also utilized to evaluate how the hypothesis may generalize to different data types. The experimental data supported our hypothesis that feature selection algorithms may improve DNN models, and the DBN models using features selected by SVM-RFE usually achieved the best prediction accuracies on the five methylomic datasets. Availability and implementation All the algorithms were implemented and tested under the programming environment Python version 3.6.6. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Quality Assessment of Protein Docking Models Based on Graph Neural Network

Frontiers in Bioinformatics ◽

10.3389/fbinf.2021.693211 ◽

2021 ◽

Vol 1 ◽

Author(s):

Ye Han ◽

Fei He ◽

Yongbing Chen ◽

Wenyuan Qin ◽

Helong Yu ◽

...

Keyword(s):

Neural Network ◽

Quality Assessment ◽

Chemical Properties ◽

Protein Docking ◽

Structural Basis ◽

Docking Model ◽

Testing Dataset ◽

Independent Testing Dataset

Protein docking provides a structural basis for the design of drugs and vaccines. Among the processes of protein docking, quality assessment (QA) is utilized to pick near-native models from numerous protein docking candidate conformations, and it directly determines the final docking results. Although extensive efforts have been made to improve QA accuracy, it is still the bottleneck of current protein docking systems. In this paper, we presented a Deep Graph Attention Neural Network (DGANN) to evaluate and rank protein docking candidate models. DGANN learns inter-residue physio-chemical properties and structural fitness across the two protein monomers in a docking model and generates their probabilities of near-native models. On the ZDOCK decoy benchmark, our DGANN outperformed the ranking provided by ZDOCK in terms of ranking good models into the top selections. Furthermore, we conducted comparative experiments on an independent testing dataset, and the results also demonstrated the superiority and generalization of our proposed method.

Download Full-text

Deep Neural Network for Protein Contact Prediction by Weighting Sequences in a Multiple Sequence Alignment

10.1101/331926 ◽

2018 ◽

Author(s):

Hiroyuki Fukuda ◽

Kentaro Tomii

Keyword(s):

Neural Network ◽

Supervised Learning ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Structure Prediction ◽

Deep Neural Network ◽

Multiple Sequence ◽

Contact Prediction ◽

Meta Learning ◽

Correlation Information

AbstractProtein contact prediction is a crucially important step for protein structure prediction. To predict a contact, approaches of two types are used: evolutionary coupling analysis (ECA) and supervised learning. ECA uses a large multiple sequence alignment (MSA) of homologue sequences and extract correlation information between residues. Supervised learning uses ECA analysis results as input features and can produce higher accuracy. As described herein, we present a new approach to contact prediction which can both extract correlation information and predict contacts in a supervised manner directly from MSA using a deep neural network (DNN). Using DNN, we can obtain higher accuracy than with earlier ECA methods. Simultaneously, we can weight each sequence in MSA to eliminate noise sequences automatically in a supervised way. It is expected that the combination of our method and other meta-learning methods can provide much higher accuracy of contact prediction.

Download Full-text

DNN-Dom: predicting protein domain boundary from sequence alone by deep neural network

Bioinformatics ◽

10.1093/bioinformatics/btz464 ◽

2019 ◽

Vol 35 (24) ◽

pp. 5128-5136 ◽

Cited By ~ 3

Author(s):

Qiang Shi ◽

Weiya Chen ◽

Siqi Huang ◽

Fanglin Jin ◽

Yinghao Dong ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Structure Prediction ◽

Domain Boundary ◽

Protein Domain ◽

Supplementary Information ◽

High Dimensions ◽

Long Range Interactions ◽

Domain Boundary Prediction ◽

And Function

Abstract Motivation Accurate delineation of protein domain boundary plays an important role for protein engineering and structure prediction. Although machine-learning methods are widely used to predict domain boundary, these approaches often ignore long-range interactions among residues, which have been proven to improve the prediction performance. However, how to simultaneously model the local and global interactions to further improve domain boundary prediction is still a challenging problem. Results This article employs a hybrid deep learning method that combines convolutional neural network and gate recurrent units’ models for domain boundary prediction. It not only captures the local and non-local interactions, but also fuses these features for prediction. Additionally, we adopt balanced Random Forest for classification to deal with high imbalance of samples and high dimensions of deep features. Experimental results show that our proposed approach (DNN-Dom) outperforms existing machine-learning-based methods for boundary prediction. We expect that DNN-Dom can be useful for assisting protein structure and function prediction. Availability and implementation The method is available as DNN-Dom Server at http://isyslab.info/DNN-Dom/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

PASSION: an ensemble neural network approach for identifying the binding sites of RBPs on circRNAs

Bioinformatics ◽

10.1093/bioinformatics/btaa522 ◽

2020 ◽

Vol 36 (15) ◽

pp. 4276-4282 ◽

Cited By ~ 7

Author(s):

Cangzhi Jia ◽

Yue Bi ◽

Jinxiang Chen ◽

André Leier ◽

Fuyi Li ◽

...

Keyword(s):

Neural Network ◽

Binding Sites ◽

Deep Neural Network ◽

Sequence Similarity ◽

Supplementary Information ◽

Circular Rnas ◽

Support Vector ◽

Feature Subset ◽

K Nearest Neighbor ◽

Ensemble Neural Network

Abstract Motivation Different from traditional linear RNAs (containing 5′ and 3′ ends), circular RNAs (circRNAs) are a special type of RNAs that have a closed ring structure. Accumulating evidence has indicated that circRNAs can directly bind proteins and participate in a myriad of different biological processes. Results For identifying the interaction of circRNAs with 37 different types of circRNA-binding proteins (RBPs), we develop an ensemble neural network, termed PASSION, which is based on the concatenated artificial neural network (ANN) and hybrid deep neural network frameworks. Specifically, the input of the ANN is the optimal feature subset for each RBP, which has been selected from six types of feature encoding schemes through incremental feature selection and application of the XGBoost algorithm. In turn, the input of the hybrid deep neural network is a stacked codon-based scheme. Benchmarking experiments indicate that the ensemble neural network reaches the average best area under the curve (AUC) of 0.883 across the 37 circRNA datasets when compared with XGBoost, k-nearest neighbor, support vector machine, random forest, logistic regression and Naive Bayes. Moreover, each of the 37 RBP models is extensively tested by performing independent tests, with the varying sequence similarity thresholds of 0.8, 0.7, 0.6 and 0.5, respectively. The corresponding average AUC obtained are 0.883, 0.876, 0.868 and 0.883, respectively, highlighting the effectiveness and robustness of PASSION. Extensive benchmarking experiments demonstrate that PASSION achieves a competitive performance for identifying binding sites between circRNA and RBPs, when compared with several state-of-the-art methods. Availability and implementation A user-friendly web server of PASSION is publicly accessible at http://flagship.erc.monash.edu/PASSION/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text