scholarly journals LIBRUS: combined machine learning and homology information for sequence-based ligand-binding residue prediction

2009 ◽  
Vol 25 (23) ◽  
pp. 3099-3107 ◽  
Author(s):  
C. Kauffman ◽  
G. Karypis
2020 ◽  
Vol 36 (10) ◽  
pp. 3018-3027 ◽  
Author(s):  
Chun-Qiu Xia ◽  
Xiaoyong Pan ◽  
Hong-Bin Shen

Abstract Motivation Knowledge of protein–ligand binding residues is important for understanding the functions of proteins and their interaction mechanisms. From experimentally solved protein structures, how to accurately identify its potential binding sites of a specific ligand on the protein is still a challenging problem. Compared with structure-alignment-based methods, machine learning algorithms provide an alternative flexible solution which is less dependent on annotated homogeneous protein structures. Several factors are important for an efficient protein–ligand prediction model, e.g. discriminative feature representation and effective learning architecture to deal with both the large-scale and severely imbalanced data. Results In this study, we propose a novel deep-learning-based method called DELIA for protein–ligand binding residue prediction. In DELIA, a hybrid deep neural network is designed to integrate 1D sequence-based features with 2D structure-based amino acid distance matrices. To overcome the problem of severe data imbalance between the binding and nonbinding residues, strategies of oversampling in mini-batch, random undersampling and stacking ensemble are designed to enhance the model. Experimental results on five benchmark datasets demonstrate the effectiveness of proposed DELIA pipeline. Availability and implementation The web server of DELIA is available at www.csbio.sjtu.edu.cn/bioinf/delia/. Supplementary information Supplementary data are available at Bioinformatics online.


2013 ◽  
Vol 317 ◽  
pp. 219-223 ◽  
Author(s):  
Zhijun Qiu ◽  
Cuili Qin ◽  
Min Jiu ◽  
Xicheng Wang

2010 ◽  
Vol 26 (8) ◽  
pp. 1022-1028 ◽  
Author(s):  
Wei-Yao Chou ◽  
Wei-I Chou ◽  
Tun-Wen Pai ◽  
Shu-Chuan Lin ◽  
Ting-Ying Jiang ◽  
...  

Author(s):  
Stefan Holderbach ◽  
Lukas Adam ◽  
Bhyravabhotla Jayaram ◽  
Rebecca Wade ◽  
Goutam Mukherjee

The virtual screening of large numbers of compounds against target protein binding sites has become an integral component of drug discovery workflows. This screening is often done by computationally docking ligands into a protein binding site of interest, but this has the drawback that a large number of poses must be evaluated to obtain accurate estimates of protein-ligand binding affinity. We here introduce a fast prefiltering method for ligand prioritization that is based on a set of machine learning models and uses simple pose-invariant physicochemical descriptors of the ligands and the protein binding pocket. Our method, Rapid Screening with Physicochemical Descriptors + machine learning (RASPD+), is trained on PDBbind data and achieves a regression performance better than for the original RASPD method and comparable to traditional scoring functions on a range of different test sets without the need for generating ligand poses. Additionally, we use RASPD+ to identify molecular features important for binding affinity and assess the ability of RASPD+ to enrich active molecules from decoys.


2020 ◽  
Author(s):  
Paul Francoeur ◽  
Tomohide Masuda ◽  
David R. Koes

One of the main challenges in drug discovery is predicting protein-ligand binding affinity. Recently, machine learning approaches have made substantial progress on this task. However, current methods of model evaluation are overly optimistic in measuring generalization to new targets, and there does not exist a standard dataset of sufficient size to compare performance between models. We present a new dataset for structure-based machine learning, the CrossDocked2020 set, with 22.5 million poses of ligands docked into multiple similar binding pockets across the Protein Data Bank and perform a comprehensive evaluation of grid-based convolutional neural network models on this dataset. We also demonstrate how the partitioning of the training data and test data can impact the results of models trained with the PDBbind dataset, how performance improves by adding more, lower-quality training data, and how training with docked poses imparts pose sensitivity to the predicted affinity of a complex. Our best performing model, an ensemble of 5 densely connected convolutional newtworks, achieves a root mean squared error of 1.42 and Pearson R of 0.612 on the affinity prediction task, an AUC of 0.956 at binding pose classification, and a 68.4% accuracy at pose selection on the CrossDocked2020 set. By providing data splits for clustered cross-validation and the raw data for the CrossDocked2020 set, we establish the first standardized dataset for training machine learning models to recognize ligands in non-cognate target structures while also greatly expanding the number of poses available for training. In order to facilitate community adoption of this dataset for benchmarking protein-ligand binding affinity prediction, we provide our models, weights, and the CrossDocked2020 set at https://github.com/gnina/models.


2021 ◽  
Author(s):  
Fergus Boyles ◽  
Charlotte M Deane ◽  
Garrett Morris

Machine learning scoring functions for protein-ligand binding affinity have been found to consistently outperform classical scoring functions when trained and tested on crystal structures of bound protein-ligand complexes. However, it is less clear how these methods perform when applied to docked poses of complexes.<br><br>We explore how the use of docked, rather than crystallographic, poses for both training and testing affects the performance of machine learning scoring functions. Using the PDBbind Core Sets as benchmarks, we show that the performance of a structure-based machine learning scoring function trained and tested on docked poses is lower than that of the same scoring function trained and tested on crystallographic poses. We construct a hybrid scoring function by combining both structure-based and ligand-based features, and show that its ability to predict binding affinity using docked poses is comparable to that of purely structure-based scoring functions trained and tested on crystal poses. Despite strong performance on docked poses of the PDBbind Core Sets, we find that our hybrid scoring function fails to generalise to anew data set, demonstrating the need for improved scoring functions and additional validation benchmarks. <br><br>Code and data to reproduce our results are available from https://github.com/oxpig/learning-from-docked-poses.


Sign in / Sign up

Export Citation Format

Share Document