Developing a Scoring Function for NMR Structure-based Assignments using Machine Learning

Author(s):  
Mehmet Çağri Çalpur ◽  
Hakan Erdoğan ◽  
Bülent Çatay ◽  
Bruce R. Donald ◽  
Mehmet Serkan Apaydin
Author(s):  
Jun Pei ◽  
Zheng Zheng ◽  
Hyunji Kim ◽  
Lin Song ◽  
Sarah Walworth ◽  
...  

An accurate scoring function is expected to correctly select the most stable structure from a set of pose candidates. One can hypothesize that a scoring function’s ability to identify the most stable structure might be improved by emphasizing the most relevant atom pairwise interactions. However, it is hard to evaluate the relevant importance for each atom pair using traditional means. With the introduction of machine learning methods, it has become possible to determine the relative importance for each atom pair present in a scoring function. In this work, we use the Random Forest (RF) method to refine a pair potential developed by our laboratory (GARF6) by identifying relevant atom pairs that optimize the performance of the potential on our given task. Our goal is to construct a machine learning (ML) model that can accurately differentiate the native ligand binding pose from candidate poses using a potential refined by RF optimization. We successfully constructed RF models on an unbalanced data set with the ‘comparison’ concept and, the resultant RF models were tested on CASF-2013.5 In a comparison of the performance of our RF models against 29 scoring functions, we found our models outperformed the other scoring functions in predicting the native pose. In addition, we used two artificial designed potential models to address the importance of the GARF potential in the RF models: (1) a scrambled probability function set, which was obtained by mixing up atom pairs and probability functions in GARF, and (2) a uniform probability function set, which share the same peak positions with GARF but have fixed peak heights. The results of accuracy comparison from RF models based on the scrambled, uniform, and original GARF potential clearly showed that the peak positions in the GARF potential are important while the well depths are not. <br>


Molecules ◽  
2019 ◽  
Vol 24 (15) ◽  
pp. 2747 ◽  
Author(s):  
Eliane Briand ◽  
Ragnar Thomsen ◽  
Kristian Linnet ◽  
Henrik Berg Rasmussen ◽  
Søren Brunak ◽  
...  

The human carboxylesterase 1 (CES1), responsible for the biotransformation of many diverse therapeutic agents, may contribute to the occurrence of adverse drug reactions and therapeutic failure through drug interactions. The present study is designed to address the issue of potential drug interactions resulting from the inhibition of CES1. Based on an ensemble of 10 crystal structures complexed with different ligands and a set of 294 known CES1 ligands, we used docking (Autodock Vina) and machine learning methodologies (LDA, QDA and multilayer perceptron), considering the different energy terms from the scoring function to assess the best combination to enable the identification of CES1 inhibitors. The protocol was then applied on a library of 1114 FDA-approved drugs and eight drugs were selected for in vitro CES1 inhibition. An inhibition effect was observed for diltiazem (IC50 = 13.9 µM). Three others drugs (benztropine, iloprost and treprostinil), exhibited a weak CES1 inhibitory effects with IC50 values of 298.2 µM, 366.8 µM and 391.6 µM respectively. In conclusion, the binding site of CES1 is relatively flexible and can adapt its conformation to different types of ligands. Combining ensemble docking and machine learning approaches improves the prediction of CES1 inhibitors compared to a docking study using only one crystal structure.


2021 ◽  
Author(s):  
Fergus Boyles ◽  
Charlotte M Deane ◽  
Garrett Morris

Machine learning scoring functions for protein-ligand binding affinity have been found to consistently outperform classical scoring functions when trained and tested on crystal structures of bound protein-ligand complexes. However, it is less clear how these methods perform when applied to docked poses of complexes.<br><br>We explore how the use of docked, rather than crystallographic, poses for both training and testing affects the performance of machine learning scoring functions. Using the PDBbind Core Sets as benchmarks, we show that the performance of a structure-based machine learning scoring function trained and tested on docked poses is lower than that of the same scoring function trained and tested on crystallographic poses. We construct a hybrid scoring function by combining both structure-based and ligand-based features, and show that its ability to predict binding affinity using docked poses is comparable to that of purely structure-based scoring functions trained and tested on crystal poses. Despite strong performance on docked poses of the PDBbind Core Sets, we find that our hybrid scoring function fails to generalise to anew data set, demonstrating the need for improved scoring functions and additional validation benchmarks. <br><br>Code and data to reproduce our results are available from https://github.com/oxpig/learning-from-docked-poses.


2020 ◽  
Author(s):  
Pedro Ballester

Interest in docking technologies has grown parallel to the ever increasing number and diversity of 3D models for macromolecular therapeutic targets. Structure-Based Virtual Screening (SBVS) aims at leveraging these experimental structures to discover the necessary starting points for the drug discovery process. It is now established that Machine Learning (ML) can strongly enhance the predictive accuracy of scoring functions for SBVS by exploiting large datasets from targets, molecules and their associations. However, with greater choice, the question of which ML-based scoring function is the most suitable for prospective use on a given target has gained importance. Here we analyse two approaches to select an existing scoring function for the target along with a third approach consisting in generating a scoring function tailored to the target. These analyses required discussing the limitations of popular SBVS benchmarks, the alternatives to benchmark scoring functions for SBVS and how to generate them or use them using freely-available software.


2021 ◽  
pp. 1-18
Author(s):  
Seyed Reza Shahamiri ◽  
Fadi Thabtah ◽  
Neda Abdelhamid

BACKGROUND: Autistic Spectrum Disorder (ASD) is a neurodevelopment condition that is normally linked with substantial healthcare costs. Typical ASD screening techniques are time consuming, so the early detection of ASD could reduce such costs and help limit the development of the condition. OBJECTIVE: We propose an automated approach to detect autistic traits that replaces the scoring function used in current ASD screening with a more intelligent and less subjective approach. METHODS: The proposed approach employs deep neural networks (DNNs) to detect hidden patterns from previously labelled cases and controls, then applies the knowledge derived to classify the individual being screened. Specificity, sensitivity, and accuracy of the proposed approach are evaluated using ten-fold cross-validation. A comparative analysis has also been conducted to compare the DNNs’ performance with other prominent machine learning algorithms. RESULTS: Results indicate that deep learning technologies can be embedded within existing ASD screening to assist the stakeholders in the early identification of ASD traits. CONCLUSION: The proposed system will facilitate access to needed support for the social, physical, and educational well-being of the patient and family by making ASD screening more intelligent and accurate.


2011 ◽  
Vol 11 (2-3) ◽  
pp. 263-296 ◽  
Author(s):  
SHAY B. COHEN ◽  
ROBERT J. SIMMONS ◽  
NOAH A. SMITH

AbstractWeighted logic programming, a generalization of bottom-up logic programming, is a well-suited framework for specifying dynamic programming algorithms. In this setting, proofs correspond to the algorithm's output space, such as a path through a graph or a grammatical derivation, and are given a real-valued score (often interpreted as a probability) that depends on the real weights of the base axioms used in the proof. The desired output is a function over all possible proofs, such as a sum of scores or an optimal score. We describe the product transformation, which can merge two weighted logic programs into a new one. The resulting program optimizes a product of proof scores from the original programs, constituting a scoring function known in machine learning as a “product of experts.” Through the addition of intuitive constraining side conditions, we show that several important dynamic programming algorithms can be derived by applying product to weighted logic programs corresponding to simpler weighted logic programs. In addition, we show how the computation of Kullback–Leibler divergence, an information-theoretic measure, can be interpreted using product.


Author(s):  
Alexander Mitrofanov ◽  
Omer S. Alkhnbashi ◽  
Sergey A. Shmakov ◽  
Kira S. Makarova ◽  
Eugene V. Koonin ◽  
...  

CRISPR-Cas are adaptive immune systems that degrade foreign genetic elements in archaea and bacteria. In carrying out their immune functions, CRISPR-Cas systems heavily rely on RNA components. These CRISPR (cr) RNAs are repeat-spacer units that are produced by processing of pre-crRNA, the transcript of CRISPR arrays, and guide Cas protein(s) to the cognate invading nucleic acids, enabling their destruction. Several bioinformatics tools have been developed to detect CRISPR arrays based solely on DNA sequences, but all these tools employ the same strategy of looking for repetitive patterns, which might correspond to CRISPR array repeats. The identified patterns are evaluated using a fixed, built-in scoring function, and arrays exceeding a cut-off value are reported. Here, we instead introduce a data-driven approach that uses machine learning to detect and differentiate true CRISPR arrays from false ones based on several features. Our CRISPR detection tool, CRISPRIdentify, performs three steps: detection, feature extraction and classification based on manually curated sets of positive and negative examples of CRISPR arrays. The identified CRISPR arrays are then reported to the user accompanied by detailed annotation. We demonstrate that our approach identifies not only previously detected CRISPR arrays, but also CRISPR array candidates not detected by other tools. Compared to other methods, our tool has a drastically reduced false-positive rate. In contrast to the existing tools, our approach not only provides the user with the basic statistics on the identified CRISPR arrays but also produces a certainty score as a practical measure of the likelihood that a given genomic region is a CRISPR array.


2020 ◽  
Author(s):  
Pedro Ballester

Interest in docking technologies has grown parallel to the ever increasing number and diversity of 3D models for macromolecular therapeutic targets. Structure-Based Virtual Screening (SBVS) aims at leveraging these experimental structures to discover the necessary starting points for the drug discovery process. It is now established that Machine Learning (ML) can strongly enhance the predictive accuracy of scoring functions for SBVS by exploiting large datasets from targets, molecules and their associations. However, with greater choice, the question of which ML-based scoring function is the most suitable for prospective use on a given target has gained importance. Here we analyse two approaches to select an existing scoring function for the target along with a third approach consisting in generating a scoring function tailored to the target. These analyses required discussing the limitations of popular SBVS benchmarks, the alternatives to benchmark scoring functions for SBVS and how to generate them or use them using freely-available software.


2020 ◽  
Vol 60 (3) ◽  
pp. 1122-1136 ◽  
Author(s):  
Minyi Su ◽  
Guoqin Feng ◽  
Zhihai Liu ◽  
Yan Li ◽  
Renxiao Wang

2021 ◽  
Author(s):  
Fergus Boyles ◽  
Charlotte M Deane ◽  
Garrett Morris

Machine learning scoring functions for protein-ligand binding affinity have been found to consistently outperform classical scoring functions when trained and tested on crystal structures of bound protein-ligand complexes. However, it is less clear how these methods perform when applied to docked poses of complexes.<br><br>We explore how the use of docked, rather than crystallographic, poses for both training and testing affects the performance of machine learning scoring functions. Using the PDBbind Core Sets as benchmarks, we show that the performance of a structure-based machine learning scoring function trained and tested on docked poses is lower than that of the same scoring function trained and tested on crystallographic poses. We construct a hybrid scoring function by combining both structure-based and ligand-based features, and show that its ability to predict binding affinity using docked poses is comparable to that of purely structure-based scoring functions trained and tested on crystal poses. Despite strong performance on docked poses of the PDBbind Core Sets, we find that our hybrid scoring function fails to generalise to anew data set, demonstrating the need for improved scoring functions and additional validation benchmarks. <br><br>Code and data to reproduce our results are available from https://github.com/oxpig/learning-from-docked-poses.


Sign in / Sign up

Export Citation Format

Share Document