scholarly journals Unsupervised Representation Learning for Proteochemometric Modeling

2021 ◽  
Vol 22 (23) ◽  
pp. 12882
Author(s):  
Paul T. Kim ◽  
Robin Winter ◽  
Djork-Arné Clevert

In silico protein–ligand binding prediction is an ongoing area of research in computational chemistry and machine learning based drug discovery, as an accurate predictive model could greatly reduce the time and resources necessary for the detection and prioritization of possible drug candidates. Proteochemometric modeling (PCM) attempts to create an accurate model of the protein–ligand interaction space by combining explicit protein and ligand descriptors. This requires the creation of information-rich, uniform and computer interpretable representations of proteins and ligands. Previous studies in PCM modeling rely on pre-defined, handcrafted feature extraction methods, and many methods use protein descriptors that require alignment or are otherwise specific to a particular group of related proteins. However, recent advances in representation learning have shown that unsupervised machine learning can be used to generate embeddings that outperform complex, human-engineered representations. Several different embedding methods for proteins and molecules have been developed based on various language-modeling methods. Here, we demonstrate the utility of these unsupervised representations and compare three protein embeddings and two compound embeddings in a fair manner. We evaluate performance on various splits of a benchmark dataset, as well as on an internal dataset of protein–ligand binding activities and find that unsupervised-learned representations significantly outperform handcrafted representations.

2020 ◽  
Author(s):  
Paul Kim ◽  
Robin Winter ◽  
Djork-Arné Clevert

In-silico protein-ligand binding prediction is an ongoing area of research in computational chemistry and machine learning based drug discovery, as an accurate predictive model could greatly reduce the time and resources necessary for the detection and prioritization of possible drug candidates. Proteochemometric modeling (PCM) attempts to make an accurate model of the protein-ligand interaction space by combining explicit protein and ligand descriptors. This requires the creation of information-rich, uniform and computer interpretable representations of proteins and ligands. Previous work in PCM modeling relies on pre-defined, handcrafted feature extraction methods, and many methods use protein descriptors that require alignment or are otherwise specific to a particular group of related proteins. However, recent advances in representation learning have shown that unsupervised machine learning can be used to generate embeddings which outperform complex, human-engineered representations. We apply this reasoning to propose a novel proteochemometric modeling methodology which, for the first time, uses embeddings generated via unsupervised representation learning for both the protein and ligand descriptors. We evaluate performance on various splits of a benchmark dataset, including a challenging split that tests the model’s ability to generalize to proteins for which bioactivity data is greatly limited, and we find that our method consistently outperforms state-of-the-art methods.


2020 ◽  
Author(s):  
Paul Kim ◽  
Robin Winter ◽  
Djork-Arné Clevert

In-silico protein-ligand binding prediction is an ongoing area of research in computational chemistry and machine learning based drug discovery, as an accurate predictive model could greatly reduce the time and resources necessary for the detection and prioritization of possible drug candidates. Proteochemometric modeling (PCM) attempts to make an accurate model of the protein-ligand interaction space by combining explicit protein and ligand descriptors. This requires the creation of information-rich, uniform and computer interpretable representations of proteins and ligands. Previous work in PCM modeling relies on pre-defined, handcrafted feature extraction methods, and many methods use protein descriptors that require alignment or are otherwise specific to a particular group of related proteins. However, recent advances in representation learning have shown that unsupervised machine learning can be used to generate embeddings which outperform complex, human-engineered representations. We apply this reasoning to propose a novel proteochemometric modeling methodology which, for the first time, uses embeddings generated via unsupervised representation learning for both the protein and ligand descriptors. We evaluate performance on various splits of a benchmark dataset, including a challenging split that tests the model’s ability to generalize to proteins for which bioactivity data is greatly limited, and we find that our method consistently outperforms state-of-the-art methods.


Author(s):  
Lennart Gundelach ◽  
Christofer S Tautermann ◽  
Thomas Fox ◽  
Chris-Kriton Skylaris

The accurate prediction of protein-ligand binding free energies with tractable computational methods has the potential to revolutionize drug discovery. Modeling the protein-ligand interaction at a quantum mechanical level, instead of...


2020 ◽  
Author(s):  
Ben Geoffrey A S ◽  
Pavan Preetham Valluri ◽  
Akhil Sanker ◽  
Rafal Madaj ◽  
Host Antony Davidd ◽  
...  

<p>Network data is composed of nodes and edges. Successful application of machine learning/deep learning algorithms on network data to make node classification and link prediction has been shown in the area of social networks through which highly customized suggestions are offered to social network users. Similarly one can attempt the use of machine learning/deep learning algorithms on biological network data to generate predictions of scientific usefulness. In the present work, compound-drug target interaction data set from bindingDB has been used to train machine learning/deep learning algorithms which are used to predict the drug targets for any PubChem compound queried by the user. The user is required to input the PubChem Compound ID (CID) of the compound the user wishes to gain information about its predicted biological activity and the tool outputs the RCSB PDB IDs of the predicted drug target. The tool also incorporates a feature to perform automated <i>In Silico</i> modelling for the compounds and the predicted drug targets to uncover their protein-ligand interaction profiles. The programs fetches the structures of the compound and the predicted drug targets, prepares them for molecular docking using standard AutoDock Scripts that are part of MGLtools and performs molecular docking, protein-ligand interaction profiling of the targets and the compound and stores the visualized results in the working folder of the user. The program is hosted, supported and maintained at the following GitHub repository </p> <p><a href="https://github.com/bengeof/Compound2Drug">https://github.com/bengeof/Compound2Drug</a></p>


2019 ◽  
Vol 32 (10) ◽  
pp. 459-469 ◽  
Author(s):  
Abhinav R Jain ◽  
Zachary T Britton ◽  
Chester E Markwalter ◽  
Anne S Robinson

Abstract The tachykinin 2 receptor (NK2R) plays critical roles in gastrointestinal, respiratory and mental disorders and is a well-recognized target for therapeutic intervention. To date, therapeutics targeting NK2R have failed to meet regulatory agency approval due in large part to the limited characterization of the receptor-ligand interaction and downstream signaling. Herein, we report a protein engineering strategy to improve ligand-binding- and signaling-competent human NK2R that enables a yeast-based NK2R signaling platform by creating chimeras utilizing sequences from rat NK2R. We demonstrate that NK2R chimeras incorporating the rat NK2R C-terminus exhibited improved ligand-binding yields and downstream signaling in engineered yeast strains and mammalian cells, where observed yields were better than 4-fold over wild type. This work builds on our previous studies that suggest exchanging the C-termini of related and well-expressed family members may be a general protein engineering strategy to overcome limitations to ligand-binding and signaling-competent G protein-coupled receptor yields in yeast. We expect these efforts to result in NK2R drug candidates with better characterized signaling properties.


2013 ◽  
Vol 53 (4) ◽  
pp. 763-772 ◽  
Author(s):  
Vladimir Chupakhin ◽  
Gilles Marcou ◽  
Igor Baskin ◽  
Alexandre Varnek ◽  
Didier Rognan

2020 ◽  
Vol 6 ◽  
pp. e253
Author(s):  
Nafees Sadique ◽  
Al Amin Neaz Ahmed ◽  
Md Tajul Islam ◽  
Md. Nawshad Pervage ◽  
Swakkhar Shatabda

Proteins are the building blocks of all cells in both human and all living creatures of the world. Most of the work in the living organism is performed by proteins. Proteins are polymers of amino acid monomers which are biomolecules or macromolecules. The tertiary structure of protein represents the three-dimensional shape of a protein. The functions, classification and binding sites are governed by the protein’s tertiary structure. If two protein structures are alike, then the two proteins can be of the same kind implying similar structural class and ligand binding properties. In this paper, we have used the protein tertiary structure to generate effective features for applications in structural similarity to detect structural class and ligand binding. Firstly, we have analyzed the effectiveness of a group of image-based features to predict the structural class of a protein. These features are derived from the image generated by the distance matrix of the tertiary structure of a given protein. They include local binary pattern (LBP) histogram, Gabor filtered LBP histogram, separate row multiplication matrix with uniform LBP histogram, neighbor block subtraction matrix with uniform LBP histogram and atom bond. Separate row multiplication matrix and neighbor block subtraction matrix filters, as well as atom bond, are our novels. The experiments were done on a standard benchmark dataset. We have demonstrated the effectiveness of these features over a large variety of supervised machine learning algorithms. Experiments suggest support vector machines is the best performing classifier on the selected dataset using the set of features. We believe the excellent performance of Hybrid LBP in terms of accuracy would motivate the researchers and practitioners to use it to identify protein structural class. To facilitate that, a classification model using Hybrid LBP is readily available for use at http://brl.uiu.ac.bd/PL/. Protein-ligand binding is accountable for managing the tasks of biological receptors that help to cure diseases and many more. Therefore, binding prediction between protein and ligand is important for understanding a protein’s activity or to accelerate docking computations in virtual screening-based drug design. Protein-ligand binding prediction requires three-dimensional tertiary structure of the target protein to be searched for ligand binding. In this paper, we have proposed a supervised learning algorithm for predicting protein-ligand binding, which is a similarity-based clustering approach using the same set of features. Our algorithm works better than the most popular and widely used machine learning algorithms.


Molecules ◽  
2021 ◽  
Vol 26 (9) ◽  
pp. 2452
Author(s):  
Enade P. Istyastono ◽  
Nunung Yuniarti ◽  
Vivitri D. Prasasty ◽  
Sudi Mungkasi

Identification of molecular determinants of receptor-ligand binding could significantly increase the quality of structure-based virtual screening protocols. In turn, drug design process, especially the fragment-based approaches, could benefit from the knowledge. Retrospective virtual screening campaigns by employing AutoDock Vina followed by protein-ligand interaction fingerprinting (PLIF) identification by using recently published PyPLIF HIPPOS were the main techniques used here. The ligands and decoys datasets from the enhanced version of the database of useful decoys (DUDE) targeting human G protein-coupled receptors (GPCRs) were employed in this research since the mutation data are available and could be used to retrospectively verify the prediction. The results show that the method presented in this article could pinpoint some retrospectively verified molecular determinants. The method is therefore suggested to be employed as a routine in drug design and discovery.


2020 ◽  
Author(s):  
Ben Geoffrey A S ◽  
Pavan Preetham Valluri ◽  
Akhil Sanker ◽  
Rafal Madaj ◽  
Host Antony Davidd ◽  
...  

<p>Network data is composed of nodes and edges. Successful application of machine learning/deep learning algorithms on network data to make node classification and link prediction has been shown in the area of social networks through which highly customized suggestions are offered to social network users. Similarly one can attempt the use of machine learning/deep learning algorithms on biological network data to generate predictions of scientific usefulness. In the present work, compound-drug target interaction data set from bindingDB has been used to train machine learning/deep learning algorithms which are used to predict the drug targets for any PubChem compound queried by the user. The user is required to input the PubChem Compound ID (CID) of the compound the user wishes to gain information about its predicted biological activity and the tool outputs the RCSB PDB IDs of the predicted drug target. The tool also incorporates a feature to perform automated <i>In Silico</i> modelling for the compounds and the predicted drug targets to uncover their protein-ligand interaction profiles. The programs fetches the structures of the compound and the predicted drug targets, prepares them for molecular docking using standard AutoDock Scripts that are part of MGLtools and performs molecular docking, protein-ligand interaction profiling of the targets and the compound and stores the visualized results in the working folder of the user. The program is hosted, supported and maintained at the following GitHub repository </p> <p><a href="https://github.com/bengeof/Compound2Drug">https://github.com/bengeof/Compound2Drug</a></p>


Sign in / Sign up

Export Citation Format

Share Document