A programmatic tool for automatic ease in coronavirus drug discovery through programmatically automated data mining, QSAR and In Silico modelling

<p>The work is composed of python based programmatic tool that automates the dry lab drug discovery workflow for coronavirus. Firstly, the python program is written to automate the process of data mining PubChem database to collect data required to perform a machine learning based AutoQSAR algorithm through which drug leads for coronavirus are generated. The data acquisition from PubChem was carried out through python web scrapping techniques. The workflow of the machine learning based AutoQSAR involves feature learning and descriptor selection, QSAR modelling, validation and prediction. The drug leads generated by the program are required to satisfy the Lipinski’s drug likeness criteria as compounds that satisfy Lipinski’s criteria are likely to be an orally active drug in humans. Drug leads generated by the program are fed as programmatic inputs to an In Silico modelling package to computer model the interaction of the compounds generated as drug leads and the coronaviral drug target identified with their PDB ID : 6Y84. The results are stored in the working folder of the user. The program also generates protein-ligand interaction profiling and stores the visualized images in the working folder of the user. Select drug leads were further studied extensively using Molecular Dynamics Simulations and best binders and their reactive profiles were analysed using Molecular Dynamics and Density Functional Theory calculations. Thus our programmatic tool ushers in a new age of automatic ease in drug identification for coronavirus. </p><p><br></p><p><br></p><p>The program is hosted, maintained and supported at the GitHub repository link given below</p><p><br></p><p>https://github.com/bengeof/Programmatic-tool-to-automate-the-drug-discovery-workflow-for-coronavirus</p>

Download Full-text

Automated In Silico Identification of Drug Candidates for Coronavirus Through a Novel Programmatic Tool and Extensive Computational (MD, DFT) Studies of Select Drug Candidates

10.26434/chemrxiv.12423638 ◽

2020 ◽

Author(s):

Ben Geoffrey A S ◽

Rafal Madaj ◽

Akhil Sanker ◽

Mario Sergio Valdés Tresanco ◽

Host Antony Davidd ◽

...

Keyword(s):

Machine Learning ◽

Molecular Dynamics ◽

Drug Discovery ◽

In Silico ◽

Density Functional ◽

Density Functional Theory Calculations ◽

Ligand Interaction ◽

Drug Candidates ◽

Descriptor Selection ◽

Drug Leads

<p>The work is composed of python based programmatic tool that automates the dry lab drug discovery workflow for coronavirus. Firstly, the python program is written to automate the process of data mining PubChem database to collect data required to perform a machine learning based AutoQSAR algorithm through which drug leads for coronavirus are generated. The data acquisition from PubChem was carried out through python web scrapping techniques. The workflow of the machine learning based AutoQSAR involves feature learning and descriptor selection, QSAR modelling, validation and prediction. The drug leads generated by the program are required to satisfy the Lipinski’s drug likeness criteria as compounds that satisfy Lipinski’s criteria are likely to be an orally active drug in humans. Drug leads generated by the program are fed as programmatic inputs to an In Silico modelling package to computer model the interaction of the compounds generated as drug leads and the coronaviral drug target identified with their PDB ID : 6Y84. The results are stored in the working folder of the user. The program also generates protein-ligand interaction profiling and stores the visualized images in the working folder of the user. Select drug leads were further studied extensively using Molecular Dynamics Simulations and best binders and their reactive profiles were analysed using Molecular Dynamics and Density Functional Theory calculations. Thus our programmatic tool ushers in a new age of automatic ease in drug identification for coronavirus. </p><p><br></p><p><br></p><p>The program is hosted, maintained and supported at the GitHub repository link given below</p><p><br></p><p>https://github.com/bengeof/Programmatic-tool-to-automate-the-drug-discovery-workflow-for-coronavirus</p>

Download Full-text

A Program for Automated Data Mining of PubChem to Screen a Billion Compounds and Generate by Machine Learning Based AutoQSAR Algorithm Anti-Corona Viral Drug Leads (Replicase Polyprotein 1ab Inhibitors) and in Silico Study of the Top Drug Lead Compounds

10.26434/chemrxiv.12423638.v1 ◽

2020 ◽

Author(s):

Ben Geoffrey A S ◽

Akhil Sanker ◽

Host Antony Davidd ◽

Judith Gracia

Keyword(s):

Machine Learning ◽

Data Mining ◽

In Silico ◽

Drug Target ◽

Data Set ◽

Lead Compounds ◽

Pubchem Database ◽

Drug Lead ◽

Lead Generation ◽

Drug Leads

Our work is composed of a python program for automatic data mining of PubChem database to collect data associated with the corona virus drug target replicase polyprotein 1ab (UniProt identifier : POC6X7 ) of data set involving active compounds, their activity value (IC50) and their chemical/molecular descriptors to run a machine learning based AutoQSAR algorithm on the data set to generate anti-corona viral drug leads. The machine learning based AutoQSAR algorithm involves feature selection, QSAR modelling, validation and prediction. The drug leads generated each time the program is run is reflective of the constantly growing PubChem database is an important dynamic feature of the program which facilitates fast and dynamic anti-corona viral drug lead generation reflective of the constantly growing PubChem database. The program prints out the top anti-corona viral drug leads after screening PubChem library which is over a billion compounds. The interaction of top drug lead compounds generated by the program and two corona viral drug target proteins, 3-Cystiene like Protease (3CLPro) and Papain like protease (PLpro) was studied and analysed using molecular docking tools. The compounds generated as drug leads by the program showed favourable interaction with the drug target proteins and thus we recommend the program for use in anti-corona viral compound drug lead generation as it helps reduce the complexity of virtual screening and ushers in an age of automatic ease in drug lead generation. The leads generated by the program can further be tested for drug potential through further In Silico, In Vitro and In Vivo testing <div><br></div><div><div>The program is hosted, maintained and supported at the GitHub repository link given below</div><div><br></div><div>https://github.com/bengeof/Drug-Discovery-P0C6X7</div></div><div><br></div>

Download Full-text

New machine learning and physics-based scoring functions for drug discovery

Scientific Reports ◽

10.1038/s41598-021-82410-1 ◽

2021 ◽

Vol 11 (1) ◽

Cited By ~ 1

Author(s):

Isabella A. Guedes ◽

André M. S. Barreto ◽

Diogo Marinho ◽

Eduardo Krempser ◽

Mélaine A. Kuenemann ◽

...

Keyword(s):

Machine Learning ◽

Drug Discovery ◽

Protein Interactions ◽

In Silico ◽

Drug Targets ◽

Support Vector ◽

Scoring Functions ◽

Protein Protein Interactions ◽

Energy Prediction ◽

Original Class

AbstractScoring functions are essential for modern in silico drug discovery. However, the accurate prediction of binding affinity by scoring functions remains a challenging task. The performance of scoring functions is very heterogeneous across different target classes. Scoring functions based on precise physics-based descriptors better representing protein–ligand recognition process are strongly needed. We developed a set of new empirical scoring functions, named DockTScore, by explicitly accounting for physics-based terms combined with machine learning. Target-specific scoring functions were developed for two important drug targets, proteases and protein–protein interactions, representing an original class of molecules for drug discovery. Multiple linear regression (MLR), support vector machine and random forest algorithms were employed to derive general and target-specific scoring functions involving optimized MMFF94S force-field terms, solvation and lipophilic interactions terms, and an improved term accounting for ligand torsional entropy contribution to ligand binding. DockTScore scoring functions demonstrated to be competitive with the current best-evaluated scoring functions in terms of binding energy prediction and ranking on four DUD-E datasets and will be useful for in silico drug design for diverse proteins as well as for specific targets such as proteases and protein–protein interactions. Currently, the MLR DockTScore is available at www.dockthor.lncc.br.

Download Full-text

Automated identification of small drug molecules for Hepatitis C virus through a novel programmatic tool and extensive Molecular Dynamics studies of select drug candidates

10.1101/2020.07.07.192518 ◽

2020 ◽

Author(s):

Rafal Madaj ◽

Akhil Sanker ◽

Ben Geoffrey A S ◽

Host Antony David ◽

Shubham Verma ◽

...

Keyword(s):

Machine Learning ◽

Molecular Dynamics ◽

Hepatitis C Virus ◽

Hepatitis C ◽

Drug Discovery ◽

Molecular Mechanics ◽

In Silico ◽

Drug Identification ◽

Drug Leads ◽

Viral Helicase

AbstractWe report a novel python based programmatic tool that automates the dry lab drug discovery workflow for Hepatitis C virus. Firstly, the python program is written to automate the process of data mining PubChem database to collect data required to perform a machine learning based AutoQSAR algorithm through which drug leads for Hepatitis C virus is generated. The workflow of the machine learning based AutoQSAR involves feature learning and descriptor selection, QSAR modelling, validation and prediction. The drug leads generated by the program are required to satisfy the Lipinski’s drug likeness criteria. 50 of the drug leads generated by the program are fed as programmatic inputs to an In Silico modelling package by the program for fast virtual screening and computer modelling of the interaction of the compounds generated as drug leads and the drug target, a viral Helicase of Hepatitis C. The results are stored automatically in the working folder of the user by the program. The program also generates protein-ligand interaction profiling and stores the visualized images in the working folder of the user. Select protein-ligand complexes associated with structurally diverse ligands having lowest binding energy were selected for extensive molecular dynamics simulation studies and subsequently for molecular mechanics generalized-born surface area (MMGBSA) with pairwise decomposition calculations. The molecular mechanics studies predict In Silico that the compounds generated by the program inhibit the viral helicase of Hepatitis C and prevent the replication of the virus. Thus our programmatic tool ushers in the new age of automatic ease in drug identification for Hepatitis C virus through a programmatic tool that completely automates the dry lab drug discovery workflow. The program is hosted, maintained and supported at the GitHub repository link given below https://github.com/bengeof/Automated-drug-identification-programmatic-tool-for-Hepatitis-C-virus

Download Full-text

Target2drug : A Novel Programmatic Workflow to Automate in Silico Drug Discovery

10.26434/chemrxiv.13262603 ◽

2020 ◽

Author(s):

Ben Geoffrey A S ◽

Rafal Madaj ◽

Akhil Sanker ◽

Pavan Preetham Valluri ◽

Judith Gracia ◽

...

Keyword(s):

Drug Discovery ◽

Computational Biology ◽

In Silico ◽

Drug Target ◽

Drug Targets ◽

Qsar Model ◽

Learning Tools ◽

Active Compounds ◽

Qsar Modelling ◽

Drug Leads

As the Big Data and Artificial Intelligence (AI) revolution continues to affect every area of our lives, it’s influence is also exerted in the areas of bioinformatics, computational biology and drug discovery. Machine/Deep Learning tools have been developed to predict compounds-drug target interactions and the vice-versa process of predicting target interactions for an compound. In our presented work, we report a programmatic tool, which incorporates many features of the bioinformatics, computational biology and AI-driven drug discovery revolutions into a single workflow assembly. When a user is required to identify drugs against a new drug target, the user provides target signatures in the form of amino acid sequence of the target or it’s corresponding nucleotide sequence as input to the tool and the tool carries out a BLAST protocol to identify known protein drug targets that are similar to the new target submitted by the user and collects data linked to the target involving, active compounds against the target, the activity value and molecular descriptors of active compounds to perform QSAR modelling and to generate drug leads with predictions from the validated QSAR model. The tool performs an In-Silico modelling to generate In-Silico interaction profiles of compounds generated as drug leads and the target and stores the results in the working folder of the user. To demonstrate the use of the tool, we have carried out a demonstration with the target signatures of the current pandemic causing virus, SARS-CoV 2. However the tool can be used against any target and is expected to help in growing our knowledge graph of targets and interacting compounds. <br>

Download Full-text

QPoweredTarget2DeNovoDrugPropMax : a novel programmatic tool incorporating deep learning and in silico methods for automated de novo drug design for any target of interest

10.31219/osf.io/b8y79 ◽

2021 ◽

Author(s):

Ben Geoffrey ◽

Rafal Madaj ◽

Pavan Preetham Valluri ◽

Akhil Sanker

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Drug Discovery ◽

In Silico ◽

Drug Target ◽

Data Science ◽

De Novo ◽

Autodock Vina ◽

De Novo Drug Design ◽

Ligand Interaction

The past decade has seen a surge in the range of application data science, machine learning, deep learning, and AI methods to drug discovery. The presented work involves an assemblage of a variety of AI methods for drug discovery along with the incorporation of in silico techniques to provide a holistic tool for automated drug discovery. When drug candidates are required to be identified for aparticular drug target of interest, the user is required to provide the tool target signatures in the form of an amino acid sequence or its corresponding nucleotide sequence. The tool collects data registered on PubChem required to perform an automated QSAR and with the validated QSAR model, prediction and drug lead generation are carried out. This protocol we call Target2Drug. This is followed by a protocol we call Target2DeNovoDrug wherein novel molecules with likely activityagainst the target are generated de novo using a generative LSTM model. It is often required in drug discovery that the generated molecules possess certain properties like drug-likeness, and therefore to optimize the generated de novo molecules toward the required drug-like property we use a deep learning model called DeepFMPO, and this protocol we call Target2DeNovoDrugPropMax. This is followed by the fast automated AutoDock-Vina based in silico modeling and profiling of theinteraction of optimized drug leads and the drug target. This is followed by an automated execution of the Molecular Dynamics protocol that is also carried out for the complex identified with the best protein-ligand interaction from the AutoDock- Vina based virtual screening. The results are stored in the working folder of the user. The code is maintained, supported, and provide for use in thefollowing GitHub repositoryhttps://github.com/bengeof/Target2DeNovoDrugPropMaxAnticipating the rise in the use of quantum computing and quantum machine learning in drug discovery we use the Penny-lane interface to quantum hardware to turn classical Keras layers used in our machine/deep learning models into a quantum layer and introduce quantum layers into our classical models to produce a quantum-classical machine/deep learning hybrid model of our tool and the code corresponding to the same is provided belowhttps://github.com/bengeof/QPoweredTarget2DeNovoDrugPropMax

Download Full-text

Target2drug : A Novel Programmatic Workflow to Automate in Silico Drug Discovery

10.26434/chemrxiv.13262603.v1 ◽

2020 ◽

Author(s):

Ben Geoffrey A S ◽

Rafal Madaj ◽

Akhil Sanker ◽

Pavan Preetham Valluri ◽

Judith Gracia ◽

...

Keyword(s):

Drug Discovery ◽

Computational Biology ◽

In Silico ◽

Drug Target ◽

Drug Targets ◽

Qsar Model ◽

Learning Tools ◽

Active Compounds ◽

Qsar Modelling ◽

Drug Leads

As the Big Data and Artificial Intelligence (AI) revolution continues to affect every area of our lives, it’s influence is also exerted in the areas of bioinformatics, computational biology and drug discovery. Machine/Deep Learning tools have been developed to predict compounds-drug target interactions and the vice-versa process of predicting target interactions for an compound. In our presented work, we report a programmatic tool, which incorporates many features of the bioinformatics, computational biology and AI-driven drug discovery revolutions into a single workflow assembly. When a user is required to identify drugs against a new drug target, the user provides target signatures in the form of amino acid sequence of the target or it’s corresponding nucleotide sequence as input to the tool and the tool carries out a BLAST protocol to identify known protein drug targets that are similar to the new target submitted by the user and collects data linked to the target involving, active compounds against the target, the activity value and molecular descriptors of active compounds to perform QSAR modelling and to generate drug leads with predictions from the validated QSAR model. The tool performs an In-Silico modelling to generate In-Silico interaction profiles of compounds generated as drug leads and the target and stores the results in the working folder of the user. To demonstrate the use of the tool, we have carried out a demonstration with the target signatures of the current pandemic causing virus, SARS-CoV 2. However the tool can be used against any target and is expected to help in growing our knowledge graph of targets and interacting compounds. <br>

Download Full-text

Compound2Drug – a Machine/deep Learning Tool for Predicting the Bioactivity of PubChem Compounds

10.26434/chemrxiv.13052951 ◽

2020 ◽

Author(s):

Ben Geoffrey A S ◽

Pavan Preetham Valluri ◽

Akhil Sanker ◽

Rafal Madaj ◽

Host Antony Davidd ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Molecular Docking ◽

Drug Target ◽

Drug Targets ◽

Learning Algorithms ◽

Network Data ◽

Ligand Interaction ◽

Pubchem Compound ◽

Protein Ligand Interaction

<p>Network data is composed of nodes and edges. Successful application of machine learning/deep learning algorithms on network data to make node classification and link prediction has been shown in the area of social networks through which highly customized suggestions are offered to social network users. Similarly one can attempt the use of machine learning/deep learning algorithms on biological network data to generate predictions of scientific usefulness. In the present work, compound-drug target interaction data set from bindingDB has been used to train machine learning/deep learning algorithms which are used to predict the drug targets for any PubChem compound queried by the user. The user is required to input the PubChem Compound ID (CID) of the compound the user wishes to gain information about its predicted biological activity and the tool outputs the RCSB PDB IDs of the predicted drug target. The tool also incorporates a feature to perform automated <i>In Silico</i> modelling for the compounds and the predicted drug targets to uncover their protein-ligand interaction profiles. The programs fetches the structures of the compound and the predicted drug targets, prepares them for molecular docking using standard AutoDock Scripts that are part of MGLtools and performs molecular docking, protein-ligand interaction profiling of the targets and the compound and stores the visualized results in the working folder of the user. The program is hosted, supported and maintained at the following GitHub repository </p> <p><a href="https://github.com/bengeof/Compound2Drug">https://github.com/bengeof/Compound2Drug</a></p>

Download Full-text

Providing the ‘Best’ Lipophilicity Assessment in a Drug Discovery Environment

10.26434/chemrxiv.14292485 ◽

2021 ◽

Author(s):

george chang ◽

Nathaniel Woody ◽

Christopher Keefer

Keyword(s):

Machine Learning ◽

Drug Discovery ◽

High Throughput ◽

In Silico ◽

Shake Flask ◽

Chromatographic Method ◽

Learning Approach ◽

Rule Based ◽

Machine Learning Approach ◽

High Throughput Screens

Lipophilicity is a fundamental structural property that influences almost every aspect of drug discovery. Within Pfizer, we have two complementary high-throughput screens for measuring lipophilicity as a distribution coefficient (LogD) – a miniaturized shake-flask method (SFLogD) and a chromatographic method (ELogD). The results from these two assays are not the same (see Figure 1), with each assay being applicable or more reliable in particular chemical spaces. In addition to LogD assays, the ability to predict the LogD value for virtual compounds is equally vital. Here we present an in-silico LogD model, applicable to all chemical spaces, based on the integration of the LogD data from both assays. We developed two approaches towards a single LogD model – a Rule-based and a Machine Learning approach. Ultimately, the Machine Learning LogD model was found to be superior to both internally developed and commercial LogD models.<br>

Download Full-text