MolData, A Molecular Benchmark for Disease and Target Based Machine Learning

Abstract Deep learning’s automatic feature extraction has been a revolutionary addition to computational drug discovery, infusing both the capabilities of learning abstract features and discovering complex molecular patterns via learning from molecular data. Since biological and chemical knowledge is necessary for overcoming the challenges of data curation, balancing, training, and evaluation, it is important for databases to contain meaningful information regarding the exact target and disease of each bioassay. The existing depositories such as PubChem or ChMBL offer the screening data of millions of molecules against a variety of cells and targets, however, their bioassays contain complex biological information which can hinder their usage by the machine learning community. In this work, a comprehensive disease and target-based dataset are collected from PubChem in order to facilitate and accelerate molecular machine learning for better drug discovery. MolData is one the largest efforts to date for democratizing the molecular machine learning, with roughly 170 million drug screening results from 1.4 million unique molecules assigned to specific diseases and targets. It also provides 30 unique categories of targets and diseases. Correlation analysis of the MolData bioassays unveils valuable information for drug repurposing for multiple diseases including cancer, metabolic disorders, and infectious diseases. Finally, we provide a benchmark of more than 30 models trained on each category using multitask learning. MolData aims to pave the way for computational drug discovery and accelerate the advancement of molecular artificial intelligence in a practical manner. The MolData benchmark data is available at https:// github.com/Transilico/MolData as well as within the supplementary materials.

Download Full-text

MolData, A Molecular Benchmark for Disease and Target Based Machine Learning

10.21203/rs.3.rs-968557/v1 ◽

2021 ◽

Author(s):

Arash Keshavarzi Arshadi

Keyword(s):

Machine Learning ◽

Drug Discovery ◽

Learning Community ◽

Drug Repurposing ◽

Molecular Data ◽

Molecular Machine ◽

Biological Information ◽

Automatic Feature Extraction ◽

Molecular Patterns ◽

Computational Drug Discovery

Abstract Deep learning’s automatic feature extraction has been a revolutionary addition to computational drug discovery, infusing both the capabilities of learning abstract features and discovering complex molecular patterns via learning from molecular data. Since biological and chemical knowledge are necessary for overcoming the challenges of data curation, balancing, training, and evaluation, it is important for databases to contain meaningful information regarding the exact target and disease of each bioassay. The existing depositories such as PubChem or ChemBL offer the screening data of millions of molecules against a variety of cells and targets, however, their bioassays contain complex biological information which can hinder their usage by the machine learning community. In this work, a comprehensive disease and target-based dataset is collected from PubChem in order to facilitate and accelerate molecular machine learning for better drug discovery. MolData is one the largest efforts to date for democratizing the molecular machine learning, with roughly 170 million drug screening results from 1.4 million unique molecules assigned to specific diseases and targets. It also provides 30 unique categories of targets and diseases. Correlation analysis of the MolData bioassays unveil valuable information for drug repurposing for multiple diseases including cancer, metabolic disorders, and infectious diseases. Finally, we provide a benchmark of more than 30 models trained on each category using multitask learning. MolData aims to pave the way for computational drug discovery and accelerate the advancement of molecular artificial intelligence in a practical manner. The MolData benchmark data is available at https://github.com/Transilico/MolData as well as within the supplementary materials.

Download Full-text

Transformation of Drug Discovery towards Artificial Intelligence: An in Silico Approach

10.5772/intechopen.99018 ◽

2021 ◽

Author(s):

Ruby Srivastava

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Drug Discovery ◽

In Silico ◽

Drug Repurposing ◽

Machine Learning Techniques ◽

Learning Techniques ◽

Modern Drug ◽

Short Period

Computational methods play a key role in the design of therapeutically important molecules for modern drug development. With these “in silico” approaches, machines are learning and offering solutions to some of the most complex drug related problems and has well positioned them as a next frontier for potential breakthrough in drug discovery. Machine learning (ML) methods are used to predict compounds with pharmacological activity, specific pharmacodynamic and ADMET (absorption, distribution, metabolism, excretion and toxicity) properties to evaluate the drugs and their various applications. Modern artificial intelligence (AI) has the capacity to significantly enhance the role of computational methodology in drug discovery. Use of AI in drug discovery and development, drug repurposing, improving pharmaceutical productivity, and clinical trials will certainly reduce the human workload as well as achieving targets in a short period of time. This chapter elaborates the crosstalk between the machine learning techniques, computational tools and the future of AI in the pharmaceutical industry.

Download Full-text

Machine Learning Models Identify Inhibitors of SARS-CoV-2

10.1101/2020.06.16.154765 ◽

2020 ◽

Author(s):

Victor O. Gawriljuk ◽

Phyo Phyo Kyaw Zin ◽

Daniel H. Foil ◽

Jean Bernatchez ◽

Sungjun Beck ◽

...

Keyword(s):

Machine Learning ◽

Drug Discovery ◽

Antiviral Drug ◽

Drug Repurposing ◽

Published Data ◽

Learning Models ◽

Machine Learning Model ◽

Bayesian Machine Learning ◽

Machine Learning Models

AbstractWith the ongoing SARS-CoV-2 pandemic there is an urgent need for the discovery of a treatment for the coronavirus disease (COVID-19). Drug repurposing is one of the most rapid strategies for addressing this need and numerous compounds have been selected for in vitro testing by several groups already. These have led to a growing database of molecules with in vitro activity against the virus. Machine learning models can assist drug discovery through prediction of the best compounds based on previously published data. Herein we have implemented several machine learning methods to develop predictive models from recent SARS-CoV-2 in vitro inhibition data and used them to prioritize additional FDA approved compounds for in vitro testing selected from our in-house compound library. From the compounds predicted with a Bayesian machine learning model, CPI1062 and CPI1155 showed antiviral activity in HeLa-ACE2 cell-based assays and represent potential repurposing opportunities for COVID-19. This approach can be greatly expanded to exhaustively virtually screen available molecules with predicted activity against this virus as well as a prioritization tool for SARS-CoV-2 antiviral drug discovery programs. The very latest model for SARS-CoV-2 is available at www.assaycentral.org.

Download Full-text

Recent applications of deep learning and machine intelligence on in silico drug discovery: methods, tools and databases

Briefings in Bioinformatics ◽

10.1093/bib/bby061 ◽

2018 ◽

Vol 20 (5) ◽

pp. 1878-1912 ◽

Cited By ~ 45

Author(s):

Ahmet Sureyya Rifaioglu ◽

Heval Atas ◽

Maria Jesus Martin ◽

Rengul Cetin-Atalay ◽

Volkan Atalay ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Drug Discovery ◽

Machine Intelligence ◽

New Drugs ◽

Machine Learning Techniques ◽

The Past ◽

Screening Experiments ◽

Learning Techniques ◽

Computational Drug Discovery

Abstract The identification of interactions between drugs/compounds and their targets is crucial for the development of new drugs. In vitro screening experiments (i.e. bioassays) are frequently used for this purpose; however, experimental approaches are insufficient to explore novel drug-target interactions, mainly because of feasibility problems, as they are labour intensive, costly and time consuming. A computational field known as ‘virtual screening’ (VS) has emerged in the past decades to aid experimental drug discovery studies by statistically estimating unknown bio-interactions between compounds and biological targets. These methods use the physico-chemical and structural properties of compounds and/or target proteins along with the experimentally verified bio-interaction information to generate predictive models. Lately, sophisticated machine learning techniques are applied in VS to elevate the predictive performance. The objective of this study is to examine and discuss the recent applications of machine learning techniques in VS, including deep learning, which became highly popular after giving rise to epochal developments in the fields of computer vision and natural language processing. The past 3 years have witnessed an unprecedented amount of research studies considering the application of deep learning in biomedicine, including computational drug discovery. In this review, we first describe the main instruments of VS methods, including compound and protein features (i.e. representations and descriptors), frequently used libraries and toolkits for VS, bioactivity databases and gold-standard data sets for system training and benchmarking. We subsequently review recent VS studies with a strong emphasis on deep learning applications. Finally, we discuss the present state of the field, including the current challenges and suggest future directions. We believe that this survey will provide insight to the researchers working in the field of computational drug discovery in terms of comprehending and developing novel bio-prediction methods.

Download Full-text

Machine learning-driven protein engineering: a case study in computational drug discovery

Engineering Biology ◽

10.1049/enb.2019.0019 ◽

2020 ◽

Vol 4 (1) ◽

pp. 7-9

Author(s):

Harry F. Rickerby ◽

Katya Putintseva ◽

Christopher Cozens

Keyword(s):

Machine Learning ◽

Drug Discovery ◽

Protein Engineering ◽

Computational Drug Discovery

Download Full-text

Transforming Computational Drug Discovery with Machine Learning and AI

ACS Medicinal Chemistry Letters ◽

10.1021/acsmedchemlett.8b00437 ◽

2018 ◽

Vol 9 (11) ◽

pp. 1065-1069 ◽

Cited By ~ 21

Author(s):

Justin S. Smith ◽

Adrian E. Roitberg ◽

Olexandr Isayev

Keyword(s):

Machine Learning ◽

Drug Discovery ◽

Computational Drug Discovery

Download Full-text

ORGANIC (1).pdf

10.26434/chemrxiv.5309668.v1 ◽

2017 ◽

Author(s):

Benjamin Sanchez-Lengeling ◽

Carlos Outeiral ◽

Gabriel L. Guimaraes ◽

Alan Aspuru-Guzik

Keyword(s):

Machine Learning ◽

Learning Community ◽

Chemical Species ◽

Material Design ◽

Organic Photovoltaic ◽

Generative Adversarial Networks ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Adversarial Networks ◽

Photovoltaic Material

Molecular discovery seeks to generate chemical species tailored to very specific needs. In this paper, we present ORGANIC, a framework based on Objective-Reinforced Generative Adversarial Networks (ORGAN), capable of producing a distribution over molecular space that matches with a certain set of desirable metrics. This methodology combines two successful techniques from the machine learning community: a Generative Adversarial Network (GAN), to create non-repetitive sensible molecular species, and Reinforcement Learning (RL), to bias this generative distribution towards certain attributes. We explore several applications, from optimization of random physicochemical properties to candidates for drug discovery and organic photovoltaic material design.

Download Full-text

Sitagliptin: a potential drug for the treatment of SARS-CoV-2?

10.31222/osf.io/78mbt ◽

2020 ◽

Author(s):

Sanaa Bardaweel

Keyword(s):

Clinical Trials ◽

Drug Discovery ◽

Antimalarial Drug ◽

Drug Repurposing ◽

Spike Protein ◽

Diabetic Patients ◽

Potential Drug ◽

Chemokine Production ◽

Drug Discovery And Development ◽

Sequence Identity

Recently, an outbreak of fatal coronavirus, SARS-CoV-2, has emerged from China and is rapidly spreading worldwide. As the coronavirus pandemic rages, drug discovery and development become even more challenging. Drug repurposing of the antimalarial drug chloroquine and its hydroxylated form had demonstrated apparent effectiveness in the treatment of COVID-19 associated pneumonia in clinical trials. SARS-CoV-2 spike protein shares 31.9% sequence identity with the spike protein presents in the Middle East Respiratory Syndrome Corona Virus (MERS-CoV), which infects cells through the interaction of its spike protein with the DPP4 receptor found on macrophages. Sitagliptin, a DPP4 inhibitor, that is known for its antidiabetic, immunoregulatory, anti-inflammatory, and beneficial cardiometabolic effects has been shown to reverse macrophage responses in MERS-CoV infection and reduce CXCL10 chemokine production in AIDS patients. We suggest that Sitagliptin may be beneficial alternative for the treatment of COVID-19 disease especially in diabetic patients and patients with preexisting cardiovascular conditions who are already at higher risk of COVID-19 infection.

Download Full-text

Recent Advances in Drug Repurposing for Parkinson’s Disease

Current Medicinal Chemistry ◽

10.2174/0929867325666180719144850 ◽

2019 ◽

Vol 26 (28) ◽

pp. 5340-5362 ◽

Cited By ~ 2

Author(s):

Xin Chen ◽

Giuseppe Gumina ◽

Kristopher G. Virga

Keyword(s):

Parkinson’S Disease ◽

Parkinson's Disease ◽

Drug Discovery ◽

De Novo ◽

Drug Repositioning ◽

Drug Repurposing ◽

Cost Effective ◽

New Drugs ◽

Recent Advances ◽

Clinical Drugs

:As a long-term degenerative disorder of the central nervous system that mostly affects older people, Parkinson’s disease is a growing health threat to our ever-aging population. Despite remarkable advances in our understanding of this disease, all therapeutics currently available only act to improve symptoms but cannot stop the disease progression. Therefore, it is essential that more effective drug discovery methods and approaches are developed, validated, and used for the discovery of disease-modifying treatments for Parkinson’s disease. Drug repurposing, also known as drug repositioning, or the process of finding new uses for existing or abandoned pharmaceuticals, has been recognized as a cost-effective and timeefficient way to develop new drugs, being equally promising as de novo drug discovery in the field of neurodegeneration and, more specifically for Parkinson’s disease. The availability of several established libraries of clinical drugs and fast evolvement in disease biology, genomics and bioinformatics has stimulated the momentums of both in silico and activity-based drug repurposing. With the successful clinical introduction of several repurposed drugs for Parkinson’s disease, drug repurposing has now become a robust alternative approach to the discovery and development of novel drugs for this disease. In this review, recent advances in drug repurposing for Parkinson’s disease will be discussed.

Download Full-text

Applications of Quantitative Structure-Activity Relationships (QSAR) based Virtual Screening in Drug Design: A Review

Mini-Reviews in Medicinal Chemistry ◽

10.2174/1389557520666200429102334 ◽

2020 ◽

Vol 20 (14) ◽

pp. 1375-1388 ◽

Cited By ~ 2

Author(s):

Patnala Ganga Raju Achary

Keyword(s):

Machine Learning ◽

Drug Discovery ◽

Virtual Screening ◽

Model Building ◽

Chemical Space ◽

Qsar Model ◽

Quantitative Structure ◽

Efficient Manner ◽

Qsar Analysis ◽

Structure Activity

The scientists, and the researchers around the globe generate tremendous amount of information everyday; for instance, so far more than 74 million molecules are registered in Chemical Abstract Services. According to a recent study, at present we have around 1060 molecules, which are classified as new drug-like molecules. The library of such molecules is now considered as ‘dark chemical space’ or ‘dark chemistry.’ Now, in order to explore such hidden molecules scientifically, a good number of live and updated databases (protein, cell, tissues, structure, drugs, etc.) are available today. The synchronization of the three different sciences: ‘genomics’, proteomics and ‘in-silico simulation’ will revolutionize the process of drug discovery. The screening of a sizable number of drugs like molecules is a challenge and it must be treated in an efficient manner. Virtual screening (VS) is an important computational tool in the drug discovery process; however, experimental verification of the drugs also equally important for the drug development process. The quantitative structure-activity relationship (QSAR) analysis is one of the machine learning technique, which is extensively used in VS techniques. QSAR is well-known for its high and fast throughput screening with a satisfactory hit rate. The QSAR model building involves (i) chemo-genomics data collection from a database or literature (ii) Calculation of right descriptors from molecular representation (iii) establishing a relationship (model) between biological activity and the selected descriptors (iv) application of QSAR model to predict the biological property for the molecules. All the hits obtained by the VS technique needs to be experimentally verified. The present mini-review highlights: the web-based machine learning tools, the role of QSAR in VS techniques, successful applications of QSAR based VS leading to the drug discovery and advantages and challenges of QSAR.

Download Full-text