scholarly journals MolData, A Molecular Benchmark for Disease and Target Based Machine Learning

Author(s):  
Arash Keshavarzi Arshadi ◽  
Milad Salem ◽  
Arash Firouzbakht ◽  
Jiann Shiun Yuan

Abstract Deep learning’s automatic feature extraction has been a revolutionary addition to computational drug discovery, infusing both the capabilities of learning abstract features and discovering complex molecular patterns via learning from molecular data. Since biological and chemical knowledge is necessary for overcoming the challenges of data curation, balancing, training, and evaluation, it is important for databases to contain meaningful information regarding the exact target and disease of each bioassay. The existing depositories such as PubChem or ChMBL offer the screening data of millions of molecules against a variety of cells and targets, however, their bioassays contain complex biological information which can hinder their usage by the machine learning community. In this work, a comprehensive disease and target-based dataset are collected from PubChem in order to facilitate and accelerate molecular machine learning for better drug discovery. MolData is one the largest efforts to date for democratizing the molecular machine learning, with roughly 170 million drug screening results from 1.4 million unique molecules assigned to specific diseases and targets. It also provides 30 unique categories of targets and diseases. Correlation analysis of the MolData bioassays unveils valuable information for drug repurposing for multiple diseases including cancer, metabolic disorders, and infectious diseases. Finally, we provide a benchmark of more than 30 models trained on each category using multitask learning. MolData aims to pave the way for computational drug discovery and accelerate the advancement of molecular artificial intelligence in a practical manner. The MolData benchmark data is available at https:// github.com/Transilico/MolData as well as within the supplementary materials.

2021 ◽  
Author(s):  
Arash Keshavarzi Arshadi

Abstract Deep learning’s automatic feature extraction has been a revolutionary addition to computational drug discovery, infusing both the capabilities of learning abstract features and discovering complex molecular patterns via learning from molecular data. Since biological and chemical knowledge are necessary for overcoming the challenges of data curation, balancing, training, and evaluation, it is important for databases to contain meaningful information regarding the exact target and disease of each bioassay. The existing depositories such as PubChem or ChemBL offer the screening data of millions of molecules against a variety of cells and targets, however, their bioassays contain complex biological information which can hinder their usage by the machine learning community. In this work, a comprehensive disease and target-based dataset is collected from PubChem in order to facilitate and accelerate molecular machine learning for better drug discovery. MolData is one the largest efforts to date for democratizing the molecular machine learning, with roughly 170 million drug screening results from 1.4 million unique molecules assigned to specific diseases and targets. It also provides 30 unique categories of targets and diseases. Correlation analysis of the MolData bioassays unveil valuable information for drug repurposing for multiple diseases including cancer, metabolic disorders, and infectious diseases. Finally, we provide a benchmark of more than 30 models trained on each category using multitask learning. MolData aims to pave the way for computational drug discovery and accelerate the advancement of molecular artificial intelligence in a practical manner. The MolData benchmark data is available at https://github.com/Transilico/MolData as well as within the supplementary materials.


2021 ◽  
Author(s):  
Ruby Srivastava

Computational methods play a key role in the design of therapeutically important molecules for modern drug development. With these “in silico” approaches, machines are learning and offering solutions to some of the most complex drug related problems and has well positioned them as a next frontier for potential breakthrough in drug discovery. Machine learning (ML) methods are used to predict compounds with pharmacological activity, specific pharmacodynamic and ADMET (absorption, distribution, metabolism, excretion and toxicity) properties to evaluate the drugs and their various applications. Modern artificial intelligence (AI) has the capacity to significantly enhance the role of computational methodology in drug discovery. Use of AI in drug discovery and development, drug repurposing, improving pharmaceutical productivity, and clinical trials will certainly reduce the human workload as well as achieving targets in a short period of time. This chapter elaborates the crosstalk between the machine learning techniques, computational tools and the future of AI in the pharmaceutical industry.


2020 ◽  
Author(s):  
Victor O. Gawriljuk ◽  
Phyo Phyo Kyaw Zin ◽  
Daniel H. Foil ◽  
Jean Bernatchez ◽  
Sungjun Beck ◽  
...  

AbstractWith the ongoing SARS-CoV-2 pandemic there is an urgent need for the discovery of a treatment for the coronavirus disease (COVID-19). Drug repurposing is one of the most rapid strategies for addressing this need and numerous compounds have been selected for in vitro testing by several groups already. These have led to a growing database of molecules with in vitro activity against the virus. Machine learning models can assist drug discovery through prediction of the best compounds based on previously published data. Herein we have implemented several machine learning methods to develop predictive models from recent SARS-CoV-2 in vitro inhibition data and used them to prioritize additional FDA approved compounds for in vitro testing selected from our in-house compound library. From the compounds predicted with a Bayesian machine learning model, CPI1062 and CPI1155 showed antiviral activity in HeLa-ACE2 cell-based assays and represent potential repurposing opportunities for COVID-19. This approach can be greatly expanded to exhaustively virtually screen available molecules with predicted activity against this virus as well as a prioritization tool for SARS-CoV-2 antiviral drug discovery programs. The very latest model for SARS-CoV-2 is available at www.assaycentral.org.


2018 ◽  
Vol 20 (5) ◽  
pp. 1878-1912 ◽  
Author(s):  
Ahmet Sureyya Rifaioglu ◽  
Heval Atas ◽  
Maria Jesus Martin ◽  
Rengul Cetin-Atalay ◽  
Volkan Atalay ◽  
...  

Abstract The identification of interactions between drugs/compounds and their targets is crucial for the development of new drugs. In vitro screening experiments (i.e. bioassays) are frequently used for this purpose; however, experimental approaches are insufficient to explore novel drug-target interactions, mainly because of feasibility problems, as they are labour intensive, costly and time consuming. A computational field known as ‘virtual screening’ (VS) has emerged in the past decades to aid experimental drug discovery studies by statistically estimating unknown bio-interactions between compounds and biological targets. These methods use the physico-chemical and structural properties of compounds and/or target proteins along with the experimentally verified bio-interaction information to generate predictive models. Lately, sophisticated machine learning techniques are applied in VS to elevate the predictive performance. The objective of this study is to examine and discuss the recent applications of machine learning techniques in VS, including deep learning, which became highly popular after giving rise to epochal developments in the fields of computer vision and natural language processing. The past 3 years have witnessed an unprecedented amount of research studies considering the application of deep learning in biomedicine, including computational drug discovery. In this review, we first describe the main instruments of VS methods, including compound and protein features (i.e. representations and descriptors), frequently used libraries and toolkits for VS, bioactivity databases and gold-standard data sets for system training and benchmarking. We subsequently review recent VS studies with a strong emphasis on deep learning applications. Finally, we discuss the present state of the field, including the current challenges and suggest future directions. We believe that this survey will provide insight to the researchers working in the field of computational drug discovery in terms of comprehending and developing novel bio-prediction methods.


2020 ◽  
Vol 4 (1) ◽  
pp. 7-9
Author(s):  
Harry F. Rickerby ◽  
Katya Putintseva ◽  
Christopher Cozens

2018 ◽  
Vol 9 (11) ◽  
pp. 1065-1069 ◽  
Author(s):  
Justin S. Smith ◽  
Adrian E. Roitberg ◽  
Olexandr Isayev

2017 ◽  
Author(s):  
Benjamin Sanchez-Lengeling ◽  
Carlos Outeiral ◽  
Gabriel L. Guimaraes ◽  
Alan Aspuru-Guzik

Molecular discovery seeks to generate chemical species tailored to very specific needs. In this paper, we present ORGANIC, a framework based on Objective-Reinforced Generative Adversarial Networks (ORGAN), capable of producing a distribution over molecular space that matches with a certain set of desirable metrics. This methodology combines two successful techniques from the machine learning community: a Generative Adversarial Network (GAN), to create non-repetitive sensible molecular species, and Reinforcement Learning (RL), to bias this generative distribution towards certain attributes. We explore several applications, from optimization of random physicochemical properties to candidates for drug discovery and organic photovoltaic material design.


2020 ◽  
Author(s):  
Sanaa Bardaweel

Recently, an outbreak of fatal coronavirus, SARS-CoV-2, has emerged from China and is rapidly spreading worldwide. As the coronavirus pandemic rages, drug discovery and development become even more challenging. Drug repurposing of the antimalarial drug chloroquine and its hydroxylated form had demonstrated apparent effectiveness in the treatment of COVID-19 associated pneumonia in clinical trials. SARS-CoV-2 spike protein shares 31.9% sequence identity with the spike protein presents in the Middle East Respiratory Syndrome Corona Virus (MERS-CoV), which infects cells through the interaction of its spike protein with the DPP4 receptor found on macrophages. Sitagliptin, a DPP4 inhibitor, that is known for its antidiabetic, immunoregulatory, anti-inflammatory, and beneficial cardiometabolic effects has been shown to reverse macrophage responses in MERS-CoV infection and reduce CXCL10 chemokine production in AIDS patients. We suggest that Sitagliptin may be beneficial alternative for the treatment of COVID-19 disease especially in diabetic patients and patients with preexisting cardiovascular conditions who are already at higher risk of COVID-19 infection.


2019 ◽  
Vol 26 (28) ◽  
pp. 5340-5362 ◽  
Author(s):  
Xin Chen ◽  
Giuseppe Gumina ◽  
Kristopher G. Virga

:As a long-term degenerative disorder of the central nervous system that mostly affects older people, Parkinson’s disease is a growing health threat to our ever-aging population. Despite remarkable advances in our understanding of this disease, all therapeutics currently available only act to improve symptoms but cannot stop the disease progression. Therefore, it is essential that more effective drug discovery methods and approaches are developed, validated, and used for the discovery of disease-modifying treatments for Parkinson’s disease. Drug repurposing, also known as drug repositioning, or the process of finding new uses for existing or abandoned pharmaceuticals, has been recognized as a cost-effective and timeefficient way to develop new drugs, being equally promising as de novo drug discovery in the field of neurodegeneration and, more specifically for Parkinson’s disease. The availability of several established libraries of clinical drugs and fast evolvement in disease biology, genomics and bioinformatics has stimulated the momentums of both in silico and activity-based drug repurposing. With the successful clinical introduction of several repurposed drugs for Parkinson’s disease, drug repurposing has now become a robust alternative approach to the discovery and development of novel drugs for this disease. In this review, recent advances in drug repurposing for Parkinson’s disease will be discussed.


2020 ◽  
Vol 20 (14) ◽  
pp. 1375-1388 ◽  
Author(s):  
Patnala Ganga Raju Achary

The scientists, and the researchers around the globe generate tremendous amount of information everyday; for instance, so far more than 74 million molecules are registered in Chemical Abstract Services. According to a recent study, at present we have around 1060 molecules, which are classified as new drug-like molecules. The library of such molecules is now considered as ‘dark chemical space’ or ‘dark chemistry.’ Now, in order to explore such hidden molecules scientifically, a good number of live and updated databases (protein, cell, tissues, structure, drugs, etc.) are available today. The synchronization of the three different sciences: ‘genomics’, proteomics and ‘in-silico simulation’ will revolutionize the process of drug discovery. The screening of a sizable number of drugs like molecules is a challenge and it must be treated in an efficient manner. Virtual screening (VS) is an important computational tool in the drug discovery process; however, experimental verification of the drugs also equally important for the drug development process. The quantitative structure-activity relationship (QSAR) analysis is one of the machine learning technique, which is extensively used in VS techniques. QSAR is well-known for its high and fast throughput screening with a satisfactory hit rate. The QSAR model building involves (i) chemo-genomics data collection from a database or literature (ii) Calculation of right descriptors from molecular representation (iii) establishing a relationship (model) between biological activity and the selected descriptors (iv) application of QSAR model to predict the biological property for the molecules. All the hits obtained by the VS technique needs to be experimentally verified. The present mini-review highlights: the web-based machine learning tools, the role of QSAR in VS techniques, successful applications of QSAR based VS leading to the drug discovery and advantages and challenges of QSAR.


Sign in / Sign up

Export Citation Format

Share Document