ChemInformatics Model Explorer (CIME): Exploratory analysis of chemical model explanations

Author(s):  
Christina Humer ◽  
Henry Heberle ◽  
Floriane Montanari ◽  
Thomas Wolf ◽  
Florian Huber ◽  
...  

The introduction of machine learning to small molecule research – an inherently multidisciplinary field in which chemists and data scientists combine their expertise and collaborate – has been vital to making screening processes more efficient. In recent years, numerous models that predict pharmacokinetic properties or bioactivity have been published, and these are used on a daily basis by chemists to make decisions and prioritize ideas. The emerging field of explainable artificial intelligence is opening up new possibilities for understanding the reasoning that underlies a model. In small molecule research, this means relating contributions of substructures of compounds to their predicted properties, which in turn also allows the areas of the compounds that have the greatest influence on the outcome to be identified. However, there is no interactive visualization tool that facilitates such interdisciplinary collaborations towards interpretability of machine learning models for small molecules. To fill this gap, we present CIME (ChemInformatics Model Explorer), an interactive web-based system that allows users to inspect chemical data sets, visualize model explanations, compare interpretability techniques, and explore subgroups of compounds. The tool is model-agnostic and can be run on a server or a workstation.

Database ◽  
2021 ◽  
Vol 2021 ◽  
Author(s):  
Eryk Kropiwnicki ◽  
John E Evangelista ◽  
Daniel J Stein ◽  
Daniel J B Clarke ◽  
Alexander Lachmann ◽  
...  

AbstractUnderstanding the underlying molecular and structural similarities between seemingly heterogeneous sets of drugs can aid in identifying drug repurposing opportunities and assist in the discovery of novel properties of preclinical small molecules. A wealth of information about drug and small molecule structure, targets, indications and side effects; induced gene expression signatures; and other attributes are publicly available through web-based tools, databases and repositories. By processing, abstracting and aggregating information from these resources into drug set libraries, knowledge about novel properties of drugs and small molecules can be systematically imputed with machine learning. In addition, drug set libraries can be used as the underlying database for drug set enrichment analysis. Here, we present Drugmonizome, a database with a search engine for querying annotated sets of drugs and small molecules for performing drug set enrichment analysis. Utilizing the data within Drugmonizome, we also developed Drugmonizome-ML. Drugmonizome-ML enables users to construct customized machine learning pipelines using the drug set libraries from Drugmonizome. To demonstrate the utility of Drugmonizome, drug sets from 12 independent SARS-CoV-2 in vitro screens were subjected to consensus enrichment analysis. Despite the low overlap among these 12 independent in vitro screens, we identified common biological processes critical for blocking viral replication. To demonstrate Drugmonizome-ML, we constructed a machine learning pipeline to predict whether approved and preclinical drugs may induce peripheral neuropathy as a potential side effect. Overall, the Drugmonizome and Drugmonizome-ML resources provide rich and diverse knowledge about drugs and small molecules for direct systems pharmacology applications.Database URL: https://maayanlab.cloud/drugmonizome/.


2021 ◽  
Author(s):  
Khyati Gohil ◽  
M. Zain Kazmi ◽  
Florence Williams

Neurotrophic small molecule natural products are functional analogs of signaling proteins called neurotrophins, which cause a pro-growth, pro-survival, or pro-differentiation response in neuronal cells. While these phenotypic responses are desirable to combat neurodegenerative disease progression, the pharmacokinetic properties of neurotrophins present challenges to their administration. Therefore, neurotrophic small molecules such as the cis- and trans-banglenes offer attractive alternatives. We describe the synthesis and testing of banglene derivatives and establish a structure-activity response for the banglene family. We demonstrate that (–) trans-banglene is the primarily active enantiomer, and that select modifications on the cyclohexene ring of trans-banglene do not significantly impair its bioactivity. Finally, we demonstrate that (–) trans-banglene potentiation of NGF induced neuritogenesis is unaffected by the presence of these Erk1/2, Akt and Pkc inhibitors. Our structure-activity results also suggest that (–) trans-banglene neurotrophic activity and its potentiation of NGF activity might be distinct unassociated processes.


2019 ◽  
Author(s):  
Bryce K Allen ◽  
Nagi G Ayad ◽  
Stephan C Schürer

Deep learning is a machine learning technique that attempts to model high-level abstractions in data by utilizing a graph composed of multiple processing layers that experience various linear and non-linear transformations. This technique has been shown to perform well for applications in drug discovery, utilizing structural features of small molecules to predict activity. However, the application of deep learning to discriminating features of kinase inhibitors has not been well explored. Small molecule kinase inhibitors are an important class of anti-cancer agents and have demonstrated impressive clinical efficacy in several different diseases. However, resistance is often observed mediated by adaptive Kinome reprogramming or subpopulation diversity. Therefore, polypharmacology and combination therapies offer potential therapeutic strategies for patients with resistant disease. Their development would benefit from more comprehensive and dense knowledge of small-molecule inhibition across the human Kinome. Because such data is not publicly available, we evaluated multiple machine learning methods to predict small molecule inhibition of 342 kinases using over 650K aggregated bioactivity annotations for over 300K small molecules curated from ChEMBL and the Kinase Knowledge Base (KKB). Our results demonstrated that multi-task deep neural networks outperform classical single-task methods, offering potential towards predicting activity profiles and filling gaps in the available data.


Author(s):  
Sara S. El Zahed ◽  
Shawn French ◽  
Maya A. Farha ◽  
Garima Kumar ◽  
Eric D. Brown

Discovering new Gram-negative antibiotics has been a challenge for decades. This has been largely attributed to a limited understanding of the molecular descriptors governing Gram-negative permeation and efflux evasion. Herein, we address the contribution of efflux using a novel approach that applies multivariate analysis, machine learning, and structure-based clustering to some 4,500 actives from a small molecule screen in efflux-compromised Escherichia coli. We employed principal-component analysis and trained two decision tree-based machine learning models to investigate descriptors contributing to the antibacterial activity and efflux susceptibility of these actives. This approach revealed that the Gram-negative activity of hydrophobic and planar small molecules with low molecular stability is limited to efflux-compromised E. coli. Further, molecules with reduced branching and compactness showed increased susceptibility to efflux. Given these distinct properties that govern efflux, we developed the first machine learning model, called Susceptibility to Efflux Random Forest (SERF), as a tool to analyze the molecular descriptors of small molecules and predict those that could be susceptible to efflux pumps in silico. Here, SERF demonstrated high accuracy in identifying such molecules. Further, we clustered all 4,500 actives based on their core structures and identified distinct clusters highlighting side chain moieties that cause marked changes in efflux susceptibility. In all, our work reveals a role for physicochemical and structural parameters in governing efflux, presents a machine learning tool for rapid in silico analysis of efflux susceptibility, and provides a proof of principle for the potential of exploiting side chain modification to design novel antimicrobials evading efflux pumps.


2006 ◽  
Vol 5 (3) ◽  
pp. 185-191 ◽  
Author(s):  
Alex Ivanov ◽  
Dianne Cyr

Electronic brainstorming systems have been shown to lead to more ideas, yet unsupported face-to-face brainstorming is still widely preferred. This paper proposes a graphical user interface for a web-based system for design problem-solving or other intellective tasks involving convergent and divergent thinking. Referring to the literature on group support systems and information and knowledge visualization, the study extends features of concept mapping and synthesizes these into a prototype called the Concept Plot (CP). Based on an advertising design task, the paper shows how the CP can be collaboratively constructed in two directions, as text and pictures are uploaded onto nodes, and these nodes scaled up or down as users click to evaluate ideas. The expectation is that this integrated visualization would diminish information overload, while enhancing the social dynamics of the process. Also presented is the pilot deployment of a Flash prototype. The results were inconclusive, yet promising that a study with more participants might demonstrate the functional and affective benefits of the CP.


2021 ◽  
Author(s):  
Bas Stringer ◽  
Hans De Ferrante ◽  
Sanne Abeln ◽  
Jaap Heringa ◽  
K. Anton A. Feenstra ◽  
...  

Motivation: Protein interactions play an essential role in many biological and cellular processes, such as protein—protein interaction (PPI) in signaling pathways, binding to DNA in transcription, and binding to small molecules in receptor activation or enzymatic activity. Experimental identification of protein binding interface residues is a time-consuming, costly, and challenging task. Several machine learning and other computational approaches exist which predict such interface residues. Here we explore if Deep Learning (DL) can be used effectively for this prediction task, and which learning strategies and architectures may be most efficient. We introduce seven DL architectures that are applied to eleven independent test sets, focused on the residues involved in PPI interfaces and in binding RNA/DNA and small molecule ligands. Results: We constructed a large data set dubbed BioDL, comprising protein-protein interaction data from the PDB and protein-ligand interactions (DNA, RNA and small molecules) from the BioLip database. Additionally, we reused our existing curated homo- and heteromeric PPI data sets. We performed several experiments to assess the impact of different data features, spatial forms, encoding schemes, network initializations, loss functions, regularization mechanisms, and activation functions on the performance of the predictors. Benchmarking the resulting DL models with an independent test set (ZK448) shows no single DL architecture performs best on all instances, but that an ensemble of DL architectures consistently achieves peak prediction performance. Our PIPENN's ensemble predictor outperforms current state-of-the-art sequence-based protein interface predictors on all interaction types, achieving AUCs of 0.718 (protein—protein), 0.823 (protein—nucleotide) and 0.842 (protein—small molecule) respectively. Availability: Source code and data sets at https://github.com/ibivu/


Biomolecules ◽  
2019 ◽  
Vol 9 (2) ◽  
pp. 43 ◽  
Author(s):  
Ya Chen ◽  
Conrad Stork ◽  
Steffen Hirte ◽  
Johannes Kirchmair

Natural products (NPs) remain the most prolific resource for the development of small-molecule drugs. Here we report a new machine learning approach that allows the identification of natural products with high accuracy. The method also generates similarity maps, which highlight atoms that contribute significantly to the classification of small molecules as a natural product or synthetic molecule. The method can hence be utilized to (i) identify natural products in large molecular libraries, (ii) quantify the natural product-likeness of small molecules, and (iii) visualize atoms in small molecules that are characteristic of natural products or synthetic molecules. The models are based on random forest classifiers trained on data sets consisting of more than 265,000 to 322,000 natural products and synthetic molecules. Two-dimensional molecular descriptors, MACCS keys and Morgan2 fingerprints were explored. On an independent test set the models reached areas under the receiver operating characteristic curve (AUC) of 0.997 and Matthews correlation coefficients (MCCs) of 0.954 and higher. The method was further tested on data from the Dictionary of Natural Products, ChEMBL and other resources. The best-performing models are accessible as a free web service at http://npscout.zbh.uni-hamburg.de/npscout.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Fidelia Cascini ◽  
Nadia De Giovanni ◽  
Ilaria Inserra ◽  
Federico Santaroni ◽  
Luigi Laura

Abstract Machine learning has been used for distinct purposes in the science field but no applications on illegal drug have been done before. This study proposes a new web-based system for cocaine classification, profiling relations and comparison, that is capable of producing meaningful output based on a large amount of chemical profiling’s data. In particular, the Profiling Relations In Drug trafficking in Europe (PRIDE) system, offers several advantages to intelligence actions across Europe. Thus, it provides a standardized, broad methodology which uses machine learning algorithms to classify and compare drug profiles, highlight how similar drug samples are, and how probable it is that they share a common origin, batch, or preparation process. We evaluated the proposed algorithms using precision and recall metrics and analyzed the quality of predictions performed by the algorithms, with respect to our gold standard. In our experiments, we reached a value of 88% for F0.5-measure, 91% for precision, and 78% for recall, confirming our main hypothesis: machine learning can learn and be applied to have an automatic classification of cocaine profiles.


2021 ◽  
Vol 9 (01) ◽  
pp. 38-42
Author(s):  
Adriansa Wahyu Pramudita ◽  
Ramos Somya

As Lembaga Penjamin Mutu (LPM) or Quality Assurance Institution of Universitas Kristen Satya Wacana (UKSW), it has many responsibilities to improve quality of UKSW such as the quality of curriculum, lecturers, facilities, etc. LPM works with big data on daily basis, therefore it needs to improve efficiency at work and of course to make it easier. In this case LPM often have to deal with Pangkalan Data Pendidikan Tinggi (PDDikti) to give report to the government about student data of UKSW. Therefore, it is needed a web-based system to filter active student from excel file so that it will make LPM works easier and more efficient.


Sign in / Sign up

Export Citation Format

Share Document