Assigning the Origin of Microbial Natural Products by Chemical Space Map and Machine Learning

10.26434/chemrxiv.12902288 ◽

2020 ◽

Author(s):

Alice Capecchi ◽

Jean-Louis Reymond

Keyword(s):

Machine Learning ◽

Natural Products ◽

Chemical Space ◽

Chemical Properties ◽

Structural Diversity ◽

Physico Chemical ◽

Machine Learning Model ◽

Interactive Map ◽

Microbial Natural Products ◽

Tree Map

<p>Microbial natural products (NPs) are an important source of drugs. However, their structural diversity remains poorly understood. Here we used our recently reported MinHashed Atom Pair fingerprint with diameter of four bonds (MAP4), a fingerprint suitable for molecules across very different sizes, to analyze the Natural Products Atlas (NPAtlas), a database of 25,523 NPs of bacterial or fungal origin downloaded from <a href="https://www.npatlas.org/joomla/">https://www.npatlas.org/joomla/</a>. To visualize NPAtlas by MAP4 similarity, we used the dimensionality reduction method tree map (TMAP) (<a href="http://tmap.gdb.tools/">http://tmap.gdb.tools</a>). The resulting interactive map (<a href="https://tm.gdb.tools/map4/npatlas_map_tmap/">https://tm.gdb.tools/map4/npatlas_map_tmap/</a>) organizes molecules by physico-chemical properties and compound families such as peptides, glycosides, polyphenols or terpenoids. Remarkably, the map separates bacterial and fungal NPs from one another, revealing that these two compound families are intrinsically different despite of their related biosynthetic pathways. We used these differences to train a machine learning model capable of distinguishing between NPs of bacterial or fungal origin. </p>

Download Full-text

Assigning the Origin of Microbial Natural Products by Chemical Space Map and Machine Learning

Biomolecules ◽

10.3390/biom10101385 ◽

2020 ◽

Vol 10 (10) ◽

pp. 1385

Author(s):

Alice Capecchi ◽

Jean-Louis Reymond

Keyword(s):

Machine Learning ◽

Natural Products ◽

Chemical Space ◽

Chemical Properties ◽

Structural Diversity ◽

Physico Chemical ◽

Machine Learning Model ◽

Interactive Map ◽

Microbial Natural Products ◽

Tree Map

Microbial natural products (NPs) are an important source of drugs, however, their structural diversity remains poorly understood. Here we used our recently reported MinHashed Atom Pair fingerprint with diameter of four bonds (MAP4), a fingerprint suitable for molecules across very different sizes, to analyze the Natural Products Atlas (NPAtlas), a database of 25,523 NPs of bacterial or fungal origin. To visualize NPAtlas by MAP4 similarity, we used the dimensionality reduction method tree map (TMAP). The resulting interactive map organizes molecules by physico-chemical properties and compound families such as peptides and glycosides. Remarkably, the map separates bacterial and fungal NPs from one another, revealing that these two compound families are intrinsically different despite their related biosynthetic pathways. We used these differences to train a machine learning model capable of distinguishing between NPs of bacterial or fungal origin.

Download Full-text

Prediction of antimicrobial peptides toxicity based on their physico-chemical properties using machine learning techniques

BMC Bioinformatics ◽

10.1186/s12859-021-04468-y ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Hossein Khabbaz ◽

Mohammad Hossein Karimi-Jafari ◽

Ali Akbar Saboury ◽

Bagher BabaAli

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Antimicrobial Peptides ◽

Mammalian Cells ◽

Chemical Properties ◽

Machine Learning Techniques ◽

Learning Techniques ◽

Physico Chemical ◽

Machine Learning Model ◽

Low Toxicity

Abstract Background Antimicrobial peptides are promising tools to fight against ever-growing antibiotic resistance. However, despite many advantages, their toxicity to mammalian cells is a critical obstacle in clinical application and needs to be addressed. Results In this study, by using an up-to-date dataset, a machine learning model has been trained successfully to predict the toxicity of antimicrobial peptides. The comprehensive set of features of both physico-chemical and linguistic-based with local and global essences have undergone feature selection to identify key properties behind toxicity of antimicrobial peptides. After feature selection, the hybrid model showed the best performance with a recall of 0. 876 and a F1 score of 0. 849. Conclusions The obtained model can be useful in extracting AMPs with low toxicity from AMP libraries in clinical applications. On the other hand, several properties with local nature including positions of strand forming and hydrophobic residues in final selected features show that these properties are critical definer of peptide properties and should be considered in developing models for activity prediction of peptides. The executable code is available at https://git.io/JRZaT.

Download Full-text

Enzymatic dimerization in the biosynthetic pathway of microbial natural products

Natural Product Reports ◽

10.1039/d0np00063a ◽

2021 ◽

Author(s):

Jiawang Liu ◽

Anan Liu ◽

Youcai Hu

Keyword(s):

Natural Products ◽

Biosynthetic Pathway ◽

Structural Diversity ◽

Cytochrome P450s ◽

Microbial Natural Products

Cytochrome P450s, laccases, and intermolecular [4 + 2] cyclases, along with other enzymes were utilized to catalyze varied dimerization of matured natural products so as to create the structural diversity and complexity in microorganisms.

Download Full-text

Prioritizing Small Molecule as Candidates for Drug Repositioning using Machine Learning

10.1101/331975 ◽

2018 ◽

Author(s):

Khader Shameer ◽

Kipp W. Johnson ◽

Benjamin S. Glicksberg ◽

Rachel Hodos ◽

Ben Readhead ◽

...

Keyword(s):

Machine Learning ◽

Drug Discovery ◽

Small Molecule ◽

Chemical Space ◽

Drug Repositioning ◽

Chemical Properties ◽

Support Vector ◽

Feature Engineering ◽

Connectivity Map ◽

Molecular Features

ABSTRACTDrug repositioning, i.e. identifying new uses for existing drugs and research compounds, is a cost-effective drug discovery strategy that is continuing to grow in popularity. Prioritizing and identifying drugs capable of being repositioned may improve the productivity and success rate of the drug discovery cycle, especially if the drug has already proven to be safe in humans. In previous work, we have shown that drugs that have been successfully repositioned have different chemical properties than those that have not. Hence, there is an opportunity to use machine learning to prioritize drug-like molecules as candidates for future repositioning studies. We have developed a feature engineering and machine learning that leverages data from publicly available drug discovery resources: RepurposeDB and DrugBank. ChemVec is the chemoinformatics-based feature engineering strategy designed to compile molecular features representing the chemical space of all drug molecules in the study. ChemVec was trained through a variety of supervised classification algorithms (Naïve Bayes, Random Forest, Support Vector Machines and an ensemble model combining the three algorithms). Models were created using various combinations of datasets as Connectivity Map based model, DrugBank Approved compounds based model, and DrugBank full set of compounds; of which RandomForest trained using Connectivity Map based data performed the best (AUC=0.674). Briefly, our study represents a novel approach to evaluate a small molecule for drug repositioning opportunity and may further improve discovery of pleiotropic drugs, or those to treat multiple indications.

Download Full-text

BIOFACQUIM: A Mexican Compound Database of Natural Products

Biomolecules ◽

10.3390/biom9010031 ◽

2019 ◽

Vol 9 (1) ◽

pp. 31 ◽

Cited By ~ 20

Author(s):

B. Pilón-Jiménez ◽

Fernanda Saldívar-González ◽

Bárbara Díaz-Eufracio ◽

José Medina-Franco

Keyword(s):

Natural Products ◽

Drug Discovery ◽

Physicochemical Properties ◽

Chemical Space ◽

Structural Diversity ◽

Proof Of Concept ◽

Molecular Fingerprints ◽

The Public ◽

Compound Database

Compound databases of natural products have a major impact on drug discovery projects and other areas of research. The number of databases in the public domain with compounds with natural origins is increasing. Several countries, Brazil, France, Panama and, recently, Vietnam, have initiatives in place to construct and maintain compound databases that are representative of their diversity. In this proof-of-concept study, we discuss the first version of BIOFACQUIM, a novel compound database with natural products isolated and characterized in Mexico. We discuss its construction, curation, and a complete chemoinformatic characterization of the content and coverage in chemical space. The profile of physicochemical properties, scaffold content, and diversity, as well as structural diversity based on molecular fingerprints is reported. BIOFACQUIM is available for free.

Download Full-text

Further biochemical profiling of Hypholoma fasciculare metabolome reveals its chemogenetic diversity

10.1101/2020.05.28.122176 ◽

2020 ◽

Author(s):

Suhad A.A. Al-Salihi ◽

Ian Bull ◽

Raghad A. Al-Salhi ◽

Paul J. Gates ◽

Kifah Salih ◽

...

Keyword(s):

Natural Products ◽

Chemical Properties ◽

Structural Diversity ◽

Gene Clusters ◽

Biosynthetic Gene Cluster ◽

Biosynthetic Gene ◽

Microbial Drug Resistance ◽

Bioactive Natural Products ◽

Highly Active ◽

Comprehensive Picture

AbstractThere is a desperate need in continuing the search for natural products with novel mechanism to battle the constant increase of microbial drug resistance. Previously mushroom forming fungi were neglected as a source of novel antibiotics, due to the difficulties associated with their culture preparation and genetic tractability. However, modern fungal molecular and synthetic biology tools, renewed the interest in exploring mushroom fungi for novel therapeutics. The aim of this study was to have a comprehensive picture of nine basidiomycetes secondary metabolites (SM), screen their biological and chemical properties to describe the genetic pathways associated with their production. H. fasciculare revealed to be highly active antagonistic species, with antimicrobial activity against three different microorganisms - Bacillus subtilis, Escherichia coli and Saccharomyces cerevisiae-. Extensive genomic comparison and chemical analysis using analytical chromatography, led to the characterisation of more than 15 variant biosynthetic gene clusters and the first identification of a potent antibacterial metabolite-3, 5-dichloromethoxy benzoic acid (3, 5-D)-in this species, for which a biosynthetic gene cluster was predicted. This work demonstrates the great potential of mushroom forming fungi as a reservoir of bioactive natural products which are currently unexplored, and that access to their genomic data and structural diversity natural products via utilizing modern computational analysis and efficient chemical methods, could accelerate the development and applications of such distinct molecules in both pharmaceutical and agrochemical industry.

Download Full-text

Feature optimization for atomistic machine learning yields a data-driven construction of the periodic table of the elements

Physical Chemistry Chemical Physics ◽

10.1039/c8cp05921g ◽

2018 ◽

Vol 20 (47) ◽

pp. 29661-29668 ◽

Cited By ~ 31

Author(s):

Michael J. Willatt ◽

Félix Musil ◽

Michele Ceriotti

Keyword(s):

Machine Learning ◽

Periodic Table ◽

Chemical Space ◽

Learning Model ◽

Data Driven ◽

Machine Learning Model ◽

Feature Optimization ◽

Low Dimensional

By representing elements as points in a low-dimensional chemical space it is possible to improve the performance of a machine-learning model for a chemically-diverse dataset. The resulting coordinates are reminiscent of the main groups of the periodic table.

Download Full-text

Unifying machine learning and quantum chemistry with a deep neural network for molecular wavefunctions

Nature Communications ◽

10.1038/s41467-019-12875-2 ◽

2019 ◽

Vol 10 (1) ◽

Cited By ~ 64

Author(s):

K. T. Schütt ◽

M. Gastegger ◽

A. Tkatchenko ◽

K.-R. Müller ◽

R. J. Maurer

Keyword(s):

Machine Learning ◽

Quantum Chemistry ◽

Degrees Of Freedom ◽

Large Scale ◽

Materials Science ◽

Chemical Space ◽

Chemical Properties ◽

Molecular Structures ◽

Learning Framework ◽

Molecular Wavefunctions

AbstractMachine learning advances chemistry and materials science by enabling large-scale exploration of chemical space based on quantum chemical calculations. While these models supply fast and accurate predictions of atomistic chemical properties, they do not explicitly capture the electronic degrees of freedom of a molecule, which limits their applicability for reactive chemistry and chemical analysis. Here we present a deep learning framework for the prediction of the quantum mechanical wavefunction in a local basis of atomic orbitals from which all other ground-state properties can be derived. This approach retains full access to the electronic structure via the wavefunction at force-field-like efficiency and captures quantum mechanics in an analytically differentiable representation. On several examples, we demonstrate that this opens promising avenues to perform inverse design of molecular structures for targeting electronic property optimisation and a clear path towards increased synergy of machine learning and quantum chemistry.

Download Full-text

Dataset’s chemical diversity limits the generalizability of machine learning predictions

Journal of Cheminformatics ◽

10.1186/s13321-019-0391-2 ◽

2019 ◽

Vol 11 (1) ◽

Cited By ~ 8

Author(s):

Marta Glavatskikh ◽

Jules Leguy ◽

Gilles Hunault ◽

Thomas Cauchy ◽

Benoit Da Mota

Keyword(s):

Machine Learning ◽

Density Functional ◽

Chemical Space ◽

Chemical Properties ◽

Density Functional Theory Calculations ◽

Real Data ◽

Chemical Diversity ◽

Energy Prediction ◽

The Neural Network ◽

Golden Standard

Abstract The QM9 dataset has become the golden standard for Machine Learning (ML) predictions of various chemical properties. QM9 is based on the GDB, which is a combinatorial exploration of the chemical space. ML molecular predictions have been recently published with an accuracy on par with Density Functional Theory calculations. Such ML models need to be tested and generalized on real data. PC9, a new QM9 equivalent dataset (only H, C, N, O and F and up to 9 “heavy” atoms) of the PubChemQC project is presented in this article. A statistical study of bonding distances and chemical functions shows that this new dataset encompasses more chemical diversity. Kernel Ridge Regression, Elastic Net and the Neural Network model provided by SchNet have been used on both datasets. The overall accuracy in energy prediction is higher for the QM9 subset. However, a model trained on PC9 shows a stronger ability to predict energies of the other dataset.

Download Full-text