scholarly journals Assigning the Origin of Microbial Natural Products by Chemical Space Map and Machine Learning

Author(s):  
Alice Capecchi ◽  
Jean-Louis Reymond

<p>Microbial natural products (NPs) are an important source of drugs. However, their structural diversity remains poorly understood. Here we used our recently reported MinHashed Atom Pair fingerprint with diameter of four bonds (MAP4), a fingerprint suitable for molecules across very different sizes, to analyze the Natural Products Atlas (NPAtlas), a database of 25,523 NPs of bacterial or fungal origin downloaded from <a href="https://www.npatlas.org/joomla/">https://www.npatlas.org/joomla/</a>. To visualize NPAtlas by MAP4 similarity, we used the dimensionality reduction method tree map (TMAP) (<a href="http://tmap.gdb.tools/">http://tmap.gdb.tools</a>). The resulting interactive map (<a href="https://tm.gdb.tools/map4/npatlas_map_tmap/">https://tm.gdb.tools/map4/npatlas_map_tmap/</a>) organizes molecules by physico-chemical properties and compound families such as peptides, glycosides, polyphenols or terpenoids. Remarkably, the map separates bacterial and fungal NPs from one another, revealing that these two compound families are intrinsically different despite of their related biosynthetic pathways. We used these differences to train a machine learning model capable of distinguishing between NPs of bacterial or fungal origin. </p>

2020 ◽  
Author(s):  
Alice Capecchi ◽  
Jean-Louis Reymond

<p>Microbial natural products (NPs) are an important source of drugs. However, their structural diversity remains poorly understood. Here we used our recently reported MinHashed Atom Pair fingerprint with diameter of four bonds (MAP4), a fingerprint suitable for molecules across very different sizes, to analyze the Natural Products Atlas (NPAtlas), a database of 25,523 NPs of bacterial or fungal origin downloaded from <a href="https://www.npatlas.org/joomla/">https://www.npatlas.org/joomla/</a>. To visualize NPAtlas by MAP4 similarity, we used the dimensionality reduction method tree map (TMAP) (<a href="http://tmap.gdb.tools/">http://tmap.gdb.tools</a>). The resulting interactive map (<a href="https://tm.gdb.tools/map4/npatlas_map_tmap/">https://tm.gdb.tools/map4/npatlas_map_tmap/</a>) organizes molecules by physico-chemical properties and compound families such as peptides, glycosides, polyphenols or terpenoids. Remarkably, the map separates bacterial and fungal NPs from one another, revealing that these two compound families are intrinsically different despite of their related biosynthetic pathways. We used these differences to train a machine learning model capable of distinguishing between NPs of bacterial or fungal origin. </p>


Biomolecules ◽  
2020 ◽  
Vol 10 (10) ◽  
pp. 1385
Author(s):  
Alice Capecchi ◽  
Jean-Louis Reymond

Microbial natural products (NPs) are an important source of drugs, however, their structural diversity remains poorly understood. Here we used our recently reported MinHashed Atom Pair fingerprint with diameter of four bonds (MAP4), a fingerprint suitable for molecules across very different sizes, to analyze the Natural Products Atlas (NPAtlas), a database of 25,523 NPs of bacterial or fungal origin. To visualize NPAtlas by MAP4 similarity, we used the dimensionality reduction method tree map (TMAP). The resulting interactive map organizes molecules by physico-chemical properties and compound families such as peptides and glycosides. Remarkably, the map separates bacterial and fungal NPs from one another, revealing that these two compound families are intrinsically different despite their related biosynthetic pathways. We used these differences to train a machine learning model capable of distinguishing between NPs of bacterial or fungal origin.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Hossein Khabbaz ◽  
Mohammad Hossein Karimi-Jafari ◽  
Ali Akbar Saboury ◽  
Bagher BabaAli

Abstract Background Antimicrobial peptides are promising tools to fight against ever-growing antibiotic resistance. However, despite many advantages, their toxicity to mammalian cells is a critical obstacle in clinical application and needs to be addressed. Results In this study, by using an up-to-date dataset, a machine learning model has been trained successfully to predict the toxicity of antimicrobial peptides. The comprehensive set of features of both physico-chemical and linguistic-based with local and global essences have undergone feature selection to identify key properties behind toxicity of antimicrobial peptides. After feature selection, the hybrid model showed the best performance with a recall of 0. 876 and a F1 score of 0. 849. Conclusions The obtained model can be useful in extracting AMPs with low toxicity from AMP libraries in clinical applications. On the other hand, several properties with local nature including positions of strand forming and hydrophobic residues in final selected features show that these properties are critical definer of peptide properties and should be considered in developing models for activity prediction of peptides. The executable code is available at https://git.io/JRZaT.


2021 ◽  
Author(s):  
Jiawang Liu ◽  
Anan Liu ◽  
Youcai Hu

Cytochrome P450s, laccases, and intermolecular [4 + 2] cyclases, along with other enzymes were utilized to catalyze varied dimerization of matured natural products so as to create the structural diversity and complexity in microorganisms.


2018 ◽  
Author(s):  
Khader Shameer ◽  
Kipp W. Johnson ◽  
Benjamin S. Glicksberg ◽  
Rachel Hodos ◽  
Ben Readhead ◽  
...  

ABSTRACTDrug repositioning, i.e. identifying new uses for existing drugs and research compounds, is a cost-effective drug discovery strategy that is continuing to grow in popularity. Prioritizing and identifying drugs capable of being repositioned may improve the productivity and success rate of the drug discovery cycle, especially if the drug has already proven to be safe in humans. In previous work, we have shown that drugs that have been successfully repositioned have different chemical properties than those that have not. Hence, there is an opportunity to use machine learning to prioritize drug-like molecules as candidates for future repositioning studies. We have developed a feature engineering and machine learning that leverages data from publicly available drug discovery resources: RepurposeDB and DrugBank. ChemVec is the chemoinformatics-based feature engineering strategy designed to compile molecular features representing the chemical space of all drug molecules in the study. ChemVec was trained through a variety of supervised classification algorithms (Naïve Bayes, Random Forest, Support Vector Machines and an ensemble model combining the three algorithms). Models were created using various combinations of datasets as Connectivity Map based model, DrugBank Approved compounds based model, and DrugBank full set of compounds; of which RandomForest trained using Connectivity Map based data performed the best (AUC=0.674). Briefly, our study represents a novel approach to evaluate a small molecule for drug repositioning opportunity and may further improve discovery of pleiotropic drugs, or those to treat multiple indications.


Biomolecules ◽  
2019 ◽  
Vol 9 (1) ◽  
pp. 31 ◽  
Author(s):  
B. Pilón-Jiménez ◽  
Fernanda Saldívar-González ◽  
Bárbara Díaz-Eufracio ◽  
José Medina-Franco

Compound databases of natural products have a major impact on drug discovery projects and other areas of research. The number of databases in the public domain with compounds with natural origins is increasing. Several countries, Brazil, France, Panama and, recently, Vietnam, have initiatives in place to construct and maintain compound databases that are representative of their diversity. In this proof-of-concept study, we discuss the first version of BIOFACQUIM, a novel compound database with natural products isolated and characterized in Mexico. We discuss its construction, curation, and a complete chemoinformatic characterization of the content and coverage in chemical space. The profile of physicochemical properties, scaffold content, and diversity, as well as structural diversity based on molecular fingerprints is reported. BIOFACQUIM is available for free.


2020 ◽  
Author(s):  
Suhad A.A. Al-Salihi ◽  
Ian Bull ◽  
Raghad A. Al-Salhi ◽  
Paul J. Gates ◽  
Kifah Salih ◽  
...  

AbstractThere is a desperate need in continuing the search for natural products with novel mechanism to battle the constant increase of microbial drug resistance. Previously mushroom forming fungi were neglected as a source of novel antibiotics, due to the difficulties associated with their culture preparation and genetic tractability. However, modern fungal molecular and synthetic biology tools, renewed the interest in exploring mushroom fungi for novel therapeutics. The aim of this study was to have a comprehensive picture of nine basidiomycetes secondary metabolites (SM), screen their biological and chemical properties to describe the genetic pathways associated with their production. H. fasciculare revealed to be highly active antagonistic species, with antimicrobial activity against three different microorganisms - Bacillus subtilis, Escherichia coli and Saccharomyces cerevisiae-. Extensive genomic comparison and chemical analysis using analytical chromatography, led to the characterisation of more than 15 variant biosynthetic gene clusters and the first identification of a potent antibacterial metabolite-3, 5-dichloromethoxy benzoic acid (3, 5-D)-in this species, for which a biosynthetic gene cluster was predicted. This work demonstrates the great potential of mushroom forming fungi as a reservoir of bioactive natural products which are currently unexplored, and that access to their genomic data and structural diversity natural products via utilizing modern computational analysis and efficient chemical methods, could accelerate the development and applications of such distinct molecules in both pharmaceutical and agrochemical industry.


2018 ◽  
Vol 20 (47) ◽  
pp. 29661-29668 ◽  
Author(s):  
Michael J. Willatt ◽  
Félix Musil ◽  
Michele Ceriotti

By representing elements as points in a low-dimensional chemical space it is possible to improve the performance of a machine-learning model for a chemically-diverse dataset. The resulting coordinates are reminiscent of the main groups of the periodic table.


2019 ◽  
Vol 10 (1) ◽  
Author(s):  
K. T. Schütt ◽  
M. Gastegger ◽  
A. Tkatchenko ◽  
K.-R. Müller ◽  
R. J. Maurer

AbstractMachine learning advances chemistry and materials science by enabling large-scale exploration of chemical space based on quantum chemical calculations. While these models supply fast and accurate predictions of atomistic chemical properties, they do not explicitly capture the electronic degrees of freedom of a molecule, which limits their applicability for reactive chemistry and chemical analysis. Here we present a deep learning framework for the prediction of the quantum mechanical wavefunction in a local basis of atomic orbitals from which all other ground-state properties can be derived. This approach retains full access to the electronic structure via the wavefunction at force-field-like efficiency and captures quantum mechanics in an analytically differentiable representation. On several examples, we demonstrate that this opens promising avenues to perform inverse design of molecular structures for targeting electronic property optimisation and a clear path towards increased synergy of machine learning and quantum chemistry.


2019 ◽  
Vol 11 (1) ◽  
Author(s):  
Marta Glavatskikh ◽  
Jules Leguy ◽  
Gilles Hunault ◽  
Thomas Cauchy ◽  
Benoit Da Mota

Abstract The QM9 dataset has become the golden standard for Machine Learning (ML) predictions of various chemical properties. QM9 is based on the GDB, which is a combinatorial exploration of the chemical space. ML molecular predictions have been recently published with an accuracy on par with Density Functional Theory calculations. Such ML models need to be tested and generalized on real data. PC9, a new QM9 equivalent dataset (only H, C, N, O and F and up to 9 “heavy” atoms) of the PubChemQC project is presented in this article. A statistical study of bonding distances and chemical functions shows that this new dataset encompasses more chemical diversity. Kernel Ridge Regression, Elastic Net and the Neural Network model provided by SchNet have been used on both datasets. The overall accuracy in energy prediction is higher for the QM9 subset. However, a model trained on PC9 shows a stronger ability to predict energies of the other dataset.


Sign in / Sign up

Export Citation Format

Share Document