scholarly journals Analysis of a large food chemical database: chemical space, diversity, and complexity

F1000Research ◽  
2018 ◽  
Vol 7 ◽  
pp. 993 ◽  
Author(s):  
J. Jesús Naveja ◽  
Mariel P. Rico-Hidalgo ◽  
José L. Medina-Franco

Background: Food chemicals are a cornerstone in the food industry. However, its chemical diversity has been explored on a limited basis, for instance, previous analysis of food-related databases were done up to 2,200 molecules. The goal of this work was to quantify the chemical diversity of chemical compounds stored in FooDB, a database with nearly 24,000 food chemicals. Methods: The visual representation of the chemical space of FooDB was done with ChemMaps, a novel approach based on the concept of chemical satellites. The large food chemical database was profiled based on physicochemical properties, molecular complexity and scaffold content. The global diversity of FooDB was characterized using Consensus Diversity Plots. Results: It was found that compounds in FooDB are very diverse in terms of properties and structure, with a large structural complexity. It was also found that one third of the food chemicals are acyclic molecules and ring-containing molecules are mostly monocyclic, with several scaffolds common to natural products in other databases. Conclusions: To the best of our knowledge, this is the first analysis of the chemical diversity and complexity of FooDB. This study represents a step further to the emerging field of “Food Informatics”. Future study should compare directly the chemical structures of the molecules in FooDB with other compound databases, for instance, drug-like databases and natural products collections. An additional future direction of this work is to use the list of 3,228 polyphenolic compounds identified in this work to enhance the on-going polyphenol-protein interactome studies.

F1000Research ◽  
2018 ◽  
Vol 7 ◽  
pp. 993 ◽  
Author(s):  
J. Jesús Naveja ◽  
Mariel P. Rico-Hidalgo ◽  
José L. Medina-Franco

Background: Food chemicals are a cornerstone in the food industry. However, its chemical diversity has been explored on a limited basis, for instance, previous analysis of food-related databases were done up to 2,200 molecules. The goal of this work was to quantify the chemical diversity of chemical compounds stored in FooDB, a database with nearly 24,000 food chemicals. Methods: The visual representation of the chemical space of FooDB was done with ChemMaps, a novel approach based on the concept of chemical satellites. The large food chemical database was profiled based on physicochemical properties, molecular complexity and scaffold content. The global diversity of FoodDB was characterized using Consensus Diversity Plots. Results: It was found that compounds in FooDB are very diverse in terms of properties and structure, with a large structural complexity. It was also found that one third of the food chemicals are acyclic molecules and ring-containing molecules are mostly monocyclic, with several scaffolds common to natural products in other databases. Conclusions: To the best of our knowledge, this is the first analysis of the chemical diversity and complexity of FooDB. This study represents a step further to the emerging field of “Food Informatics”. Future study should compare directly the chemical structures of the molecules in FooDB with other compound databases, for instance, drug-like databases and natural products collections.


Author(s):  
Sanrda Kim Tiam ◽  
Muriel Gugger ◽  
Justine Demay ◽  
Severine Le Manach ◽  
Charlotte Duval ◽  
...  

Cyanobacteria are an ancient lineage of slow-growing photosynthetic bacteria and a prolific source of natural products with diverse chemical structures and potent biological activities and toxicities. The chemical identification of these compounds remains a major bottleneck. Strategies that can prioritize the most prolific strains and novel compounds are of great interest. Here, we combine chemical analysis and genomics to investigate the chemodiversity of secondary metabolites based on their pattern of distribution within some cyanobacteria. Planktothrix being a cyanobacterial genus known to form blooms worldwide and to produce a broad spectrum of toxins and other bioactive compounds, we applied this combined approach on four closely related strains of Planktothrix. The chemical diversity of the metabolites produced by the four strains was evaluated using an untargeted metabolomics strategy with high-resolution LC-MS. Metabolite profiles were correlated with the potential of metabolite production identified by genomics for the different strains. Although, the Planktothrix strains present a global similarity in term biosynthetic cluster gene for microcystin, aeruginosin and prenylagaramide for example, we found remarkable strain-specific chemo-diversity. Only few of the chemical features were common to the four studied strains. Additionally, the MS/MS data were analyzed using Global Natural Products Social Molecular Networking (GNPS) to identify molecular families of the same biosynthetic origin. In conclusion, we present an efficient integrative strategy for elucidating the chemical diversity of a given genus and link the data obtained from analytical chemistry to biosynthetic genes of cyanobacteria.


2020 ◽  
Author(s):  
Thomas Blaschke ◽  
Ola Engkvist ◽  
Jürgen Bajorath ◽  
Hongming Chen

Abstract In de novo molecular design, recurrent neural networks (RNN) have been shown to be effective methods for sampling and generating novel chemical structures. Using a technique called reinforcement learning (RL), an RNN can be tuned to target a particular section of chemical space with optimized desirable properties using a scoring function. However, ligands generated by current RL methods so far tend to have relatively low diversity, and sometimes even result in duplicate structures when optimizing towards desired properties. Here, we propose a new method to address the low diversity issue in RL for molecular design. Memory-assisted RL is an extension of the known RL, with the introduction of a so-called memory unit. As proof of concept, we applied our method to generate structures with a desired AlogP value. In a second case study, we applied our method to design ligands for the dopamine type 2 receptor and the 5-hydroxytryptamine type 1A receptor. For both receptors, a machine learning model was developed to predict whether generated molecules were active or not for the receptor. In both case studies, it was found that memory-assisted RL led to the generation of more compounds predicted to be active having higher chemical diversity, thus achieving better coverage of chemical space of known ligands compared to established RL methods.


2018 ◽  
Vol 4 (5) ◽  
Author(s):  
Fernanda I. Saldívar-González ◽  
B. Angélica Pilón-Jiménez ◽  
José L. Medina-Franco

AbstractThe chemical space of naturally occurring compounds is vast and diverse. Other than biologics, naturally occurring small molecules include a large variety of compounds covering natural products from different sources such as plant, marine, and fungi, to name a few, and several food chemicals. The systematic exploration of the chemical space of naturally occurring compounds have significant implications in many areas of research including but not limited to drug discovery, nutrition, bio- and chemical diversity analysis. The exploration of the coverage and diversity of the chemical space of compound databases can be carried out in different ways. The approach will largely depend on the criteria to define the chemical space that is commonly selected based on the goals of the study. This chapter discusses major compound databases of natural products and cheminformatics strategies that have been used to characterize the chemical space of natural products. Recent exemplary studies of the chemical space of natural products from different sources and their relationships with other compounds are also discussed. We also present novel chemical descriptors and data mining approaches that are emerging to characterize the chemical space of naturally occurring compounds.


2020 ◽  
Vol 2020 ◽  
pp. 1-27 ◽  
Author(s):  
Antonio Francioso ◽  
Alessia Baseggio Conrado ◽  
Luciana Mosca ◽  
Mario Fontana

Sulfur contributes significantly to nature chemical diversity and thanks to its particular features allows fundamental biological reactions that no other element allows. Sulfur natural compounds are utilized by all living beings and depending on the function are distributed in the different kingdoms. It is no coincidence that marine organisms are one of the most important sources of sulfur natural products since most of the inorganic sulfur is metabolized in ocean environments where this element is abundant. Terrestrial organisms such as plants and microorganisms are also able to incorporate sulfur in organic molecules to produce primary metabolites (e.g., methionine, cysteine) and more complex unique chemical structures with diverse biological roles. Animals are not able to fix inorganic sulfur into biomolecules and are completely dependent on preformed organic sulfurous compounds to satisfy their sulfur needs. However, some higher species such as humans are able to build new sulfur-containing chemical entities starting especially from plants’ organosulfur precursors. Sulfur metabolism in humans is very complicated and plays a central role in redox biochemistry. The chemical properties, the large number of oxidation states, and the versatile reactivity of the oxygen family chalcogens make sulfur ideal for redox biological reactions and electron transfer processes. This review will explore sulfur metabolism related to redox biochemistry and will describe the various classes of sulfur-containing compounds spread all over the natural kingdoms. We will describe the chemistry and the biochemistry of well-known metabolites and also of the unknown and poorly studied sulfur natural products which are still in search for a biological role.


2016 ◽  
Vol 113 (42) ◽  
pp. E6343-E6351 ◽  
Author(s):  
Michael A. Skinnider ◽  
Chad W. Johnston ◽  
Robyn E. Edgar ◽  
Chris A. Dejong ◽  
Nishanth J. Merwin ◽  
...  

Microbial natural products are an evolved resource of bioactive small molecules, which form the foundation of many modern therapeutic regimes. Ribosomally synthesized and posttranslationally modified peptides (RiPPs) represent a class of natural products which have attracted extensive interest for their diverse chemical structures and potent biological activities. Genome sequencing has revealed that the vast majority of genetically encoded natural products remain unknown. Many bioinformatic resources have therefore been developed to predict the chemical structures of natural products, particularly nonribosomal peptides and polyketides, from sequence data. However, the diversity and complexity of RiPPs have challenged systematic investigation of RiPP diversity, and consequently the vast majority of genetically encoded RiPPs remain chemical “dark matter.” Here, we introduce an algorithm to catalog RiPP biosynthetic gene clusters and chart genetically encoded RiPP chemical space. A global analysis of 65,421 prokaryotic genomes revealed 30,261 RiPP clusters, encoding 2,231 unique products. We further leverage the structure predictions generated by our algorithm to facilitate the genome-guided discovery of a molecule from a rare family of RiPPs. Our results provide the systematic investigation of RiPP genetic and chemical space, revealing the widespread distribution of RiPP biosynthesis throughout the prokaryotic tree of life, and provide a platform for the targeted discovery of RiPPs based on genome sequencing.


2020 ◽  
Vol 5 (10) ◽  
Author(s):  
Conrad V. Simoben ◽  
Fidele Ntie-Kang ◽  
Dina Robaa ◽  
Wolfgang Sippl

AbstractThe development and application of computer-aided drug design/discovery (CADD) techniques (such as structured-base virtual screening, ligand-based virtual screening and neural networks approaches) are on the point of disintermediation in the pharmaceutical drug discovery processes. The application of these CADD methods are standing out positively as compared to other experimental approaches in the identification of hits. In order to venture into new chemical spaces, research groups are exploring natural products (NPs) for the search and identification of new hits and more efficient leads as well as the repurposing of approved NPs. The chemical space of NPs is continuously increasing as a result of millions of years of evolution of species and these data are mainly stored in the form of databases providing access to scientists around the world to conduct studies using them. Investigation of these NP databases with the help of CADD methodologies in combination with experimental validation techniques is essential to identify and propose new drug molecules. In this chapter, we highlight the importance of the chemical diversity of NPs as a source for potential drugs as well as some of the success stories of NP-derived candidates against important therapeutic targets. The focus is on studies that applied a healthy dose of the emerging CADD methodologies (structure-based, ligand-based and machine learning).


2019 ◽  
Author(s):  
Xuhan Liu ◽  
Kai Ye ◽  
Herman Van Vlijmen ◽  
Adriaan P. IJzerman ◽  
Gerard JP Van westen

<p></p><p>Over the last five years deep learning has progressed tremendously in both image recognition and natural language processing. Now it is increasingly applied to other data rich fields. In drug discovery, recurrent neural networks (RNNs) have been shown to be an effective method to generate novel chemical structures in the form of SMILES. However, ligands generated by current methods have so far provided relatively low diversity and do not fully cover the whole chemical space occupied by known ligands. Here, we propose a new method (DrugEx) to discover <i>de novo</i> drug-like molecules. DrugEx is an RNN model (generator) trained through reinforcement learning which was integrated with a special exploration strategy. As a case study we applied our method to design ligands against the adenosine A<sub>2A</sub> receptor. From ChEMBL data, a machine learning model (predictor) was created to predict whether generated molecules are active or not. Based on this predictor as the reward function, the generator was trained by reinforcement learning without any further data. We then compared the performance of our method with two previously published methods, REINVENT and ORGANIC. We found that candidate molecules our model designed, and predicted to be active, had a larger chemical diversity, and better covered the chemical space of known ligands compared to the state-of-the-art.</p><p></p>


Biomolecules ◽  
2020 ◽  
Vol 10 (11) ◽  
pp. 1566 ◽  
Author(s):  
José L. Medina-Franco ◽  
Fernanda I. Saldívar-González

Natural products have a significant role in drug discovery. Natural products have distinctive chemical structures that have contributed to identifying and developing drugs for different therapeutic areas. Moreover, natural products are significant sources of inspiration or starting points to develop new therapeutic agents. Natural products such as peptides and macrocycles, and other compounds with unique features represent attractive sources to address complex diseases. Computational approaches that use chemoinformatics and molecular modeling methods contribute to speed up natural product-based drug discovery. Several research groups have recently used computational methodologies to organize data, interpret results, generate and test hypotheses, filter large chemical databases before the experimental screening, and design experiments. This review discusses a broad range of chemoinformatics applications to support natural product-based drug discovery. We emphasize profiling natural product data sets in terms of diversity; complexity; acid/base; absorption, distribution, metabolism, excretion, and toxicity (ADME/Tox) properties; and fragment analysis. Novel techniques for the visual representation of the chemical space are also discussed.


Marine Drugs ◽  
2018 ◽  
Vol 16 (12) ◽  
pp. 485 ◽  
Author(s):  
Inês Raimundo ◽  
Sandra Silva ◽  
Rodrigo Costa ◽  
Tina Keller-Costa

Octocorals (Cnidaria, Anthozoa Octocorallia) are magnificent repositories of natural products with fascinating and unusual chemical structures and bioactivities of interest to medicine and biotechnology. However, mechanistic understanding of the contribution of microbial symbionts to the chemical diversity of octocorals is yet to be achieved. This review inventories the natural products so-far described for octocoral-derived bacteria and fungi, uncovering a true chemical arsenal of terpenes, steroids, alkaloids, and polyketides with antibacterial, antifungal, antiviral, antifouling, anticancer, anti-inflammatory, and antimalarial activities of enormous potential for blue growth. Genome mining of 15 bacterial associates (spanning 12 genera) cultivated from Eunicella spp. resulted in the identification of 440 putative and classifiable secondary metabolite biosynthetic gene clusters (BGCs), encompassing varied terpene-, polyketide-, bacteriocin-, and nonribosomal peptide-synthase BGCs. This points towards a widespread yet uncharted capacity of octocoral-associated bacteria to synthetize a broad range of natural products. However, to extend our knowledge and foster the near-future laboratory production of bioactive compounds from (cultivatable and currently uncultivatable) octocoral symbionts, optimal blending between targeted metagenomics, DNA recombinant technologies, improved symbiont cultivation, functional genomics, and analytical chemistry are required. Such a multidisciplinary undertaking is key to achieving a sustainable response to the urgent industrial demand for novel drugs and enzyme varieties.


Sign in / Sign up

Export Citation Format

Share Document