Cell2Chem: mining explored and unexplored biosynthetic chemical spaces

Author(s):  
Dongliang Liu ◽  
Mengying Han ◽  
Yu Tian ◽  
Linlin Gong ◽  
Cancan Jia ◽  
...  

Abstract Summary Living cell strains have important applications in synthesizing their native compounds and potential for use in studies exploring the universal chemical space. Here, we present a web server named as Cell2Chem which accelerates the search for explored compounds in organisms, facilitating investigations of biosynthesis in unexplored chemical spaces. Cell2Chem uses co-occurrence networks and natural language processing to provide a systematic method for linking living organisms to biosynthesized compounds and the processes that produce these compounds. The Cell2Chem platform comprises 40 370 species and 125 212 compounds. Using reaction pathway and enzyme function in silico prediction methods, Cell2Chem reveals possible biosynthetic pathways of compounds and catalytic functions of proteins to expand unexplored biosynthetic chemical spaces. Cell2Chem can help improve biosynthesis research and enhance the efficiency of synthetic biology. Availability and implementation Cell2Chem is available at: http://www.rxnfinder.org/cell2chem/.

Química Nova ◽  
2020 ◽  
Author(s):  
Daiana Franco ◽  
Thiago Pereira ◽  
Felipe Vitorio ◽  
Nathalia Nadur ◽  
Renata Lacerda ◽  
...  

Coumarins are natural products characterized as 2H-chromen-2-one, according to IUPAC nomenclature, largely distributed in plants, as well as, in species of fungi and bacteria. Nowadays, many synthetic procedures allow the discovery of coumarins with expanded chemical space. The ability to exert non-covalent interactions with many enzymes an receptors in living organisms lead the coumarins to exhibit a wide range of biological activities and applications. Then, this manuscript provides an overview of the use of coumarin compounds in medicinal chemistry in treating many diseases. Important examples of the last years have been selected concerning the activities of coumarins as anticoagulant, anticancer, antioxidant, antiviral, antidiabetics, anti-inflammatory, antibacterial, antifungal, and anti-neurodegenerative agents. Thus, this work aims at contributing to the development of new rational research projects searching for new treatments and bioactive compounds for many pathologies using coumarin derivatives.


Author(s):  
Yu Tian ◽  
Ling Wu ◽  
Le Yuan ◽  
Shaozhen Ding ◽  
Fu Chen ◽  
...  

Abstract Summary The biosynthetic ability of living organisms has important applications in producing bulk chemicals, biofuels and natural products. Based on the most comprehensive biosynthesis knowledgebase, a computational system, BCSExplorer, is proposed to discover the unexplored chemical space using nature’s biosynthetic potential. BCSExplorer first integrates the most comprehensive biosynthetic reaction database with 280 000 biochemical reactions and 60 000 chemicals biosynthesized globally over the past 130 years. Second, in this study, a biosynthesis tree is computed for a starting chemical molecule based on a comprehensive biotransformation rule library covering almost all biosynthetic possibilities, in which redundant rules are removed using a new algorithm. Moreover, biosynthesis feasibility, drug-likeness and toxicity analysis of a new generation of compounds will be pursued in further studies to meet various needs. BCSExplorer represents a novel method to explore biosynthetically available chemical space. Availability and implementation BCSExplorer is available at: http://www.rxnfinder.org/bcsexplorer/. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Robin Winter ◽  
Floriane Montanari ◽  
Andreas Steffen ◽  
Hans Briem ◽  
Frank Noé ◽  
...  

In this work, we propose a novel method that combines in silico prediction of molecular properties such as biological activity or pharmacokinetics with an in silico optimization algorithm, namely Particle Swarm Optimization. Our method takes a starting compound as input and proposes new molecules with more desirable (predicted) properties. It navigates a machine-learned continuous representation of a drug-like chemical space guided by a de fined objective function. The objective function combines multiple in silico prediction models, de fined desirability ranges and substructure constraints. We demonstrate that our proposed method is able to consistently fi nd more desirable molecules for the studied tasks in relatively short time.<br>


2021 ◽  
Vol 8 (2) ◽  
Author(s):  
Peter Dekker ◽  
Willem Zuidema

In this paper, we investigate how the prediction paradigm from machine learning and Natural Language Processing (NLP) can be put to use in computational historical linguistics. We propose word prediction as an intermediate task, where the forms of unseen words in some target language are predicted from the forms of the corresponding words in a source language. Word prediction allows us to develop algorithms for phylogenetic tree reconstruction, sound correspondence identification and cognate detection, in ways close to attested methods for linguistic reconstruction. We will discuss different factors, such as data representation and the choice of machine learning model, that have to be taken into account when applying prediction methods in historical linguistics. We present our own implementations and evaluate them on different tasks in historical linguistics.


2018 ◽  
Author(s):  
Arpit Jain ◽  
Arndt von Haeseler ◽  
Ingo Ebersberger

AbstractOrthologs document the evolution of genes and metabolic capacities encoded in extant and ancient genomes. Orthologous genes that are detected across the full diversity of contemporary life allow reconstructing the gene set of LUCA, the last universal common ancestor. These genes presumably represent the functional repertoire common to – and necessary for – all living organisms. Design of artificial life has the potential to test this. Recently, a minimal gene (MG) set for a self-replicating cell was determined experimentally, and a surprisingly high number of genes have unknown functions and are not represented in LUCA. However, as similarity between orthologs decays with time, it becomes insufficient to infer common ancestry, leaving ancient gene set reconstructions incomplete and distorted to an unknown extent. Here we introduce the evolutionary traceability, together with the software protTrace, that quantifies, for each protein, the evolutionary distance beyond which the sensitivity of the ortholog search becomes limiting. We show that the LUCA set comprises only high-traceable proteins most of which have catalytic functions. We further show that proteins in the MG set lacking orthologs outside bacteria mostly have low traceability, leaving open whether their eukaryotic orthologs have just been overlooked. On the example of REC8, a protein essential for chromosome cohesion, we demonstrate how a traceability-informed adjustment of the search sensitivity identifies hitherto missed orthologs in the fast-evolving microsporidia. Taken together, the evolutionary traceability helps to differentiate between true absence and non-detection of orthologs, and thus improves our understanding about the evolutionary conservation of functional protein networks.


Genes ◽  
2021 ◽  
Vol 12 (12) ◽  
pp. 1878
Author(s):  
Rui Niu ◽  
Jiajie Peng ◽  
Zhipeng Zhang ◽  
Xuequn Shang

The Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)—associated protein 9 (Cas9) system is a groundbreaking gene-editing tool, which has been widely adopted in biomedical research. However, the guide RNAs in CRISPR-Cas9 system may induce unwanted off-target activities and further affect the practical application of the technique. Most existing in silico prediction methods that focused on off-target activities possess limited predictive precision and remain to be improved. Hence, it is necessary to propose a new in silico prediction method to address this problem. In this work, a deep learning framework named R-CRISPR is presented, which devises an encoding scheme to encode gRNA-target sequences into binary matrices, a convolutional neural network as feature extractor, and a recurrent neural network to predict off-target activities with mismatch, insertion, or deletion. It is demonstrated that R-CRISPR surpasses six mainstream prediction methods with a significant improvement on mismatch-only datasets verified by GUIDE-seq. Compared with the state-of-art prediction methods, R-CRISPR also achieves competitive performance on datasets with mismatch, insertion, and deletion. Furthermore, experiments show that data concatenate could influence the quality of training data, and investigate the optimal combination of datasets.


Author(s):  
Masahiro Hattori ◽  
Masaaki Kotera

Chemical genomics is one of the cutting-edge research areas in the post-genomic era, which requires a sophisticated integration of heterogeneous information, i.e., genomic and chemical information. Enzymes play key roles for dynamic behavior of living organisms, linking information in the chemical space and genomic space. In this chapter, the authors report our recent efforts in this area, including the development of a similarity measure between two chemical compounds, a prediction system of a plausible enzyme for a given substrate and product pair, and two different approaches to predict the fate of a given compound in a metabolic pathway. General problems and possible future directions are also discussed, in hope to attract more activities from many researchers in this research area.


2013 ◽  
pp. 986-1009
Author(s):  
Masahiro Hattori ◽  
Masaaki Kotera

Chemical genomics is one of the cutting-edge research areas in the post-genomic era, which requires a sophisticated integration of heterogeneous information, i.e., genomic and chemical information. Enzymes play key roles for dynamic behavior of living organisms, linking information in the chemical space and genomic space. In this chapter, the authors report our recent efforts in this area, including the development of a similarity measure between two chemical compounds, a prediction system of a plausible enzyme for a given substrate and product pair, and two different approaches to predict the fate of a given compound in a metabolic pathway. General problems and possible future directions are also discussed, in hope to attract more activities from many researchers in this research area.


Sign in / Sign up

Export Citation Format

Share Document