scholarly journals A deep learning genome-mining strategy for biosynthetic gene cluster prediction

2019 ◽  
Vol 47 (18) ◽  
pp. e110-e110 ◽  
Author(s):  
Geoffrey D Hannigan ◽  
David Prihoda ◽  
Andrej Palicka ◽  
Jindrich Soukup ◽  
Ondrej Klempir ◽  
...  

Abstract Natural products represent a rich reservoir of small molecule drug candidates utilized as antimicrobial drugs, anticancer therapies, and immunomodulatory agents. These molecules are microbial secondary metabolites synthesized by co-localized genes termed Biosynthetic Gene Clusters (BGCs). The increase in full microbial genomes and similar resources has led to development of BGC prediction algorithms, although their precision and ability to identify novel BGC classes could be improved. Here we present a deep learning strategy (DeepBGC) that offers reduced false positive rates in BGC identification and an improved ability to extrapolate and identify novel BGC classes compared to existing machine-learning tools. We supplemented this with random forest classifiers that accurately predicted BGC product classes and potential chemical activity. Application of DeepBGC to bacterial genomes uncovered previously undetectable putative BGCs that may code for natural products with novel biologic activities. The improved accuracy and classification ability of DeepBGC represents a major addition to in-silico BGC identification.

2018 ◽  
Author(s):  
Geoffrey D. Hannigan ◽  
David Prihoda ◽  
Andrej Palicka ◽  
Jindrich Soukup ◽  
Ondrej Klempir ◽  
...  

AbstractNatural products represent a rich reservoir of small molecule drug candidates utilized as antimicrobial drugs, anticancer therapies, and immunomodulatory agents. These molecules are microbial secondary metabolites synthesized by co-localized genes termed Biosynthetic Gene Clusters (BGCs). The increase in full microbial genomes and similar resources has led to development of BGC prediction algorithms, although their precision and ability to identify novel BGC classes could be improved. Here we present a deep learning strategy (DeepBGC) that offers more accurate BGC identification and an improved ability to extrapolate and identify novel BGC classes compared to existing tools. We supplemented this with downstream random forest classifiers that accurately predicted BGC product classes and potential chemical activity. Application of DeepBGC to bacterial genomes uncovered previously undetectable BGCs that may code for natural products with novel biologic activities. The improved accuracy and classification ability of DeepBGC represents a significant step forward forin-silicoBGC identification.


2021 ◽  
Author(s):  
Ziyi Yang ◽  
Benben Liao ◽  
Changyu Hsieh ◽  
Chao Han ◽  
Liang Fang ◽  
...  

Natural products produced by microorganisms constitute an important source of essential pharmaceuticals, including antimicrobial and anti-tumor drugs. These bioactive molecules are microbial secondary metabolites synthesized by co-localized genes termed Biosynthetic Gene Clusters (BGCs). The rapid increase of microbial genomics resources, due to the availability of high-throughput sequencing technologies, has spurred the development of computational methods for microbial genome mining for BGC discovery. Current machine learning methods, however, have limited successes in uncovering novel BGCs due to an excessive number of false positives in their predictions. To this end, we propose Deep-BGCpred, a framework that effectively addresses the aforementioned issue by improving a deep learning model termed DeepBGC. The new model embeds multi-source protein family domains and employs a stacked Bidirectional Long Short-Term Memory model to boost accuracy for BGC identifications. In particular, it integrates two customized strategies, sliding window strategy and dual-model serial screening, to improve the model's performance stability and reduce the number of false positive in BGC predictions. We compare the proposed model against other well-established methods on common benchmarks and achieve new state-of-the-art results with convincing evidences. We expect that researchers working on genome mining for natural products may be greatly benefited from our newly proposed method, Deep-BGCpred.


2015 ◽  
Author(s):  
Pablo Cruz-Morales ◽  
Christian E. Martínez-Guerrero ◽  
Marco A. Morales-Escalante ◽  
Luis Yáñez-Guerra ◽  
Johannes Florian Kopp ◽  
...  

AbstractNatural products have provided humans with antibiotics for millennia. However, a decline in the pace of chemical discovery exerts pressure on human health as antibiotic resistance spreads. The empirical nature of current genome mining approaches used for natural products research limits the chemical space that is explored. By integration of evolutionary concepts related to emergence of metabolism, we have gained fundamental insights that are translated into an alternative genome mining approach, termed EvoMining. As the founding assumption of EvoMining is the evolution of enzymes, we solved two milestone problems revealing unprecedented conversions. First, we report the biosynthetic gene cluster of the ‘orphan’ metabolite leupeptin in Streptomyces roseus. Second, we discover an enzyme involved in formation of an arsenic-carbon bond in Streptomyces coelicolor and Streptomyces lividans. This work provides evidence that bacterial chemical repertoire is underexploited, as well as an approach to accelerate the discovery of novel antibiotics from bacterial genomes.


2020 ◽  
Vol 8 (12) ◽  
pp. 2034
Author(s):  
Nils Gummerlich ◽  
Yuriy Rebets ◽  
Constanze Paulus ◽  
Josef Zapp ◽  
Andriy Luzhetskyy

Natural products are an important source of novel investigational compounds in drug discovery. Especially in the field of antibiotics, Actinobacteria have been proven to be a reliable source for lead structures. The discovery of these natural products with activity- and structure-guided screenings has been impeded by the constant rediscovery of previously identified compounds. Additionally, a large discrepancy between produced natural products and biosynthetic potential in Actinobacteria, including representatives of the order Pseudonocardiales, has been revealed using genome sequencing. To turn this genomic potential into novel natural products, we used an approach including the in-silico pre-selection of unique biosynthetic gene clusters followed by their systematic heterologous expression. As a proof of concept, fifteen Saccharothrixespanaensis genomic library clones covering predicted biosynthetic gene clusters were chosen for expression in two heterologous hosts, Streptomyceslividans and Streptomycesalbus. As a result, two novel natural products, an unusual angucyclinone pentangumycin and a new type II polyketide synthase shunt product SEK90, were identified. After purification and structure elucidation, the biosynthetic pathways leading to the formation of pentangumycin and SEK90 were deduced using mutational analysis of the biosynthetic gene cluster and feeding experiments with 13C-labelled precursors.


Author(s):  
Patrick Videau ◽  
Kaitlyn Wells ◽  
Arun Singh ◽  
Jessie Eiting ◽  
Philip Proteau ◽  
...  

Cyanobacteria are prolific producers of natural products and genome mining has shown that many orphan biosynthetic gene clusters can be found in sequenced cyanobacterial genomes. New tools and methodologies are required to investigate these biosynthetic gene clusters and here we present the use of <i>Anabaena </i>sp. strain PCC 7120 as a host for combinatorial biosynthesis of natural products using the indolactam natural products (lyngbyatoxin A, pendolmycin, and teleocidin B-4) as a test case. We were able to successfully produce all three compounds using codon optimized genes from Actinobacteria. We also introduce a new plasmid backbone based on the native <i>Anabaena</i>7120 plasmid pCC7120ζ and show that production of teleocidin B-4 can be accomplished using a two-plasmid system, which can be introduced by co-conjugation.


2019 ◽  
Vol 116 (40) ◽  
pp. 19805-19814 ◽  
Author(s):  
Zachary L. Reitz ◽  
Clifford D. Hardy ◽  
Jaewon Suk ◽  
Jean Bouvet ◽  
Alison Butler

Genome mining of biosynthetic pathways streamlines discovery of secondary metabolites but can leave ambiguities in the predicted structures, which must be rectified experimentally. Through coupling the reactivity predicted by biosynthetic gene clusters with verified structures, the origin of the β-hydroxyaspartic acid diastereomers in siderophores is reported herein. Two functional subtypes of nonheme Fe(II)/α-ketoglutarate–dependent aspartyl β-hydroxylases are identified in siderophore biosynthetic gene clusters, which differ in genomic organization—existing either as fused domains (IβHAsp) at the carboxyl terminus of a nonribosomal peptide synthetase (NRPS) or as stand-alone enzymes (TβHAsp)—and each directs opposite stereoselectivity of Asp β-hydroxylation. The predictive power of this subtype delineation is confirmed by the stereochemical characterization of β-OHAsp residues in pyoverdine GB-1, delftibactin, histicorrugatin, and cupriachelin. The l-threo (2S, 3S) β-OHAsp residues of alterobactin arise from hydroxylation by the β-hydroxylase domain integrated into NRPS AltH, while l-erythro (2S, 3R) β-OHAsp in delftibactin arises from the stand-alone β-hydroxylase DelD. Cupriachelin contains both l-threo and l-erythro β-OHAsp, consistent with the presence of both types of β-hydroxylases in the biosynthetic gene cluster. A third subtype of nonheme Fe(II)/α-ketoglutarate–dependent enzymes (IβHHis) hydroxylates histidyl residues with l-threo stereospecificity. A previously undescribed, noncanonical member of the NRPS condensation domain superfamily is identified, named the interface domain, which is proposed to position the β-hydroxylase and the NRPS-bound amino acid prior to hydroxylation. Through mapping characterized β-OHAsp diastereomers to the phylogenetic tree of siderophore β-hydroxylases, methods to predict β-OHAsp stereochemistry in silico are realized.


2020 ◽  
Author(s):  
Suhad A.A. Al-Salihi ◽  
Ian Bull ◽  
Raghad A. Al-Salhi ◽  
Paul J. Gates ◽  
Kifah Salih ◽  
...  

AbstractThere is a desperate need in continuing the search for natural products with novel mechanism to battle the constant increase of microbial drug resistance. Previously mushroom forming fungi were neglected as a source of novel antibiotics, due to the difficulties associated with their culture preparation and genetic tractability. However, modern fungal molecular and synthetic biology tools, renewed the interest in exploring mushroom fungi for novel therapeutics. The aim of this study was to have a comprehensive picture of nine basidiomycetes secondary metabolites (SM), screen their biological and chemical properties to describe the genetic pathways associated with their production. H. fasciculare revealed to be highly active antagonistic species, with antimicrobial activity against three different microorganisms - Bacillus subtilis, Escherichia coli and Saccharomyces cerevisiae-. Extensive genomic comparison and chemical analysis using analytical chromatography, led to the characterisation of more than 15 variant biosynthetic gene clusters and the first identification of a potent antibacterial metabolite-3, 5-dichloromethoxy benzoic acid (3, 5-D)-in this species, for which a biosynthetic gene cluster was predicted. This work demonstrates the great potential of mushroom forming fungi as a reservoir of bioactive natural products which are currently unexplored, and that access to their genomic data and structural diversity natural products via utilizing modern computational analysis and efficient chemical methods, could accelerate the development and applications of such distinct molecules in both pharmaceutical and agrochemical industry.


2020 ◽  
Author(s):  
Yunchang Xie ◽  
Jiawen Chen ◽  
Bo Wang ◽  
Tai Chen ◽  
Junyu Chen ◽  
...  

Abstract Backgrounds: Activation of silent biosynthetic gene clusters (BGCs) in marine-derived actinomycete strains is a feasible strategy to discover bioactive natural products. Actinoalloteichus sp. AHMU CJ021, isolated from the seashore, was shown to contain an intact but silent caerulomycin A (CRM A) BGC-cam in its genome. Thus, a genome mining work was preformed to activate the strain’s bioproduction of CRM A, an immunosuppressive drug lead with diverse bioactivities.Results: To well activate the expression of cam, ribosomal engineering was adopted to treat the wild type Actinoalloteichus sp. AHMU CJ021. The initial mutant strain XC-11G with gentamycin resistance and CRM A bioproduction titer of 42.51 ± 4.22 mg/L was selected from all generated mutant strains by gene expression comparison of the essential biosynthetic gene-camE. The titer of CRM A bioproduction was then improved by two strain breeding methods via UV mutagenesis and cofactor engineering-directed increasing of intracellular riboflavin, which finally generated the optimal mutant strain XC-11GUR with a CRM A bioproduction titer of 113.91 ± 7.58 mg/L. Subsequently, this titer of strain XC-11GUR was improved to 618.61 ± 16.29 mg/L through medium optimization together with further adjustment derived from response surface methodology. In terms of this 14.7 folds increase in the titer of CRM A compared to the initial value, strain XC-GUR could be a well alternative strain for CRM A development.Conclusions: Our results have constructed an ideal CRM A producer. More importantly, our efforts also have demonstrated the effectiveness of abovementioned combinatorial strategies, which is applicable to the genome mining of bioactive natural products from abundant actinomycetes strains.


Marine Drugs ◽  
2019 ◽  
Vol 17 (7) ◽  
pp. 388 ◽  
Author(s):  
Li Liao ◽  
Shiyuan Su ◽  
Bin Zhao ◽  
Chengqi Fan ◽  
Jin Zhang ◽  
...  

Rare actinobacterial species are considered as potential resources of new natural products. Marisediminicola antarctica ZS314T is the only type strain of the novel actinobacterial genus Marisediminicola isolated from intertidal sediments in East Antarctica. The strain ZS314T was able to produce reddish orange pigments at low temperatures, showing characteristics of carotenoids. To understand the biosynthetic potential of this strain, the genome was completely sequenced for data mining. The complete genome had 3,352,609 base pairs (bp), much smaller than most genomes of actinomycetes. Five biosynthetic gene clusters (BGCs) were predicted in the genome, including a gene cluster responsible for the biosynthesis of C50 carotenoid, and four additional BGCs of unknown oligosaccharide, salinixanthin, alkylresorcinol derivatives, and NRPS (non-ribosomal peptide synthetase) or amino acid-derived compounds. Further experimental characterization indicated that the strain may produce C.p.450-like carotenoids, supporting the genomic data analysis. A new xanthorhodopsin gene was discovered along with the analysis of the salinixanthin biosynthetic gene cluster. Since little is known about this genus, this work improves our understanding of its biosynthetic potential and provides opportunities for further investigation of natural products and strategies for adaptation to the extreme Antarctic environment.


2017 ◽  
Vol 20 (4) ◽  
pp. 1103-1113 ◽  
Author(s):  
Kai Blin ◽  
Hyun Uk Kim ◽  
Marnix H Medema ◽  
Tilmann Weber

Abstract Many drugs are derived from small molecules produced by microorganisms and plants, so-called natural products. Natural products have diverse chemical structures, but the biosynthetic pathways producing those compounds are often organized as biosynthetic gene clusters (BGCs) and follow a highly conserved biosynthetic logic. This allows for the identification of core biosynthetic enzymes using genome mining strategies that are based on the sequence similarity of the involved enzymes/genes. However, mining for a variety of BGCs quickly approaches a complexity level where manual analyses are no longer possible and require the use of automated genome mining pipelines, such as the antiSMASH software. In this review, we discuss the principles underlying the predictions of antiSMASH and other tools and provide practical advice for their application. Furthermore, we discuss important caveats such as rule-based BGC detection, sequence and annotation quality and cluster boundary prediction, which all have to be considered while planning for, performing and analyzing the results of genome mining studies.


Sign in / Sign up

Export Citation Format

Share Document