Targeted Genome Mining—From Compound Discovery to Biosynthetic Pathway Elucidation

Natural products are an important source of novel investigational compounds in drug discovery. Especially in the field of antibiotics, Actinobacteria have been proven to be a reliable source for lead structures. The discovery of these natural products with activity- and structure-guided screenings has been impeded by the constant rediscovery of previously identified compounds. Additionally, a large discrepancy between produced natural products and biosynthetic potential in Actinobacteria, including representatives of the order Pseudonocardiales, has been revealed using genome sequencing. To turn this genomic potential into novel natural products, we used an approach including the in-silico pre-selection of unique biosynthetic gene clusters followed by their systematic heterologous expression. As a proof of concept, fifteen Saccharothrixespanaensis genomic library clones covering predicted biosynthetic gene clusters were chosen for expression in two heterologous hosts, Streptomyceslividans and Streptomycesalbus. As a result, two novel natural products, an unusual angucyclinone pentangumycin and a new type II polyketide synthase shunt product SEK90, were identified. After purification and structure elucidation, the biosynthetic pathways leading to the formation of pentangumycin and SEK90 were deduced using mutational analysis of the biosynthetic gene cluster and feeding experiments with 13C-labelled precursors.

Download Full-text

Recapitulation of the evolution of biosynthetic gene clusters reveals hidden chemical diversity on bacterial genomes

10.1101/020503 ◽

2015 ◽

Cited By ~ 6

Author(s):

Pablo Cruz-Morales ◽

Christian E. Martínez-Guerrero ◽

Marco A. Morales-Escalante ◽

Luis Yáñez-Guerra ◽

Johannes Florian Kopp ◽

...

Keyword(s):

Natural Products ◽

Chemical Space ◽

Streptomyces Coelicolor ◽

Genome Mining ◽

Gene Clusters ◽

Biosynthetic Gene Cluster ◽

Chemical Diversity ◽

Biosynthetic Gene ◽

Bacterial Genomes ◽

Biosynthetic Gene Clusters

AbstractNatural products have provided humans with antibiotics for millennia. However, a decline in the pace of chemical discovery exerts pressure on human health as antibiotic resistance spreads. The empirical nature of current genome mining approaches used for natural products research limits the chemical space that is explored. By integration of evolutionary concepts related to emergence of metabolism, we have gained fundamental insights that are translated into an alternative genome mining approach, termed EvoMining. As the founding assumption of EvoMining is the evolution of enzymes, we solved two milestone problems revealing unprecedented conversions. First, we report the biosynthetic gene cluster of the ‘orphan’ metabolite leupeptin in Streptomyces roseus. Second, we discover an enzyme involved in formation of an arsenic-carbon bond in Streptomyces coelicolor and Streptomyces lividans. This work provides evidence that bacterial chemical repertoire is underexploited, as well as an approach to accelerate the discovery of novel antibiotics from bacterial genomes.

Download Full-text

Expanding the Natural Products Heterologous Expression Repertoire in the Model Cyanobacterium Anabaena sp. Strain PCC 7120: Production of Pendolmycin and Teleocidin B-4

10.26434/chemrxiv.11316098.v1 ◽

2019 ◽

Cited By ~ 1

Author(s):

Patrick Videau ◽

Kaitlyn Wells ◽

Arun Singh ◽

Jessie Eiting ◽

Philip Proteau ◽

...

Keyword(s):

Natural Products ◽

Genome Mining ◽

Gene Clusters ◽

Combinatorial Biosynthesis ◽

Test Case ◽

Biosynthetic Gene ◽

Biosynthetic Gene Clusters ◽

Cyanobacterium Anabaena ◽

Anabaena Sp ◽

Pcc 7120

Cyanobacteria are prolific producers of natural products and genome mining has shown that many orphan biosynthetic gene clusters can be found in sequenced cyanobacterial genomes. New tools and methodologies are required to investigate these biosynthetic gene clusters and here we present the use of <i>Anabaena </i>sp. strain PCC 7120 as a host for combinatorial biosynthesis of natural products using the indolactam natural products (lyngbyatoxin A, pendolmycin, and teleocidin B-4) as a test case. We were able to successfully produce all three compounds using codon optimized genes from Actinobacteria. We also introduce a new plasmid backbone based on the native <i>Anabaena</i>7120 plasmid pCC7120ζ and show that production of teleocidin B-4 can be accomplished using a two-plasmid system, which can be introduced by co-conjugation.

Download Full-text

Identification and distribution of gene clusters required for synthesis of sphingolipid metabolism inhibitors in diverse species of the filamentous fungus Fusarium

BMC Genomics ◽

10.1186/s12864-020-06896-1 ◽

2020 ◽

Vol 21 (1) ◽

Cited By ~ 1

Author(s):

Hye-Seon Kim ◽

Jessica M. Lohmar ◽

Mark Busman ◽

Daren W. Brown ◽

Todd A. Naumann ◽

...

Keyword(s):

Polyketide Synthase ◽

Gene Clusters ◽

Biosynthetic Gene Cluster ◽

Sphingolipid Metabolism ◽

Biosynthetic Gene ◽

Biosynthetic Gene Clusters ◽

Dehydrogenase Gene ◽

Food And Feed ◽

Feed Safety

Abstract Background Sphingolipids are structural components and signaling molecules in eukaryotic membranes, and many organisms produce compounds that inhibit sphingolipid metabolism. Some of the inhibitors are structurally similar to the sphingolipid biosynthetic intermediate sphinganine and are referred to as sphinganine-analog metabolites (SAMs). The mycotoxins fumonisins, which are frequent contaminants in maize, are one family of SAMs. Due to food and feed safety concerns, fumonisin biosynthesis has been investigated extensively, including characterization of the fumonisin biosynthetic gene cluster in the agriculturally important fungi Aspergillus and Fusarium. Production of several other SAMs has also been reported in fungi, but there is almost no information on their biosynthesis. There is also little information on how widely SAM production occurs in fungi or on the extent of structural variation of fungal SAMs. Results Using fumonisin biosynthesis as a model, we predicted that SAM biosynthetic gene clusters in fungi should include a polyketide synthase (PKS), an aminotransferase and a dehydrogenase gene. Surveys of genome sequences identified five putative clusters with this three-gene combination in 92 of 186 Fusarium species examined. Collectively, the putative SAM clusters were distributed widely but discontinuously among the species. We propose that the SAM5 cluster confers production of a previously reported Fusarium SAM, 2-amino-14,16-dimethyloctadecan-3-ol (AOD), based on the occurrence of AOD production only in species with the cluster and on deletion analysis of the SAM5 cluster PKS gene. We also identified SAM clusters in 24 species of other fungal genera, and propose that one of the clusters confers production of sphingofungin, a previously reported Aspergillus SAM. Conclusion Our results provide a genomics approach to identify novel SAM biosynthetic gene clusters in fungi, which should in turn contribute to identification of novel SAMs with applications in medicine and other fields. Information about novel SAMs could also provide insights into the role of SAMs in the ecology of fungi. Such insights have potential to contribute to strategies to reduce fumonisin contamination in crops and to control crop diseases caused by SAM-producing fungi.

Download Full-text

Genomic analysis of siderophore β-hydroxylases reveals divergent stereocontrol and expands the condensation domain family

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1903161116 ◽

2019 ◽

Vol 116 (40) ◽

pp. 19805-19814 ◽

Cited By ~ 8

Author(s):

Zachary L. Reitz ◽

Clifford D. Hardy ◽

Jaewon Suk ◽

Jean Bouvet ◽

Alison Butler

Keyword(s):

Predictive Power ◽

Genome Mining ◽

Genomic Analysis ◽

Gene Clusters ◽

Biosynthetic Gene Cluster ◽

Biosynthetic Gene ◽

Biosynthetic Gene Clusters ◽

Peptide Synthetase ◽

Condensation Domain

Genome mining of biosynthetic pathways streamlines discovery of secondary metabolites but can leave ambiguities in the predicted structures, which must be rectified experimentally. Through coupling the reactivity predicted by biosynthetic gene clusters with verified structures, the origin of the β-hydroxyaspartic acid diastereomers in siderophores is reported herein. Two functional subtypes of nonheme Fe(II)/α-ketoglutarate–dependent aspartyl β-hydroxylases are identified in siderophore biosynthetic gene clusters, which differ in genomic organization—existing either as fused domains (IβHAsp) at the carboxyl terminus of a nonribosomal peptide synthetase (NRPS) or as stand-alone enzymes (TβHAsp)—and each directs opposite stereoselectivity of Asp β-hydroxylation. The predictive power of this subtype delineation is confirmed by the stereochemical characterization of β-OHAsp residues in pyoverdine GB-1, delftibactin, histicorrugatin, and cupriachelin. The l-threo (2S, 3S) β-OHAsp residues of alterobactin arise from hydroxylation by the β-hydroxylase domain integrated into NRPS AltH, while l-erythro (2S, 3R) β-OHAsp in delftibactin arises from the stand-alone β-hydroxylase DelD. Cupriachelin contains both l-threo and l-erythro β-OHAsp, consistent with the presence of both types of β-hydroxylases in the biosynthetic gene cluster. A third subtype of nonheme Fe(II)/α-ketoglutarate–dependent enzymes (IβHHis) hydroxylates histidyl residues with l-threo stereospecificity. A previously undescribed, noncanonical member of the NRPS condensation domain superfamily is identified, named the interface domain, which is proposed to position the β-hydroxylase and the NRPS-bound amino acid prior to hydroxylation. Through mapping characterized β-OHAsp diastereomers to the phylogenetic tree of siderophore β-hydroxylases, methods to predict β-OHAsp stereochemistry in silico are realized.

Download Full-text

A Deep Learning Genome-Mining Strategy Improves Biosynthetic Gene Cluster Prediction

10.1101/500694 ◽

2018 ◽

Author(s):

Geoffrey D. Hannigan ◽

David Prihoda ◽

Andrej Palicka ◽

Jindrich Soukup ◽

Ondrej Klempir ◽

...

Keyword(s):

Natural Products ◽

Deep Learning ◽

Learning Strategy ◽

Genome Mining ◽

Gene Clusters ◽

Biosynthetic Gene Cluster ◽

Biosynthetic Gene ◽

Antimicrobial Drugs ◽

Drug Candidates ◽

Significant Step

AbstractNatural products represent a rich reservoir of small molecule drug candidates utilized as antimicrobial drugs, anticancer therapies, and immunomodulatory agents. These molecules are microbial secondary metabolites synthesized by co-localized genes termed Biosynthetic Gene Clusters (BGCs). The increase in full microbial genomes and similar resources has led to development of BGC prediction algorithms, although their precision and ability to identify novel BGC classes could be improved. Here we present a deep learning strategy (DeepBGC) that offers more accurate BGC identification and an improved ability to extrapolate and identify novel BGC classes compared to existing tools. We supplemented this with downstream random forest classifiers that accurately predicted BGC product classes and potential chemical activity. Application of DeepBGC to bacterial genomes uncovered previously undetectable BGCs that may code for natural products with novel biologic activities. The improved accuracy and classification ability of DeepBGC represents a significant step forward forin-silicoBGC identification.

Download Full-text

Recent development of antiSMASH and other computational approaches to mine secondary metabolite biosynthetic gene clusters

Briefings in Bioinformatics ◽

10.1093/bib/bbx146 ◽

2017 ◽

Vol 20 (4) ◽

pp. 1103-1113 ◽

Cited By ~ 37

Author(s):

Kai Blin ◽

Hyun Uk Kim ◽

Marnix H Medema ◽

Tilmann Weber

Keyword(s):

Natural Products ◽

Small Molecules ◽

Sequence Similarity ◽

Genome Mining ◽

Gene Clusters ◽

Biosynthetic Gene ◽

Biosynthetic Gene Clusters ◽

Rule Based ◽

Chemical Structures ◽

Annotation Quality

Abstract Many drugs are derived from small molecules produced by microorganisms and plants, so-called natural products. Natural products have diverse chemical structures, but the biosynthetic pathways producing those compounds are often organized as biosynthetic gene clusters (BGCs) and follow a highly conserved biosynthetic logic. This allows for the identification of core biosynthetic enzymes using genome mining strategies that are based on the sequence similarity of the involved enzymes/genes. However, mining for a variety of BGCs quickly approaches a complexity level where manual analyses are no longer possible and require the use of automated genome mining pipelines, such as the antiSMASH software. In this review, we discuss the principles underlying the predictions of antiSMASH and other tools and provide practical advice for their application. Furthermore, we discuss important caveats such as rule-based BGC detection, sequence and annotation quality and cluster boundary prediction, which all have to be considered while planning for, performing and analyzing the results of genome mining studies.

Download Full-text

MIBiG 2.0: a repository for biosynthetic gene clusters of known function

Nucleic Acids Research ◽

10.1093/nar/gkz882 ◽

2019 ◽

Cited By ~ 31

Author(s):

Satria A Kautsar ◽

Kai Blin ◽

Simon Shaw ◽

Jorge C Navarro-Muñoz ◽

Barbara R Terlouw ◽

...

Keyword(s):

Genome Mining ◽

Gene Clusters ◽

Biosynthetic Gene Cluster ◽

Biosynthetic Gene ◽

Biosynthetic Gene Clusters ◽

Data Schema ◽

Cluster Data ◽

Structure Databases ◽

And Storage

Abstract Fueled by the explosion of (meta)genomic data, genome mining of specialized metabolites has become a major technology for drug discovery and studying microbiome ecology. In these efforts, computational tools like antiSMASH have played a central role through the analysis of Biosynthetic Gene Clusters (BGCs). Thousands of candidate BGCs from microbial genomes have been identified and stored in public databases. Interpreting the function and novelty of these predicted BGCs requires comparison with a well-documented set of BGCs of known function. The MIBiG (Minimum Information about a Biosynthetic Gene Cluster) Data Standard and Repository was established in 2015 to enable curation and storage of known BGCs. Here, we present MIBiG 2.0, which encompasses major updates to the schema, the data, and the online repository itself. Over the past five years, 851 new BGCs have been added. Additionally, we performed extensive manual data curation of all entries to improve the annotation quality of our repository. We also redesigned the data schema to ensure the compliance of future annotations. Finally, we improved the user experience by adding new features such as query searches and a statistics page, and enabled direct link-outs to chemical structure databases. The repository is accessible online at https://mibig.secondarymetabolites.org/.

Download Full-text

Further Biochemical Profiling of Hypholoma fasciculare Metabolome Reveals Its Chemogenetic Diversity

Frontiers in Bioengineering and Biotechnology ◽

10.3389/fbioe.2021.567384 ◽

2021 ◽

Vol 9 ◽

Author(s):

Suhad A. A. Al-Salihi ◽

Ian D. Bull ◽

Raghad Al-Salhi ◽

Paul J. Gates ◽

Kifah S. M. Salih ◽

...

Keyword(s):

Natural Products ◽

Chemical Properties ◽

Gene Clusters ◽

Biosynthetic Gene Cluster ◽

Biosynthetic Gene ◽

Biosynthetic Gene Clusters ◽

Microbial Drug Resistance ◽

Bioactive Natural Products ◽

Highly Active ◽

Hypholoma Fasciculare

Natural products with novel chemistry are urgently needed to battle the continued increase in microbial drug resistance. Mushroom-forming fungi are underutilized as a source of novel antibiotics in the literature due to their challenging culture preparation and genetic intractability. However, modern fungal molecular and synthetic biology tools have renewed interest in exploring mushroom fungi for novel therapeutic agents. The aims of this study were to investigate the secondary metabolites of nine basidiomycetes, screen their biological and chemical properties, and then investigate the genetic pathways associated with their production. Of the nine fungi selected, Hypholoma fasciculare was revealed to be a highly active antagonistic species, with antimicrobial activity against three different microorganisms: Bacillus subtilis, Escherichia coli, and Saccharomyces cerevisiae. Genomic comparisons and chromatographic studies were employed to characterize more than 15 biosynthetic gene clusters and resulted in the identification of 3,5-dichloromethoxy benzoic acid as a potential antibacterial compound. The biosynthetic gene cluster for this product is also predicted. This study reinforces the potential of mushroom-forming fungi as an underexplored reservoir of bioactive natural products. Access to genomic data, and chemical-based frameworks, will assist the development and application of novel molecules with applications in both the pharmaceutical and agrochemical industries.

Download Full-text

Synergistic activity of cosecreted natural products from amoebae-associated bacteria

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1721790115 ◽

2018 ◽

Vol 115 (15) ◽

pp. 3758-3763 ◽

Cited By ~ 27

Author(s):

Johannes Arp ◽

Sebastian Götze ◽

Ruchira Mukherji ◽

Derek J. Mattern ◽

María García-Altares ◽

...

Keyword(s):

Natural Products ◽

Polyketide Synthase ◽

Bacterial Genome ◽

Gene Clusters ◽

Homoserine Lactone ◽

Microbial Interactions ◽

Signal Molecules ◽

Synergistic Activity ◽

Biosynthetic Gene ◽

Biosynthetic Gene Clusters

Investigating microbial interactions from an ecological perspective is a particularly fruitful approach to unveil both new chemistry and bioactivity. Microbial predator–prey interactions in particular rely on natural products as signal or defense molecules. In this context, we identified a grazing-resistant Pseudomonas strain, isolated from the bacterivorous amoeba Dictyostelium discoideum. Genome analysis of this bacterium revealed the presence of two biosynthetic gene clusters that were found adjacent to each other on a contiguous stretch of the bacterial genome. Although one cluster codes for the polyketide synthase producing the known antibiotic mupirocin, the other cluster encodes a nonribosomal peptide synthetase leading to the unreported cyclic lipopeptide jessenipeptin. We describe its complete structure elucidation, as well as its synergistic activity against methicillin-resistant Staphylococcus aureus, when in combination with mupirocin. Both biosynthetic gene clusters are regulated by quorum-sensing systems, with 3-oxo-decanoyl homoserine lactone (3-oxo-C10-AHL) and hexanoyl homoserine lactone (C6-AHL) being the respective signal molecules. This study highlights the regulation, richness, and complex interplay of bacterial natural products that emerge in the context of microbial competition.

Download Full-text

Deep-BGCpred: A unified deep learning genome-mining framework for biosynthetic gene cluster prediction

10.1101/2021.11.15.468547 ◽

2021 ◽

Author(s):

Ziyi Yang ◽

Benben Liao ◽

Changyu Hsieh ◽

Chao Han ◽

Liang Fang ◽

...

Keyword(s):

Natural Products ◽

Deep Learning ◽

High Throughput Sequencing ◽

Short Term Memory ◽

Genome Mining ◽

Gene Clusters ◽

Biosynthetic Gene Cluster ◽

Bioactive Molecules ◽

Dual Model ◽

Biosynthetic Gene

Natural products produced by microorganisms constitute an important source of essential pharmaceuticals, including antimicrobial and anti-tumor drugs. These bioactive molecules are microbial secondary metabolites synthesized by co-localized genes termed Biosynthetic Gene Clusters (BGCs). The rapid increase of microbial genomics resources, due to the availability of high-throughput sequencing technologies, has spurred the development of computational methods for microbial genome mining for BGC discovery. Current machine learning methods, however, have limited successes in uncovering novel BGCs due to an excessive number of false positives in their predictions. To this end, we propose Deep-BGCpred, a framework that effectively addresses the aforementioned issue by improving a deep learning model termed DeepBGC. The new model embeds multi-source protein family domains and employs a stacked Bidirectional Long Short-Term Memory model to boost accuracy for BGC identifications. In particular, it integrates two customized strategies, sliding window strategy and dual-model serial screening, to improve the model's performance stability and reduce the number of false positive in BGC predictions. We compare the proposed model against other well-established methods on common benchmarks and achieve new state-of-the-art results with convincing evidences. We expect that researchers working on genome mining for natural products may be greatly benefited from our newly proposed method, Deep-BGCpred.

Download Full-text