The antiSMASH database version 3: increased taxonomic coverage and new query features for modular enzymes

Abstract Microorganisms produce natural products that are frequently used in the development of antibacterial, antiviral, and anticancer drugs, pesticides, herbicides, or fungicides. In recent years, genome mining has evolved into a prominent method to access this potential. antiSMASH is one of the most popular tools for this task. Here, we present version 3 of the antiSMASH database, providing a means to access and query precomputed antiSMASH-5.2-detected biosynthetic gene clusters from representative, publicly available, high-quality microbial genomes via an interactive graphical user interface. In version 3, the database contains 147 517 high quality BGC regions from 388 archaeal, 25 236 bacterial and 177 fungal genomes and is available at https://antismash-db.secondarymetabolites.org/.

Download Full-text

Panning for gold in mould: can we increase the odds for fungal genome mining?

Organic & Biomolecular Chemistry ◽

10.1039/c7ob03127k ◽

2018 ◽

Vol 16 (10) ◽

pp. 1620-1626 ◽

Cited By ~ 10

Author(s):

Cameron L. M. Gilchrist ◽

Hang Li ◽

Yit-Heng Chooi

Keyword(s):

Secondary Metabolite ◽

Genome Mining ◽

Gene Clusters ◽

Fungal Genome ◽

Biosynthetic Gene ◽

Biosynthetic Gene Clusters ◽

Fungal Genomes

A perspective on existing and emerging strategies for the prioritisation of secondary metabolite biosynthetic gene clusters (BGCs) to increase the odds of fruitful mining of fungal genomes.

Download Full-text

BiG-SLiCE: A Highly Scalable Tool Maps the Diversity of 1.2 Million Biosynthetic Gene Clusters

10.1101/2020.08.17.240838 ◽

2020 ◽

Cited By ~ 3

Author(s):

Satria A. Kautsar ◽

Justin J. J. van der Hooft ◽

Dick de Ridder ◽

Marnix H. Medema

Keyword(s):

Natural Product ◽

Biological Activities ◽

Genome Mining ◽

Gene Clusters ◽

Genomic Diversity ◽

Biosynthetic Gene ◽

Biosynthetic Gene Clusters ◽

Microbial Genomes ◽

Natural Product Discovery ◽

User Friendly

AbstractBackgroundGenome mining for Biosynthetic Gene Clusters (BGCs) has become an integral part of natural product discovery. The >200,000 microbial genomes now publicly available hold information on abundant novel chemistry. One way to navigate this vast genomic diversity is through comparative analysis of homologous BGCs, which allows identification of cross-species patterns that can be matched to the presence of metabolites or biological activities. However, current tools suffer from a bottleneck caused by the expensive network-based approach used to group these BGCs into Gene Cluster Families (GCFs).ResultsHere, we introduce BiG-SLiCE, a tool designed to cluster massive numbers of BGCs. By representing them in Euclidean space, BiG-SLiCE can group BGCs into GCFs in a non-pairwise, near-linear fashion. We used BiG-SLiCE to analyze 1,225,071 BGCs collected from 209,206 publicly available microbial genomes and metagenome-assembled genomes (MAGs) within ten days on a typical 36-cores CPU server. We demonstrate the utility of such analyses by reconstructing a global map of secondary metabolic diversity across taxonomy to identify uncharted biosynthetic potential. BiG-SLiCE also provides a "query mode" that can efficiently place newly sequenced BGCs into previously computed GCFs, plus a powerful output visualization engine that facilitates user-friendly data exploration.ConclusionsBiG-SLiCE opens up new possibilities to accelerate natural product discovery and offers a first step towards constructing a global, searchable interconnected network of BGCs. As more genomes get sequenced from understudied taxa, more information can be mined to highlight their potentially novel chemistry. BiG-SLiCE is available via https://github.com/medema-group/bigslice.

Download Full-text

An interpreted atlas of biosynthetic gene clusters from 1,000 fungal genomes

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.2020230118 ◽

2021 ◽

Vol 118 (19) ◽

pp. e2020230118

Author(s):

Matthew T. Robey ◽

Lindsay K. Caesar ◽

Milton T. Drott ◽

Nancy P. Keller ◽

Neil L. Kelleher

Keyword(s):

Natural Products ◽

Chemical Space ◽

Genome Mining ◽

Fold Increase ◽

Gene Clusters ◽

Biosynthetic Gene ◽

Biosynthetic Gene Clusters ◽

Automated Annotation ◽

Fungal Genomes ◽

Species Specific

Fungi are prolific producers of natural products, compounds which have had a large societal impact as pharmaceuticals, mycotoxins, and agrochemicals. Despite the availability of over 1,000 fungal genomes and several decades of compound discovery efforts from fungi, the biosynthetic gene clusters (BGCs) encoded by these genomes and the associated chemical space have yet to be analyzed systematically. Here, we provide detailed annotation and analyses of fungal biosynthetic and chemical space to enable genome mining and discovery of fungal natural products. Using 1,037 genomes from species across the fungal kingdom (e.g., Ascomycota, Basidiomycota, and non-Dikarya taxa), 36,399 predicted BGCs were organized into a network of 12,067 gene cluster families (GCFs). Anchoring these GCFs with reference BGCs enabled automated annotation of 2,026 BGCs with predicted metabolite scaffolds. We performed parallel analyses of the chemical repertoire of fungi, organizing 15,213 fungal compounds into 2,945 molecular families (MFs). The taxonomic landscape of fungal GCFs is largely species specific, though select families such as the equisetin GCF are present across vast phylogenetic distances with parallel diversifications in the GCF and MF. We compare these fungal datasets with a set of 5,453 bacterial genomes and their BGCs and 9,382 bacterial compounds, revealing dramatic differences between bacterial and fungal biosynthetic logic and chemical space. These genomics and cheminformatics analyses reveal the large extent to which fungal and bacterial sources represent distinct compound reservoirs. With a >10-fold increase in the number of interpreted strains and annotated BGCs, this work better regularizes the biosynthetic potential of fungi for rational compound discovery.

Download Full-text

An Interpreted Atlas of Biosynthetic Gene Clusters from 1000 Fungal Genomes

10.1101/2020.09.21.307157 ◽

2020 ◽

Author(s):

Matthew T. Robey ◽

Lindsay K. Caesar ◽

Milton T. Drott ◽

Nancy P. Keller ◽

Neil L. Kelleher

Keyword(s):

Natural Products ◽

Large Scale ◽

Ad Hoc ◽

Chemical Space ◽

Genome Mining ◽

Fold Increase ◽

Gene Clusters ◽

Biosynthetic Gene ◽

Biosynthetic Gene Clusters ◽

Fungal Genomes

AbstractFungi are prolific producers of natural products, compounds which have had a large societal impact as pharmaceuticals, mycotoxins, and agrochemicals. Despite the availability of over 1000 fungal genomes and several decades of compound discovery efforts from fungi, the biosynthetic gene clusters (BGCs) encoded by these genomes and the associated chemical space have yet to be analyzed systematically. Here we provide detailed annotation and analyses of fungal biosynthetic and chemical space to enable genome mining and discovery of fungal natural products. Using 1037 genomes from species across the fungal kingdom (e.g., Ascomycota, Basidiomycota, and non-Dikarya taxa), 36,399 predicted BGCs were organized into a network of 12,067 gene cluster families (GCFs). Anchoring these GCFs with reference BGCs enabled automated annotation of 2,026 BGCs with predicted metabolite scaffolds. We performed parallel analyses of the chemical repertoire of Fungi, organizing 15,213 fungal compounds into 2,945 molecular families (MFs). The taxonomic landscape of fungal GCFs is largely species-specific, though select families such as the equisetin GCF are present across vast phylogenetic distances with parallel diversifications in the GCF and MF. We compare these fungal datasets with a set of 5,453 bacterial genomes and their BGCs and 9,382 bacterial compounds, revealing dramatic differences between bacterial and fungal biosynthetic logic and chemical space. These genomics and cheminformatics analyses reveal the large extent to which fungal and bacterial sources represent distinct compound reservoirs. With a >10-fold increase in the number of interpreted strains and annotated BGCs, this work better regularizes the biosynthetic potential of fungi for rational compound discovery.Significance StatementFungi represent an underexploited resource for new compounds with applications in the pharmaceutical and agriscience industries. Despite the availability of >1000 fungal genomes, our knowledge of the biosynthetic space encoded by these genomes is limited and ad hoc. We present results from systematically organizing the biosynthetic content of 1037 fungal genomes, providing a resource for data-driven genome mining and large-scale comparison of the genetic and molecular repertoires produced in fungi and compare to those present in bacteria.

Download Full-text

BiG-SLiCE: A highly scalable tool maps the diversity of 1.2 million biosynthetic gene clusters

GigaScience ◽

10.1093/gigascience/giaa154 ◽

2021 ◽

Vol 10 (1) ◽

Cited By ~ 1

Author(s):

Satria A Kautsar ◽

Justin J J van der Hooft ◽

Dick de Ridder ◽

Marnix H Medema

Keyword(s):

Natural Product ◽

Biological Activities ◽

Genome Mining ◽

Gene Clusters ◽

Genomic Diversity ◽

Biosynthetic Gene ◽

Biosynthetic Gene Clusters ◽

Microbial Genomes ◽

Natural Product Discovery ◽

User Friendly

Abstract Background Genome mining for biosynthetic gene clusters (BGCs) has become an integral part of natural product discovery. The >200,000 microbial genomes now publicly available hold information on abundant novel chemistry. One way to navigate this vast genomic diversity is through comparative analysis of homologous BGCs, which allows identification of cross-species patterns that can be matched to the presence of metabolites or biological activities. However, current tools are hindered by a bottleneck caused by the expensive network-based approach used to group these BGCs into gene cluster families (GCFs). Results Here, we introduce BiG-SLiCE, a tool designed to cluster massive numbers of BGCs. By representing them in Euclidean space, BiG-SLiCE can group BGCs into GCFs in a non-pairwise, near-linear fashion. We used BiG-SLiCE to analyze 1,225,071 BGCs collected from 209,206 publicly available microbial genomes and metagenome-assembled genomes within 10 days on a typical 36-core CPU server. We demonstrate the utility of such analyses by reconstructing a global map of secondary metabolic diversity across taxonomy to identify uncharted biosynthetic potential. BiG-SLiCE also provides a “query mode” that can efficiently place newly sequenced BGCs into previously computed GCFs, plus a powerful output visualization engine that facilitates user-friendly data exploration. Conclusions BiG-SLiCE opens up new possibilities to accelerate natural product discovery and offers a first step towards constructing a global and searchable interconnected network of BGCs. As more genomes are sequenced from understudied taxa, more information can be mined to highlight their potentially novel chemistry. BiG-SLiCE is available via https://github.com/medema-group/bigslice.

Download Full-text

Expanding the Natural Products Heterologous Expression Repertoire in the Model Cyanobacterium Anabaena sp. Strain PCC 7120: Production of Pendolmycin and Teleocidin B-4

10.26434/chemrxiv.11316098.v1 ◽

2019 ◽

Cited By ~ 1

Author(s):

Patrick Videau ◽

Kaitlyn Wells ◽

Arun Singh ◽

Jessie Eiting ◽

Philip Proteau ◽

...

Keyword(s):

Natural Products ◽

Genome Mining ◽

Gene Clusters ◽

Combinatorial Biosynthesis ◽

Test Case ◽

Biosynthetic Gene ◽

Biosynthetic Gene Clusters ◽

Cyanobacterium Anabaena ◽

Anabaena Sp ◽

Pcc 7120

Cyanobacteria are prolific producers of natural products and genome mining has shown that many orphan biosynthetic gene clusters can be found in sequenced cyanobacterial genomes. New tools and methodologies are required to investigate these biosynthetic gene clusters and here we present the use of <i>Anabaena </i>sp. strain PCC 7120 as a host for combinatorial biosynthesis of natural products using the indolactam natural products (lyngbyatoxin A, pendolmycin, and teleocidin B-4) as a test case. We were able to successfully produce all three compounds using codon optimized genes from Actinobacteria. We also introduce a new plasmid backbone based on the native <i>Anabaena</i>7120 plasmid pCC7120ζ and show that production of teleocidin B-4 can be accomplished using a two-plasmid system, which can be introduced by co-conjugation.

Download Full-text

Discovery of Unusual Cyanobacterial Tryptophan-containing Anabaenopeptins by MS/MS Based Molecular Networking

10.20944/preprints202007.0562.v1 ◽

2020 ◽

Author(s):

Subhasish Saha ◽

Germana Esposito ◽

Petra Urajova ◽

Jan Mareš ◽

Daniela Ewe ◽

...

Keyword(s):

Multidisciplinary Approach ◽

Spectroscopic Analysis ◽

Chemical Space ◽

Genome Mining ◽

Gc Content ◽

Gene Clusters ◽

Biosynthetic Gene ◽

Biosynthetic Gene Clusters ◽

Hela Cell Lines ◽

Bioactive Secondary Metabolites

Heterocytous cyanobacteria are among the most prolific source of bioactive secondary metabolites, including anabaenopeptins (APTs). A terrestrial filamentous Brasilonema sp. CT11 collected in Costa Rica bamboo forest, as black mat was studied using a multidisciplinary approach: genome mining and HPLC-HRMS/MS coupled with bionformatic analyses. Herein, we report the nearly complete genome consisting 8.79 Mbp with a GC content of 42.4%. Moreover, we report on three novel tryptophane-containing APTs; anabaenopeptin 788 (1), anabaenopeptin 802 (2) and anabaenopeptin 816 (3). Further, the structure of two homologues, i.e., anabaenopeptin 802 (2a) and anabaenopeptin 802 (2b) was determined by spectroscopic analysis (NMR and MS). Both compounds were shown to exert weak to moderate antiproliferative activity against HeLa cell lines. This study also provides the unique and diverse potential of biosynthetic gene clusters and an assessment of the predicted chemical space yet to be discovered from this genus.

Download Full-text

An Update on Molecular Tools for Genetic Engineering of Actinomycetes—The Source of Important Antibiotics and Other Valuable Compounds

Antibiotics ◽

10.3390/antibiotics9080494 ◽

2020 ◽

Vol 9 (8) ◽

pp. 494

Author(s):

Lena Mitousis ◽

Yvonne Thoma ◽

Ewa M. Musiol-Kroll

Keyword(s):

Genetic Engineering ◽

Genome Mining ◽

Gene Clusters ◽

Biosynthetic Gene ◽

Molecular Tools ◽

Biosynthetic Gene Clusters ◽

Streptomyces Antibioticus ◽

The Past ◽

Bioactive Agents ◽

Valuable Compounds

The first antibiotic-producing actinomycete (Streptomyces antibioticus) was described by Waksman and Woodruff in 1940. This discovery initiated the “actinomycetes era”, in which several species were identified and demonstrated to be a great source of bioactive compounds. However, the remarkable group of microorganisms and their potential for the production of bioactive agents were only partially exploited. This is caused by the fact that the growth of many actinomycetes cannot be reproduced on artificial media at laboratory conditions. In addition, sequencing, genome mining and bioactivity screening disclosed that numerous biosynthetic gene clusters (BGCs), encoded in actinomycetes genomes are not expressed and thus, the respective potential products remain uncharacterized. Therefore, a lot of effort was put into the development of technologies that facilitate the access to actinomycetes genomes and activation of their biosynthetic pathways. In this review, we mainly focus on molecular tools and methods for genetic engineering of actinomycetes that have emerged in the field in the past five years (2015–2020). In addition, we highlight examples of successful application of the recently developed technologies in genetic engineering of actinomycetes for activation and/or improvement of the biosynthesis of secondary metabolites.

Download Full-text

Genomic Assemblies of Members of Burkholderia and Related Genera as a Resource for Natural Product Discovery

Microbiology Resource Announcements ◽

10.1128/mra.00485-20 ◽

2020 ◽

Vol 9 (42) ◽

Author(s):

Alex J. Mullins ◽

Cerith Jones ◽

Matthew J. Bull ◽

Gordon Webster ◽

Julian Parkhill ◽

...

Keyword(s):

Natural Product ◽

Genome Mining ◽

Genomic Analysis ◽

Gene Clusters ◽

Biosynthetic Gene ◽

Biosynthetic Gene Clusters ◽

Natural Product Discovery

ABSTRACT The genomes of 450 members of Burkholderiaceae, isolated from clinical and environmental sources, were sequenced and assembled as a resource for genome mining. Genomic analysis of the collection has enabled the identification of multiple metabolites and their biosynthetic gene clusters, including the antibiotics gladiolin, icosalide A, enacyloxin, and cepacin A.

Download Full-text

Genomic analysis of siderophore β-hydroxylases reveals divergent stereocontrol and expands the condensation domain family

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1903161116 ◽

2019 ◽

Vol 116 (40) ◽

pp. 19805-19814 ◽

Cited By ~ 8

Author(s):

Zachary L. Reitz ◽

Clifford D. Hardy ◽

Jaewon Suk ◽

Jean Bouvet ◽

Alison Butler

Keyword(s):

Predictive Power ◽

Genome Mining ◽

Genomic Analysis ◽

Gene Clusters ◽

Biosynthetic Gene Cluster ◽

Biosynthetic Gene ◽

Biosynthetic Gene Clusters ◽

Peptide Synthetase ◽

Condensation Domain

Genome mining of biosynthetic pathways streamlines discovery of secondary metabolites but can leave ambiguities in the predicted structures, which must be rectified experimentally. Through coupling the reactivity predicted by biosynthetic gene clusters with verified structures, the origin of the β-hydroxyaspartic acid diastereomers in siderophores is reported herein. Two functional subtypes of nonheme Fe(II)/α-ketoglutarate–dependent aspartyl β-hydroxylases are identified in siderophore biosynthetic gene clusters, which differ in genomic organization—existing either as fused domains (IβHAsp) at the carboxyl terminus of a nonribosomal peptide synthetase (NRPS) or as stand-alone enzymes (TβHAsp)—and each directs opposite stereoselectivity of Asp β-hydroxylation. The predictive power of this subtype delineation is confirmed by the stereochemical characterization of β-OHAsp residues in pyoverdine GB-1, delftibactin, histicorrugatin, and cupriachelin. The l-threo (2S, 3S) β-OHAsp residues of alterobactin arise from hydroxylation by the β-hydroxylase domain integrated into NRPS AltH, while l-erythro (2S, 3R) β-OHAsp in delftibactin arises from the stand-alone β-hydroxylase DelD. Cupriachelin contains both l-threo and l-erythro β-OHAsp, consistent with the presence of both types of β-hydroxylases in the biosynthetic gene cluster. A third subtype of nonheme Fe(II)/α-ketoglutarate–dependent enzymes (IβHHis) hydroxylates histidyl residues with l-threo stereospecificity. A previously undescribed, noncanonical member of the NRPS condensation domain superfamily is identified, named the interface domain, which is proposed to position the β-hydroxylase and the NRPS-bound amino acid prior to hydroxylation. Through mapping characterized β-OHAsp diastereomers to the phylogenetic tree of siderophore β-hydroxylases, methods to predict β-OHAsp stereochemistry in silico are realized.

Download Full-text