BiG-SLiCE: A Highly Scalable Tool Maps the Diversity of 1.2 Million Biosynthetic Gene Clusters

AbstractBackgroundGenome mining for Biosynthetic Gene Clusters (BGCs) has become an integral part of natural product discovery. The >200,000 microbial genomes now publicly available hold information on abundant novel chemistry. One way to navigate this vast genomic diversity is through comparative analysis of homologous BGCs, which allows identification of cross-species patterns that can be matched to the presence of metabolites or biological activities. However, current tools suffer from a bottleneck caused by the expensive network-based approach used to group these BGCs into Gene Cluster Families (GCFs).ResultsHere, we introduce BiG-SLiCE, a tool designed to cluster massive numbers of BGCs. By representing them in Euclidean space, BiG-SLiCE can group BGCs into GCFs in a non-pairwise, near-linear fashion. We used BiG-SLiCE to analyze 1,225,071 BGCs collected from 209,206 publicly available microbial genomes and metagenome-assembled genomes (MAGs) within ten days on a typical 36-cores CPU server. We demonstrate the utility of such analyses by reconstructing a global map of secondary metabolic diversity across taxonomy to identify uncharted biosynthetic potential. BiG-SLiCE also provides a "query mode" that can efficiently place newly sequenced BGCs into previously computed GCFs, plus a powerful output visualization engine that facilitates user-friendly data exploration.ConclusionsBiG-SLiCE opens up new possibilities to accelerate natural product discovery and offers a first step towards constructing a global, searchable interconnected network of BGCs. As more genomes get sequenced from understudied taxa, more information can be mined to highlight their potentially novel chemistry. BiG-SLiCE is available via https://github.com/medema-group/bigslice.

Download Full-text

BiG-SLiCE: A highly scalable tool maps the diversity of 1.2 million biosynthetic gene clusters

GigaScience ◽

10.1093/gigascience/giaa154 ◽

2021 ◽

Vol 10 (1) ◽

Cited By ~ 1

Author(s):

Satria A Kautsar ◽

Justin J J van der Hooft ◽

Dick de Ridder ◽

Marnix H Medema

Keyword(s):

Natural Product ◽

Biological Activities ◽

Genome Mining ◽

Gene Clusters ◽

Genomic Diversity ◽

Biosynthetic Gene ◽

Biosynthetic Gene Clusters ◽

Microbial Genomes ◽

Natural Product Discovery ◽

User Friendly

Abstract Background Genome mining for biosynthetic gene clusters (BGCs) has become an integral part of natural product discovery. The >200,000 microbial genomes now publicly available hold information on abundant novel chemistry. One way to navigate this vast genomic diversity is through comparative analysis of homologous BGCs, which allows identification of cross-species patterns that can be matched to the presence of metabolites or biological activities. However, current tools are hindered by a bottleneck caused by the expensive network-based approach used to group these BGCs into gene cluster families (GCFs). Results Here, we introduce BiG-SLiCE, a tool designed to cluster massive numbers of BGCs. By representing them in Euclidean space, BiG-SLiCE can group BGCs into GCFs in a non-pairwise, near-linear fashion. We used BiG-SLiCE to analyze 1,225,071 BGCs collected from 209,206 publicly available microbial genomes and metagenome-assembled genomes within 10 days on a typical 36-core CPU server. We demonstrate the utility of such analyses by reconstructing a global map of secondary metabolic diversity across taxonomy to identify uncharted biosynthetic potential. BiG-SLiCE also provides a “query mode” that can efficiently place newly sequenced BGCs into previously computed GCFs, plus a powerful output visualization engine that facilitates user-friendly data exploration. Conclusions BiG-SLiCE opens up new possibilities to accelerate natural product discovery and offers a first step towards constructing a global and searchable interconnected network of BGCs. As more genomes are sequenced from understudied taxa, more information can be mined to highlight their potentially novel chemistry. BiG-SLiCE is available via https://github.com/medema-group/bigslice.

Download Full-text

Genomic Assemblies of Members of Burkholderia and Related Genera as a Resource for Natural Product Discovery

Microbiology Resource Announcements ◽

10.1128/mra.00485-20 ◽

2020 ◽

Vol 9 (42) ◽

Author(s):

Alex J. Mullins ◽

Cerith Jones ◽

Matthew J. Bull ◽

Gordon Webster ◽

Julian Parkhill ◽

...

Keyword(s):

Natural Product ◽

Genome Mining ◽

Genomic Analysis ◽

Gene Clusters ◽

Biosynthetic Gene ◽

Biosynthetic Gene Clusters ◽

Natural Product Discovery

ABSTRACT The genomes of 450 members of Burkholderiaceae, isolated from clinical and environmental sources, were sequenced and assembled as a resource for genome mining. Genomic analysis of the collection has enabled the identification of multiple metabolites and their biosynthetic gene clusters, including the antibiotics gladiolin, icosalide A, enacyloxin, and cepacin A.

Download Full-text

Synthetic Biology Advanced Natural Product Discovery

Metabolites ◽

10.3390/metabo11110785 ◽

2021 ◽

Vol 11 (11) ◽

pp. 785

Author(s):

Junyang Wang ◽

Jens Nielsen ◽

Zihe Liu

Keyword(s):

Natural Products ◽

Synthetic Biology ◽

Natural Product ◽

Rapid Development ◽

Genome Mining ◽

Gene Clusters ◽

Future Research ◽

Biosynthetic Gene ◽

Biosynthetic Gene Clusters ◽

Natural Product Discovery

A wide variety of bacteria, fungi and plants can produce bioactive secondary metabolites, which are often referred to as natural products. With the rapid development of DNA sequencing technology and bioinformatics, a large number of putative biosynthetic gene clusters have been reported. However, only a limited number of natural products have been discovered, as most biosynthetic gene clusters are not expressed or are expressed at extremely low levels under conventional laboratory conditions. With the rapid development of synthetic biology, advanced genome mining and engineering strategies have been reported and they provide new opportunities for discovery of natural products. This review discusses advances in recent years that can accelerate the design, build, test, and learn (DBTL) cycle of natural product discovery, and prospects trends and key challenges for future research directions.

Download Full-text

The antiSMASH database version 3: increased taxonomic coverage and new query features for modular enzymes

Nucleic Acids Research ◽

10.1093/nar/gkaa978 ◽

2020 ◽

Vol 49 (D1) ◽

pp. D639-D643 ◽

Cited By ~ 1

Author(s):

Kai Blin ◽

Simon Shaw ◽

Satria A Kautsar ◽

Marnix H Medema ◽

Tilmann Weber

Keyword(s):

User Interface ◽

Graphical User Interface ◽

Genome Mining ◽

Gene Clusters ◽

Biosynthetic Gene ◽

Biosynthetic Gene Clusters ◽

High Quality ◽

Microbial Genomes ◽

Fungal Genomes ◽

Interactive Graphical User Interface

Abstract Microorganisms produce natural products that are frequently used in the development of antibacterial, antiviral, and anticancer drugs, pesticides, herbicides, or fungicides. In recent years, genome mining has evolved into a prominent method to access this potential. antiSMASH is one of the most popular tools for this task. Here, we present version 3 of the antiSMASH database, providing a means to access and query precomputed antiSMASH-5.2-detected biosynthetic gene clusters from representative, publicly available, high-quality microbial genomes via an interactive graphical user interface. In version 3, the database contains 147 517 high quality BGC regions from 388 archaeal, 25 236 bacterial and 177 fungal genomes and is available at https://antismash-db.secondarymetabolites.org/.

Download Full-text

Large-Scale Metagenome Assembly Reveals Novel Animal-Associated Microbial Genomes, Biosynthetic Gene Clusters, and Other Genetic Diversity

mSystems ◽

10.1128/msystems.01045-20 ◽

2020 ◽

Vol 5 (6) ◽

Author(s):

Nicholas D. Youngblut ◽

Jacobo de la Cuesta-Zuluaga ◽

Georg H. Reischer ◽

Silke Dauser ◽

Nathalie Schuster ◽

...

Keyword(s):

Large Scale ◽

Animal Species ◽

Gene Clusters ◽

Genomic Diversity ◽

Data Sets ◽

Biosynthetic Gene ◽

Biosynthetic Gene Clusters ◽

Microbial Genomes ◽

Metagenome Assembly ◽

Gut Metagenome

ABSTRACT Large-scale metagenome assemblies of human microbiomes have produced a vast catalogue of previously unseen microbial genomes; however, comparatively few microbial genomes derive from other vertebrates. Here, we generated 5,596 metagenome-assembled genomes (MAGs) from the gut metagenomes of 180 predominantly wild animal species representing 5 classes, in addition to 14 existing animal gut metagenome data sets. The MAGs comprised 1,522 species-level genome bins (SGBs), most of which were novel at the species, genus, or family level, and the majority were enriched in host versus environment metagenomes. Many traits distinguished SGBs enriched in host or environmental biomes, including the number of antimicrobial resistance genes. We identified 1,986 diverse biosynthetic gene clusters; only 23 clustered with any MIBiG database references. Gene-based assembly revealed tremendous gene diversity, much of it host or environment specific. Our MAG and gene data sets greatly expand the microbial genome repertoire and provide a broad view of microbial adaptations to the vertebrate gut. IMPORTANCE Microbiome studies on a select few mammalian species (e.g., humans, mice, and cattle) have revealed a great deal of novel genomic diversity in the gut microbiome. However, little is known of the microbial diversity in the gut of other vertebrates. We studied the gut microbiomes of a large set of mostly wild animal species consisting of mammals, birds, reptiles, amphibians, and fish. Unfortunately, we found that existing reference databases commonly used for metagenomic analyses failed to capture the microbiome diversity among vertebrates. To increase database representation, we applied advanced metagenome assembly methods to our animal gut data and to many public gut metagenome data sets that had not been used to obtain microbial genomes. Our resulting genome and gene cluster collections comprised a great deal of novel taxonomic and genomic diversity, which we extensively characterized. Our findings substantially expand what is known of microbial genomic diversity in the vertebrate gut.

Download Full-text

Native and engineered promoters in natural product discovery

Natural Product Reports ◽

10.1039/c6np00002a ◽

2016 ◽

Vol 33 (8) ◽

pp. 1006-1019 ◽

Cited By ~ 45

Author(s):

Maksym Myronovskyi ◽

Andriy Luzhetskyy

Keyword(s):

Natural Product ◽

Transcriptional Activation ◽

Gene Clusters ◽

Biosynthetic Gene ◽

Biosynthetic Gene Clusters ◽

Natural Product Discovery

Transcriptional activation of biosynthetic gene clusters.

Download Full-text

Discovery and characterisation of an amidine-containing ribosomally-synthesised peptide that is widely distributed in nature

Chemical Science ◽

10.1039/d1sc01456k ◽

2021 ◽

Author(s):

Alicia H Russell ◽

Natalia Miguel Vior ◽

Edward Steven Hems ◽

Rodney Lacret ◽

Andrew William Truman

Keyword(s):

Natural Product ◽

Genome Mining ◽

Gene Clusters ◽

Biosynthetic Gene ◽

Biosynthetic Gene Clusters ◽

Wide Range ◽

Modified Peptides

Ribosomally synthesised and post-translationally modified peptides (RiPPs) are a structurally diverse class of natural product with a wide range of bioactivities. Genome mining for RiPP biosynthetic gene clusters (BGCs) is...

Download Full-text

Expression of fungal biosynthetic gene clusters in S. cerevisiae for natural product discovery

Synthetic and Systems Biotechnology ◽

10.1016/j.synbio.2021.01.003 ◽

2021 ◽

Vol 6 (1) ◽

pp. 20-22

Author(s):

Zihe Liu ◽

Zhenquan Lin ◽

Jens Nielsen

Keyword(s):

Natural Product ◽

Gene Clusters ◽

Biosynthetic Gene ◽

Biosynthetic Gene Clusters ◽

Natural Product Discovery

Download Full-text

Discovery and characterisation of an amidine-containing ribosomally-synthesised peptide that is widely distributed in nature

10.1101/2020.05.04.076059 ◽

2020 ◽

Author(s):

Alicia H. Russell ◽

Natalia M. Vior ◽

Edward S. Hems ◽

Rodney Lacret ◽

Andrew W. Truman

Keyword(s):

Natural Product ◽

Genome Mining ◽

Gene Clusters ◽

Model Organisms ◽

Biosynthetic Gene ◽

Biosynthetic Gene Clusters ◽

Post Translational Modifications ◽

Streptomyces Albus ◽

Modified Peptides ◽

Mining Tool

ABSTRACTRibosomally synthesised and post-translationally modified peptides (RiPPs) are a structurally diverse class of natural product with a range of bioactivities. Genome mining for RiPP biosynthetic gene clusters (BGCs) is often hampered by poor detection of the short precursor peptides that are ultimately modified into the final molecule. Here, we utilise a previously described genome mining tool, RiPPER, to identify novel RiPP precursor peptides near YcaO-domain proteins, enzymes that catalyse various RiPP post-translational modifications including heterocyclisation and thioamidation. Using this dataset, we identified a novel, diverse and highly conserved family of RiPP BGCs spanning over 230 species of Actinobacteria and Firmicutes. A representative BGC from Streptomyces albus J1074 was characterised, leading to the discovery of streptamidine, a novel-amidine containing RiPP. This highlights the breadth of unexplored natural products with structurally rare features, even in model organisms.

Download Full-text

Understanding and manipulating antibiotic production in actinomycetes

Biochemical Society Transactions ◽

10.1042/bst20130214 ◽

2013 ◽

Vol 41 (6) ◽

pp. 1355-1364 ◽

Cited By ~ 29

Author(s):

Mervyn J. Bibb

Keyword(s):

Natural Product ◽

Biological Activities ◽

Antibiotic Production ◽

Gene Clusters ◽

Biosynthetic Gene ◽

Biosynthetic Gene Clusters ◽

Natural Product Biosynthesis ◽

Genomic Technologies ◽

Wide Range ◽

Recent Developments

Actinomycetes are prolific producers of natural products with a wide range of biological activities. Many of the compounds that they make (and derivatives thereof) are used extensively in medicine, most notably as clinically important antibiotics, and in agriculture. Moreover, these organisms remain a source of novel and potentially useful molecules, but maximizing their biosynthetic potential requires a better understanding of natural product biosynthesis. Recent developments in genome sequencing have greatly facilitated the identification of natural product biosynthetic gene clusters. In the present article, I summarize the recent contributions of our laboratory in applying genomic technologies to better understand and manipulate natural product biosynthesis in a range of different actinomycetes.

Download Full-text