scholarly journals BiG-SLiCE: A Highly Scalable Tool Maps the Diversity of 1.2 Million Biosynthetic Gene Clusters

Author(s):  
Satria A. Kautsar ◽  
Justin J. J. van der Hooft ◽  
Dick de Ridder ◽  
Marnix H. Medema

AbstractBackgroundGenome mining for Biosynthetic Gene Clusters (BGCs) has become an integral part of natural product discovery. The >200,000 microbial genomes now publicly available hold information on abundant novel chemistry. One way to navigate this vast genomic diversity is through comparative analysis of homologous BGCs, which allows identification of cross-species patterns that can be matched to the presence of metabolites or biological activities. However, current tools suffer from a bottleneck caused by the expensive network-based approach used to group these BGCs into Gene Cluster Families (GCFs).ResultsHere, we introduce BiG-SLiCE, a tool designed to cluster massive numbers of BGCs. By representing them in Euclidean space, BiG-SLiCE can group BGCs into GCFs in a non-pairwise, near-linear fashion. We used BiG-SLiCE to analyze 1,225,071 BGCs collected from 209,206 publicly available microbial genomes and metagenome-assembled genomes (MAGs) within ten days on a typical 36-cores CPU server. We demonstrate the utility of such analyses by reconstructing a global map of secondary metabolic diversity across taxonomy to identify uncharted biosynthetic potential. BiG-SLiCE also provides a "query mode" that can efficiently place newly sequenced BGCs into previously computed GCFs, plus a powerful output visualization engine that facilitates user-friendly data exploration.ConclusionsBiG-SLiCE opens up new possibilities to accelerate natural product discovery and offers a first step towards constructing a global, searchable interconnected network of BGCs. As more genomes get sequenced from understudied taxa, more information can be mined to highlight their potentially novel chemistry. BiG-SLiCE is available via https://github.com/medema-group/bigslice.

GigaScience ◽  
2021 ◽  
Vol 10 (1) ◽  
Author(s):  
Satria A Kautsar ◽  
Justin J J van der Hooft ◽  
Dick de Ridder ◽  
Marnix H Medema

Abstract Background Genome mining for biosynthetic gene clusters (BGCs) has become an integral part of natural product discovery. The >200,000 microbial genomes now publicly available hold information on abundant novel chemistry. One way to navigate this vast genomic diversity is through comparative analysis of homologous BGCs, which allows identification of cross-species patterns that can be matched to the presence of metabolites or biological activities. However, current tools are hindered by a bottleneck caused by the expensive network-based approach used to group these BGCs into gene cluster families (GCFs). Results Here, we introduce BiG-SLiCE, a tool designed to cluster massive numbers of BGCs. By representing them in Euclidean space, BiG-SLiCE can group BGCs into GCFs in a non-pairwise, near-linear fashion. We used BiG-SLiCE to analyze 1,225,071 BGCs collected from 209,206 publicly available microbial genomes and metagenome-assembled genomes within 10 days on a typical 36-core CPU server. We demonstrate the utility of such analyses by reconstructing a global map of secondary metabolic diversity across taxonomy to identify uncharted biosynthetic potential. BiG-SLiCE also provides a “query mode” that can efficiently place newly sequenced BGCs into previously computed GCFs, plus a powerful output visualization engine that facilitates user-friendly data exploration. Conclusions BiG-SLiCE opens up new possibilities to accelerate natural product discovery and offers a first step towards constructing a global and searchable interconnected network of BGCs. As more genomes are sequenced from understudied taxa, more information can be mined to highlight their potentially novel chemistry. BiG-SLiCE is available via https://github.com/medema-group/bigslice.


2020 ◽  
Vol 9 (42) ◽  
Author(s):  
Alex J. Mullins ◽  
Cerith Jones ◽  
Matthew J. Bull ◽  
Gordon Webster ◽  
Julian Parkhill ◽  
...  

ABSTRACT The genomes of 450 members of Burkholderiaceae, isolated from clinical and environmental sources, were sequenced and assembled as a resource for genome mining. Genomic analysis of the collection has enabled the identification of multiple metabolites and their biosynthetic gene clusters, including the antibiotics gladiolin, icosalide A, enacyloxin, and cepacin A.


Metabolites ◽  
2021 ◽  
Vol 11 (11) ◽  
pp. 785
Author(s):  
Junyang Wang ◽  
Jens Nielsen ◽  
Zihe Liu

A wide variety of bacteria, fungi and plants can produce bioactive secondary metabolites, which are often referred to as natural products. With the rapid development of DNA sequencing technology and bioinformatics, a large number of putative biosynthetic gene clusters have been reported. However, only a limited number of natural products have been discovered, as most biosynthetic gene clusters are not expressed or are expressed at extremely low levels under conventional laboratory conditions. With the rapid development of synthetic biology, advanced genome mining and engineering strategies have been reported and they provide new opportunities for discovery of natural products. This review discusses advances in recent years that can accelerate the design, build, test, and learn (DBTL) cycle of natural product discovery, and prospects trends and key challenges for future research directions.


2020 ◽  
Vol 49 (D1) ◽  
pp. D639-D643 ◽  
Author(s):  
Kai Blin ◽  
Simon Shaw ◽  
Satria A Kautsar ◽  
Marnix H Medema ◽  
Tilmann Weber

Abstract Microorganisms produce natural products that are frequently used in the development of antibacterial, antiviral, and anticancer drugs, pesticides, herbicides, or fungicides. In recent years, genome mining has evolved into a prominent method to access this potential. antiSMASH is one of the most popular tools for this task. Here, we present version 3 of the antiSMASH database, providing a means to access and query precomputed antiSMASH-5.2-detected biosynthetic gene clusters from representative, publicly available, high-quality microbial genomes via an interactive graphical user interface. In version 3, the database contains 147 517 high quality BGC regions from 388 archaeal, 25 236 bacterial and 177 fungal genomes and is available at https://antismash-db.secondarymetabolites.org/.


mSystems ◽  
2020 ◽  
Vol 5 (6) ◽  
Author(s):  
Nicholas D. Youngblut ◽  
Jacobo de la Cuesta-Zuluaga ◽  
Georg H. Reischer ◽  
Silke Dauser ◽  
Nathalie Schuster ◽  
...  

ABSTRACT Large-scale metagenome assemblies of human microbiomes have produced a vast catalogue of previously unseen microbial genomes; however, comparatively few microbial genomes derive from other vertebrates. Here, we generated 5,596 metagenome-assembled genomes (MAGs) from the gut metagenomes of 180 predominantly wild animal species representing 5 classes, in addition to 14 existing animal gut metagenome data sets. The MAGs comprised 1,522 species-level genome bins (SGBs), most of which were novel at the species, genus, or family level, and the majority were enriched in host versus environment metagenomes. Many traits distinguished SGBs enriched in host or environmental biomes, including the number of antimicrobial resistance genes. We identified 1,986 diverse biosynthetic gene clusters; only 23 clustered with any MIBiG database references. Gene-based assembly revealed tremendous gene diversity, much of it host or environment specific. Our MAG and gene data sets greatly expand the microbial genome repertoire and provide a broad view of microbial adaptations to the vertebrate gut. IMPORTANCE Microbiome studies on a select few mammalian species (e.g., humans, mice, and cattle) have revealed a great deal of novel genomic diversity in the gut microbiome. However, little is known of the microbial diversity in the gut of other vertebrates. We studied the gut microbiomes of a large set of mostly wild animal species consisting of mammals, birds, reptiles, amphibians, and fish. Unfortunately, we found that existing reference databases commonly used for metagenomic analyses failed to capture the microbiome diversity among vertebrates. To increase database representation, we applied advanced metagenome assembly methods to our animal gut data and to many public gut metagenome data sets that had not been used to obtain microbial genomes. Our resulting genome and gene cluster collections comprised a great deal of novel taxonomic and genomic diversity, which we extensively characterized. Our findings substantially expand what is known of microbial genomic diversity in the vertebrate gut.


2016 ◽  
Vol 33 (8) ◽  
pp. 1006-1019 ◽  
Author(s):  
Maksym Myronovskyi ◽  
Andriy Luzhetskyy

Transcriptional activation of biosynthetic gene clusters.


2021 ◽  
Author(s):  
Alicia H Russell ◽  
Natalia Miguel Vior ◽  
Edward Steven Hems ◽  
Rodney Lacret ◽  
Andrew William Truman

Ribosomally synthesised and post-translationally modified peptides (RiPPs) are a structurally diverse class of natural product with a wide range of bioactivities. Genome mining for RiPP biosynthetic gene clusters (BGCs) is...


2020 ◽  
Author(s):  
Alicia H. Russell ◽  
Natalia M. Vior ◽  
Edward S. Hems ◽  
Rodney Lacret ◽  
Andrew W. Truman

ABSTRACTRibosomally synthesised and post-translationally modified peptides (RiPPs) are a structurally diverse class of natural product with a range of bioactivities. Genome mining for RiPP biosynthetic gene clusters (BGCs) is often hampered by poor detection of the short precursor peptides that are ultimately modified into the final molecule. Here, we utilise a previously described genome mining tool, RiPPER, to identify novel RiPP precursor peptides near YcaO-domain proteins, enzymes that catalyse various RiPP post-translational modifications including heterocyclisation and thioamidation. Using this dataset, we identified a novel, diverse and highly conserved family of RiPP BGCs spanning over 230 species of Actinobacteria and Firmicutes. A representative BGC from Streptomyces albus J1074 was characterised, leading to the discovery of streptamidine, a novel-amidine containing RiPP. This highlights the breadth of unexplored natural products with structurally rare features, even in model organisms.


2013 ◽  
Vol 41 (6) ◽  
pp. 1355-1364 ◽  
Author(s):  
Mervyn J. Bibb

Actinomycetes are prolific producers of natural products with a wide range of biological activities. Many of the compounds that they make (and derivatives thereof) are used extensively in medicine, most notably as clinically important antibiotics, and in agriculture. Moreover, these organisms remain a source of novel and potentially useful molecules, but maximizing their biosynthetic potential requires a better understanding of natural product biosynthesis. Recent developments in genome sequencing have greatly facilitated the identification of natural product biosynthetic gene clusters. In the present article, I summarize the recent contributions of our laboratory in applying genomic technologies to better understand and manipulate natural product biosynthesis in a range of different actinomycetes.


Sign in / Sign up

Export Citation Format

Share Document