scholarly journals clusterTools: proximity searches for functional elements to identify putative biosynthetic gene clusters

2017 ◽  
Author(s):  
Emmanuel LC de los Santos ◽  
Gregory L. Challis

AbstractMotivation: The low cost of DNA sequencing has accelerated research in natural product biosynthesis allowing us to rapidly link small molecules to the clusters that produce them. However, the large amount of data means that the number of putative biosynthetic gene clusters (BGCs) far exceeds our ability to experimentally characterize them. This necessitates the need for development of further tools to analyze putative BGCs to flag those of interest for further characterization.Results: Clustertools implements a framework to aid in the characterization of putative BGCs. It does this by or-ganizing genomic information on coding sequences in a way that enables directed, hypothesis-driven queries for functional elements in close physical proximity of each other. Genomic sequence databases can be constructed in clusterTools with an interface to the NCBI Genbank and Genomes databases, or from private sequence databases. clusterTools can be used either to identify interesting BGCs from a database of putative BGCs, or on databases of genomic sequences to identify and download regions of interest in the DNA for further processing and annotation in programs such as antiSMASH. We have used clusterTools to identify putative and known biosynthetic gene clus-ters involved in bacterial polyketide alkaoloid and tetronate biosynthesis.Availability and Implementation: Clustertools is implemented in Python and is available via the AGPL. Stand-alone versions of clusterTools are available for Macintosh, Windows, and Linux upon registration (https://goo.gl/forms/QRKTkpqiA0g31IWp1). The source-code is available at https://www.github.com/emzodls/clusterArch.Supplementary information: A manual describing the Python toolkit that powers clusterTools, as well as the HMMs constructed for the tetronate search is available online.

2017 ◽  
Author(s):  
Gong Cheng ◽  
Quan Lu ◽  
Zongshan Zhou ◽  
Ling Ma ◽  
Guocai Zhang ◽  
...  

ABSTRACTMotivationAt present Docker technology has received increasing level of attention throughout the bioinformatics community. However, its implementation details have not yet been mastered by most biologists and applied widely in biological researches. In order to popularizing this technology in the bioinformatics and sufficiently use plenty of public resources of bioinformatics tools (Dockerfile and image of scommunity, officially and privately) in Docker Hub Registry and other Docker sources based on Docker, we introduced full and accurate instance of a bioinformatics workflow based on Docker to analyse and visualize pan-genome and biosynthetic gene clusters of a bacteria in this article, provided the solutions for mining bioinformatics big data from various public biology databases. You could be guided step-by-step through the workflow process from docker file to build up your own images and run an container fast creating an workflow.ResultsWe presented a BGDMdocker (bacterial genome data mining docker-based) workflow based on docker. The workflow consists of three integrated toolkits, Prokka v1.11, panX, and antiSMASH3.0. The dependencies were all written in Dockerfile, to build docker image and run container for analysing pan-genome of total 44 Bacillus amyloliquefaciens strains, which were retrieved from public? database. The pan-genome totally includes 172,432 gene, 2,306 Core gene cluster. The visualized pan-genomic data such as alignment, phylogenetic trees, maps mutations within that cluster to the branches of the tree, infers loss and gain of genes on the core-genome phylogeny for each gene cluster were presented. Besides, 997 known (MIBiG database) and 553 unknown (antiSMASH-predicted clusters and Pfam database) genes of biosynthesis gene clusters types and orthologous groups were mined in all strains. This workflow could also be used for other species pan-genome analysis and visualization. The display of visual data can completely duplicated as well as done in this paper. All result data and relevant tools and files can be downloaded from our website with no need to register. The pan-genome and biosynthetic gene clusters analysis and visualization can be fully reusable immediately in different computing platforms (Linux, Windows, Mac and deployed in the cloud), achieved cross platform deployment flexibility, rapid development integrated software package.Availability and implementationBGDMdocker is available at http://42.96.173.25/bapgd/ and the source code under GPL license is available at https://github.com/cgwyx/debian_prokka_panx_antismash_biodocker.Contactchenggongwyx@foxmail.comSupplementary informationSupplementary data are available at biorxiv online.


mSystems ◽  
2021 ◽  
Author(s):  
Rahim Rajwani ◽  
Shannon I. Ohlemacher ◽  
Gengxiang Zhao ◽  
Hong-Bing Liu ◽  
Carole A. Bewley

Short-read sequencing of GC-rich genomes such as those from actinomycetes results in a fragmented genome assembly and truncated biosynthetic gene clusters (often 10 to >100 kb long), which hinders our ability to understand the biosynthetic potential of a given strain and predict the molecules that can be produced. The current study demonstrates that contiguous DNA assemblies, suitable for analysis of BGCs, can be obtained through low-coverage, multiplexed sequencing on Flongle, which provides a new low-cost workflow ($30 to 40 per strain) for sequencing actinomycete strain libraries.


2019 ◽  
Vol 35 (19) ◽  
pp. 3584-3591 ◽  
Author(s):  
Sherif Farag ◽  
Rachel M Bleich ◽  
Elizabeth A Shank ◽  
Olexandr Isayev ◽  
Albert A Bowers ◽  
...  

Abstract Motivation Non-ribosomal peptide synthetases (NRPSs) are modular enzymatic machines that catalyze the ribosome-independent production of structurally complex small peptides, many of which have important clinical applications as antibiotics, antifungals and anti-cancer agents. Several groups have tried to expand natural product diversity by intermixing different NRPS modules to create synthetic peptides. This approach has not been as successful as anticipated, suggesting that these modules are not fully interchangeable. Results We explored whether Inter-Modular Linkers (IMLs) impact the ability of NRPS modules to communicate during the synthesis of NRPs. We developed a parser to extract 39 804 IMLs from both well annotated and putative NRPS biosynthetic gene clusters from 39 232 bacterial genomes and established the first IMLs database. We analyzed these IMLs and identified a striking relationship between IMLs and the amino acid substrates of their adjacent modules. More than 92% of the identified IMLs connect modules that activate a particular pair of substrates, suggesting that significant specificity is embedded within these sequences. We therefore propose that incorporating the correct IML is critical when attempting combinatorial biosynthesis of novel NRPS. Availability and implementation The IMLs database as well as the NRPS-Parser have been made available on the web at https://nrps-linker.unc.edu. The entire source code of the project is hosted in GitHub repository (https://github.com/SWFarag/nrps-linker). Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Hisayuki Komaki

This study aimed to clarify the taxonomic relationships among Streptomyces costaricanus , Streptomyces graminearus, Streptomyces murinus and Streptomyces phaeogriseichromatogenes . These strains share the same 16S rRNA gene sequence. Multilocus sequence analysis revealed that S. costaricanus , S. murinus and S. phaeogriseichromatogenes belong to the same species, but S. graminearus does not. Digital DNA–DNA relatedness and average nucleotide identity among S. costaricanus, S. murinus and S. phaeogriseichromatogenes were 70.9–74.6% and 96.5–97.0 %, respectively. In addition to the previously reported phenotypic data, the presence of a similar set of secondary metabolite-biosynthetic gene clusters for polyketides and nonribosomal peptides supported the similarity among the three species. Therefore, S. costaricanus and S. phaeogriseichromatogenes should be reclassified as later heterotypic synonyms of S. murinus .


Author(s):  
Patrick Videau ◽  
Kaitlyn Wells ◽  
Arun Singh ◽  
Jessie Eiting ◽  
Philip Proteau ◽  
...  

Cyanobacteria are prolific producers of natural products and genome mining has shown that many orphan biosynthetic gene clusters can be found in sequenced cyanobacterial genomes. New tools and methodologies are required to investigate these biosynthetic gene clusters and here we present the use of <i>Anabaena </i>sp. strain PCC 7120 as a host for combinatorial biosynthesis of natural products using the indolactam natural products (lyngbyatoxin A, pendolmycin, and teleocidin B-4) as a test case. We were able to successfully produce all three compounds using codon optimized genes from Actinobacteria. We also introduce a new plasmid backbone based on the native <i>Anabaena</i>7120 plasmid pCC7120ζ and show that production of teleocidin B-4 can be accomplished using a two-plasmid system, which can be introduced by co-conjugation.


eLife ◽  
2015 ◽  
Vol 4 ◽  
Author(s):  
Zachary Charlop-Powers ◽  
Jeremy G Owen ◽  
Boojala Vijay B Reddy ◽  
Melinda A Ternei ◽  
Denise O Guimarães ◽  
...  

Recent bacterial (meta)genome sequencing efforts suggest the existence of an enormous untapped reservoir of natural-product-encoding biosynthetic gene clusters in the environment. Here we use the pyro-sequencing of PCR amplicons derived from both nonribosomal peptide adenylation domains and polyketide ketosynthase domains to compare biosynthetic diversity in soil microbiomes from around the globe. We see large differences in domain populations from all except the most proximal and biome-similar samples, suggesting that most microbiomes will encode largely distinct collections of bacterial secondary metabolites. Our data indicate a correlation between two factors, geographic distance and biome-type, and the biosynthetic diversity found in soil environments. By assigning reads to known gene clusters we identify hotspots of biomedically relevant biosynthetic diversity. These observations not only provide new insights into the natural world, they also provide a road map for guiding future natural products discovery efforts.


2021 ◽  
Author(s):  
Xuhua Mo ◽  
Tobias A. M. Gulder

Over 30 biosynthetic gene clusters for natural tetramate have been identified. This highlight reviews the biosynthetic strategies for formation of tetramic acid unit for the first time, discussing the individual molecular mechanism in detail.


Sign in / Sign up

Export Citation Format

Share Document