scholarly journals BGDMdocker: a Docker workflow for analysis and visualization pan-genome and biosynthetic gene clusters of bacterial

2017 ◽  
Author(s):  
Gong Cheng ◽  
Quan Lu ◽  
Zongshan Zhou ◽  
Ling Ma ◽  
Guocai Zhang ◽  
...  

ABSTRACTMotivationAt present Docker technology has received increasing level of attention throughout the bioinformatics community. However, its implementation details have not yet been mastered by most biologists and applied widely in biological researches. In order to popularizing this technology in the bioinformatics and sufficiently use plenty of public resources of bioinformatics tools (Dockerfile and image of scommunity, officially and privately) in Docker Hub Registry and other Docker sources based on Docker, we introduced full and accurate instance of a bioinformatics workflow based on Docker to analyse and visualize pan-genome and biosynthetic gene clusters of a bacteria in this article, provided the solutions for mining bioinformatics big data from various public biology databases. You could be guided step-by-step through the workflow process from docker file to build up your own images and run an container fast creating an workflow.ResultsWe presented a BGDMdocker (bacterial genome data mining docker-based) workflow based on docker. The workflow consists of three integrated toolkits, Prokka v1.11, panX, and antiSMASH3.0. The dependencies were all written in Dockerfile, to build docker image and run container for analysing pan-genome of total 44 Bacillus amyloliquefaciens strains, which were retrieved from public? database. The pan-genome totally includes 172,432 gene, 2,306 Core gene cluster. The visualized pan-genomic data such as alignment, phylogenetic trees, maps mutations within that cluster to the branches of the tree, infers loss and gain of genes on the core-genome phylogeny for each gene cluster were presented. Besides, 997 known (MIBiG database) and 553 unknown (antiSMASH-predicted clusters and Pfam database) genes of biosynthesis gene clusters types and orthologous groups were mined in all strains. This workflow could also be used for other species pan-genome analysis and visualization. The display of visual data can completely duplicated as well as done in this paper. All result data and relevant tools and files can be downloaded from our website with no need to register. The pan-genome and biosynthetic gene clusters analysis and visualization can be fully reusable immediately in different computing platforms (Linux, Windows, Mac and deployed in the cloud), achieved cross platform deployment flexibility, rapid development integrated software package.Availability and implementationBGDMdocker is available at http://42.96.173.25/bapgd/ and the source code under GPL license is available at https://github.com/cgwyx/debian_prokka_panx_antismash_biodocker.Contactchenggongwyx@foxmail.comSupplementary informationSupplementary data are available at biorxiv online.

2017 ◽  
Author(s):  
Emmanuel LC de los Santos ◽  
Gregory L. Challis

AbstractMotivation: The low cost of DNA sequencing has accelerated research in natural product biosynthesis allowing us to rapidly link small molecules to the clusters that produce them. However, the large amount of data means that the number of putative biosynthetic gene clusters (BGCs) far exceeds our ability to experimentally characterize them. This necessitates the need for development of further tools to analyze putative BGCs to flag those of interest for further characterization.Results: Clustertools implements a framework to aid in the characterization of putative BGCs. It does this by or-ganizing genomic information on coding sequences in a way that enables directed, hypothesis-driven queries for functional elements in close physical proximity of each other. Genomic sequence databases can be constructed in clusterTools with an interface to the NCBI Genbank and Genomes databases, or from private sequence databases. clusterTools can be used either to identify interesting BGCs from a database of putative BGCs, or on databases of genomic sequences to identify and download regions of interest in the DNA for further processing and annotation in programs such as antiSMASH. We have used clusterTools to identify putative and known biosynthetic gene clus-ters involved in bacterial polyketide alkaoloid and tetronate biosynthesis.Availability and Implementation: Clustertools is implemented in Python and is available via the AGPL. Stand-alone versions of clusterTools are available for Macintosh, Windows, and Linux upon registration (https://goo.gl/forms/QRKTkpqiA0g31IWp1). The source-code is available at https://www.github.com/emzodls/clusterArch.Supplementary information: A manual describing the Python toolkit that powers clusterTools, as well as the HMMs constructed for the tetronate search is available online.


Life ◽  
2021 ◽  
Vol 11 (8) ◽  
pp. 758
Author(s):  
Xiaohe Jin ◽  
Yunlong Zhang ◽  
Ran Zhang ◽  
Kathy-Uyen Nguyen ◽  
Jonathan S. Lindsey ◽  
...  

Tolyporphins A–R are unusual tetrapyrrole macrocycles produced by the non-axenic filamentous cyanobacterium HT-58-2. A putative biosynthetic gene cluster for biosynthesis of tolyporphins (here termed BGC-1) was previously identified in the genome of HT-58-2. Here, homology searching of BGC-1 in HT-58-2 led to identification of similar BGCs in seven other filamentous cyanobacteria, including strains Nostoc sp. 106C, Nostoc sp. RF31YmG, Nostoc sp. FACHB-892, Brasilonema octagenarum UFV-OR1, Brasilonema octagenarum UFV-E1, Brasilonema sennae CENA114 and Oculatella sp. LEGE 06141, suggesting their potential for tolyporphins production. A similar gene cluster (BGC-2) also was identified unexpectedly in HT-58-2. Tolyporphins BGCs were not identified in unicellular cyanobacteria. Phylogenetic analysis based on 16S rRNA and a common component of the BGCs, TolD, points to a close evolutionary history between each strain and their respective tolyporphins BGC. Though identified with putative tolyporphins BGCs, examination of pigments extracted from three cyanobacteria has not revealed the presence of tolyporphins. Overall, the identification of BGCs and potential producers of tolyporphins presents a collection of candidate cyanobacteria for genetic and biochemical analysis pertaining to these unusual tetrapyrrole macrocycles.


2021 ◽  
Vol 12 ◽  
Author(s):  
Carlos Caicedo-Montoya ◽  
Monserrat Manzo-Ruiz ◽  
Rigoberto Ríos-Estepa

Species of the genus Streptomyces are known for their ability to produce multiple secondary metabolites; their genomes have been extensively explored to discover new bioactive compounds. The richness of genomic data currently available allows filtering for high quality genomes, which in turn permits reliable comparative genomics studies and an improved prediction of biosynthetic gene clusters (BGCs) through genome mining approaches. In this work, we used 121 genome sequences of the genus Streptomyces in a comparative genomics study with the aim of estimating the genomic diversity by protein domains content, sequence similarity of proteins and conservation of Intergenic Regions (IGRs). We also searched for BGCs but prioritizing those with potential antibiotic activity. Our analysis revealed that the pan-genome of the genus Streptomyces is clearly open, with a high quantity of unique gene families across the different species and that the IGRs are rarely conserved. We also described the phylogenetic relationships of the analyzed genomes using multiple markers, obtaining a trustworthy tree whose relationships were further validated by Average Nucleotide Identity (ANI) calculations. Finally, 33 biosynthetic gene clusters were detected to have potential antibiotic activity and a predicted mode of action, which might serve up as a guide to formulation of related experimental studies.


2007 ◽  
Vol 52 (2) ◽  
pp. 574-585 ◽  
Author(s):  
Xiujun Zhang ◽  
Lawrence B. Alemany ◽  
Hans-Peter Fiedler ◽  
Michael Goodfellow ◽  
Ronald J. Parry

ABSTRACT The antibiotics lactonamycin and lactonamycin Z provide attractive leads for antibacterial drug development. Both antibiotics contain a novel aglycone core called lactonamycinone. To gain insight into lactonamycinone biosynthesis, cloning and precursor incorporation experiments were undertaken. The lactonamycin gene cluster was initially cloned from Streptomyces rishiriensis. Sequencing of ca. 61 kb of S. rishiriensis DNA revealed the presence of 57 open reading frames. These included genes coding for the biosynthesis of l-rhodinose, the sugar found in lactonamycin, and genes similar to those in the tetracenomycin biosynthetic gene cluster. Since lactonamycin production by S. rishiriensis could not be sustained, additional proof for the identity of the S. rishiriensis cluster was obtained by cloning the lactonamycin Z gene cluster from Streptomyces sanglieri. Partial sequencing of the S. sanglieri cluster revealed 15 genes that exhibited a very high degree of similarity to genes within the lactonamycin cluster, as well as an identical organization. Double-crossover disruption of one gene in the S. sanglieri cluster abolished lactonamycin Z production, and production was restored by complementation. These results confirm the identity of the genetic locus cloned from S. sanglieri and indicate that the highly similar locus in S. rishiriensis encodes lactonamycin biosynthetic genes. Precursor incorporation experiments with S. sanglieri revealed that lactonamycinone is biosynthesized in an unusual manner whereby glycine or a glycine derivative serves as a starter unit that is extended by nine acetate units. Analysis of the gene clusters and of the precursor incorporation data suggested a hypothetical scheme for lactonamycinone biosynthesis.


2009 ◽  
Vol 76 (1) ◽  
pp. 283-293 ◽  
Author(s):  
Hanne Jørgensen ◽  
Kristin F. Degnes ◽  
Alexander Dikiy ◽  
Espen Fjærvik ◽  
Geir Klinkenberg ◽  
...  

ABSTRACT A new compound, designated ML-449, structurally similar to the known 20-membered macrolactam BE-14106, was isolated from a marine sediment-derived Streptomyces sp. Cloning and sequencing of the 83-kb ML-449 biosynthetic gene cluster revealed its high level of similarity to the BE-14106 gene cluster. Comparison of the respective biosynthetic pathways indicated that the difference in the compounds' structures stems from the incorporation of one extra acetate unit during the synthesis of the acyl side chain. A phylogenetic analysis of the β-ketosynthase (KS) domains from polyketide synthases involved in the biosynthesis of macrolactams pointed to a common ancestry for the two clusters. Furthermore, the analysis demonstrated the formation of a macrolactam-specific subclade for the majority of the KS domains from several macrolactam-biosynthetic gene clusters, indicating a closer relationship between macrolactam clusters than with the macrolactone clusters included in the analysis. Some KS domains from the ML-449, BE-14106, and salinilactam gene clusters did, however, show a closer relationship with KS domains from the polyene macrolide clusters, suggesting potential acquisition rather than duplication of certain PKS genes. Comparison of the ML-449, BE-14106, vicenistatin, and salinilactam biosynthetic gene clusters indicated an evolutionary relationship between them and provided new insights into the processes governing the evolution of small-ring macrolactam biosynthesis.


2014 ◽  
Vol 80 (16) ◽  
pp. 5028-5036 ◽  
Author(s):  
Kiyoko T. Miyamoto ◽  
Mamoru Komatsu ◽  
Haruo Ikeda

ABSTRACTMycosporines and mycosporine-like amino acids (MAAs), including shinorine (mycosporine-glycine-serine) and porphyra-334 (mycosporine-glycine-threonine), are UV-absorbing compounds produced by cyanobacteria, fungi, and marine micro- and macroalgae. These MAAs have the ability to protect these organisms from damage by environmental UV radiation. Although no reports have described the production of MAAs and the corresponding genes involved in MAA biosynthesis from Gram-positive bacteria to date, genome mining of the Gram-positive bacterial database revealed that two microorganisms belonging to the orderActinomycetales,Actinosynnema mirumDSM 43827 andPseudonocardiasp. strain P1, possess a gene cluster homologous to the biosynthetic gene clusters identified from cyanobacteria. When the two strains were grown in liquid culture,Pseudonocardiasp. accumulated a very small amount of MAA-like compound in a medium-dependent manner, whereasA. mirumdid not produce MAAs under any culture conditions, indicating that the biosynthetic gene cluster ofA. mirumwas in a cryptic state in this microorganism. In order to characterize these biosynthetic gene clusters, each biosynthetic gene cluster was heterologously expressed in an engineered host,Streptomyces avermitilisSUKA22. Since the resultant transformants carrying the entire biosynthetic gene cluster controlled by an alternative promoter produced mainly shinorine, this is the first confirmation of a biosynthetic gene cluster for MAA from Gram-positive bacteria. Furthermore,S. avermitilisSUKA22 transformants carrying the biosynthetic gene cluster for MAA ofA. mirumaccumulated not only shinorine and porphyra-334 but also a novel MAA. Structure elucidation revealed that the novel MAA is mycosporine-glycine-alanine, which substitutesl-alanine for thel-serine of shinorine.


Biology ◽  
2020 ◽  
Vol 9 (12) ◽  
pp. 482
Author(s):  
Catarina Marques-Pereira ◽  
Diogo Neves Proença ◽  
Paula V. Morais

Serratia strains are ubiquitous microorganisms with the ability to produce serratomolides, such as serrawettins. These extracellular lipopeptides are described as biocides against many bacteria and fungi and may have a nematicidal activity against phytopathogenic nematodes. Serrawettins W1 and W2 from different strains have different structures that might be correlated with distinct genomic organizations. This work used comparative genomics to determine the distribution and the organization of the serrawettins biosynthetic gene clusters in all the 84 publicly available genomes of the Serratia genus. The serrawettin W1 and W2 gene clusters’ organization was established using antiSMASH software and compared with single and short data previously described for YD25TSerratia. Here, the serrawettin W1 gene clusters’ organization is reported for the first time. The serrawettin W1 biosynthetic gene swrW was present in 17 Serratia genomes. Eighty different coding sequence (CDS) were assigned to the W1 gene cluster, 13 being common to all clusters. The serrawettin W2 swrA gene was present in 11 Serratia genomes. The W2 gene clusters included 68 CDS with 24 present in all the clusters. The genomic analysis showed the swrA gene constitutes five modules, four with three domains and one with four domains, while the swrW gene constitutes one module with four domains. This work identified four genes common to all serrawettin gene clusters, highlighting their essential potential in the serrawettins biosynthetic process.


2018 ◽  
Vol 85 (4) ◽  
Author(s):  
Jan Mareš ◽  
Jan Hájek ◽  
Petra Urajová ◽  
Andreja Kust ◽  
Jouni Jokela ◽  
...  

ABSTRACT Puwainaphycins (PUWs) and minutissamides (MINs) are structurally analogous cyclic lipopeptides possessing cytotoxic activity. Both types of compound exhibit high structural variability, particularly in the fatty acid (FA) moiety. Although a biosynthetic gene cluster responsible for synthesis of several PUW variants has been proposed in a cyanobacterial strain, the genetic background for MINs remains unexplored. Herein, we report PUW/MIN biosynthetic gene clusters and structural variants from six cyanobacterial strains. Comparison of biosynthetic gene clusters indicates a common origin of the PUW/MIN hybrid nonribosomal peptide synthetase and polyketide synthase. Surprisingly, the biosynthetic gene clusters encode two alternative biosynthetic starter modules, and analysis of structural variants suggests that initiation by each of the starter modules results in lipopeptides of differing lengths and FA substitutions. Among additional modifications of the FA chain, chlorination of minutissamide D was explained by the presence of a putative halogenase gene in the PUW/MIN gene cluster of Anabaena minutissima strain UTEX B 1613. We detected PUW variants bearing an acetyl substitution in Symplocastrum muelleri strain NIVA-CYA 644, consistent with an O-acetyltransferase gene in its biosynthetic gene cluster. The major lipopeptide variants did not exhibit any significant antibacterial activity, and only the PUW F variant was moderately active against yeast, consistent with previously published data suggesting that PUWs/MINs interact preferentially with eukaryotic plasma membranes. IMPORTANCE Herein, we deciphered the most important biosynthetic traits of a prominent group of bioactive lipopeptides. We reveal evidence for initiation of biosynthesis by two alternative starter units hardwired directly in the same gene cluster, eventually resulting in the production of a remarkable range of lipopeptide variants. We identified several unusual tailoring genes potentially involved in modifying the fatty acid chain. Careful characterization of these biosynthetic gene clusters and their diverse products could provide important insight into lipopeptide biosynthesis in prokaryotes. Some of the variants identified exhibit cytotoxic and antifungal properties, and some are associated with a toxigenic biofilm-forming strain. The findings may prove valuable to researchers in the fields of natural product discovery and toxicology.


2019 ◽  
Author(s):  
Jintao Cheng ◽  
Fei Cao ◽  
Xinai Chen ◽  
Yongquan Li ◽  
Xuming Mao

Abstract Endophytic fungi can produce many active secondary metabolites, which are important resources of natural medicines. However, there is currently little understanding of endophytic fungi at the omics levels. Calcarisporium arbuscula , an endophytic fungus from the healthy fruit of russulaceae, can produce a variety of secondary metabolites with anti-cancer, anti-nematode and antibiotic bioactivities. Comprehensive survey of the endophytic fungi genome and transcriptome will help to understand their capacity to biosynthesize secondary metabolites and lay the foundation for the development of these precious resources. In this study,we reported the high-quality genome sequence of a strain C. arbuscula NRRL 3705 based on Single Molecule Real-Time sequencing technology. The genome of this fungus is over 45 Mb in size, relatively larger than other typical filamentous fungi, and comprises 10,001 predictable genes, encoding at least 762 secretory-proteins, 386 carbohydrate-active enzymes and 177 P450 enzymes. 398 virulence factors and 228 genes related to pathogen-host interactions were also predicted in this fungus. Moreover , 65 secondary metabolite biosynthetic gene clusters were revealed, including the gene cluster for mycotoxins aurovertins. In addition, several gene clusters were predicted to produce various mycotoxins, including aflatoxin, alternariol, destruxin, citrinin and isoflavipucine. Notably, two independent gene clusters were shown possibly involved in the biosynthesis of alternariol. Furthermore, RNA-Seq assay showed that only the expression of aurovertin gene cluster is much stronger than the housekeeping genes under laboratory conditions, consistent with that aurovertins are the predominant metabolites. The gene expression of the remaining 64 gene clusters for compound backbone biosynthesis was all lower than the housekeeping genes, which might partially explain poor production of other secondary metabolites in this fungus.Our omics data along with bioinformatics analysis indicated that C. arbuscula NRRL 3705 contains a large number of biosynthetic gene clusters and has a huge potential to produce profound secondary metabolites. This work also provides the basis for development of endophytic fungi as a new resource of natural products with promising biological activities.


mBio ◽  
2021 ◽  
Author(s):  
Wenjie Wang ◽  
Milton Drott ◽  
Claudio Greco ◽  
Dianiris Luciano-Rosario ◽  
Pinmei Wang ◽  
...  

Fungal secondary metabolites (SMs) are an important source of pharmaceuticals on one hand and toxins on the other. Efforts to identify the biosynthetic gene clusters (BGCs) that synthesize SMs have yielded significant insights into how variation in the genes that compose BGCs may impact subsequent metabolite production within and between species.


Sign in / Sign up

Export Citation Format

Share Document