scholarly journals Density-based binning of gene clusters to infer function or evolutionary history using GeneGrouper

2021 ◽  
Author(s):  
Alexander G McFarland ◽  
Nolan W Kennedy ◽  
Carolyn E Mills ◽  
Danielle Tullman-Ercek ◽  
Curtis Huttenhower ◽  
...  

Motivation: Identifying gene clusters of interest in phylogenetically proximate and distant taxa can help to infer phenotypes of interest. Conserved gene clusters may differ by only a few genes, which can be biologically meaningful, such as the formation of pseudogenes or insertions interrupting regulation. These qualities may allow for unsupervised clustering of similar gene clusters into bins that provide a population-level understanding of the genetic variation in similar gene clusters. Results: We developed GeneGrouper, a command-line tool that uses a density-based clustering method to group gene clusters into bins. GeneGrouper demonstrated high recall and precision in benchmarks for the detection of the 23-gene Salmonella enterica LT2 Pdu gene cluster and four-gene Pseudomonas aeruginosa PAO1 Mex gene cluster in 435 genomes containing mixed taxa. In a subsequent application investigating the diversity and impact of gene complete and incomplete LT2 Pdu gene clusters in 1130 S. enterica genomes, GeneGrouper identified a novel, frequently occurring pduN pseudogene. When replicated in vivo, disruption of pduN with a frameshift mutation negatively impacted microcompartment formation. We next demonstrated the versatility of GeneGrouper by clustering both distant homologous gene clusters and variable gene clusters found in integrative and conjugative elements.

Life ◽  
2021 ◽  
Vol 11 (8) ◽  
pp. 758
Author(s):  
Xiaohe Jin ◽  
Yunlong Zhang ◽  
Ran Zhang ◽  
Kathy-Uyen Nguyen ◽  
Jonathan S. Lindsey ◽  
...  

Tolyporphins A–R are unusual tetrapyrrole macrocycles produced by the non-axenic filamentous cyanobacterium HT-58-2. A putative biosynthetic gene cluster for biosynthesis of tolyporphins (here termed BGC-1) was previously identified in the genome of HT-58-2. Here, homology searching of BGC-1 in HT-58-2 led to identification of similar BGCs in seven other filamentous cyanobacteria, including strains Nostoc sp. 106C, Nostoc sp. RF31YmG, Nostoc sp. FACHB-892, Brasilonema octagenarum UFV-OR1, Brasilonema octagenarum UFV-E1, Brasilonema sennae CENA114 and Oculatella sp. LEGE 06141, suggesting their potential for tolyporphins production. A similar gene cluster (BGC-2) also was identified unexpectedly in HT-58-2. Tolyporphins BGCs were not identified in unicellular cyanobacteria. Phylogenetic analysis based on 16S rRNA and a common component of the BGCs, TolD, points to a close evolutionary history between each strain and their respective tolyporphins BGC. Though identified with putative tolyporphins BGCs, examination of pigments extracted from three cyanobacteria has not revealed the presence of tolyporphins. Overall, the identification of BGCs and potential producers of tolyporphins presents a collection of candidate cyanobacteria for genetic and biochemical analysis pertaining to these unusual tetrapyrrole macrocycles.


Author(s):  
Rebecca Devine ◽  
Hannah McDonald ◽  
Zhiwei Qin ◽  
Corinne Arnold ◽  
Katie Noble ◽  
...  

AbstractThe formicamycins are promising antibiotics with potent activity against Gram-positive pathogens including VRE and MRSA and display a high barrier to selection of resistant isolates. They were first identified in Streptomyces formicae KY5, which produces the formicamycins at low levels on solid agar but not in liquid culture, thus hindering further investigation of these promising antibacterial compounds. We hypothesised that by understanding the organisation and regulation of the for biosynthetic gene cluster, we could rationally refactor the cluster to increase production levels. Here we report that the for biosynthetic gene cluster consists of 24 genes expressed on nine transcripts. Seven of these transcripts, including those containing all the major biosynthetic genes, are repressed by the MarR-regulator ForJ which also controls the expression of the ForGF two-component system that initiates biosynthesis. A third cluster-situated regulator, ForZ, autoregulates and controls production of the putative MFS transporter ForAA. Consistent with these findings, deletion of forJ increased formicamycin biosynthesis 5-fold, while over-expression of forGF in the ΔforJ background increased production 10-fold compared to the wild-type. De-repression by deleting forJ also switched on biosynthesis in liquid-culture and induced the production of two novel formicamycin congeners. By combining mutations in regulatory and biosynthetic genes, six new biosynthetic precursors with antibacterial activity were also isolated. This work demonstrates the power of synthetic biology for the rational redesign of antibiotic biosynthetic gene clusters both to engineer strains suitable for fermentation in large scale bioreactors and to generate new molecules.ImportanceAntimicrobial resistance is a growing threat as existing antibiotics become increasingly ineffective against drug resistant pathogens. Here we determine the transcriptional organisation and regulation of the gene cluster encoding biosynthesis of the formicamycins, promising new antibiotics with activity against drug resistant bacteria. By exploiting this knowledge, we construct stable mutant strains which over-produce these molecules in both liquid and solid culture whilst also making some new compound variants. This will facilitate large scale purification of these molecules for further study including in vivo experiments and the elucidation of their mechanism of action. Our work demonstrates that understanding the regulation of natural product biosynthetic pathways can enable rational improvement of the producing strains.


2005 ◽  
Vol 187 (23) ◽  
pp. 8164-8171 ◽  
Author(s):  
Diana Ideses ◽  
Uri Gophna ◽  
Yossi Paitan ◽  
Roy R. Chaudhuri ◽  
Mark J. Pallen ◽  
...  

ABSTRACT The type III secretion system (T3SS) is an important virulence factor used by several gram-negative bacteria to deliver effector proteins which subvert host cellular processes. Enterohemorrhagic Escherichia coli O157 has a well-defined T3SS involved in attachment and effacement (ETT1) and critical for virulence. A gene cluster potentially encoding an additional T3SS (ETT2), which resembles the SPI-1 system in Salmonella enterica, was found in its genome sequence. The ETT2 gene cluster has since been found in many E. coli strains, but its in vivo role is not known. Many of the ETT2 gene clusters carry mutations and deletions, raising the possibility that they are not functional. Here we show the existence in septicemic E. coli strains of an ETT2 gene cluster, ETT2sepsis, which, although degenerate, contributes to pathogenesis. ETT2sepsis has several premature stop codons and a large (5 kb) deletion, which is conserved in 11 E. coli strains from cases of septicemia and newborn meningitis. A null mutant constructed to remove genes coding for the putative inner membrane ring of the secretion complex exhibited significantly reduced virulence. These results are the first demonstration of the importance of ETT2 for pathogenesis.


2010 ◽  
Vol 77 (4) ◽  
pp. 1214-1220 ◽  
Author(s):  
Toshiki Furuya ◽  
Satomi Hirose ◽  
Hisashi Osanai ◽  
Hisashi Semba ◽  
Kuniki Kino

ABSTRACTMycobacterium goodiistrain 12523 is an actinomycete that is able to oxidize phenol regioselectively at theparaposition to produce hydroquinone. In this study, we investigated the genes responsible for this unique regioselective oxidation. On the basis of the fact that the oxidation activity ofM. goodiistrain 12523 toward phenol is induced in the presence of acetone, we first identified acetone-induced proteins in this microorganism by two-dimensional electrophoretic analysis. The N-terminal amino acid sequence of one of these acetone-induced proteins shares 100% identity with that of the protein encoded by the open reading frame Msmeg_1971 inMycobacterium smegmatisstrain mc2155, whose genome sequence has been determined. Since Msmeg_1971, Msmeg_1972, Msmeg_1973, and Msmeg_1974 constitute a putative binuclear iron monooxygenase gene cluster, we cloned this gene cluster ofM. smegmatisstrain mc2155 and its homologous gene cluster found inM. goodiistrain 12523. Sequence analysis of these binuclear iron monooxygenase gene clusters revealed the presence of four genes designatedmimABCD, which encode an oxygenase large subunit, a reductase, an oxygenase small subunit, and a coupling protein, respectively. When themimAgene (Msmeg_1971) ofM. smegmatisstrain mc2155, which was also found to be able to oxidize phenol to hydroquinone, was deleted, this mutant lost the oxidation ability. This ability was restored by introduction of themimAgene ofM. smegmatisstrain mc2155 or ofM. goodiistrain 12523 into this mutant. Interestingly, we found that these gene clusters also play essential roles in propane and acetone metabolism in these mycobacteria.


2020 ◽  
Author(s):  
Dina Kačar ◽  
Librada M Cañedo ◽  
Pilar Rodríguez ◽  
Elena Gonzalez ◽  
Beatriz Galán ◽  
...  

AbstractGlutaramide-containing polyketides are known as potent antitumoral and antimetastatic agents. However, the associated gene clusters have only been identified and studied in a few Streptomyces producers and sole Burkholderia gladioli symbiont. The new glutaramide-family polyketides, denominated sesbanimides D, E and F along with the previously known sesbanimide A and C, were isolated from two marine alphaproteobacteria Stappia indica PHM037 and Labrenzia aggregata PHM038. Structures of the isolated compounds were elucidated based on 1D and 2D homo and heteronuclear NMR analyses and ESI-MS spectrometry. All compounds exhibited strong antitumor activity in lung, breast and colorectal cancer cell lines. Subsequent whole genome sequencing and genome mining revealed the presence of the trans-AT PKS gene cluster responsible for the sesbanimide biosynthesis, described as sbn cluster, and the sesbanimide modular assembly is proposed. Interestingly, numerous homologous orphan gene clusters were localized in distantly related bacteria and used as comparative genomic assets for a more global characterization of sbn like-clusters. Strikingly, the modular architecture of downstream mixed type PKS/NRPS, SbnQ, revealed high similarity to PedH in pederin and Lab13 in labrenzin gene clusters, although those clusters are responsible for the production of structurally completely different molecules. The unexpected presence of SbnQ homologs in unrelated polyketide gene clusters across phylogenetically distant bacteria, raises intriguing questions about the evolutionary relationship between glutaramide-like and pederin-like pathways, as well as the functionality of their synthetic products.SignificanceGlutaramide-containing polyketides are still a largely understudied group of polyketides, produced mainly by the genera Streptomyces, with a great potential for antitumor drug production. Here, we describe genomes of two cultivable marine bacteria, Stappia indica PHM037 and Labrenzia aggregata PHM038, producers of the cytotoxic glutaramide-family polyketides sesbanimide A and C with chemical elucidation of newly identified analogs D, E and F. Genome mining revealed trans-AT PKS gene cluster responsible for sesbanimide biosynthesis. Although there are numerous homologous gene clusters present in remarkably different bacteria, this is the first time that the biosynthesis product has been reported. The comparative genome analysis reveals stunning, cryptic evolutionary relationship between sesbanimides, glutaramides from Streptomyces spp. and the pederin-family gene clusters.


2021 ◽  
Author(s):  
Robert W. Murdoch ◽  
Gao Chen ◽  
Fadime Kara Murdoch ◽  
E. Erin Mack ◽  
Manuel I. Villalobos Solis ◽  
...  

AbstractAnthropogenic activities and natural processes release dichloromethane (DCM), a toxic chemical with substantial ozone-depleting capacity. Specialized anaerobic bacteria metabolize DCM; however, the genetic basis for this process has remained elusive. Comparative genomics of the three known anaerobic DCM-degrading bacterial species revealed a homologous gene cluster, designated the methylene chloride catabolism (mec) gene cassette, comprising eight to ten genes with predicted 79.6 – 99.7% amino acid identity. Functional annotation identified genes encoding a corrinoid-dependent methyltransferase system, and shotgun proteomics applied to two DCM-catabolizing cultures revealed high expression of proteins encoded on the mec gene cluster during anaerobic growth with DCM. In a DCM-contaminated groundwater plume, the abundance of mec genes strongly correlated with DCM concentrations (R2 = 0.71 – 0.85) indicating their value as process-specific bioremediation biomarkers. mec gene clusters were identified in metagenomes representing peat bogs, the deep subsurface, and marine ecosystems including oxygen minimum zones (OMZs), suggesting DCM turnover in diverse habitats. The broad distribution of anaerobic DCM catabolic potential suggests a relevant control function for emissions to the atmosphere, and a role for DCM as a microbial energy source in critical zone environments. The findings imply that the global DCM flux might be far greater than emission measurements suggest.ImportanceDichloromethane (DCM) is an increasing threat to stratospheric ozone with both anthropogenic and natural emission sources. Anaerobic bacterial metabolism of DCM has not yet been taken into consideration as a factor in the global DCM cycle. The discovery of the mec gene cassette associated with anaerobic bacterial DCM metabolism and its widespread distribution in environmental systems highlight a strong attenuation potential for DCM. Knowledge of the mec cassette offers new opportunities to delineate DCM sources, enables more robust estimates of DCM fluxes, supports refined DCM emission modeling and simulation of the stratospheric ozone layer, reveals a novel, ubiquitous C1 carbon metabolic system, and provides prognostic and diagnostic tools supporting bioremediation of groundwater aquifers impacted by DCM.


2016 ◽  
Author(s):  
Bogdan Tokovenko ◽  
Yuriy Rebets ◽  
Andriy Luzhetskyy

Background. Biosynthetic potential of Actinobacteria has long been the subject of theoretical estimates. Such an estimate is indeed important as a test of further exploitability of a taxon or group of taxa for new therapeutics. As neither a set of available genomes nor a set of bacterial cultivation methods are static, it makes sense to simplify as much as possible and to improve reproducibility of biosynthetic gene clusters similarity, diversity, and abundance estimations. Results. We have developed a command-line computational pipeline (available at https://bitbucket.org/qmentis/clusterscluster/) that assists in performing empirical (genome-based) assessment of microbial secondary metabolite gene clusters similarity and abundance, and applied it to a set of 208 complete and de-duplicated Actinobacteria genomes. After a brief overview of Actinobacteria biosynthetic potential as compared to other bacterial taxa, we use similarity thresholds derived from 4 pairs of known similar gene clusters to identify up to 40-48% of 3247 gene clusters in our set of genomes as unique. There is no saturation of the cumulative unique gene clusters curve within the examined dataset, and Heap's alpha is 0.129, suggesting an open pan-clustome. We identify and highlight pitfalls and possible improvements of genome-based gene cluster similarity measurements.


2009 ◽  
Vol 71-73 ◽  
pp. 207-210 ◽  
Author(s):  
M. Esparza ◽  
B. Bowien ◽  
Eugenia Jedlicki ◽  
David S. Holmes

Acidithiobacillus ferrooxidans is an obligately chemolithoautotrophic, -proteobacterium that fixes CO2 by the Calvin-Benson-Bassham (CBB) reductive pentose phosphate cycle. Our objective is to identify genes potentially involved in CO2 fixation and to advance our understanding of how they might be regulated in response to environmental signals. Bioinformatic analyses, based on the complete genome sequence of the type strain ATCC 23270, identified five cbb gene clusters four of which we show experimentally to be operons. These operons are predicted to encode: (i) the components of the carboxysome and one copy of form I RubisCO (cbb1 operon), (ii) a second copy of form I RubisCO (cbb2 operon), (iii) enzymes of central carbon metabolism (cbb3 operon), (iv) a phosphoribulokinase and enzymes of sulfur metabolism (cbb4 operon) and RubisCO form II (cbb5 gene cluster). In addition, the gene for a LysR-type transcriptional regulator CbbR was identified immediately upstream and in divergent orientation to the cbb1 operon and another associated with the cbb5 gene cluster. A. ferrooxidans was grown under different concentrations of CO2 (2.5 to 20% [v/v]), and levels of mRNA and protein were evaluated by qPCR and Western blotting, respectively. CbbR binding to predicted promoter regions of operons cbb1-4 was assayed by EMSA This information permitted the formulation of models explaining how these operons might be regulated by environmental CO2 concentrations. These models were evaluated in vivo in a heterologous host, using cloned A. ferrooxidans cbbR to complement a mutant of the facultative chemoautotroph Ralstonia eutropha H16 lacking a functional cbbR. Cloned copies of A. ferrooxidans promoter regions were also introduced into R. eutropha to evaluate their ability to drive reporter gene expression. This work lays the framework for further studies that should result in a more comprehensive picture of how CO2 fixation is regulated in A. ferrooxidans.


2017 ◽  
Vol 114 (27) ◽  
pp. 7025-7030 ◽  
Author(s):  
Nicholas C. Harris ◽  
Michio Sato ◽  
Nicolaus A. Herman ◽  
Frederick Twigg ◽  
Wenlong Cai ◽  
...  

A putative lipopeptide biosynthetic gene cluster is conserved in many species of Actinobacteria, including Mycobacterium tuberculosis and M. marinum, but the specific function of the encoding proteins has been elusive. Using both in vivo heterologous reconstitution and in vitro biochemical analyses, we have revealed that the five encoding biosynthetic enzymes are capable of synthesizing a family of isonitrile lipopeptides (INLPs) through a thio-template mechanism. The biosynthesis features the generation of isonitrile from a single precursor Gly promoted by a thioesterase and a nonheme iron(II)-dependent oxidase homolog and the acylation of both amino groups of Lys by the same isonitrile acyl chain facilitated by a single condensation domain of a nonribosomal peptide synthetase. In addition, the deletion of INLP biosynthetic genes in M. marinum has decreased the intracellular metal concentration, suggesting the role of this biosynthetic gene cluster in metal transport.


1998 ◽  
Vol 180 (8) ◽  
pp. 2005-2013 ◽  
Author(s):  
Lu-Shu Yeh ◽  
Tien Hsu ◽  
Jim D. Karam

ABSTRACT The genomes of bacteriophages T4 and RB69 are phylogenetically related but diverge in nucleotide sequence at many loci and are incompatible with each other in vivo. We describe here the biological implications of divergence in a genomic segment that encodes four essential DNA replication proteins: gp45 (sliding clamp), gp44/62 complex (clamp loader), and gp46 (a recombination protein). We have cloned, sequenced, and expressed several overlapping segments of the RB69 gene 46-45.2-(rpbA)-45-44-62cluster and compared its features to those of the homologous gene cluster from T4. The deduced primary structures of all four RB69 replication proteins and gp45.2 from this cluster are very similar (80 to 95% similarity) to those of their respective T4 homologs. In contrast, the rpbA region (which encodes a nonessential protein in T4) is highly diverged (∼49% similarity) between the two phage genomes and does not encode protein in RB69. Expression studies and patterns of high divergence of intercistronic nucleotide sequences of this cluster suggest that T4 and RB69 evolved similar transcriptional and translational control strategies for the cistrons contained therein, but with different specificities. In plasmid-phage complementation assays, we show that posttranslationally, RB69 and T4 homologs of gp45 and the gp44/62 complex can be effectively exchanged between the two phage replicase assemblies; however, we also show results which suggest that mixed clamp loader complexes consisting of T4 gp62 and RB69 gp44 subunits are not active for phage DNA replication. Thus, specificity of the gp44-gp62 interaction in the clamp loader marks a point of departure between the T4 and RB69 replication systems.


Sign in / Sign up

Export Citation Format

Share Document