scholarly journals Sequence conservation, domain architectures, and phylogenetic distribution of the HD-GYP type c-di-GMP phosphodiesterases

2021 ◽  
Author(s):  
Michael Y. Galperin ◽  
Shan-Ho Chou

The HD-GYP domain, named after two of its conserved sequence motifs, was first described in 1999 as a specialized version of the widespread HD phosphohydrolase domain that had additional highly conserved amino acid residues. Domain associations of HD-GYP indicated its involvement in bacterial signal transduction and distribution patterns of this domain suggested that it could serve as a hydrolase of the bacterial second messenger c-di-GMP, in addition to or instead of the EAL domain. Subsequent studies confirmed the ability of various HD-GYP domains to hydrolyze c-di-GMP to linear pGpG and/or GMP. Certain HD-GYP-containing proteins hydrolyze another second messenger, cGAMP, and some HD-GYP domains participate in regulatory protein-protein interactions. The recently solved structures of HD-GYP domains from four distinct organisms clarified the mechanisms of c-di-GMP binding and metal-assisted hydrolysis. However, the HD-GYP domain is poorly represented in public domain databases, which causes certain confusion about its phylogenetic distribution, functions, and domain architectures. Here, we present a refined sequence model for the HD-GYP domain and describe the roles of its most conserved residues in metal and/or substrate binding. We also calculate the numbers of HD-GYPs encoded in various genomes and list the most common domain combinations involving HD-GYP, such as the RpfG (REC–HD-GYP), Bd1817 (DUF3391– HD-GYP), and PmGH (GAF–HD-GYP) protein families. We also provide the descriptions of six HD-GYP–associated domains, including four novel integral membrane sensor domains. This work is expected to stimulate studies of diverse HD-GYP-containing proteins, their N-terminal sensor domains and the signals to which they respond. IMPORTANCE The HD-GYP domain forms class II of c-di-GMP phosphodiesterases that control the cellular levels of the universal bacterial second messenger c-di-GMP and therefore affect flagellar and/or twitching motility, cell development, biofilm formation, and, often, virulence. Despite more than 20 years of research, HD-GYP domains are insufficiently characterized; they are often confused with ‘classical’ HD domains that are involved in various housekeeping activities and may participate in signaling, hydrolyzing (p)ppGpp and c-di-AMP. This work provides an updated description of the HD-GYP domain, including its sequence conservation, phylogenetic distribution, domain architectures, and the most widespread HD-GYP-containing protein families. This work shows that HD-GYP domains are widespread in many environmental bacteria and are predominant c-di-GMP hydrolases in many lineages, including clostridia and deltaproteobacteria .

2021 ◽  
Author(s):  
Michael Y. Galperin ◽  
Shan-Ho Chou

The HD-GYP domain, named after two of its conserved sequence motifs, was first described in 1999 as a specialized version of the widespread HD phosphohydrolase domain that had additional highly conserved amino acid residues. Domain associations of HD-GYP indicated its involvement in bacterial signal transduction and distribution patterns of this domain suggested that it could serve as a hydrolase of the bacterial second messenger c-di-GMP, in addition to or instead of the EAL domain. Subsequent studies confirmed the ability of various HD-GYP domains to hydrolyze c-di-GMP to linear pGpG and/or GMP. Certain HD-GYP-containing proteins hydrolyze another second messenger, cGAMP, and some HD-GYP domains participate in regulatory protein-protein interactions. The recently solved structures of HD-GYP domains from four distinct organisms clarified the mechanisms of c-di-GMP binding and metal-assisted hydrolysis. However, the HD-GYP domain is poorly represented in public domain databases, which causes certain confusion about its phylogenic distribution, functions, and domain architectures. Here, we present a refined sequence model for the HD-GYP domain and describe the roles of its most conserved residues in metal and/or substrate binding. We also calculate the numbers of HD-GYPs encoded in various genomes and list the most common domain combinations involving HD-GYP, such as the RpfG (REC-HD-GYP), Bd1817 (DUF3391-HD-GYP), and PmGH (GAF-HD-GYP) protein families. We also provide the descriptions of six HD-GYP-associated domains, including four novel integral membrane sensor domains. This work is expected to stimulate studies of diverse HD-GYP-containing proteins, their N-terminal sensor domains, and the signals to which they respond.


2018 ◽  
Author(s):  
Naomi Yamada ◽  
William K.M. Lai ◽  
Nina Farrell ◽  
B. Franklin Pugh ◽  
Shaun Mahony

AbstractMotivationRegulatory proteins associate with the genome either by directly binding cognate DNA motifs or via protein-protein interactions with other regulators. Each recruitment mechanism may be associated with distinct motifs and may also result in distinct characteristic patterns in high-resolution protein-DNA binding assays. For example, the ChIP-exo protocol precisely characterizes protein-DNA crosslinking patterns by combining chromatin immunoprecipitation (ChIP) with 5’ → 3’ exonuclease digestion. Since different regulatory complexes will result in different protein-DNA crosslinking signatures, analysis of ChIP-exo tag enrichment patterns should enable detection of multiple protein-DNA binding modes for a given regulatory protein. However, current ChIP-exo analysis methods either treat all binding events as being of a uniform type or rely on motifs to cluster binding events into subtypes.ResultsTo systematically detect multiple protein-DNA interaction modes in a single ChIP-exo experiment, we introduce the ChIP-exo mixture model (ChExMix). ChExMix probabilistically models the genomic locations and subtype memberships of binding events using both ChIP-exo tag distribution patterns and DNA motifs. We demonstrate that ChExMix achieves accurate detection and classification of binding event subtypes using in silico mixed ChIP-exo data. We further demonstrate the unique analysis abilities of ChExMix using a collection of ChIP-exo experiments that profile the binding of key transcription factors in MCF-7 cells. In these data, ChExMix identifies possible recruitment mechanisms of FoxA1 and ERα, thus demonstrating that ChExMix can effectively stratify ChIP-exo binding events into biologically meaningful subtypes.AvailabilityChExMix is available from https://github.com/seqcode/[email protected]


1989 ◽  
Vol 9 (2) ◽  
pp. 747-756 ◽  
Author(s):  
L Poellinger ◽  
R G Roeder

Immunoglobulin heavy-chain genes contain two conserved sequence elements 5' to the site of transcription initiation: the octamer ATGCAAAT and the heptamer CTCATGA. Both of these elements are required for normal cell-specific promoter function. The present study demonstrates that both the ubiquitous and lymphoid-cell-specific octamer transcription factors (OTF-1 and OTF-2, respectively) interact specifically with each of the two conserved sequence elements, forming either homo- or heterodimeric complexes. This was surprising, since the heptamer and octamer sequence motifs bear no obvious similarity to each other. Binding of either factor to the octamer element occurred independently. However, OTF interaction with the heptamer sequence appeared to require the presence of an intact octamer motif and occurred with a spacing of either 2 or 14 base pairs between the two elements, suggesting coordinate binding resulting from protein-protein interactions. The degeneracy in sequences recognized by the OTFs may be important in widening the range over which gene expression can be modulated and in establishing cell type specificity.


2011 ◽  
Vol 286 (41) ◽  
pp. 35418-35429 ◽  
Author(s):  
Trine Kjaersgaard ◽  
Michael K. Jensen ◽  
Michael W. Christiansen ◽  
Per Gregersen ◽  
Birthe B. Kragelund ◽  
...  

Senescence in plants involves massive nutrient relocation and age-related cell death. Characterization of the molecular components, such as transcription factors (TFs), involved in these processes is required to understand senescence. We found that HvNAC005 and HvNAC013 of the plant-specific NAC (NAM, ATAF1,2, CUC) TF family are up-regulated during senescence in barley (Hordeum vulgare). Both HvNAC005 and HvNAC013 bound the conserved NAC DNA target sequence. Computational and biophysical analyses showed that both proteins are intrinsically disordered in their large C-terminal domains, which are transcription regulatory domains (TRDs) in many NAC TFs. Using motif searches and interaction studies in yeast we identified an evolutionarily conserved sequence, the LP motif, in the TRD of HvNAC013. This motif was sufficient for transcriptional activity. In contrast, HvNAC005 did not function as a transcriptional activator suggesting that an involvement of HvNAC013 and HvNAC005 in senescence will be different. HvNAC013 interacted with barley radical-induced cell death 1 (RCD1) via the very C-terminal part of its TRD, outside of the region containing the LP motif. No significant secondary structure was induced in the HvNAC013 TRD upon interaction with RCD1. RCD1 also interacted with regions dominated by intrinsic disorder in TFs of the MYB and basic helix-loop-helix families. We propose that RCD1 is a regulatory protein capable of interacting with many different TFs by exploiting their intrinsic disorder. In addition, we present the first structural characterization of NAC C-terminal domains and relate intrinsic disorder and sequence motifs to activity and protein-protein interactions.


1989 ◽  
Vol 9 (2) ◽  
pp. 747-756
Author(s):  
L Poellinger ◽  
R G Roeder

Immunoglobulin heavy-chain genes contain two conserved sequence elements 5' to the site of transcription initiation: the octamer ATGCAAAT and the heptamer CTCATGA. Both of these elements are required for normal cell-specific promoter function. The present study demonstrates that both the ubiquitous and lymphoid-cell-specific octamer transcription factors (OTF-1 and OTF-2, respectively) interact specifically with each of the two conserved sequence elements, forming either homo- or heterodimeric complexes. This was surprising, since the heptamer and octamer sequence motifs bear no obvious similarity to each other. Binding of either factor to the octamer element occurred independently. However, OTF interaction with the heptamer sequence appeared to require the presence of an intact octamer motif and occurred with a spacing of either 2 or 14 base pairs between the two elements, suggesting coordinate binding resulting from protein-protein interactions. The degeneracy in sequences recognized by the OTFs may be important in widening the range over which gene expression can be modulated and in establishing cell type specificity.


Author(s):  
Yanrong Ji ◽  
Zhihan Zhou ◽  
Han Liu ◽  
Ramana V Davuluri

Abstract Motivation Deciphering the language of non-coding DNA is one of the fundamental problems in genome research. Gene regulatory code is highly complex due to the existence of polysemy and distant semantic relationship, which previous informatics methods often fail to capture especially in data-scarce scenarios. Results To address this challenge, we developed a novel pre-trained bidirectional encoder representation, named DNABERT, to capture global and transferrable understanding of genomic DNA sequences based on up and downstream nucleotide contexts. We compared DNABERT to the most widely used programs for genome-wide regulatory elements prediction and demonstrate its ease of use, accuracy and efficiency. We show that the single pre-trained transformers model can simultaneously achieve state-of-the-art performance on prediction of promoters, splice sites and transcription factor binding sites, after easy fine-tuning using small task-specific labeled data. Further, DNABERT enables direct visualization of nucleotide-level importance and semantic relationship within input sequences for better interpretability and accurate identification of conserved sequence motifs and functional genetic variant candidates. Finally, we demonstrate that pre-trained DNABERT with human genome can even be readily applied to other organisms with exceptional performance. We anticipate that the pre-trained DNABERT model can be fined tuned to many other sequence analyses tasks. Availability and implementation The source code, pretrained and finetuned model for DNABERT are available at GitHub (https://github.com/jerryji1993/DNABERT). Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Ami Shah ◽  
Madison Ratkowski ◽  
Alessandro Rosa ◽  
Paul Feinstein ◽  
Thomas Bozza

AbstractOlfactory sensory neurons express a large family of odorant receptors (ORs) and a small family of trace amine-associated receptors (TAARs). While both families are subject to so-called singular expression (expression of one allele of one gene), the mechanisms underlying TAAR gene choice remain obscure. Here, we report the identification of two conserved sequence elements in the mouse TAAR cluster (T-elements) that are required for TAAR gene expression. We observed that cell-type-specific expression of a TAAR-derived transgene required either T-element. Moreover, deleting either element reduced or abolished expression of a subset of TAAR genes, while deleting both elements abolished olfactory expression of all TAARs in cis with the mutation. The T-elements exhibit several features of known OR enhancers but also contain highly conserved, unique sequence motifs. Our data demonstrate that TAAR gene expression requires two cooperative cis-acting enhancers and suggest that ORs and TAARs share similar mechanisms of singular expression.


2020 ◽  
Vol 401 (12) ◽  
pp. 1323-1334
Author(s):  
Sandra Kunz ◽  
Peter L. Graumann

AbstractThe second messenger cyclic di-GMP regulates a variety of processes in bacteria, many of which are centered around the decision whether to adopt a sessile or a motile life style. Regulatory circuits include pathogenicity, biofilm formation, and motility in a wide variety of bacteria, and play a key role in cell cycle progression in Caulobacter crescentus. Interestingly, multiple, seemingly independent c-di-GMP pathways have been found in several species, where deletions of individual c-di-GMP synthetases (DGCs) or hydrolases (PDEs) have resulted in distinct phenotypes that would not be expected based on a freely diffusible second messenger. Several recent studies have shown that individual signaling nodes exist, and additionally, that protein/protein interactions between DGCs, PDEs and c-di-GMP receptors play an important role in signaling specificity. Additionally, subcellular clustering has been shown to be employed by bacteria to likely generate local signaling of second messenger, and/or to increase signaling specificity. This review highlights recent findings that reveal how bacteria employ spatial cues to increase the versatility of second messenger signaling.


Sign in / Sign up

Export Citation Format

Share Document