cg dinucleotide
Recently Published Documents


TOTAL DOCUMENTS

38
(FIVE YEARS 14)

H-INDEX

13
(FIVE YEARS 3)

2021 ◽  
Vol 22 (20) ◽  
pp. 11025
Author(s):  
Nadine Müller ◽  
Eveliina Ponkkonen ◽  
Thomas Carell ◽  
Andriy Khobta

Stepwise oxidation of the epigenetic mark 5-methylcytosine and base excision repair (BER) of the resulting 5-formylcytosine (5-fC) and 5-carboxycytosine (5-caC) may provide a mechanism for reactivation of epigenetically silenced genes; however, the functions of 5-fC and 5-caC at defined gene elements are scarcely explored. We analyzed the expression of reporter constructs containing either 2′-deoxy-(5-fC/5-caC) or their BER-resistant 2′-fluorinated analogs, asymmetrically incorporated into CG-dinucleotide of the GC box cis-element (5′-TGGGCGGAGC) upstream from the RNA polymerase II core promoter. In the absence of BER, 5-caC caused a strong inhibition of the promoter activity, whereas 5-fC had almost no effect, similar to 5-methylcytosine or 5-hydroxymethylcytosine. BER of 5-caC caused a transient but significant promoter reactivation, succeeded by silencing during the following hours. Both responses strictly required thymine DNA glycosylase (TDG); however, the silencing phase additionally demanded a 5′-endonuclease (likely APE1) activity and was also induced by 5-fC or an apurinic/apyrimidinic site. We propose that 5-caC may act as a repressory mark to prevent premature activation of promoters undergoing the final stages of DNA demethylation, when the symmetric CpG methylation has already been lost. Remarkably, the downstream promoter activation or repression responses are regulated by two separate BER steps, where TDG and APE1 act as potential switches.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Yong Wang ◽  
Jun‑Ming Mao ◽  
Guang‑Dong Wang ◽  
Zhi‑Peng Luo ◽  
Liu Yang ◽  
...  

An amendment to this paper has been published and can be accessed via a link at the top of the paper.


Genetics ◽  
2021 ◽  
Author(s):  
Aline Muyle ◽  
Jeffrey Ross-Ibarra ◽  
Danelle K Seymour ◽  
Brandon S Gaut

Abstract In plants, mammals and insects, some genes are methylated in the CG dinucleotide context, a phenomenon called gene body methylation (gbM). It has been controversial whether this phenomenon has any functional role. Here we took advantage of the availability of 876 leaf methylomes in Arabidopsis thaliana to characterize the population frequency of methylation at the gene level and to estimate the site-frequency spectrum of allelic states. Using a population genetics model specifically designed for epigenetic data, we found that genes with ancestral gbM are under significant selection to remain methylated. Conversely, ancestrally unmethylated genes were under selection to remain unmethylated. Repeating the analyses at the level of individual cytosines confirmed these results. Estimated selection coefficients were small, on the order of 4Nes = 1.4, which is similar to the magnitude of selection acting on codon usage. We also estimated that A. thaliana is losing gbM three-fold more rapidly than gaining it, which could be due to a recent reduction in the efficacy of selection after a switch to selfing. Finally, we investigated the potential function of gbM through its link with gene expression. Across genes with polymorphic methylation states, the expression of gene body methylated alleles was consistently and significantly higher than unmethylated alleles. Although it is difficult to disentangle genetic from epigenetic effects, our work suggests that gbM has a small but measurable effect on fitness, perhaps due to its association to a phenotype like gene expression.


PeerJ ◽  
2020 ◽  
Vol 8 ◽  
pp. e10063
Author(s):  
Sam Humphrey ◽  
Alastair Kerr ◽  
Magnus Rattray ◽  
Caroline Dive ◽  
Crispin J. Miller

Molecular sequences carry information. Analysis of sequence conservation between homologous loci is a proven approach with which to explore the information content of molecular sequences. This is often done using multiple sequence alignments to support comparisons between homologous loci. These methods therefore rely on sufficient underlying sequence similarity with which to construct a representative alignment. Here we describe a method using a formal metric of information, surprisal, to analyse biological sub-sequences without alignment constraints. We applied our model to the genomes of five different species to reveal similar patterns across a panel of eukaryotes. As the surprisal of a sub-sequence is inversely proportional to its occurrence within the genome, the optimal size of the sub-sequences was selected for each species under consideration. With the model optimized, we found a strong correlation between surprisal and CG dinucleotide usage. The utility of our model was tested by examining the sequences of genes known to undergo splicing. We demonstrate that our model can identify biological features of interest such as known donor and acceptor sites. Analysis across all annotated coding exon junctions in Homo sapiens reveals the information content of coding exons to be greater than the surrounding intron regions, a consequence of increased suppression of the CG dinucleotide in intronic space. Sequences within coding regions proximal to exon junctions exhibited novel patterns within DNA and coding mRNA that are not a function of the encoded amino acid sequence. Our findings are consistent with the presence of secondary information encoding features such as DNA and RNA binding sites, multiplexed through the coding sequence and independent of the information required to define the corresponding amino-acid sequence. We conclude that surprisal provides a complementary methodology with which to locate regions of interest in the genome, particularly in situations that lack an appropriate multiple sequence alignment.


2020 ◽  
Author(s):  
Aline Muyle ◽  
Jeffrey Ross-Ibarra ◽  
Danelle K. Seymour ◽  
Brandon S. Gaut

AbstractIn plants, mammals and insects, some genes are methylated in the CG dinucleotide context, a phenomenon called gene body methylation. It has been controversial whether this phenomenon has any functional role. Here, we took advantage of the availability of 876 leaf methylomes in Arabidopsis thaliana to characterize the population frequency of methylation at the gene level and estimated the site-frequency spectrum of allelic states (epialleles). Using a population genetics model specifically designed for epigenetic data, we found that genes with ancestral gene body methylation are under significant selection to remain methylated. Conversely, all genes taken together were inferred to be under selection to be unmethylated. The estimated selection coefficients were small, similar to the magnitude of selection acting on codon usage. We also estimated that A. thaliana is losing gene body methylation three-fold more rapidly than gaining it, which could be due to a recent reduction in the efficacy of selection after a switch to selfing. Finally, we investigated the potential function of gene body methylation through its link with gene expression level. Across genes with polymorphic methylation states, the expression of gene body methylated alleles was consistently and significantly higher than unmethylated alleles. Although it is difficult to disentangle genetic from epigenetic effects, our work suggests that gbM has a small but measurable effect on fitness, perhaps due to its association to a phenotype like gene expression.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Yong Wang ◽  
Jun-Ming Mao ◽  
Guang-Dong Wang ◽  
Zhi-Peng Luo ◽  
Liu Yang ◽  
...  

2020 ◽  
Author(s):  
Rebekah Tillotson ◽  
Justyna Cholewa-Waclaw ◽  
Kashyap Chhatbar ◽  
John Connelly ◽  
Sophie A. Kirschner ◽  
...  

SUMMARYDNA methylation is implicated in neuronal biology via the protein MeCP2, mutation of which causes Rett syndrome. MeCP2 recruits the NCOR1/2 corepressor complexes to methylated cytosine in the CG dinucleotide, but also to non-CG methylation, which is abundant specifically in neuronal genomes. To test the biological significance of its dual binding specificity, we replaced the MeCP2 DNA binding domain with an orthologous domain whose specificity is restricted to mCG motifs. Knock-in mice expressing the domain-swap protein displayed severe Rett syndrome-like phenotypes, demonstrating that interaction with sites of non-CG methylation, specifically the mCAC trinucleotide, is critical for normal brain function. The results support the notion that the delayed onset of Rett syndrome is due to the late accumulation of both mCAC and its reader MeCP2. Intriguingly, genes dysregulated in both Mecp2-null and domain-swap mice are implicated in other neurological disorders, potentially highlighting targets of particular relevance to the Rett syndrome phenotype.


2020 ◽  
Author(s):  
Anna S. Ershova ◽  
Irina A. Eliseeva ◽  
Oleg S. Nikonov ◽  
Alla D. Fedorova ◽  
Ilya E. Vorontsov ◽  
...  

AbstractKnowledge of mechanisms responsible for mutagenesis of adult stem cells is crucial to track genomic alterations that may affect cell renovation and provoke malignant cell transformation. Mutations in regulatory regions are widely studied nowadays, though mostly in cancer. In this study, we decomposed the mutation signature of adult stem cells, mapped the corresponding mutations into transcription factor binding regions, and assessed mutation frequency in sequence motif occurrences. We found binding sites of C/EBP transcription factors strongly enriched with [C>T]G mutations within the core CG dinucleotide related to deamination of the methylated cytosine. This effect was also exhibited in related cancer samples. Structural modeling predicted enhanced CEBPB binding to the consensus sequence with the [C>T]G mismatch, which was then confirmed in the direct experiment. We propose that it is the enhanced binding of C/EBPs that shields C>T transitions from DNA repair and leads to selective accumulation of the [C>T]G mutations within binding sites.


Author(s):  
Yong Wang ◽  
Jun-Ming Mao ◽  
Guang-Dong Wang ◽  
Ze Qiu ◽  
Qin Yao ◽  
...  

Abstract The causative agent of COVID-19 is a severe acute respiratory syndrome-related coronavirus which has been officially named SARS-CoV-2. Here we report the discovery of extremely low CG abundance in its open reading frames. We found that CG reduction in SARS-CoV-2 is achieved mainly through mutating C/G into A/T, and CG is the best target for mutation. In view of energy usage, a coronavirus with low CG abundance has higher efficiency in translating its RNA, because the secondary structure formed by viral genome is less stable. 5’-untranslated region of SARS-CoV-2 has much more CGs and is capable of recruiting host ribosomes to initiate translation. Notably, genomes of cellular organisms also have very low CG abundance, suggesting that mutating C/G into A/T occurs universally in all life forms. Moreover, CG is related to mutational hotspots and CpG islands in cellular organisms. The relationship between them is worthy of further investigations.


Sign in / Sign up

Export Citation Format

Share Document