dna tandem repeats
Recently Published Documents


TOTAL DOCUMENTS

12
(FIVE YEARS 7)

H-INDEX

4
(FIVE YEARS 1)

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Mathys Grapotte ◽  
Manu Saraswat ◽  
Chloé Bessière ◽  
Christophe Menichelli ◽  
Jordan A. Ramilowski ◽  
...  

AbstractUsing the Cap Analysis of Gene Expression (CAGE) technology, the FANTOM5 consortium provided one of the most comprehensive maps of transcription start sites (TSSs) in several species. Strikingly, ~72% of them could not be assigned to a specific gene and initiate at unconventional regions, outside promoters or enhancers. Here, we probe these unassigned TSSs and show that, in all species studied, a significant fraction of CAGE peaks initiate at microsatellites, also called short tandem repeats (STRs). To confirm this transcription, we develop Cap Trap RNA-seq, a technology which combines cap trapping and long read MinION sequencing. We train sequence-based deep learning models able to predict CAGE signal at STRs with high accuracy. These models unveil the importance of STR surrounding sequences not only to distinguish STR classes, but also to predict the level of transcription initiation. Importantly, genetic variants linked to human diseases are preferentially found at STRs with high transcription initiation level, supporting the biological and clinical relevance of transcription initiation at STRs. Together, our results extend the repertoire of non-coding transcription associated with DNA tandem repeats and complexify STR polymorphism.


2021 ◽  
Vol 22 (10) ◽  
pp. 5373
Author(s):  
Juan A. Subirana ◽  
Xavier Messeguer

Little is known about DNA tandem repeats across prokaryotes. We have recently described an enigmatic group of tandem repeats in bacterial genomes with a constant repeat size but variable sequence. These findings strongly suggest that tandem repeat size in some bacteria is under strong selective constraints. Here, we extend these studies and describe tandem repeats in a large set of Bacillus. Some species have very few repeats, while other species have a large number. Most tandem repeats have repeats with a constant size (either 52 or 20–21 nt), but a variable sequence. We characterize in detail these intriguing tandem repeats. Individual species have several families of tandem repeats with the same repeat length and different sequence. This result is in strong contrast with eukaryotes, where tandem repeats of many sizes are found in any species. We discuss the possibility that they are transcribed as small RNA molecules. They may also be involved in the stabilization of the nucleoid through interaction with proteins. We also show that the distribution of tandem repeats in different species has a taxonomic significance. The data we present for all tandem repeats and their families in these bacterial species will be useful for further genomic studies.


2020 ◽  
Vol 202 (21) ◽  
Author(s):  
Juan A. Subirana ◽  
Xavier Messeguer

ABSTRACT DNA tandem repeats, or satellites, are well described in eukaryotic species, but little is known about their prevalence across prokaryotes. Here, we performed the most complete characterization to date of satellites in bacteria. We identified 121,638 satellites from 12,233 fully sequenced and assembled bacterial genomes with a very uneven distribution. We also determined the families of satellites which have a related sequence. There are 85 genomes that are particularly satellite rich and contain several families of satellites of yet unknown function. Interestingly, we only found two main types of noncoding satellites, depending on their repeat sizes, 22/44 or 52 nucleotides (nt). An intriguing feature is the constant size of the repeats in the genomes of different species, whereas their sequences show no conservation. Individual species also have several families of satellites with the same repeat length and different sequences. This result is in marked contrast with previous findings in eukaryotes, where noncoding satellites of many sizes are found in any species investigated. We describe in greater detail these noncoding satellites in the spirochete Leptospira interrogans and in several bacilli. These satellites undoubtedly play a specific role in the species which have acquired them. We discuss the possibility that they represent binding sites for transcription factors not previously described or that they are involved in the stabilization of the nucleoid through interaction with proteins. IMPORTANCE We found an enigmatic group of noncoding satellites in 85 bacterial genomes with a constant repeat size but variable sequence. This pattern of DNA organization is unique and had not been previously described in bacteria. These findings strongly suggest that satellite size in some bacteria is under strong selective constraints and thus that satellites are very likely to play a fundamental role. We also provide a list and properties of all satellites in 12,233 genomes, which may be used for further genomic analysis.


2020 ◽  
Author(s):  
Mathys Grapotte ◽  
Manu Saraswat ◽  
Chloé Bessière ◽  
Christophe Menichelli ◽  
Jordan A. Ramilowski ◽  
...  

Using the Cap Analysis of Gene Expression (CAGE) technology, the FANTOM5 consortium provided one of the most comprehensive maps of Transcription Start Sites (TSSs) in several species. Strikingly, ~ 72% of them could not be assigned to a specific gene and initiate at unconventional regions, outside promoters or enhancers. Here, we probed these unassigned TSSs and showed that, in all species studied, a significant fraction of CAGE peaks initiate at microsatellites, also called short tandem repeats (STRs). To confirm this transcription, we developed Cap Trap RNA-seq, a technology which combines cap trapping and long reads MinION sequencing. We trained sequence-based deep learning models able to predict CAGE signal at STRs with high accuracy. These models unveiled the importance of STR surrounding sequences not only to distinguish STR classes, as defined by the repeated DNA motif, one from each other, but also to predict their transcription. Excitingly, our models predicted that genetic variants linked to human diseases affect STR-associated transcription and correspond precisely to the key positions identified by our models to predict transcription. Together, our results extend the repertoire of non-coding transcription associated with DNA tandem repeats and complexify STR polymorphism.


2020 ◽  
Vol 22 (8) ◽  
pp. 3413-3428
Author(s):  
Alessandra G. Melo ◽  
Geneviève M. Rousseau ◽  
Denise M. Tremblay ◽  
Simon J. Labrie ◽  
Sylvain Moineau

2019 ◽  
Author(s):  
Chloé Bessière ◽  
Manu Saraswat ◽  
Mathys Grapotte ◽  
Christophe Menichelli ◽  
Jordan A. Ramilowski ◽  
...  

AbstractBackgroundUsing the Cap Analysis of Gene Expression technology, the FANTOM5 consortium provided one of the most comprehensive maps of Transcription Start Sites (TSSs) in several species. Strikingly, ~72% of them could not be assigned to a specific gene and initiate at unconventional regions, outside promoters or enhancers.ResultsHere, we probe these unassigned TSSs and show that, in all species studied, a significant fraction of CAGE peaks initiate at short tandem repeats (STRs) corresponding to homopolymers of thymidines (T). Additional analyse confirm that these CAGEs are truly associated with transcriptionally active chromatin marks. Furthermore, we train a sequence-based deep learning model able to predict CAGE signal at T STRs with high accuracy (~81%) Extracting features learned by this model reveals that transcription at T STRs is mostly directed by STR length but also instructions lying in the downstream sequence. Excitingly, our model also predicts that genetic variants linked to human diseases affect this STR-associated transcription.ConclusionsTogether, our results extend the repertoire of non-coding transcription associated with DNA tandem repeats and complexify STR polymorphism. We also provide a new metric that can be considered in future studies of STR-related complex traits.


2015 ◽  
Vol 123 (1) ◽  
pp. 183-192 ◽  
Author(s):  
Shu Wei ◽  
Yu-Zhen Xi ◽  
Da-Peng Song ◽  
Horace Wei ◽  
Margaret Y. Gruber ◽  
...  

2014 ◽  
Vol 38 (1) ◽  
pp. 119-141 ◽  
Author(s):  
Kai Zhou ◽  
Abram Aertsen ◽  
Chris W. Michiels

2013 ◽  
Vol 12 (6) ◽  
pp. 794-803 ◽  
Author(s):  
Fredj Tekaia ◽  
Bernard Dujon ◽  
Guy-Franck Richard

ABSTRACT Megasatellites are large DNA tandem repeats, originally described in Candida glabrata , in protein-coding genes. Most of the genes in which megasatellites are found are of unknown function. In this work, we extended the search for megasatellites to 20 additional completely sequenced fungal genomes and extracted 216 megasatellites in 203 out of 142,121 genes, corresponding to the most exhaustive description of such genetic elements available today. We show that half of the megasatellites detected encode threonine-rich peptides predicted to be intrinsically disordered, suggesting that they may interact with several partners or serve as flexible linkers. Megasatellite motifs were clustered into several families. Their distribution in fungal genes shows that different motifs are found in orthologous genes and similar motifs are found in unrelated genes, suggesting that megasatellite formation or spreading does not necessarily track the evolution of their host genes. Altogether, these results suggest that megasatellites are created and lost during evolution of fungal genomes, probably sharing similar functions, although their primary sequences are not necessarily conserved.


Sign in / Sign up

Export Citation Format

Share Document