scholarly journals Eukaryotic Genomes Show Strong Evolutionary Conservation of k-mer Composition and Correlation Contributions between Introns and Intergenic Regions

Genes ◽  
2021 ◽  
Vol 12 (10) ◽  
pp. 1571
Author(s):  
Aaron Sievers ◽  
Liane Sauer ◽  
Michael Hausmann ◽  
Georg Hildenbrand

Several strongly conserved DNA sequence patterns in and between introns and intergenic regions (IIRs) consisting of short tandem repeats (STRs) with repeat lengths <3 bp have already been described in the kingdom of Animalia. In this work, we expanded the search and analysis of conserved DNA sequence patterns to a wider range of eukaryotic genomes. Our aims were to confirm the conservation of these patterns, to support the hypothesis on their functional constraints and/or the identification of unknown patterns. We pairwise compared genomic DNA sequences of genes, exons, CDS, introns and intergenic regions of 34 Embryophyta (land plants), 30 Protista and 29 Fungi using established k-mer-based (alignment-free) comparison methods. Additionally, the results were compared with values derived for Animalia in former studies. We confirmed strong correlations between the sequence structures of IIRs spanning over the entire domain of Eukaryotes. We found that the high correlations within introns, intergenic regions and between the two are a result of conserved abundancies of STRs with repeat units ≤2 bp (e.g., (AT)n). For some sequence patterns and their inverse complementary sequences, we found a violation of equal distribution on complementary DNA strands in a subset of genomes. Looking at mismatches within the identified STR patterns, we found specific preferences for certain nucleotides stable over all four phylogenetic kingdoms. We conclude that all of these conserved patterns between IIRs indicate a shared function of these sequence structures related to STRs.

Genes ◽  
2018 ◽  
Vol 9 (10) ◽  
pp. 482 ◽  
Author(s):  
Aaron Sievers ◽  
Frederik Wenz ◽  
Michael Hausmann ◽  
Georg Hildenbrand

In this study, we pairwise-compared multiple genome regions, including genes, exons, coding DNA sequences (CDS), introns, and intergenic regions of 39 Animalia genomes, including Deuterostomia (27 species) and Protostomia (12 species), by applying established k-mer-based (alignment-free) comparison methods. We found strong correlations between the sequence structure of introns and intergenic regions, individual organisms, and within wider phylogenetical ranges, indicating the conservation of certain structures over the full range of analyzed organisms. We analyzed these sequence structures by quantifying the contribution of different sets of DNA words to the average correlation value by decomposing the correlation coefficients with respect to these word sets. We found that the conserved structures within introns, intergenic regions, and between the two were mainly a result of conserved tandem repeats with repeat units ≤ 2 bp (e.g., (AT)n), while other conserved sequence structures, such as those found between exons and CDS, were dominated by tandem repeats with repeat unit sizes of 3 bp in length and more complex DNA word patterns. We conclude that the conservation between intron and intergenic regions indicates a shared function of these sequence structures. Also, the similar differences in conserved structures with known origin, especially to the conservation between exons and CDS resulting from DNA codons, indicate that k-mer composition-based functional properties of introns and intergenic regions may differ from those of exons and CDS.


Genes ◽  
2019 ◽  
Vol 10 (12) ◽  
pp. 1014 ◽  
Author(s):  
Ana Paço ◽  
Renata Freitas ◽  
Ana Vieira-da-Silva

Eukaryotic genomes are rich in repetitive DNA sequences grouped in two classes regarding their genomic organization: tandem repeats and dispersed repeats. In tandem repeats, copies of a short DNA sequence are positioned one after another within the genome, while in dispersed repeats, these copies are randomly distributed. In this review we provide evidence that both tandem and dispersed repeats can have a similar organization, which leads us to suggest an update to their classification based on the sequence features, concretely regarding the presence or absence of retrotransposons/transposon specific domains. In addition, we analyze several studies that show that a repetitive element can be remodeled into repetitive non-coding or coding sequences, suggesting (1) an evolutionary relationship among DNA sequences, and (2) that the evolution of the genomes involved frequent repetitive sequence reshuffling, a process that we have designated as a “DNA remodeling mechanism”. The alternative classification of the repetitive DNA sequences here proposed will provide a novel theoretical framework that recognizes the importance of DNA remodeling for the evolution and plasticity of eukaryotic genomes.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Hayam Alamro ◽  
Mai Alzamel ◽  
Costas S. Iliopoulos ◽  
Solon P. Pissis ◽  
Steven Watts

Abstract Background An inverted repeat is a DNA sequence followed downstream by its reverse complement, potentially with a gap in the centre. Inverted repeats are found in both prokaryotic and eukaryotic genomes and they have been linked with countless possible functions. Many international consortia provide a comprehensive description of common genetic variation making alternative sequence representations, such as IUPAC encoding, necessary for leveraging the full potential of such broad variation datasets. Results We present IUPACpal, an exact tool for efficient identification of inverted repeats in IUPAC-encoded DNA sequences allowing also for potential mismatches and gaps in the inverted repeats. Conclusion Within the parameters that were tested, our experimental results show that IUPACpal compares favourably to a similar application packaged with EMBOSS. We show that IUPACpal identifies many previously unidentified inverted repeats when compared with EMBOSS, and that this is also performed with orders of magnitude improved speed.


Genes ◽  
2019 ◽  
Vol 10 (7) ◽  
pp. 542
Author(s):  
Kim ◽  
Song ◽  
Ha ◽  
Moon ◽  
Kim ◽  
...  

Variable number tandem repeats (VNTRs) in mitochondrial DNA (mtDNA) of Lentinula edodes are of interest for their role in mtDNA variation and their application as genetic marker. Sequence analysis of three L. edodes mtDNAs revealed the presence of VNTRs of two categories. Type I VNTRs consist of two types of repeat units in a symmetric distribution, whereas Type II VNTRs contain tandemly arrayed repeats of 7- or 17-bp DNA sequences. The number of repeat units was variable depending on the mtDNA of different strains. Using the variations in VNTRs as a mitochondrial marker and the A mating type as a nuclear type marker, we demonstrated that one of the two nuclei in the donor dikaryon preferentially enters into the monokaryotic cytoplasm to establish a new dikaryon which still retains the mitochondria of the monokaryon in the individual mating. Interestingly, we found 6 VNTRs with newly added repeat units from the 22 mates, indicating that elongation of VNTRs occurs during replication of mtDNA. This, together with comparative analysis of the repeating pattern, enables us to propose a mechanistic model that explains the elongation of Type I VNTRs through reciprocal incorporation of basic repeat units, 5’-TCCCTTTAGGG-3’ and its complementary sequence (5’-CCCTAAAGGGA-3’).


Author(s):  
Pierre Murat ◽  
Guillaume Guilbaud ◽  
Julian E. Sale

AbstractBackgroundShort tandem repeats (STRs) contribute significantly to de novo mutagenesis, driving phenotypic diversity and genetic disease. Although highly diverse, their repetitive sequences induce DNA polymerase slippage and stalling, leading to length and sequence variation. However, current studies of DNA synthesis through STRs are restricted to a handful of selected sequences, limiting our broader understanding of their evolutionary behaviour and hampering the characterisation of the determinants of their abundance and stability in eukaryotic genomes.ResultsWe perform a comprehensive analysis of DNA synthesis at all STR permutations and interrogate the impact of STR sequence and secondary structure on their genomic representation and mutability. To do so, we developed a high-throughput primer extension assay that allows monitoring of the kinetics and fidelity of DNA synthesis through 20,000 sequences comprising all STR permutations in different lengths. By combining these measurements with population-scale genomic data, we show that the response of a model replicative DNA polymerase to variously structured DNA is sufficient to predict the complex genomic behaviour of STRs, including abundance and mutational constraints. We demonstrate that DNA polymerase stalling at DNA structures induces error-prone DNA synthesis, which constrains STR expansion.ConclusionsOur data support a model in which STR length in eukaryotic genomes results from a balance between expansion due to polymerase slippage at repeated DNA sequences and point mutations caused by error-prone DNA synthesis at DNA structures.


Genetics ◽  
2003 ◽  
Vol 164 (3) ◽  
pp. 1087-1097 ◽  
Author(s):  
F C Hsu ◽  
C J Wang ◽  
C M Chen ◽  
H Y Hu ◽  
C C Chen

Abstract Two families of tandem repeats, 180-bp and TR-1, have been found in the knobs of maize. In this study, we isolated 59 clones belonging to the TR-1 family from maize and teosinte. Southern hybridization and sequence analysis revealed that members of this family are composed of three basic sequences, A (67 bp); B (184 bp) or its variants B′ (184 bp), 2/3B (115 bp), 2/3B′ (115 bp); and C (108 bp), which are arranged in various combinations to produce repeat units that are multiples of ∼180 bp. The molecular structure of TR-1 elements suggests that: (1) the B component may evolve from the 180-bp knob repeat as a result of mutations during evolution; (2) B′ may originate from B through lateral amplification accompanied by base-pair changes; (3) C plus A may be a single sequence that is added to B and B′, probably via nonhomologous recombination; and (4) 69 bp at the 3′ end of B or B′, and the entire sequence of C can be removed from the elements by an unknown mechanism. Sequence comparisons showed partial homologies between TR-1 elements and two centromeric sequences (B repeats) of the supernumerary B chromosome. This result, together with the finding of other investigators that the B repeat is also fragmentarily homologous to the 180-bp repeat, suggests that the B repeat is derived from knob repeats in A chromosomes, which subsequently become structurally modified. Fluorescence in situ hybridization localized the B repeat to the B centromere and the 180-bp and TR-1 repeats to the proximal heterochromatin knob on the B chromosome.


2007 ◽  
Vol 15 (03) ◽  
pp. 299-312
Author(s):  
SU-LONG NYEO ◽  
JUI-PING YU

The length distributions of simple tandem repeats in the genomes of several organisms are evaluated and found to exhibit long-range correlations in A and T nucleotide bases related repeats for most eukaryotes. In particular, the length distributions of the mononucleotide A/T repeat units have longer tails than those of the C/G repeat units. Also, the length distributions of the dinucleotide repeat unit CG show a simple monotonously fast decreasing behavior, while those of repeat units AT, AG and AC have complicated structures at larger repeat lengths, especially for human, mouse and rat chromosomes. These distributive behaviors are due to the CpG deficiency in different genomes with different methylation activities. Especially, methyltransferases in vertebrates appear to methylate specifically the cytosine in CpG dinucleotides, and the methylated cytosines is prone to mutate to thymine by spontaneous deamination. The dinucleotide CpG would gradually decay into TpG and CpA. In addition, there is a peak in the distributions of repeat unit A at repeat-repeat separation 153 nt for humans and chimpanzees. We show that the long-tail behavior of mononucleotide repeat unit A and the peak at repeat separation 153 nt are due to the interspersed repetitive DNA sequences in humans and chimpanzees.


Genetics ◽  
1988 ◽  
Vol 120 (1) ◽  
pp. 267-278 ◽  
Author(s):  
K M Lyons ◽  
J H Stein ◽  
O Smithies

Abstract Southern blot hybridization analysis of genomic DNAs from 44 unrelated individuals revealed extensive insertion/deletion polymorphisms within the BstNI-type loci (PRB1, PRB2, PRB3 and PRB4) of the human proline-rich protein (PRP) multigene family. Ten length variants were cloned, including alleles at each of the four PRB loci, and in every case the region of length difference was localized to the tandemly repetitious third exon. DNA sequences covering the region of length variation were determined for seven of the alleles. The data indicate (1) that the PRB loci can be divided into two subtypes, PRB1 plus PRB2, and PRB3 plus PRB4, and (2) that the length differences result from different numbers of tandem repeats in the third exons. Variant chromosomes were also identified with different numbers of PRP loci resulting from homologous but unequal exchange between the PRB1 and PRB2 loci. The overall data are compatible with the observed length variants having been generated via homologous but unequal intragenic exchange. The results also indicate that these crossover events are sensitive to the amount of homology shared between the interacting DNA strands. Allelic length variants have arisen independently at least 20 times at the PRB loci, but only one has been detected at a PRH locus. Comparison of the detailed structures of the repetitious regions in PRB and PRH loci shows that the repeats in PRB genes are very similar to each other in sequence and in length. The PRH genes contain fewer repeats, which differ considerably in their individual lengths. These differences suggest that the larger number of length variants in PRB genes is related to their greater ease of homologous but unequal pairing compared to PRH genes.


Author(s):  
Barbara Trask ◽  
Susan Allen ◽  
Anne Bergmann ◽  
Mari Christensen ◽  
Anne Fertitta ◽  
...  

Using fluorescence in situ hybridization (FISH), the positions of DNA sequences can be discretely marked with a fluorescent spot. The efficiency of marking DNA sequences of the size cloned in cosmids is 90-95%, and the fluorescent spots produced after FISH are ≈0.3 μm in diameter. Sites of two sequences can be distinguished using two-color FISH. Different reporter molecules, such as biotin or digoxigenin, are incorporated into DNA sequence probes by nick translation. These reporter molecules are labeled after hybridization with different fluorochromes, e.g., FITC and Texas Red. The development of dual band pass filters (Chromatechnology) allows these fluorochromes to be photographed simultaneously without registration shift.


2013 ◽  
Vol 41 (2) ◽  
pp. 548-553 ◽  
Author(s):  
Andrew A. Travers ◽  
Georgi Muskhelishvili

How much information is encoded in the DNA sequence of an organism? We argue that the informational, mechanical and topological properties of DNA are interdependent and act together to specify the primary characteristics of genetic organization and chromatin structures. Superhelicity generated in vivo, in part by the action of DNA translocases, can be transmitted to topologically sensitive regions encoded by less stable DNA sequences.


Sign in / Sign up

Export Citation Format

Share Document