dispersed repeats
Recently Published Documents


TOTAL DOCUMENTS

39
(FIVE YEARS 9)

H-INDEX

15
(FIVE YEARS 2)

2021 ◽  
Vol 119 (1) ◽  
pp. e2113075119
Author(s):  
Baoxing Song ◽  
Santiago Marco-Sola ◽  
Miquel Moreto ◽  
Lynn Johnson ◽  
Edward S. Buckler ◽  
...  

Millions of species are currently being sequenced, and their genomes are being compared. Many of them have more complex genomes than model systems and raise novel challenges for genome alignment. Widely used local alignment strategies often produce limited or incongruous results when applied to genomes with dispersed repeats, long indels, and highly diverse sequences. Moreover, alignment using many-to-many or reciprocal best hit approaches conflicts with well-studied patterns between species with different rounds of whole-genome duplication. Here, we introduce Anchored Wavefront alignment (AnchorWave), which performs whole-genome duplication–informed collinear anchor identification between genomes and performs base pair–resolved global alignment for collinear blocks using a two-piece affine gap cost strategy. This strategy enables AnchorWave to precisely identify multikilobase indels generated by transposable element (TE) presence/absence variants (PAVs). When aligning two maize genomes, AnchorWave successfully recalled 87% of previously reported TE PAVs. By contrast, other genome alignment tools showed low power for TE PAV recall. AnchorWave precisely aligns up to three times more of the genome as position matches or indels than the closest competitive approach when comparing diverse genomes. Moreover, AnchorWave recalls transcription factor–binding sites at a rate of 1.05- to 74.85-fold higher than other tools with significantly lower false-positive alignments. AnchorWave complements available genome alignment tools by showing obvious improvement when applied to genomes with dispersed repeats, active TEs, high sequence diversity, and whole-genome duplication variation.


2021 ◽  
Author(s):  
Baoxing Song ◽  
Santiago Marco-Sola ◽  
Miquel Moreto ◽  
Lynn Johnson ◽  
Edward S. Buckler ◽  
...  

Millions of species are currently being sequenced and their genomes are being compared. Many of them have more complex genomes than model systems and raised novel challenges for genome alignment. Widely used local alignment strategies often produce limited or incongruous results when applied to genomes with dispersed repeats, long indels, and highly diverse sequences. Moreover, alignment using many-to-many or reciprocal best hit approaches conflicts with well-studied patterns between species with different rounds of whole-genome duplication or polyploidy levels. Here we introduce AnchorWave, which performs whole-genome duplication informed collinear anchor identification between genomes and performs base-pair resolution global alignments for collinear blocks using the wavefront algorithm and a 2-piece affine gap cost strategy. This strategy enables AnchorWave to precisely identify multi-kilobase indels generated by transposable element (TE) presence/absence variants (PAVs). When aligning two maize genomes, AnchorWave successfully recalled 87% of previously reported TE PAVs between two maize lines. By contrast, other genome alignment tools showed almost zero power for TE PAV recall. AnchorWave precisely aligns up to three times more of the genome than the closest competitive approach, when comparing diverse genomes. Moreover, AnchorWave recalls transcription factor binding sites (TFBSs) at a rate of 1.05-74.85 fold higher than other tools, while with significantly lower false positive alignments. AnchorWave shows obvious improvement when applied to genomes with dispersed repeats, active transposable elements, high sequence diversity and whole-genome duplication variation.


Genes ◽  
2021 ◽  
Vol 12 (4) ◽  
pp. 473
Author(s):  
Eugene V. Korotkov ◽  
Anastasiya M. Kamionskya ◽  
Maria A. Korotkova

Currently, there is a lack of bioinformatics approaches to identify highly divergent tandem repeats (TRs) in eukaryotic genomes. Here, we developed a new mathematical method to search for TRs, which uses a novel algorithm for constructing multiple alignments based on the generation of random position weight matrices (RPWMs), and applied it to detect TRs of 2 to 50 nucleotides long in the rice genome. The RPWM method could find highly divergent TRs in the presence of insertions or deletions. Comparison of the RPWM algorithm with the other methods of TR identification showed that RPWM could detect TRs in which the average number of base substitutions per nucleotide (x) was between 1.5 and 3.2, whereas T-REKS and TRF methods could not detect divergent TRs with x > 1.5. Applied to the search of TRs in the rice genome, the RPWM method revealed that TRs occupied 5% of the genome and that most of them were 2 and 3 bases long. Using RPWM, we also revealed the correlation of TRs with dispersed repeats and transposons, suggesting that some transposons originated from TRs. Thus, the novel RPWM algorithm is an effective tool to search for highly divergent TRs in the genomes.


2021 ◽  
Vol 11 ◽  
Author(s):  
Xiaodong Xu ◽  
Dong Wang

The chloroplast genome (plastome) of angiosperms (particularly photosynthetic members) is generally highly conserved, although structural rearrangements have been reported in a few lineages. In this study, we revealed Corydalis to be another unusual lineage with extensive large-scale plastome rearrangements. In the four newly sequenced Corydalis plastomes that represent all the three subgenera of Corydalis, we detected (1) two independent relocations of the same five genes (trnV-UAC-rbcL) from the typically posterior part of the large single-copy (LSC) region to the front, downstream of either the atpH gene in Corydalis saxicola or the trnK-UUU gene in both Corydalis davidii and Corydalis hsiaowutaishanensis; (2) relocation of the rps16 gene from the LSC region to the inverted repeat (IR) region in Corydalis adunca; (3) uniform inversion of an 11–14 kb segment (ndhB-trnR-ACG) in the IR region of all the four Corydalis species (the same below); (4) expansions (>10 kb) of IR into the small single-copy (SSC) region and corresponding contractions of SSC region; and (5) extensive pseudogenizations or losses of 13 genes (accD, clpP, and 11 ndh genes). In addition, we also found that the four Corydalis plastomes exhibited elevated GC content in both gene and intergenic regions and high number of dispersed repeats. Phylogenomic analyses generated a well-supported topology that was consistent with the result of previous studies based on a few DNA markers but contradicted with the morphological character-based taxonomy to some extent. This study provided insights into the evolution of plastomes throughout the three Corydalis subgenera and will be of value for further study on taxonomy, phylogeny, and evolution of Corydalis.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Shanshan Liu ◽  
Zhen Wang ◽  
Yingjuan Su ◽  
Ting Wang

Abstract Background Comparative chloroplast genomics could shed light on the major evolutionary events that established plastomic diversity among closely related species. The Polypodiaceae family is one of the most species-rich and underexplored groups of extant ferns. It is generally recognized that the plastomes of Polypodiaceae are highly notable in terms of their organizational stability. Hence, no research has yet been conducted on genomic structural variation in the Polypodiaceae. Results The complete plastome sequences of Neolepisorus fortunei, Neolepisorus ovatus, and Phymatosorus cuspidatus were determined based on next-generation sequencing. Together with published plastomes, a comparative analysis of the fine structure of Polypodiaceae plastomes was carried out. The results indicated that the plastomes of Polypodiaceae are not as conservative as previously assumed. The size of the plastomes varies greatly in the Polypodiaceae, and the large insertion fragments present in the genome could be the main factor affecting the genome length. The plastome of Selliguea yakushimensis exhibits prominent features including not only a large-scale IR expansion exceeding several kb but also a unique inversion. Furthermore, gene contents, SSRs, dispersed repeats, and mutational hotspot regions were identified in the plastomes of the Polypodiaceae. Although dispersed repeats are not abundant in the plastomes of Polypodiaceae, we found that the large insertions that occur in different species are mobile and are always adjacent to repeated hotspot regions. Conclusions Our results reveal that the plastomes of Polypodiaceae are dynamic molecules, rather than constituting static genomes as previously thought. The dispersed repeats flanking insertion sequences contribute to the repair mechanism induced by double-strand breaks and are probably a major driver of structural evolution in the plastomes of Polypodiaceae.


Genes ◽  
2019 ◽  
Vol 10 (12) ◽  
pp. 1014 ◽  
Author(s):  
Ana Paço ◽  
Renata Freitas ◽  
Ana Vieira-da-Silva

Eukaryotic genomes are rich in repetitive DNA sequences grouped in two classes regarding their genomic organization: tandem repeats and dispersed repeats. In tandem repeats, copies of a short DNA sequence are positioned one after another within the genome, while in dispersed repeats, these copies are randomly distributed. In this review we provide evidence that both tandem and dispersed repeats can have a similar organization, which leads us to suggest an update to their classification based on the sequence features, concretely regarding the presence or absence of retrotransposons/transposon specific domains. In addition, we analyze several studies that show that a repetitive element can be remodeled into repetitive non-coding or coding sequences, suggesting (1) an evolutionary relationship among DNA sequences, and (2) that the evolution of the genomes involved frequent repetitive sequence reshuffling, a process that we have designated as a “DNA remodeling mechanism”. The alternative classification of the repetitive DNA sequences here proposed will provide a novel theoretical framework that recognizes the importance of DNA remodeling for the evolution and plasticity of eukaryotic genomes.


2019 ◽  
Vol 47 (W1) ◽  
pp. W65-W73 ◽  
Author(s):  
Linchun Shi ◽  
Haimei Chen ◽  
Mei Jiang ◽  
Liqiang Wang ◽  
Xi Wu ◽  
...  

AbstractWe previously developed a web server CPGAVAS for annotation, visualization and GenBank submission of plastome sequences. Here, we upgrade the server into CPGAVAS2 to address the following challenges: (i) inaccurate annotation in the reference sequence likely causing the propagation of errors; (ii) difficulty in the annotation of small exons of genes petB, petD and rps16 and trans-splicing gene rps12; (iii) lack of annotation for other genome features and their visualization, such as repeat elements; and (iv) lack of modules for diversity analysis of plastomes. In particular, CPGAVAS2 provides two reference datasets for plastome annotation. The first dataset contains 43 plastomes whose annotation have been validated or corrected by RNA-seq data. The second one contains 2544 plastomes curated with sequence alignment. Two new algorithms are also implemented to correctly annotate small exons and trans-splicing genes. Tandem and dispersed repeats are identified, whose results are displayed on a circular map together with the annotated genes. DNA-seq and RNA-seq data can be uploaded for identification of single-nucleotide polymorphism sites and RNA-editing sites. The results of two case studies show that CPGAVAS2 annotates better than several other servers. CPGAVAS2 will likely become an indispensible tool for plastome research and can be accessed from http://www.herbalgenomics.org/cpgavas2.


2018 ◽  
Author(s):  
Hong-Rui Zhang ◽  
Xian-Chun Zhang ◽  
Qiao-Ping Xiang

AbstractBackgroundIt is hypothesized that the highly conserved inverted repeat (IR) structure of land plant plastid genomes (plastomes) is beneficial for stabilizing plastome organizations, whereas the mechanism of the occurrence and stability maintenance of the newly reported direct repeats (DR) structure was yet awaiting further exploration. Here we introduced the DR structure of plastome in Selaginella vardei (Selaginellaceae, Lycophyta), trying to elucidate the mechanism of DR occurrence and stability maintenance.ResultsThe plastome of S. vardei is 121,254 bp in length and encodes 76 different genes, of which 62 encode proteins, 10 encode tRNAs and four encode rRNAs. Unexpectedly, the two identical rRNA gene regions (13,893 bp) are arranged into DR, and a ca. 50-kb trnN-trnF inversion spanning one DR copy exists in S. vardei, comparing to the typical IR organization of Isoetes flaccida (Isoetaceae, Lycophyta). We find extremely rare short dispersed repeats (SDRs) in plastome of S. vardei and is confirmed in its closely related species S. indica. The occurrence time of DR in Selaginellaceae is estimated at late Triassic (ca. 215 Ma) based on the phylogenetic framework of land plants.ConclusionsWe propose that the unconventional DR structure, co-occurred with extremely few SDRs, plays key role in maintaining the stability of plastome, and reflects a relic of the environmental upheaval during extinction event. We suggest that the ca. 50-kb inversion resulted in the DR structure, and recombination between DR regions is confirmed to generate multipartite subgenomes and diverse multimers, which shed lights on the diverse structures in plastome of land plants.


Sign in / Sign up

Export Citation Format

Share Document