scholarly journals RepeatFiller newly identifies megabases of aligning repetitive sequences and improves annotations of conserved non-exonic elements

2019 ◽  
Author(s):  
Ekaterina Osipova ◽  
Nikolai Hecker ◽  
Michael Hiller

AbstractTransposons and other repetitive sequences make up a large part of complex genomes. Repetitive sequences can be co-opted into a variety of functions and thus provide a source for evolutionary novelty. However, comprehensively detecting ancestral repeats that align between species is difficult since considering all repeat-overlapping seeds in alignment methods that rely on the seed-and-extend heuristic results in prohibitively high runtimes. Here, we show that ignoring repeat-overlapping alignment seeds when aligning entire genomes misses numerous alignments between repetitive elements. We present a tool – RepeatFiller – that improves genome alignments by incorporating previously-undetected local alignments between repetitive sequences. By applying RepeatFiller to genome alignments between human and 20 other representative mammals, we uncover between 22 and 84 megabases of previously-undetected alignments that mostly overlap transposable elements. We further show that the increased alignment coverage improves the annotation of conserved non-exonic elements, both by discovering numerous novel transposon-derived elements that evolve under constraint and by removing thousands of elements that are not under constraint in placental mammals. In conclusion, RepeatFiller contributes to comprehensively aligning repetitive genomic regions, which facilitates studying transposon co-option and genome evolution.Source codehttps://github.com/hillerlab/GenomeAlignmentTools

GigaScience ◽  
2019 ◽  
Vol 8 (11) ◽  
Author(s):  
Ekaterina Osipova ◽  
Nikolai Hecker ◽  
Michael Hiller

Abstract Background Transposons and other repetitive sequences make up a large part of complex genomes. Repetitive sequences can be co-opted into a variety of functions and thus provide a source for evolutionary novelty. However, comprehensively detecting ancestral repeats that align between species is difficult because considering all repeat-overlapping seeds in alignment methods that rely on the seed-and-extend heuristic results in prohibitively high runtimes. Results Here, we show that ignoring repeat-overlapping alignment seeds when aligning entire genomes misses numerous alignments between repetitive elements. We present a tool, RepeatFiller, that improves genome alignments by incorporating previously undetected local alignments between repetitive sequences. By applying RepeatFiller to genome alignments between human and 20 other representative mammals, we uncover between 22 and 84 Mb of previously undetected alignments that mostly overlap transposable elements. We further show that the increased alignment coverage improves the annotation of conserved non-exonic elements, both by discovering numerous novel transposon-derived elements that evolve under constraint and by removing thousands of elements that are not under constraint in placental mammals. Conclusions RepeatFiller contributes to comprehensively aligning repetitive genomic regions, which facilitates studying transposon co-option and genome evolution. Source code: https://github.com/hillerlab/GenomeAlignmentTools


2021 ◽  
Vol 1 (2) ◽  
pp. 1-9
Author(s):  
Ayan Mukherjee

Evolution of vertebrate species took shape through millions of years, where sex played an important role in maintenance of a lineage, genetic diversifications and reproductive isolation. On due course of sexual evolution, sex determination strategies have been proposed to flow from temperature dependent sex determination to genetic sex determination, which has been demonstrated as XY system in mammals and ZW system in birds. In contrary to this established conception, different lineages showed to have overlapping sex determining strategies. While searching possible reasons for these phenomenons, researchers observed that gene content of sex chromosomes is highly variable as far as their location and prevalence is concerned, which otherwise suggested autosomal origin of sex chromosomes. Although the exact mechanisms of gene transfer and thereby origin of sex chromosomes are yet to be unveiled, but chromosomal rearrangement and introgression has been hypothesized to be the possible effector. Transposable elements (TEs) are long been considered to be ‘Selfish’ or ‘Junk’ DNA material as most of the non-coding genomic regions are comprised by TEs, which did not make any sense to be a part of species genome. But recently, TEs are being considered to be a nature’s tool for biological innovation by creating new regulatory elements, new coding sequences, genetic disruption and chromosomal remodelling. So, this has been postulated that TEs could facilitate rearrangement and introgression, which ultimately lead to evolution of sex chromosomes and sex determining genes through positive selection. Prevalence of highly repetitive sequences in sex chromosomes, particularly in Y, makes it a hot bed for TEs mediated rearrangement and introgression. In this review, I tried to discuss whether it makes any sense to focus on the role of TEs in sexual evolution of animals.


2001 ◽  
Vol 11 (4) ◽  
pp. 585-594
Author(s):  
Gernot Glöckner ◽  
Karol Szafranski ◽  
Thomas Winckler ◽  
Theodor Dingermann ◽  
Michael A. Quail ◽  
...  

In the course of determining the sequence of the Dictyostelium discoideum genome we have characterized in detail the quantity and nature of interspersed repetitive elements present in this species. Several of the most abundant small complex repeats and transposons (DIRS-1; TRE3-A,B; TRE5-A; skipper; Tdd-4; H3R) have been described previously. In our analysis we have identified additional elements. Thus, we can now present a complete list of complex repetitive elements in D. discoideum. All elements add up to 10% of the genome. Some of the newly described elements belong to established classes (TRE3-C, D; TRE5-B,C; DGLT-A,P; Tdd-5). However, we have also defined two new classes of DNA transposable elements (DDT and thug) that have not been described thus far. Based on the nucleotide amount, we calculated the least copy number in each family. These vary between <10 up to >200 copies. Unique sequences adjacent to the element ends and truncation points in elements gave a measure for the fragmentation of the elements. Furthermore, we describe the diversity of single elements with regard to polymorphisms and conserved structures. All elements show insertion preference into loci in which other elements of the same family reside. The analysis of the complex repeats is a valuable data resource for the ongoing assembly of whole D. discoideum chromosomes.[The sequence data described in this paper have been submitted to the GenBank data library under accession nos. AF135841, AF298201, AF298202, AF298203, AF298204,AF298205, AF298206, AF298207, AF298208, AF298209, AF298210 and AF298624.]


Genetics ◽  
2002 ◽  
Vol 161 (4) ◽  
pp. 1661-1672 ◽  
Author(s):  
Andrea Pedrosa ◽  
Niels Sandal ◽  
Jens Stougaard ◽  
Dieter Schweizer ◽  
Andreas Bachmair

AbstractLotus japonicus is a model plant for the legume family. To facilitate map-based cloning approaches and genome analysis, we performed an extensive characterization of the chromosome complement of the species. A detailed karyotype of L. japonicus Gifu was built and plasmid and BAC clones, corresponding to genetically mapped markers (see the accompanying article by Sandal  et al. 2002, this issue), were used for FISH to correlate genetic and chromosomal maps. Hybridization of DNA clones from 32 different genomic regions enabled the assignment of linkage groups to chromosomes, the comparison between genetic and physical distances throughout the genome, and the partial characterization of different repetitive sequences, including telomeric and centromeric repeats. Additional analysis of L. filicaulis and its F1 hybrid with L. japonicus demonstrated the occurrence of inversions between these closely related species, suggesting that these chromosome rearrangements are early events in speciation of this group.


2021 ◽  
Vol 22 (1) ◽  
pp. 468
Author(s):  
Klára Konečná ◽  
Pavla Polanská Sováková ◽  
Karin Anteková ◽  
Jiří Fajkus ◽  
Miloslava Fojtová

Involvement of epigenetic mechanisms in the regulation of telomeres and transposable elements (TEs), genomic regions with the protective and potentially detrimental function, respectively, has been frequently studied. Here, we analyzed telomere lengths in Arabidopsis thaliana plants of Columbia, Landsberg erecta and Wassilevskija ecotypes exposed repeatedly to the hypomethylation drug zebularine during germination. Shorter telomeres were detected in plants growing from seedlings germinated in the presence of zebularine with a progression in telomeric phenotype across generations, relatively high inter-individual variability, and diverse responses among ecotypes. Interestingly, the extent of telomere shortening in zebularine Columbia and Wassilevskija plants corresponded to the transcriptional activation of TEs, suggesting a correlated response of these genomic elements to the zebularine treatment. Changes in lengths of telomeres and levels of TE transcripts in leaves were not always correlated with a hypomethylation of cytosines located in these regions, indicating a cytosine methylation-independent level of their regulation. These observations, including differences among ecotypes together with distinct dynamics of the reversal of the disruption of telomere homeostasis and TEs transcriptional activation, reflect a complex involvement of epigenetic processes in the regulation of crucial genomic regions. Our results further demonstrate the ability of plant cells to cope with these changes without a critical loss of the genome stability.


2021 ◽  
pp. gr.275658.121
Author(s):  
Yuyun Zhang ◽  
Zijuan Li ◽  
Yu'e Zhang ◽  
Kande Lin ◽  
Yuan Peng ◽  
...  

More than 80% of the wheat genome consists of transposable elements (TEs), which act as one major driver of wheat genome evolution. However, their contributions to the regulatory evolution of wheat adaptations remain largely unclear. Here, we created genome-binding maps for 53 transcription factors (TFs) underlying environmental responses by leveraging DAP-seq in Triticum urartu, together with epigenomic profiles. Most TF-binding sites (TFBS) located distally from genes are embedded in TEs, whose functional relevance is supported by purifying selection and active epigenomic features. About 24% of the non-TE TFBS share significantly high sequence similarity with TE-embedded TFBS. These non-TE TFBS have almost no homologous sequences in non-Triticeae species and are potentially derived from Triticeae-specific TEs. The expansion of TE-derived TFBS linked to wheat-specific gene responses, suggesting TEs are an important driving force for regulatory innovations. Altogether, TEs have been significantly and continuously shaping regulatory networks related to wheat genome evolution and adaptation.


2020 ◽  
Author(s):  
Xun Zhu ◽  
Ti-Cheng Chang ◽  
Richard Webby ◽  
Gang Wu

AbstractidCOV is a phylogenetic pipeline for quickly identifying the clades of SARS-CoV-2 virus isolates from raw sequencing data based on a selected clade-defining marker list. Using a public dataset, we show that idCOV can make equivalent calls as annotated by Nextstrain.org on all three common clade systems using user uploaded FastQ files directly. Web and equivalent command-line interfaces are available. It can be deployed on any Linux environment, including personal computer, HPC and the cloud. The source code is available at https://github.com/xz-stjude/idcov. A documentation for installation can be found at https://github.com/xz-stjude/idcov/blob/master/README.md.


2018 ◽  
Author(s):  
Mehran Karimzadeh ◽  
Michael M. Hoffman

AbstractMotivationIdentifying transcription factor binding sites is the first step in pinpointing non-coding mutations that disrupt the regulatory function of transcription factors and promote disease. ChIP-seq is the most common method for identifying binding sites, but performing it on patient samples is hampered by the amount of available biological material and the cost of the experiment. Existing methods for computational prediction of regulatory elements primarily predict binding in genomic regions with sequence similarity to known transcription factor sequence preferences. This has limited efficacy since most binding sites do not resemble known transcription factor sequence motifs, and many transcription factors are not even sequence-specific.ResultsWe developed Virtual ChIP-seq, which predicts binding of individual transcription factors in new cell types using an artificial neural network that integrates ChIP-seq results from other cell types and chromatin accessibility data in the new cell type. Virtual ChIP-seq also uses learned associations between gene expression and transcription factor binding at specific genomic regions. This approach outperforms methods that predict TF binding solely based on sequence preference, pre-dicting binding for 36 transcription factors (Matthews correlation coefficient > 0.3).AvailabilityThe datasets we used for training and validation are available at https://virchip.hoffmanlab.org. We have deposited in Zenodo the current version of our software (http://doi.org/10.5281/zenodo.1066928), datasets (http://doi.org/10.5281/zenodo.823297), predictions for 36 transcription factors on Roadmap Epigenomics cell types (http://doi.org/10.5281/zenodo.1455759), and predictions in Cistrome as well as ENCODE-DREAM in vivo TF Binding Site Prediction Challenge (http://doi.org/10.5281/zenodo.1209308).


2021 ◽  
Author(s):  
Matias Rodriguez ◽  
Wojciech Makałowski

AbstractTransposable elements (TEs) are major genomic components in most eukaryotic genomes and play an important role in genome evolution. However, despite their relevance the identification of TEs is not an easy task and a number of tools were developed to tackle this problem. To better understand how they perform, we tested several widely used tools for de novo TE detection and compared their performance on both simulated data and well curated genomic sequences. The results will be helpful for identifying common issues associated with TE-annotation and for evaluating how comparable are the results obtained with different tools.


Author(s):  
Frédéric Lemoine ◽  
Luc Blassel ◽  
Jakub Voznica ◽  
Olivier Gascuel

AbstractMotivationThe first cases of the COVID-19 pandemic emerged in December 2019. Until the end of February 2020, the number of available genomes was below 1,000, and their multiple alignment was easily achieved using standard approaches. Subsequently, the availability of genomes has grown dramatically. Moreover, some genomes are of low quality with sequencing/assembly errors, making accurate re-alignment of all genomes nearly impossible on a daily basis. A more efficient, yet accurate approach was clearly required to pursue all subsequent bioinformatics analyses of this crucial data.ResultshCoV-19 genomes are highly conserved, with very few indels and no recombination. This makes the profile HMM approach particularly well suited to align new genomes, add them to an existing alignment and filter problematic ones. Using a core of ∼2,500 high quality genomes, we estimated a profile using HMMER, and implemented this profile in COVID-Align, a user-friendly interface to be used online or as standalone via Docker. The alignment of 1,000 genomes requires less than 20mn on our cluster. Moreover, COVID-Align provides summary statistics, which can be used to determine the sequencing quality and evolutionary novelty of input genomes (e.g. number of new mutations and indels).Availabilityhttps://covalign.pasteur.cloud, hub.docker.com/r/evolbioinfo/[email protected], [email protected] informationSupplementary information is available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document