scholarly journals Reactivation of transposable elements following hybridization in fission yeast

2021 ◽  
pp. gr.276056.121
Author(s):  
Sergio Tusso ◽  
Fang Suo ◽  
Yue Liang ◽  
Li-Lin Du ◽  
Jochen B.W Wolf

Hybridization is thought to reactivate transposable elements (TEs) that were efficiently suppressed in the genomes of the parental hosts. Here, we provide evidence for this 'genomic shock hypothesis' in the fission yeast Schizosaccharomyces pombe. The species is characterized by divergence of two ancestral lineages (Sp and Sk) which have experienced recent, likely human induced, hybridization. We used long-read sequencing data to assemble genomes of 37 samples derived from 31 S. pombe strains spanning a wide range of ancestral admixture proportions. A comprehensive TE inventory revealed exclusive presence of long terminal repeat (LTR) retrotransposons. In-depth sequence analyses of active full-length elements, as well as solo-LTRs, revealed a complex history of homologous recombination. Population genetic analyses of syntenic sequences placed insertion of many solo-LTRs prior to the split of the Sp and Sk lineages. Most full-length elements were inserted more recently after hybridization. With the exception of a single full-length element with signs of positive selection, both solo-LTRs, and in particular, full-length elements carried signatures of purifying selection indicating effective removal by the host. Consistent with reactivation upon hybridization, the number of full-length LTR retrotransposons, varying extensively from zero to 87 among strains, significantly increased with the degree of genomic admixture. This study gives a detailed account of global TE diversity in S. pombe, documents complex recombination histories within TE elements and provides evidence for the ‘genomic shock hypothesis’ with implications for the role of TEs in adaptation and speciation.

2021 ◽  
Author(s):  
Sergio Tusso ◽  
Fang Suo ◽  
Yue Liang ◽  
Li-Lin Du ◽  
Jochen B.W Wolf

Hybridization is thought to reactivate transposable elements (TEs) that were efficiently suppressed in the genomes of the parental hosts. Here, we provide evidence for this 'genomic shock hypothesis' in the fission yeast Schizosaccharomyces pombe. The species is characterized by divergence of two ancestral lineages (Sp and Sk) which have experienced recent, likely human induced, hybridization. We used long-read sequencing data to assemble genomes of 37 samples derived from 31 S. pombe strains spanning a wide range of ancestral admixture proportions. A comprehensive TE inventory revealed exclusive presence of long terminal repeat (LTR) retrotransposons. In-depth sequence analyses of active full-length elements, as well as solo-LTRs, revealed a complex history of homologous recombination. Population genetic analyses of syntenic sequences placed insertion of many solo-LTRs prior to the split of the Sp and Sk lineages. Most full-length elements were inserted more recently after hybridization. With the exception of a single full-length element with signs of positive selection, both solo-LTRs, and in particular, full-length elements carried signatures of purifying selection indicating effective removal by the host. Consistent with reactivation upon hybridization, the number of full-length LTR retrotransposons, varying extensively from zero to 87 among strains, significantly increased with the degree of genomic admixture. This study provides a detailed account of global TE diversity in S. pombe, documents complex recombination histories within TE elements and provides first evidence for the 'genomic shock hypothesis' in fungi with implications for the role of TEs in adaptation and speciation.


Viruses ◽  
2019 ◽  
Vol 11 (5) ◽  
pp. 421 ◽  
Author(s):  
Min Feng ◽  
Feifei Ren ◽  
Yaohong Zhou ◽  
Nan Zhang ◽  
Qiuyuan Lu ◽  
...  

The published genome sequence of Antheraea yamamai (Saturnnidae) was used to construct a library of long terminal repeat (LTR)-retrotransposons that is representative of the wild silkmoth (Antherea) genus, and that includes 22,666 solo LTRs and 541 full-length LTRs. The LTR retrotransposons of Antheraea yamamai (AyLTRs) could be classified into the three canonical groups of Gypsy, Copia and Belpao. Eleven AyLTRs contained the env gene element, but the relationship with the env element of baculovirus, particularly A. yamamai and pernyi nucleopolyhedrovirus (AyNPV and ApNPV), was distant. A total of 251 “independent” full-length AyLTRs were identified that were located within 100 kb distance (downstream or upstream) of 406 neighboring genes in A. yamamai. Regulation of these genes might occur in cis by the AyLTRs, and the neighboring genes were found to be enriched in GO terms such as “response to stimulus”, and KEGG terms such as “mTOR signaling pathway” among others. Furthermore, the library of LTR-retrotransposons and the A. yamamai genome were used to identify and analyze the expression of LTR-retrotransposons and genes in ApNPV-infected and non-infected A. pernyi larval midguts, using raw data of a published transcriptome study. Our analysis demonstrates that 93 full-length LTR-retrotransposons are transcribed in the midgut of A. pernyi of which 12 significantly change their expression after ApNPV infection (differentially expressed LTR-retrotransposons or DELs). In addition, the expression of differentially expressed genes (DEGs) and neighboring DELs on the chromosome following ApNPV infection suggests the possibility of regulation of expression of DEGs by DELs through a cis mechanism, which will require experimental verification. When examined in more detail, it was found that genes involved in Notch signaling and stress granule (SG) formation were significantly up-regulated in ApNPV-infected A. pernyi larval midgut. Moreover, several DEGs in the Notch and SG pathways were found to be located in the neighborhood of particular DELs, indicating the possibility of DEG-DEL cross-regulation in cis for these two pathways.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Kie Kyon Huang ◽  
Jiawen Huang ◽  
Jeanie Kar Leng Wu ◽  
Minghui Lee ◽  
Su Ting Tay ◽  
...  

Abstract Background Deregulated gene expression is a hallmark of cancer; however, most studies to date have analyzed short-read RNA sequencing data with inherent limitations. Here, we combine PacBio long-read isoform sequencing (Iso-Seq) and Illumina paired-end short-read RNA sequencing to comprehensively survey the transcriptome of gastric cancer (GC), a leading cause of global cancer mortality. Results We performed full-length transcriptome analysis across 10 GC cell lines covering four major GC molecular subtypes (chromosomal unstable, Epstein-Barr positive, genome stable and microsatellite unstable). We identify 60,239 non-redundant full-length transcripts, of which > 66% are novel compared to current transcriptome databases. Novel isoforms are more likely to be cell line and subtype specific, expressed at lower levels with larger number of exons, with longer isoform/coding sequence lengths. Most novel isoforms utilize an alternate first exon, and compared to other alternative splicing categories, are expressed at higher levels and exhibit higher variability. Collectively, we observe alternate promoter usage in 25% of detected genes, with the majority (84.2%) of known/novel promoter pairs exhibiting potential changes in their coding sequences. Mapping these alternate promoters to TCGA GC samples, we identify several cancer-associated isoforms, including novel variants of oncogenes. Tumor-specific transcript isoforms tend to alter protein coding sequences to a larger extent than other isoforms. Analysis of outcome data suggests that novel isoforms may impart additional prognostic information. Conclusions Our results provide a rich resource of full-length transcriptome data for deeper studies of GC and other gastrointestinal malignancies.


2022 ◽  
Vol 23 (1) ◽  
Author(s):  
Ludwig Mann ◽  
Kathrin M. Seibt ◽  
Beatrice Weber ◽  
Tony Heitkam

Abstract Background Extrachromosomal circular DNAs (eccDNAs) are ring-like DNA structures physically separated from the chromosomes with 100 bp to several megabasepairs in size. Apart from carrying tandemly repeated DNA, eccDNAs may also harbor extra copies of genes or recently activated transposable elements. As eccDNAs occur in all eukaryotes investigated so far and likely play roles in stress, cancer, and aging, they have been prime targets in recent research—with their investigation limited by the scarcity of computational tools. Results Here, we present the ECCsplorer, a bioinformatics pipeline to detect eccDNAs in any kind of organism or tissue using next-generation sequencing techniques. Following Illumina-sequencing of amplified circular DNA (circSeq), the ECCsplorer enables an easy and automated discovery of eccDNA candidates. The data analysis encompasses two major procedures: first, read mapping to the reference genome allows the detection of informative read distributions including high coverage, discordant mapping, and split reads. Second, reference-free comparison of read clusters from amplified eccDNA against control sample data reveals specifically enriched DNA circles. Both software parts can be run separately or jointly, depending on the individual aim or data availability. To illustrate the wide applicability of our approach, we analyzed semi-artificial and published circSeq data from the model organisms Homo sapiens and Arabidopsis thaliana, and generated circSeq reads from the non-model crop plant Beta vulgaris. We clearly identified eccDNA candidates from all datasets, with and without reference genomes. The ECCsplorer pipeline specifically detected mitochondrial mini-circles and retrotransposon activation, showcasing the ECCsplorer’s sensitivity and specificity. Conclusion The ECCsplorer (available online at https://github.com/crimBubble/ECCsplorer) is a bioinformatics pipeline to detect eccDNAs in any kind of organism or tissue using next-generation sequencing data. The derived eccDNA targets are valuable for a wide range of downstream investigations—from analysis of cancer-related eccDNAs over organelle genomics to identification of active transposable elements.


2019 ◽  
Author(s):  
Maximilian Krause ◽  
Adnan M. Niazi ◽  
Kornel Labun ◽  
Yamila N. Torres Cleuren ◽  
Florian S. Müller ◽  
...  

Polyadenylation at the 3’-end is a major regulator of messenger RNA and its length is known to affect nuclear export, stability and translation, among others. Only recently, strategies have emerged that allow for genome-wide poly(A) length assessment. These methods identify genes connected to poly(A) tail measurements indirectly by short-read alignment to genetic 3’-ends. Concurrently Oxford Nanopore Technologies (ONT) established full-length isoform RNA sequencing containing the entire poly(A) tail. However, assessing poly(A) length through basecalling has so far not been possible due the inability to resolve long homopolymeric stretches in ONT sequencing.Here we presenttailfindr, an R package to estimate poly(A) tail length on ONT long-read sequencing data.tailfindroperates on unaligned, basecalled data. It measures poly(A) tail length from both native RNA and DNA sequencing, which makes poly(A) tail studies by full-length cDNA approaches possible for the first time. We assesstailfindr’sperformance across different poly(A) lengths, demonstrating thattailfindris a versatile tool providing poly(A) tail estimates across a wide range of sequencing conditions.


2020 ◽  
Vol 10 (6) ◽  
pp. 1829-1836 ◽  
Author(s):  
Graham Wiley ◽  
Matthew J. Miller

Woodpeckers are found in nearly every part of the world and have been important for studies of biogeography, phylogeography, and macroecology. Woodpecker hybrid zones are often studied to understand the dynamics of introgression between bird species. Notably, woodpeckers are gaining attention for their enriched levels of transposable elements (TEs) relative to most other birds. This enrichment of TEs may have substantial effects on molecular evolution. However, comparative studies of woodpecker genomes are hindered by the fact that no high-contiguity genome exists for any woodpecker species. Using hybrid assembly methods combining long-read Oxford Nanopore and short-read Illumina sequencing data, we generated a highly contiguous genome assembly for the Golden-fronted Woodpecker (Melanerpes aurifrons). The final assembly is 1.31 Gb and comprises 441 contigs plus a full mitochondrial genome. Half of the assembly is represented by 28 contigs (contig L50), each of these contigs is at least 16 Mb in size (contig N50). High recovery (92.6%) of bird-specific BUSCO genes suggests our assembly is both relatively complete and relatively accurate. Over a quarter (25.8%) of the genome consists of repetitive elements, with 287 Mb (21.9%) of those elements assignable to the CR1 superfamily of transposable elements, the highest proportion of CR1 repeats reported for any bird genome to date. Our assembly should improve comparative studies of molecular evolution and genomics in woodpeckers and allies. Additionally, the sequencing and bioinformatic resources used to generate this assembly were relatively low-cost and should provide a direction for development of high-quality genomes for studies of animal biodiversity.


2019 ◽  
Author(s):  
Thu-Phuong Nguyen ◽  
Cornelia Mühlich ◽  
Setareh Mohammadin ◽  
Erik van den Bergh ◽  
Adrian E. Platts ◽  
...  

AbstractBackgroundThe genus Aethionema is a sister-group to the core-group of the Brassicaceae family that includes Arabidopsis thaliana and the Brassica crops. Thus, Aethionema is phylogenetically well-placed for the investigation and understanding of genome and trait evolution across the family. We aimed to improve the quality of the reference genome draft version of the annual species Aethionema arabicum. Secondly, we constructed the first Ae. arabicum genetic map. The improved reference genome and genetic map enabled the development of each other.ResultsWe started with the initially published genome (version 2.5). PacBio and MinION sequencing together with genetic map v2.5 were incorporated to produce the new reference genome v3.0. The improved genome contains 203 MB of sequence, with approximately 94% of the assembly made up of called bases, assembled into 2,883 scaffolds. The N50 (10.3 MB) represents an 80-fold over the initial genome release. We generated a Recombinant Inbred Line (RIL) population that was derived from two ecotypes: Cyprus and Turkey (the reference genotype. Using a Genotyping by Sequencing (GBS) approach, we generated a high-density genetic map with 749 (v2.5) and then 632 SNPs (v3.0) was generated. The genetic map and reference genome were integrated, thus greatly improving the scaffolding of the reference genome into 11 linkage groups.ConclusionsWe show that long-read sequencing data and genetics are complementary, resulting in an improved genome assembly in Ae. arabicum. They will facilitate comparative genetic mapping work for the Brassicaceae family and are also valuable resources to investigate wide range of life history traits in Aethionema.


2020 ◽  
Author(s):  
Danilo Pereira ◽  
Ursula Oggenfuss ◽  
Bruce A. McDonald ◽  
Daniel Croll

AbstractThe activity of transposable elements (TEs) can be an important driver of genetic diversity with TE-mediated mutations having a wide range of fitness consequences. To avoid deleterious effects of TE activity, some fungi evolved highly sophisticated genomic defences to reduce TE proliferation across the genome. Repeat-induced point (RIP) mutations is a fungal-specific TE defence mechanism efficiently targeting duplicated sequences. The rapid accumulation of RIP mutations is expected to deactivate TEs over the course of a few generations. The evolutionary dynamics of TEs at the population level in a species with highly repressive genome defences is poorly understood. Here, we analyze 366 whole-genome sequences of Parastagonospora nodorum, a fungal pathogen of wheat with efficient RIP. A global population genomics analysis revealed high levels of genetic diversity and signs of frequent sexual recombination. Contrary to expectations for a species with RIP, we identified recent TE activity in multiple populations. The TE composition and copy numbers showed little divergence among global populations regardless of the demographic history. Miniature inverted-repeat transposable elements (MITEs) and terminal repeat retrotransposons in miniature (TRIMs) were largely underlying recent intra-species TE expansions. We inferred RIP footprints in individual TE families and found that recently active, high-copy TEs have possibly evaded genomic defences. We find no evidence that recent positive selection acted on TE-mediated mutations rather that purifying selection maintained new TE insertions at low insertion frequencies in populations. Our findings highlight the complex evolutionary equilibria established by the joint action of TE activity, selection and genomic repression.Data SummaryAll Illumina sequence data is available from the NCBI SRA BioProject numbers PRJNA606320, PRJNA398070 and PRJNA476481 (https://www.ncbi.nlm.nih.gov/bioproject). The Methods and Supplementary Figures S1-S11 and Supplementary Tables S1-S4 provide all information on strain locations and outcomes of genome analyses.


2017 ◽  
Author(s):  
Kemal Eren ◽  
Steven Weaver ◽  
Robert Ketteringham ◽  
Morné Valentyn ◽  
Melissa Laird Smith ◽  
...  

AbstractNext generation sequencing of viral populations has advanced our understanding of viral population dynamics, the development of drug resistance, and escape from host immune responses. Many applications require complete gene sequences, which can be impossible to reconstruct from short reads. HIV-1 env, the protein of interest for HIV vaccine studies, is exceptionally challenging for long-read sequencing and analysis due to its length, high substitution rate, and extensive indel variation. While long-read sequencing is attractive in this setting, the analysis of such data is not well handled by existing methods. To address this, we introduce FLEA (Full-Length Envelope Analyzer), which performs end-to-end analysis and visualization of long-read sequencing data.FLEA consists of both a pipeline (optionally run on a high-performance cluster), and a client-side web application that provides interactive results. The pipeline transforms FASTQ reads into high-quality consensus sequences (HQCSs) and uses them to build a codon-aware multiple sequence alignment. The resulting alignment is then used to infer phylogenies, selection pressure, and evolutionary dynamics. The web application provides publication-quality plots and interactive visualizations, including an annotated viral alignment browser, time series plots of evolutionary dynamics, visualizations of gene-wide selective pressures (such as dN /dS) across time and across protein structure, and a phylogenetic tree browser.We demonstrate how FLEA may be used to process Pacific Biosciences HIV-1 env data and describe recent examples of its use. Simulations show how FLEA dramatically reduces the error rate of this sequencing platform, providing an accurate portrait of complex and variable HIV-1 env populations.A public instance of FLEA is hosted at http://flea.datamonkey.org. The Python source code for the FLEA pipeline can be found at https://github.com/veg/flea-pipeline. The client-side application is available at https://github.com/veg/flea-web-app. A live demo of the P018 results can be found at http://flea.murrell.group/view/P018.


2021 ◽  
Author(s):  
Ning Wang ◽  
Vladislav Lysenkov ◽  
Katri Orte ◽  
Veli Kairisto ◽  
Juhani Aakko ◽  
...  

Insertions and deletions (indels) in human genomes are associated with a wide range of phenotypes, including various clinical disorders. High-throughput, next generation sequencing (NGS) technologies enable detection of short genetic variants, such as single nucleotide variants (SNVs) and indels. However, the variant calling accuracy for indels remains considerably lower than for SNVs. Here we present a comparative study of the performance of variant calling tools on indel calling, evaluated with a wide repertoire of NGS datasets. While there is no single optimal tool to suit all circumstances, our results demonstrate that the choice of variant calling tool greatly impacts the precision and recall of indel calling. Furthermore, to reliably detect indels, it is essential to choose NGS technologies that offer a long read length and high coverage, coupled with specific variant calling tools.


Sign in / Sign up

Export Citation Format

Share Document