Involving repetitive regions in scaffolding improvement

Author(s):  
Quentin Delorme ◽  
Rémy Costa ◽  
Yasmine Mansour ◽  
Anna-Sophie Fiston-Lavier ◽  
Annie Chateau

In this paper, we investigate througth a premilinary study the influence of repeat elements during the assembly process. We analyze the link between the presence and the nature of one type of repeat element, called transposable element (TE) and misassembly events in genome assemblies. We propose to improve assemblies by taking into account the presence of repeat elements, including TEs, during the scaffolding step. We analyze the results and relate the misassemblies to TEs before and after correction.

2021 ◽  
Author(s):  
Megumi Onishi-Seebacher ◽  
Galina Erikson ◽  
Zoe Sawitzki ◽  
Devon Ryan ◽  
Gabriele Greve ◽  
...  

Abstract BackgroundRepeat elements constitute a large proportion of the human genome and recent evidence indicates that repeat element expression has functional roles in both physiological and pathological states. Specifically for cancer, transcription of endogenous retrotransposons is often suppressed in order to attenuate an anti-tumor immune response, whereas aberrant expression of heterochromatin-derived satellite RNA has been identified as a tumor driver. These insights demonstrate separate functions for the dysregulation of distinct repeat subclasses in either the attenuation or progression of human solid tumors. For hematopoietic malignancies, such as Acute Myeloid Leukemia (AML), only very few studies on the expression/dysregulation of repeat elements were done. MethodsTo study the expression of repeat elements in AML, we performed total-RNA sequencing of healthy CD34+ cells and of leukemic blast cells from primary AML patient material. We also developed an integrative bioinformatic approach that can quantify the expression of repeat transcripts from all repeat subclasses (SINE/ALU, LINE and ERV elements and satellite repeats) in relation to the expression of gene and other non-repeat transcripts. This novel approach can be used as an instructive signature (R/G ratio) for repeat element expression and has been extended to the analysis of poly(A)-RNA sequencing datasets from Blueprint and TCGA consortia that together comprise 120 AML patient samples. ResultsWe identified that repeat element expression is generally down-regulated during hematopoietic differentiation and that relative changes in repeat to gene expression (i.e. R/G ratios) can stratify risk prediction of AML patients and correlate with overall survival probabilities. A high repeat to gene expression ratio identifies AML patient subgroups with a favorable prognosis, whereas a low repeat to gene expression is prevalent in AML patient subgroups with a poor prognosis. ConclusionsWe developed an integrative bioinformatic approach that defines a general model for the analysis of repeat element dysregulation in physiological and pathological development. We find that changes in repeat to gene expression (R/G ratios) correlate with hematopoietic differentiation and can sub-stratify AML patients into low-risk and high-risk subgroups. Thus, the definition of a R/G ratio can serve as a valuable biomarker for AML and could also provide insights into differential patient response to epigenetic drug treatment.


2014 ◽  
Author(s):  
Darshan S Chandrashekar ◽  
Poulami Dey ◽  
Kshitish K Acharya

Background: Understanding the mechanism behind the transcriptional regulation of genes is still a challenge. Recent findings indicate that the genomic repeat elements (such as LINES, SINES and LTRs) could play an important role in the transcription control. Hence, it is important to further explore the role of genomic repeat elements in the gene expression regulation, and perhaps in other molecular processes. Although many computational tools exists for repeat element analysis, almost all of them simply identify and/or classifying the genomic repeat elements within query sequence(s); none of them facilitate identification of repeat elements that are likely to have a functional significance, particularly in the context of transcriptional regulation. Result: We developed the 'Genomic Repeat Element Analyzer for Mammals' (GREAM) to allow gene-centric analysis of genomic repeat elements in 17 mammalian species, and validated it by comparing with some of the existing experimental data. The output provides a categorized list of the specific type of transposons, retro-transposons and other genome-wide repeat elements that are statistically over-represented across specific neighborhood regions of query genes. The position and frequency of these elements, within the specified regions, are displayed as well. The tool also offers queries for position-specific distribution of repeat elements within chromosomes. In addition, GREAM facilitates the analysis of repeat element distribution across the neighborhood of orthologous genes. Conclusion: GREAM allows researchers to short-list the potentially important repeat elements, from the genomic neighborhood of genes, for further experimental analysis. GREAM is free and available for all at http://resource.ibab.ac.in/GREAM/


2016 ◽  
Vol 8 (2) ◽  
pp. 403-410 ◽  
Author(s):  
Roy N. Platt ◽  
Laura Blanco-Berdugo ◽  
David A. Ray

Author(s):  
Dhawal Jain ◽  
Chong Chu ◽  
Burak Han Alver ◽  
Soohyun Lee ◽  
Eunjung Alice Lee ◽  
...  

ABSTRACT   Hi-C is a common technique for assessing 3D chromatin conformation. Recent studies have shown that long-range interaction information in Hi-C data can be used to generate chromosome-length genome assemblies and identify large-scale structural variations. Here, we demonstrate the use of Hi-C data in detecting mobile transposable element (TE) insertions genome-wide. Our pipeline Hi-C-based TE analyzer (HiTea) capitalizes on clipped Hi-C reads and is aided by a high proportion of discordant read pairs in Hi-C data to detect insertions of three major families of active human TEs. Despite the uneven genome coverage in Hi-C data, HiTea is competitive with the existing callers based on whole-genome sequencing (WGS) data and can supplement the WGS-based characterization of the TE-insertion landscape. We employ the pipeline to identify TE-insertions from human cell-line Hi-C samples. Availability and implementation HiTea is available at https://github.com/parklab/HiTea and as a Docker image. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Author(s):  
Dhawal Jain ◽  
Chong Chu ◽  
Burak Han Alver ◽  
Soohyun Lee ◽  
Eunjung Alice Lee ◽  
...  

AbstractHi-C is a common technique for assessing three-dimensional chromatin conformation. Recent studies have shown that long-range interaction information in Hi-C data can be used to generate chromosome-length genome assemblies and identify large-scale structural variations. Here, we demonstrate the use of Hi-C data in detecting mobile transposable element (TE) insertions genome-wide. Our pipeline HiTea (Hi-C based Transposable element analyzer) capitalizes on clipped Hi-C reads and is aided by a high proportion of discordant read pairs in Hi-C data to detect insertions of three major families of active human TEs. Despite the uneven genome coverage in Hi-C data, HiTea is competitive with the existing callers based on whole genome sequencing (WGS) data and can supplement the WGS-based characterization of the TE insertion landscape. We employ the pipeline to identify TE insertions from human cell-line Hi-C samples. HiTea is available at https://github.com/parklab/HiTea and as a Docker image.


2021 ◽  
Author(s):  
Igor Filipović ◽  
Gordana Rašić ◽  
James Hereward ◽  
Maria Gharuka ◽  
Gregor J Devine ◽  
...  

Background: An optimal starting point for relating genome function to organismal biology is a high-quality nuclear genome assembly, and long-read sequencing is revolutionizing the production of this genomic resource in insects. Despite this, nuclear genome assemblies have been under-represented for agricultural insect pests, particularly from the order Coleoptera. Here we present a de novo genome assembly and structural annotation for the coconut rhinoceros beetle, Oryctes rhinoceros (Coleoptera: Scarabaeidae), based on Oxford Nanopore Technologies (ONT) long-read data generated from a wild-caught female, as well as the assembly process that also led to the recovery of the complete circular genome assemblies of the beetle's mitochondrial genome and that of the biocontrol agent, Oryctes rhinoceros nudivirus (OrNV). As an invasive pest of palm trees, O. rhinoceros is undergoing an expansion in its range across the Pacific Islands, requiring new approaches to management that may include strategies facilitated by genome assembly and annotation. Results: High-quality DNA isolated from an adult female was used to create four ONT libraries that were sequenced using four MinION flow cells, producing a total of 27.2 Gb of high-quality long-read sequences. We employed an iterative assembly process and polishing with one lane of high-accuracy Illumina reads, obtaining a final size of the assembly of 377.36 Mb that had high contiguity (fragment N50 length = 12 Mb) and accuracy, as evidenced by the exceptionally high completeness of the benchmarked set of conserved single-copy orthologous genes (BUSCO completeness = 99.11%). These quality metrics place our assembly as the most complete of the published Coleopteran genomes. The structural annotation of the nuclear genome assembly contained a highly-accurate set of 16,371 protein-coding genes showing BUSCO completeness of 92.09%, as well as the expected number of non-coding RNAs and the number and structure of paralogous genes in a gene family like Sigma GST. Conclusions: The genomic resources produced in this study form a foundation for further functional genetic research and management programs that may inform the control and surveillance of O. rhinoceros populations, and we demonstrate the efficacy of de novo genome assembly using long-read ONT data from a single field-caught insect.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Jiacheng Zhou ◽  
Chuzhe Zhang ◽  
Ziqiu Wang ◽  
Kuanmin Mao ◽  
Xiaoyu Wang

In this work, the influences of constraint modes and the number of disc springs on the dynamic characteristics of the disc spring system are studied by simulation and experiment. The amplitudes and amplification factors of the disc spring system under different constraint modes and different numbers of disc springs are obtained. The results show that the maximum amplitude and amplification factor both appear at the constraint modes of locking and no preloading, which indicates that the locking and no preloading is the best constraint mode among the four different constraint modes. Moreover, the amplitude of the disc spring system first increases and then decreases with the number of disc springs increasing, while the amplification factor increases with the number of disc springs increasing. The maximum amplification factor (10.21 in experiment) of the disc spring system appears at 10 disc springs. By studying the relationship between the number of disc springs and amplification factor and damping, we find that the damping of the disc spring system can be reduced by increasing the disc spring numbers, and thus, the corresponding amplification factor can be improved. Furthermore, as the number of disc spring increases, the height differences of disc springs before and after locking are all close to 3 mm, which indicates that the amount of locking compression in the assembly process has a good consistency when the number of disc springs changes. The aforementioned works can provide guidance for the industrial production in screen vibration.


2016 ◽  
Author(s):  
Daniel Mapleson ◽  
Gonzalo Garcia Accinelli ◽  
George Kettleborough ◽  
Jonathan Wright ◽  
Bernardo J. Clavijo

ABSTRACTMotivationDe novo assembly of whole genome shotgun (WGS) next-generation sequencing (NGS) data benefits from high-quality input with high coverage. However, in practice, determining the quality and quantity of useful reads quickly and in a reference-free manner is not trivial. Gaining a better understanding of the WGS data, and how that data is utilised by assemblers, provides useful insights that can inform the assembly process and result in better assemblies.ResultsWe present the K-mer Analysis Toolkit (KAT): a multi-purpose software toolkit for reference-free quality control (QC) of WGS reads and de novo genome assemblies, primarily via their k-mer frequencies and GC composition. KAT enables users to assess levels of errors, bias and contamination at various stages of the assembly process. In this paper we highlight KAT’s ability to provide valuable insights into assembly composition and quality of genome assemblies through pairwise comparison of k-mers present in both input reads and the assemblies.AvailabilityKAT is available under the GPLv3 license at: https://github.com/TGAC/[email protected] InformationSupplementary Information (SI) is available at Bioinformatics online. In addition, the software documentation is available online at: http://kat.readthedocs.io/en/latest/.


2020 ◽  
Author(s):  
Megumi Onishi-Seebacher ◽  
Zoe Sawitzki ◽  
Devon Ryan ◽  
Galina Erikson ◽  
Gabriele Greve ◽  
...  

Abstract BackgroundRepeat elements constitute a large proportion of the human genome and recent evidence indicates that repeat element expression has functional roles in both physiological and pathological states. Specifically for cancer, transcription of endogenous retrotransposons is often suppressed in order to attenuate an anti-tumor immune response, whereas aberrant expression of heterochromatin-derived satellite RNA has been identified as a tumor driver. These insights demonstrate separate functions for the dysregulation of distinct repeat subclasses in either the attenuation or progression of human solid tumors. For hematopoietic malignancies, such as AML, only very few studies on the expression/dysregulation of repeat elements were done. MethodsTo study the expression of repeat elements in acute myeloid leukemia (AML), we performed total-RNA sequencing of healthy CD34+ cells and of leukemic blast cells from primary AML patient material. We also developed an integrative bioinformatic approach that can quantify the expression of repeat transcripts from all repeat subclasses (SINE/ALU, LINE and ERV elements and satellite repeats) in relation to the expression of gene and other non-repeat transcripts. This novel approach can be used as an instructive signature (‘rep/gene’ ratio) for repeat element expression and has been extended to the analysis of poly(A)-RNA sequencing datasets from Blueprint and TCGA consortia that together comprise 120 AML patient samples. ResultsWe identified that repeat element expression is generally down-regulated during hematopoietic differentiation and that relative changes in repeat to gene expression (i.e. ‘rep/gene’ ratios) can stratify risk prediction of AML patients and correlate with overall survival probabilities. A high repeat to gene expression ratio identifies AML patient subgroups with a favorable prognosis, whereas a low repeat to gene expression is prevalent in AML patient subgroups with a poor prognosis. ConclusionsWe developed an integrative bioinformatic approach that defines a general model for the analysis of repeat element dysregulation in physiological and pathological development. We find that changes in repeat to gene expression (‘rep/gene’ ratios) correlate with hematopoietic differentiation and can sub-stratify AML patients into low-risk and high-risk subgroups. Thus, the definition of a ‘rep/gene’ expression ratio can serve as a valuable biomarker for AML and could also provide insights into differential patient response to epigenetic drug treatment.


2021 ◽  
Vol 14 (1) ◽  
Author(s):  
M. Onishi-Seebacher ◽  
G. Erikson ◽  
Z. Sawitzki ◽  
D. Ryan ◽  
G. Greve ◽  
...  

Abstract Background Repeat elements constitute a large proportion of the human genome and recent evidence indicates that repeat element expression has functional roles in both physiological and pathological states. Specifically for cancer, transcription of endogenous retrotransposons is often suppressed to attenuate an anti-tumor immune response, whereas aberrant expression of heterochromatin-derived satellite RNA has been identified as a tumor driver. These insights demonstrate separate functions for the dysregulation of distinct repeat subclasses in either the attenuation or progression of human solid tumors. For hematopoietic malignancies, such as Acute Myeloid Leukemia (AML), only very few studies on the expression/dysregulation of repeat elements were done. Methods To study the expression of repeat elements in AML, we performed total-RNA sequencing of healthy CD34 + cells and of leukemic blast cells from primary AML patient material. We also developed an integrative bioinformatic approach that can quantify the expression of repeat transcripts from all repeat subclasses (SINE/ALU, LINE, ERV and satellites) in relation to the expression of gene and other non-repeat transcripts (i.e. R/G ratio). This novel approach can be used as an instructive signature for repeat element expression and has been extended to the analysis of poly(A)-RNA sequencing datasets from Blueprint and TCGA consortia that together comprise 120 AML patient samples. Results We identified that repeat element expression is generally down-regulated during hematopoietic differentiation and that relative changes in repeat to gene expression can stratify risk prediction of AML patients and correlate with overall survival probabilities. A high R/G ratio identifies AML patient subgroups with a favorable prognosis, whereas a low R/G ratio is prevalent in AML patient subgroups with a poor prognosis. Conclusions We developed an integrative bioinformatic approach that defines a general model for the analysis of repeat element dysregulation in physiological and pathological development. We find that changes in repeat to gene expression (i.e. R/G ratios) correlate with hematopoietic differentiation and can sub-stratify AML patients into low-risk and high-risk subgroups. Thus, the definition of a R/G ratio can serve as a valuable biomarker for AML and could also provide insights into differential patient response to epigenetic drug treatment.


Sign in / Sign up

Export Citation Format

Share Document