scholarly journals T-lex3: an accurate tool to genotype and estimate population frequencies of transposable elements using the latest short-read whole genome sequencing data

Author(s):  
María Bogaerts-Márquez ◽  
Maite G Barrón ◽  
Anna-Sophie Fiston-Lavier ◽  
Pol Vendrell-Mir ◽  
Raúl Castanera ◽  
...  

Abstract Motivation Transposable elements (TEs) constitute a significant proportion of the majority of genomes sequenced to date. TEs are responsible for a considerable fraction of the genetic variation within and among species. Accurate genotyping of TEs in genomes is therefore crucial for a complete identification of the genetic differences among individuals, populations and species. Results In this work, we present a new version of T-lex, a computational pipeline that accurately genotypes and estimates the population frequencies of reference TE insertions using short-read high-throughput sequencing data. In this new version, we have re-designed the T-lex algorithm to integrate the BWA-MEM short-read aligner, which is one of the most accurate short-read mappers and can be launched on longer short-reads (e.g. reads >150 bp). We have added new filtering steps to increase the accuracy of the genotyping, and new parameters that allow the user to control both the minimum and maximum number of reads, and the minimum number of strains to genotype a TE insertion. We also showed for the first time that T-lex3 provides accurate TE calls in a plant genome. Availability and implementation To test the accuracy of T-lex3, we called 1630 individual TE insertions in Drosophila melanogaster, 1600 individual TE insertions in humans, and 3067 individual TE insertions in the rice genome. We showed that this new version of T-lex is a broadly applicable and accurate tool for genotyping and estimating TE frequencies in organisms with different genome sizes and different TE contents. T-lex3 is available at Github: https://github.com/GonzalezLab/T-lex3. Supplementary information Supplementary data are available at Bioinformatics online.

2020 ◽  
Vol 36 (12) ◽  
pp. 3874-3876 ◽  
Author(s):  
Sergio Arredondo-Alonso ◽  
Martin Bootsma ◽  
Yaïr Hein ◽  
Malbert R C Rogers ◽  
Jukka Corander ◽  
...  

Abstract Summary Plasmids can horizontally transmit genetic traits, enabling rapid bacterial adaptation to new environments and hosts. Short-read whole-genome sequencing data are often applied to large-scale bacterial comparative genomics projects but the reconstruction of plasmids from these data is facing severe limitations, such as the inability to distinguish plasmids from each other in a bacterial genome. We developed gplas, a new approach to reliably separate plasmid contigs into discrete components using sequence composition, coverage, assembly graph information and network partitioning based on a pruned network of plasmid unitigs. Gplas facilitates the analysis of large numbers of bacterial isolates and allows a detailed analysis of plasmid epidemiology based solely on short-read sequence data. Availability and implementation Gplas is written in R, Bash and uses a Snakemake pipeline as a workflow management system. Gplas is available under the GNU General Public License v3.0 at https://gitlab.com/sirarredondo/gplas.git. Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Author(s):  
Anna E Sheppard ◽  
Nicole Stoesser ◽  
Ian German-Mesner ◽  
Kasi Vegesana ◽  
A Sarah Walker ◽  
...  

ABSTRACTMuch of the worldwide dissemination of antibiotic resistance has been driven by resistance gene associations with mobile genetic elements (MGEs), such as plasmids and transposons. Although increasing, our understanding of resistance spread remains relatively limited, as methods for tracking mobile resistance genes through multiple species, strains and plasmids are lacking. We have developed a bioinformatic pipeline for tracking variation within, and mobility of, specific transposable elements (TEs), such as transposons carrying antibiotic resistance genes. TETyper takes short-read whole-genome sequencing data as input and identifies single-nucleotide mutations and deletions within the TE of interest, to enable tracking of specific sequence variants, as well as the surrounding genetic context(s), to enable identification of transposition events. To investigate global dissemination of Klebsiella pneumoniae carbapenemase (KPC) and its associated transposon Tn4401, we applied TETyper to a collection of >3000 publicly available Illumina datasets containing blaKPC. This revealed surprising diversity, with >200 distinct flanking genetic contexts for Tn4401, indicating high levels of transposition. Integration of sample metadata revealed insights into associations between geographic locations, host species, Tn4401 sequence variants and flanking genetic contexts. To demonstrate the ability of TETyper to cope with high copy number TEs and to track specific short-term evolutionary changes, we also applied it to the insertion sequence IS26 within a defined K. pneumoniae outbreak. TETyper is implemented in python and is freely available at https://github.com/aesheppard/TETyper.


Rice ◽  
2020 ◽  
Vol 13 (1) ◽  
Author(s):  
Veronica Roman-Reyna ◽  
Dale Pinili ◽  
Frances N. Borja ◽  
Ian L. Quibod ◽  
Simon C. Groen ◽  
...  

Abstract Background The crop microbial communities are shaped by interactions between the host, microbes and the environment, however, their relative contribution is beginning to be understood. Here, we explore these interactions in the leaf bacterial community across 3024 rice accessions. Findings By using unmapped DNA sequencing reads as microbial reads, we characterized the structure of the rice bacterial microbiome. We identified central bacteria taxa that emerge as microbial “hubs” and may have an influence on the network of host-microbe interactions. We found regions in the rice genome that might control the assembly of these microbial hubs. To our knowledge this is one of the first studies that uses raw data from plant genome sequencing projects to characterize the leaf bacterial communities. Conclusion We showed, that the structure of the rice leaf microbiome is modulated by multiple interactions among host, microbes, and environment. Our data provide insight into the factors influencing microbial assemblage in the rice leaf and also opens the door for future initiatives to modulate rice consortia for crop improvement efforts.


2020 ◽  
Author(s):  
Veronica Roman-Reyna ◽  
Dale Pinili ◽  
Frances Nikki Borja ◽  
Ian Lorenzo Quibod ◽  
Simon C. Groen ◽  
...  

Abstract Background: The crop microbial communities are shaped by interactions between the host, microbes and the environment, however, their relative contribution is beginning to be understood. Here, we explore these interactions in the leaf bacterial community across 3,024 rice accessions. Findings: By using unmapped DNA sequencing reads as microbial reads, we characterized the structure of the rice bacterial microbiome. We identified central bacteria taxa that emerge as microbial “hubs” and may have an influence on the network of host-microbe interactions. We found regions in the rice genome that might control the assembly of these microbial hubs. To our knowledge this is one of the first studies that uses raw data from plant genome sequencing projects to characterize the leaf bacterial communities. Conclusion: We showed, that the structure of the rice leaf microbiome is modulated by multiple interactions among host, microbes, and environment. Our data provide insight into the factors influencing microbial assemblage in the rice leaf and also opens the door for future initiatives to modulate rice consortia for crop improvement efforts.


2020 ◽  
Author(s):  
Veronica Roman-Reyna ◽  
Dale Pinili ◽  
Frances Nikki Borja ◽  
Ian Lorenzo Quibod ◽  
Simon C. Groen ◽  
...  

Abstract Background: The crop microbial communities are shaped by interactions between the host, microbes and the environment, however, their relative contribution is beginning to be understood. Here, we explore these interactions in the leaf bacterial community across 3,024 rice accessions. Findings: By using unmapped DNA sequencing reads as microbial reads, we characterized the structure of the rice bacterial microbiome. We identified central bacteria taxa that emerge as microbial “hubs” and may have an influence on the network of host-microbe interactions. We found regions in the rice genome that might control the assembly of these microbial hubs. To our knowledge this is one of the first studies that uses raw data from plant genome sequencing projects to characterize the leaf bacterial communities. Conclusion: We showed, that the structure of the rice leaf microbiome is modulated by multiple interactions among host, microbes, and environment. Our data provide insight into the factors influencing microbial assemblage in the rice leaf and also opens the door for future initiatives to modulate rice consortia for crop improvement efforts.


Author(s):  
Veronica Roman-Reyna ◽  
Dale Pinili ◽  
Frances Nikki Borja ◽  
Ian Lorenzo Quibod ◽  
Simon C. Groen ◽  
...  

Abstract Background: The crop microbial communities are shaped by interactions between the host, microbes and the environment, however, their relative contribution is beginning to be understood. Here, we explore these interactions in the leaf bacterial community across 3,024 rice accessions.Findings: By using unmapped DNA sequencing reads as microbial reads, we characterized the structure of the rice bacterial microbiome. We identified central bacteria taxa that emerge as microbial “hubs” and may have an influence on the network of host-microbe interactions. We found regions in the rice genome that might control the assembly of these microbial hubs. To our knowledge this is one of the first studies that uses raw data from plant genome sequencing projects to characterize the leaf bacterial communities. Conclusion: We showed, that the structure of the rice leaf microbiome is modulated by multiple interactions among host, microbes, and environment. Our data provide insight into the factors influencing microbial assemblage in the rice leaf and also opens the door for future initiatives to modulate rice consortia for crop improvement efforts.


2017 ◽  
Author(s):  
Harriet Dashnow ◽  
Monkol Lek ◽  
Belinda Phipson ◽  
Andreas Halman ◽  
Simon Sadedin ◽  
...  

AbstractShort tandem repeat (STR) expansions have been identified as the causal DNA mutation in dozens of Mendelian diseases. Historically, pathogenic STR expansions could only be detected by single locus techniques, such as PCR and electrophoresis. The ability to use short read sequencing data to screen for STR expansions has the potential to reduce both the time and cost to reaching diagnosis and enable the discovery of new causal STR loci. Most existing tools detect STR variation within the read length, and so are unable to detect the majority of pathogenic expansions. Those tools that can detect large expansions are limited to a set of known disease loci and as yet no new disease causing STR expansions have been identified with high-throughput sequencing technologies.Here we address this by presenting STRetch, a new genome-wide method to detect STR expansions at all loci across the human genome. We demonstrate the use of STRetch for detecting pathogenic STR expansions in short-read whole genome sequencing data with a very low false discovery rate. We further demonstrate the application of STRetch to solve cases of patients with undiagnosed disease and apply STRetch to the analysis of 97 whole genomes to reveal variation at STR loci. STRetch assesses expansions at all STR loci in the genome and allows screening for novel disease-causing STRs.STRetch is open source software, available from github.com/Oshlack/STRetch.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Chong Chu ◽  
Rebeca Borges-Monroy ◽  
Vinayak V. Viswanadham ◽  
Soohyun Lee ◽  
Heng Li ◽  
...  

AbstractTransposable elements (TEs) help shape the structure and function of the human genome. When inserted into some locations, TEs may disrupt gene regulation and cause diseases. Here, we present xTea (x-Transposable element analyzer), a tool for identifying TE insertions in whole-genome sequencing data. Whereas existing methods are mostly designed for short-read data, xTea can be applied to both short-read and long-read data. Our analysis shows that xTea outperforms other short read-based methods for both germline and somatic TE insertion discovery. With long-read data, we created a catalogue of polymorphic insertions with full assembly and annotation of insertional sequences for various types of retroelements, including pseudogenes and endogenous retroviruses. Notably, we find that individual genomes have an average of nine groups of full-length L1s in centromeres, suggesting that centromeres and other highly repetitive regions such as telomeres are a significant yet unexplored source of active L1s. xTea is available at https://github.com/parklab/xTea.


2010 ◽  
Vol 26 (17) ◽  
pp. 2101-2108 ◽  
Author(s):  
Jiří Macas ◽  
Pavel Neumann ◽  
Petr Novák ◽  
Jiming Jiang

Abstract Motivation: Satellite DNA makes up significant portion of many eukaryotic genomes, yet it is relatively poorly characterized even in extensively sequenced species. This is, in part, due to methodological limitations of traditional methods of satellite repeat analysis, which are based on multiple alignments of monomer sequences. Therefore, we employed an alternative, alignment-free, approach utilizing k-mer frequency statistics, which is in principle more suitable for analyzing large sets of satellite repeat data, including sequence reads from next generation sequencing technologies. Results: k-mer frequency spectra were determined for two sets of rice centromeric satellite CentO sequences, including 454 reads from ChIP-sequencing of CENH3-bound DNA (7.6 Mb) and the whole genome Sanger sequencing reads (5.8 Mb). k-mer frequencies were used to identify the most conserved sequence regions and to reconstruct consensus sequences of complete monomers. Reconstructed consensus sequences as well as the assessment of overall divergence of k-mer spectra revealed high similarity of the two datasets, suggesting that CentO sequences associated with functional centromeres (CENH3-bound) do not significantly differ from the total population of CentO, which includes both centromeric and pericentromeric repeat arrays. On the other hand, considerable differences were revealed when these methods were used for comparison of CentO populations between individual chromosomes of the rice genome assembly, demonstrating preferential sequence homogenization of the clusters within the same chromosome. k-mer frequencies were also successfully used to identify and characterize smRNAs derived from CentO repeats. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document