TEfinder: A Bioinformatics Pipeline for Detecting New Transposable Element Insertion Events in Next-Generation Sequencing Data

Transposable elements (TEs) are mobile elements capable of introducing genetic changes rapidly. Their importance has been documented in many biological processes, such as introducing genetic instability, altering patterns of gene expression, and accelerating genome evolution. Increasing appreciation of TEs has resulted in a growing number of bioinformatics software to identify insertion events. However, the application of existing tools is limited by either narrow-focused design of the package, too many dependencies on other tools, or prior knowledge required as input files that may not be readily available to all users. Here, we reported a simple pipeline, TEfinder, developed for the detection of new TE insertions with minimal software and input file dependencies. The external software requirements are BEDTools, SAMtools, and Picard. Necessary input files include the reference genome sequence in FASTA format, an alignment file from paired-end reads, existing TEs in GTF format, and a text file of TE names. We tested TEfinder among several evolving populations of Fusarium oxysporum generated through a short-term adaptation study. Our results demonstrate that this easy-to-use tool can effectively detect new TE insertion events, making it accessible and practical for TE analysis.

Download Full-text

TEfinder: A Bioinformatics Pipeline for Detecting New Transposable Element Insertion Events in Next-Generation Sequencing Data

10.20944/preprints202012.0473.v1 ◽

2020 ◽

Author(s):

Vista Sohrab ◽

Cristina López-Díaz ◽

Antonio Di Pietro ◽

Li-Jun Ma ◽

Dilay Hazal Ayhan

Keyword(s):

Reference Genome ◽

Variant Calling ◽

Next Generation Sequencing Data ◽

Sequencing Data ◽

Short Term ◽

Bioinformatics Pipeline ◽

Bioinformatics Software ◽

External Software ◽

Short Term Adaptation ◽

Generation Sequencing

Transposable elements (TEs) are mobile genetic elements capable of rapidly altering the genome through their movements. The importance of TE activity has been documented in many biological processes, such as introducing genetic instability, altering patterns of gene expression, and accelerating genome evolution. Increasing appreciation of TEs results in the growing number of bioinformatics software to identify insertion events. However, the application of existing TE finding tools is limited by either narrow-focused design of the package, too many dependencies on other tools, or prior knowledge required as input files that may not be readily available to all users. Here, we report a simple pipeline, TEfinder, developed for the detection of new TE insertions with minimal software dependencies using four inputs that can be easily generated with popular variant calling pipelines. The external software requirements are BEDTools, SAMtools, and Picard. Necessary inputs include TEs present in the reference genome, binary paired-end alignment, reference genome index, and a list of TE names. We tested TEfinder pipeline among several evolving populations of Fusarium oxysporum generated through a short-term adaptation study. Our results demonstrate that this easy-to-use tool can effectively detect new TE insertion events, making it accessible and practical for TE analysis.

Download Full-text

Towards Multi-approaches Bioinformatics Pipeline Based on Big Data and Cloud Computing for Next Generation Sequencing Data Analysis

Advances in Intelligent Systems and Computing - Advanced Intelligent Systems for Sustainable Development (AI2SD’2019) ◽

10.1007/978-3-030-36664-3_43 ◽

2020 ◽

pp. 385-394

Author(s):

Razika Driouche

Keyword(s):

Cloud Computing ◽

Big Data ◽

Data Analysis ◽

Next Generation Sequencing ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Bioinformatics Pipeline ◽

Generation Sequencing ◽

Sequencing Data Analysis

Download Full-text

A bioinformatics pipeline for rare genetic diseases in South African patients

South African Journal of Science ◽

10.17159/sajs.2019/4876 ◽

2019 ◽

Vol 115 (3/4) ◽

Author(s):

Maryke Schoonen ◽

Albertus S. Seyffert ◽

Francois H. van Der Westhuizen ◽

Izelle Smuts

Keyword(s):

South Africa ◽

Next Generation Sequencing ◽

South African ◽

Rare Diseases ◽

Next Generation Sequencing Data ◽

Common Disease ◽

Next Generation ◽

Sequencing Data ◽

Bioinformatics Pipeline ◽

Generation Sequencing

The research fields of bioinformatics and computational biology are growing rapidly in South Africa. Bioinformatics pipelines play an integral part in handling sequencing data, which are used to investigate the aetiology of common and rare diseases. Bioinformatics platforms for common disease aetiology are well supported and continuously being developed in South Africa. However, the same is not the case for rare diseases aetiology research. Investigations into the latter rely on international cloud-based tools for data analyses and ultimately confirmation of a genetic disease. However, these tools are not necessarily optimised for ethnically diverse population groups. We present an in-house developed bioinformatics pipeline to enable researchers to annotate and filter variants in either exome or amplicon next-generation sequencing data. This pipeline was developed using next-generation sequencing data of a predominantly African cohort of patients diagnosed with rare disease. Significance: We demonstrate the feasibility of in-country development of ethnicity-sensitive, automated bioinformatics pipelines using free software in a South African context. We provide a roadmap for development of similarly ethnicity-sensitive bioinformatics pipelines.

Download Full-text

ECCsplorer: a pipeline to detect extrachromosomal circular DNA (eccDNA) from next-generation sequencing data

BMC Bioinformatics ◽

10.1186/s12859-021-04545-2 ◽

2022 ◽

Vol 23 (1) ◽

Author(s):

Ludwig Mann ◽

Kathrin M. Seibt ◽

Beatrice Weber ◽

Tony Heitkam

Keyword(s):

Next Generation Sequencing ◽

Transposable Elements ◽

Data Availability ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Bioinformatics Pipeline ◽

Circular Dna ◽

Wide Range ◽

Generation Sequencing

Abstract Background Extrachromosomal circular DNAs (eccDNAs) are ring-like DNA structures physically separated from the chromosomes with 100 bp to several megabasepairs in size. Apart from carrying tandemly repeated DNA, eccDNAs may also harbor extra copies of genes or recently activated transposable elements. As eccDNAs occur in all eukaryotes investigated so far and likely play roles in stress, cancer, and aging, they have been prime targets in recent research—with their investigation limited by the scarcity of computational tools. Results Here, we present the ECCsplorer, a bioinformatics pipeline to detect eccDNAs in any kind of organism or tissue using next-generation sequencing techniques. Following Illumina-sequencing of amplified circular DNA (circSeq), the ECCsplorer enables an easy and automated discovery of eccDNA candidates. The data analysis encompasses two major procedures: first, read mapping to the reference genome allows the detection of informative read distributions including high coverage, discordant mapping, and split reads. Second, reference-free comparison of read clusters from amplified eccDNA against control sample data reveals specifically enriched DNA circles. Both software parts can be run separately or jointly, depending on the individual aim or data availability. To illustrate the wide applicability of our approach, we analyzed semi-artificial and published circSeq data from the model organisms Homo sapiens and Arabidopsis thaliana, and generated circSeq reads from the non-model crop plant Beta vulgaris. We clearly identified eccDNA candidates from all datasets, with and without reference genomes. The ECCsplorer pipeline specifically detected mitochondrial mini-circles and retrotransposon activation, showcasing the ECCsplorer’s sensitivity and specificity. Conclusion The ECCsplorer (available online at https://github.com/crimBubble/ECCsplorer) is a bioinformatics pipeline to detect eccDNAs in any kind of organism or tissue using next-generation sequencing data. The derived eccDNA targets are valuable for a wide range of downstream investigations—from analysis of cancer-related eccDNAs over organelle genomics to identification of active transposable elements.

Download Full-text

A Galaxy-based bioinformatics pipeline for optimised, streamlined microsatellite development from Illumina next-generation sequencing data

Conservation Genetics Resources ◽

10.1007/s12686-016-0570-7 ◽

2016 ◽

Vol 8 (4) ◽

pp. 481-486 ◽

Cited By ~ 14

Author(s):

Sarah M. Griffiths ◽

Graeme Fox ◽

Peter J. Briggs ◽

Ian J. Donaldson ◽

Simon Hood ◽

...

Keyword(s):

Next Generation Sequencing ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Bioinformatics Pipeline ◽

Microsatellite Development ◽

Generation Sequencing

Download Full-text

Development and validation of Houston Methodist Variant Viewer version 3: updates to our application for interpretation of next-generation sequencing data

JAMIA Open ◽

10.1093/jamiaopen/ooaa004 ◽

2020 ◽

Vol 3 (2) ◽

pp. 299-305

Author(s):

Paul A Christensen ◽

Sishir Subedi ◽

Kristi Pepper ◽

Heather L Hendrickson ◽

Zejuan Li ◽

...

Keyword(s):

Next Generation Sequencing ◽

Data Entry ◽

Computation Time ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Bioinformatics Pipeline ◽

System Maintenance ◽

Development And Validation ◽

Generation Sequencing

Abstract Objectives Informatics tools that support next-generation sequencing workflows are essential to deliver timely interpretation of somatic variants in cancer. Here, we describe significant updates to our laboratory developed bioinformatics pipelines and data management application termed Houston Methodist Variant Viewer (HMVV). Materials and Methods We collected feature requests and workflow improvement suggestions from the end-users of HMVV version 1. Over 1.5 years, we iteratively implemented these features in five sequential updates to HMVV version 3. Results We improved the performance and data throughput of the application while reducing the opportunity for manual data entry errors. We enabled end-user workflows for pipeline monitoring, variant interpretation and annotation, and integration with our laboratory information system. System maintenance was improved through enhanced defect reporting, heightened data security, and improved modularity in the code and system environments. Discussion and Conclusion Validation of each HMVV update was performed according to expert guidelines. We enabled an 8× reduction in the bioinformatics pipeline computation time for our longest running assay. Our molecular pathologists can interpret the assay results at least 2 days sooner than was previously possible. The application and pipeline code are publicly available at https://github.com/hmvv.

Download Full-text

DNAscan: a fast, computationally and memory efficient bioinformatics pipeline for the analysis of DNA next-generation-sequencing data

10.1101/267195 ◽

2018 ◽

Cited By ~ 1

Author(s):

A Iacoangeli ◽

A Al Khleifat ◽

W Sproviero ◽

A Shatunov ◽

AR Jones ◽

...

Keyword(s):

Next Generation Sequencing ◽

High Performance ◽

Genetic Material ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Bioinformatics Pipeline ◽

Bioinformatics Tools ◽

Ngs Data ◽

Generation Sequencing

AbstractThe generation of DNA Next Generation Sequencing (NGS) data is a commonly applied approach for studying the genetic basis of biological processes, including diseases, and underpins the aspirations of precision medicine. However, there are significant challenges when dealing with NGS data. A huge number of bioinformatics tools exist and it is therefore challenging to design an analysis pipeline; NGS analysis is computationally intensive, requiring expensive infrastructure which can be problematic given that many medical and research centres do not have adequate high performance computing facilities and the use of cloud computing facilities is not always possible due to privacy and ownership issues. We have therefore developed a fast and efficient bioinformatics pipeline that allows for the analysis of DNA sequencing data, while requiring little computational effort and memory usage. We achieved this by exploiting state-of-the-art bioinformatics tools. DNAscan can analyse raw, 40x whole genome NGS data in 8 hours, using as little as 8 threads and 16 Gbs of RAM, while guaranteeing a high performance. DNAscan can look for SNVs, small indels, SVs, repeat expansions and viral genetic material (or any other organism). Its results are annotated using a customisable variety of databases including ClinVar, Exac and dbSNP, and a local deployment of the gene.iobio platform is available for an on-the-fly result visualisation.

Download Full-text

Faculty Opinions recommendation of VarWalker: personalized mutation network analysis of putative cancer genes from next-generation sequencing data.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.718272765.793499663 ◽

2014 ◽

Author(s):

Gary Bader ◽

Mohamed Helmy

Keyword(s):

Next Generation Sequencing ◽

Network Analysis ◽

Next Generation Sequencing Data ◽

Cancer Genes ◽

Next Generation ◽

Sequencing Data ◽

Generation Sequencing

Download Full-text

Faculty Opinions recommendation of Bioinformatory-assisted analysis of next-generation sequencing data for precision medicine in pancreatic cancer.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.727775566.793536095 ◽

2017 ◽

Author(s):

Steve Pereira

Keyword(s):

Pancreatic Cancer ◽

Next Generation Sequencing ◽

Precision Medicine ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Assisted Analysis ◽

Generation Sequencing

Download Full-text

NGSremix: A software tool for estimating pairwise relatedness between admixed individuals from next-generation sequencing data

G3 Genes|Genome|Genetics ◽

10.1093/g3journal/jkab174 ◽

2021 ◽

Author(s):

Anne Krogh Nøhr ◽

Kristian Hanghøj ◽

Genis Garcia Erill ◽

Zilong Li ◽

Ida Moltke ◽

...

Keyword(s):

Next Generation Sequencing ◽

Genetic Research ◽

Likelihood Estimation ◽

Software Tool ◽

Estimation Methods ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Ngs Data ◽

Generation Sequencing

Abstract Estimation of relatedness between pairs of individuals is important in many genetic research areas. When estimating relatedness, it is important to account for admixture if this is present. However, the methods that can account for admixture are all based on genotype data as input, which is a problem for low-depth next-generation sequencing (NGS) data from which genotypes are called with high uncertainty. Here we present a software tool, NGSremix, for maximum likelihood estimation of relatedness between pairs of admixed individuals from low-depth NGS data, which takes the uncertainty of the genotypes into account via genotype likelihoods. Using both simulated and real NGS data for admixed individuals with an average depth of 4x or below we show that our method works well and clearly outperforms all the commonly used state-of-the-art relatedness estimation methods PLINK, KING, relateAdmix, and ngsRelate that all perform quite poorly. Hence, NGSremix is a useful new tool for estimating relatedness in admixed populations from low-depth NGS data. NGSremix is implemented in C/C ++ in a multi-threaded software and is freely available on Github https://github.com/KHanghoj/NGSremix.

Download Full-text