MitoFlex: an efficient, high-performance toolkit for animal mitogenome assembly, annotation, and visualization

Bioinformatics ◽

10.1093/bioinformatics/btab111 ◽

2021 ◽

Author(s):

Jun-Yu Li ◽

Wei-Xuan Li ◽

An-Tai Wang ◽

Zhang Yu

Keyword(s):

Mitochondrial Genome ◽

High Performance ◽

High Throughput Sequencing ◽

De Novo ◽

Supplementary Information ◽

Sequencing Data ◽

Protein Coding ◽

High Throughput Sequencing Data ◽

Genome Analysis Toolkit ◽

Overall Performance

Abstract Summary MitoFlex is a linux-based mitochondrial genome analysis toolkit, which provides a complete workflow of raw data filtering, de novo assembly, mitochondrial genome identification and annotation for animal high throughput sequencing data. The overall performance was compared between MitoFlex and its analogue MitoZ, in terms of protein coding gene recovery, memory consumption and processing speed. Availability MitoFlex is available at https://github.com/Prunoideae/MitoFlex under GPLv3 license. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Inference of viral quasispecies with a paired de Bruijn graph

Bioinformatics ◽

10.1093/bioinformatics/btaa782 ◽

2020 ◽

Author(s):

Borja Freire ◽

Susana Ladra ◽

Jose R Paramá ◽

Leena Salmela

Keyword(s):

High Throughput Sequencing ◽

De Novo ◽

Supplementary Information ◽

De Bruijn Graph ◽

Viral Quasispecies ◽

Sequencing Data ◽

De Bruijn Graphs ◽

Sequencing Errors ◽

High Throughput Sequencing Data ◽

De Bruijn

Abstract Motivation RNA viruses exhibit a high mutation rate and thus they exist in infected cells as a population of closely related strains called viral quasispecies. The viral quasispecies assembly problem asks to characterize the quasispecies present in a sample from high-throughput sequencing data. We study the de novo version of the problem, where reference sequences of the quasispecies are not available. Current methods for assembling viral quasispecies are either based on overlap graphs or on de Bruijn graphs. Overlap graph-based methods tend to be accurate but slow, whereas de Bruijn graph-based methods are fast but less accurate. Results We present viaDBG, which is a fast and accurate de Bruijn graph-based tool for de novo assembly of viral quasispecies. We first iteratively correct sequencing errors in the reads, which allows us to use large k-mers in the de Bruijn graph. To incorporate the paired-end information in the graph, we also adapt the paired de Bruijn graph for viral quasispecies assembly. These features enable the use of long-range information in contig construction without compromising the speed of de Bruijn graph-based approaches. Our experimental results show that viaDBG is both accurate and fast, whereas previous methods are either fast or accurate but not both. In particular, viaDBG has comparable or better accuracy than SAVAGE, while being at least nine times faster. Furthermore, the speed of viaDBG is comparable to PEHaplo but viaDBG is able to retrieve also low abundance quasispecies, which are often missed by PEHaplo. Availability and implementation viaDBG is implemented in C++ and it is publicly available at https://bitbucket.org/bfreirec1/viadbg. All datasets used in this article are publicly available at https://bitbucket.org/bfreirec1/data-viadbg/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Characterization of the mitochondrial genome ofArge bellaWei & Du sp. nov. (Hymenoptera: Argidae)

PeerJ ◽

10.7717/peerj.6131 ◽

2018 ◽

Vol 6 ◽

pp. e6131 ◽

Cited By ~ 3

Author(s):

Shiyu Du ◽

Gengyun Niu ◽

Tommi Nyman ◽

Meicai Wei

Keyword(s):

Mitochondrial Genome ◽

High Throughput Sequencing ◽

Complete Mitochondrial Genome ◽

Nucleotide Composition ◽

Sequencing Data ◽

Protein Coding ◽

High Throughput Sequencing Data ◽

Rna Genes ◽

Ancestral Type

We describeArge bellaWei & Du sp. nov., a large and beautiful species of Argidae from south China, and report its mitochondrial genome based on high-throughput sequencing data. We present the gene order, nucleotide composition of protein-coding genes (PCGs), and the secondary structures of RNA genes. The nearly complete mitochondrial genome ofA. bellahas a length of 15,576 bp and a typical set of 37 genes (22 tRNAs, 13 PCGs, and 2 rRNAs). Three tRNAs are rearranged in theA. bellamitochondrial genome as compared to the ancestral type in insects:trnMandtrnQare shuffled, whiletrnWis translocated from thetrnW-trnC-trnYcluster to a location downstream oftrnI. All PCGs are initiated by ATN codons, and terminated with TAA, TA or T as stop codons. All tRNAs have a typical cloverleaf secondary structure, except fortrnS1. H821 ofrrnSand H976 ofrrnLare redundant. A phylogenetic analysis based on mitochondrial genome sequences ofA. bella, 21 other symphytan species, two apocritan representatives, and four outgroup taxa supports the placement of Argidae as sister to the Pergidae within the symphytan superfamily Tenthredinoidea.

Download Full-text

circtools—a one-stop software solution for circular RNA research

Bioinformatics ◽

10.1093/bioinformatics/bty948 ◽

2018 ◽

Vol 35 (13) ◽

pp. 2326-2328 ◽

Cited By ~ 13

Author(s):

Tobias Jakobi ◽

Alexey Uvarovskii ◽

Christoph Dieterich

Keyword(s):

High Throughput Sequencing ◽

Circular Rna ◽

Statistical Testing ◽

Supplementary Information ◽

Circular Rnas ◽

Sequencing Data ◽

High Throughput Sequencing Data ◽

Multi Stage ◽

Sequence Reconstruction ◽

One Stop

Abstract Motivation Circular RNAs (circRNAs) originate through back-splicing events from linear primary transcripts, are resistant to exonucleases, are not polyadenylated and have been shown to be highly specific for cell type and developmental stage. CircRNA detection starts from high-throughput sequencing data and is a multi-stage bioinformatics process yielding sets of potential circRNA candidates that require further analyses. While a number of tools for the prediction process already exist, publicly available analysis tools for further characterization are rare. Our work provides researchers with a harmonized workflow that covers different stages of in silico circRNA analyses, from prediction to first functional insights. Results Here, we present circtools, a modular, Python-based framework for computational circRNA analyses. The software includes modules for circRNA detection, internal sequence reconstruction, quality checking, statistical testing, screening for enrichment of RBP binding sites, differential exon RNase R resistance and circRNA-specific primer design. circtools supports researchers with visualization options and data export into commonly used formats. Availability and implementation circtools is available via https://github.com/dieterich-lab/circtools and http://circ.tools under GPLv3.0. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

hypeR: An R Package for Geneset Enrichment Workflows

10.1101/656637 ◽

2019 ◽

Cited By ~ 1

Author(s):

Anthony Federico ◽

Stefano Monti

Keyword(s):

High Throughput Sequencing ◽

R Package ◽

Supplementary Information ◽

Sequencing Data ◽

Wide Audience ◽

Popular Method ◽

Link Type ◽

High Throughput Sequencing Data ◽

One Stop ◽

Recent Version

ABSTRACTSummaryGeneset enrichment is a popular method for annotating high-throughput sequencing data. Existing tools fall short in providing the flexibility to tackle the varied challenges researchers face in such analyses, particularly when analyzing many signatures across multiple experiments. We present a comprehensive R package for geneset enrichment workflows that offers multiple enrichment, visualization, and sharing methods in addition to novel features such as hierarchical geneset analysis and built-in markdown reporting. hypeR is a one-stop solution to performing geneset enrichment for a wide audience and range of use cases.Availability and implementationThe most recent version of the package is available at https://github.com/montilab/hypeR.Supplementary informationComprehensive documentation and tutorials, are available at https://montilab.github.io/hypeR-docs.

Download Full-text

ADFinder: accurate detection of programmed DNA elimination using NGS high-throughput sequencing data

Bioinformatics ◽

10.1093/bioinformatics/btaa226 ◽

2020 ◽

Vol 36 (12) ◽

pp. 3632-3636 ◽

Cited By ~ 2

Author(s):

Weibo Zheng ◽

Jing Chen ◽

Thomas G Doak ◽

Weibo Song ◽

Ying Yan

Keyword(s):

High Throughput ◽

Large Scale ◽

High Throughput Sequencing ◽

Supplementary Information ◽

Sequencing Data ◽

Source Codes ◽

High Throughput Sequencing Data ◽

Dna Elimination ◽

Multiple Alternative ◽

Dna Splicing

Abstract Motivation Programmed DNA elimination (PDE) plays a crucial role in the transitions between germline and somatic genomes in diverse organisms ranging from unicellular ciliates to multicellular nematodes. However, software specific for the detection of DNA splicing events is scarce. In this paper, we describe Accurate Deletion Finder (ADFinder), an efficient detector of PDEs using high-throughput sequencing data. ADFinder can predict PDEs with relatively low sequencing coverage, detect multiple alternative splicing forms in the same genomic location and calculate the frequency for each splicing event. This software will facilitate research of PDEs and all down-stream analyses. Results By analyzing genome-wide DNA splicing events in two micronuclear genomes of Oxytricha trifallax and Tetrahymena thermophila, we prove that ADFinder is effective in predicting large scale PDEs. Availability and implementation The source codes and manual of ADFinder are available in our GitHub website: https://github.com/weibozheng/ADFinder. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Complete mitochondrial genome sequence of a Hungarian red deer (Cervus elaphus hippelaphus) from high-throughput sequencing data and its phylogenetic position within the family Cervidae

Acta Biologica Hungarica ◽

10.1556/018.67.2016.2.2 ◽

2016 ◽

Vol 67 (2) ◽

pp. 133-147 ◽

Cited By ~ 5

Author(s):

Krisztián Frank ◽

Endre Barta ◽

Nóra Á. Bana ◽

János Nagy ◽

Péter Horn ◽

...

Keyword(s):

Mitochondrial Genome ◽

High Throughput ◽

Cervus Elaphus ◽

Red Deer ◽

High Throughput Sequencing ◽

Complete Mitochondrial Genome ◽

Phylogenetic Position ◽

Sequencing Data ◽

High Throughput Sequencing Data ◽

The Family

Download Full-text

PathoQC: Computationally Efficient Read Preprocessing and Quality Control for High-Throughput Sequencing Data Sets

Cancer Informatics ◽

10.4137/cin.s13890 ◽

2014 ◽

Vol 13s1 ◽

pp. CIN.S13890 ◽

Cited By ~ 1

Author(s):

Changjin Hong ◽

Solaiappan Manimaran ◽

William Evan Johnson

Keyword(s):

Quality Control ◽

High Throughput ◽

High Performance ◽

High Throughput Sequencing ◽

Next Generation Sequencing Data ◽

Data Sets ◽

Sequencing Data ◽

Computationally Efficient ◽

High Throughput Sequencing Data ◽

Downstream Analysis

Quality control and read preprocessing are critical steps in the analysis of data sets generated from high-throughput genomic screens. In the most extreme cases, improper preprocessing can negatively affect downstream analyses and may lead to incorrect biological conclusions. Here, we present PathoQC, a streamlined toolkit that seamlessly combines the benefits of several popular quality control software approaches for preprocessing next-generation sequencing data. PathoQC provides a variety of quality control options appropriate for most high-throughput sequencing applications. PathoQC is primarily developed as a module in the PathoScope software suite for metagenomic analysis. However, PathoQC is also available as an open-source Python module that can run as a stand-alone application or can be easily integrated into any bioinformatics workflow. PathoQC achieves high performance by supporting parallel computation and is an effective tool that removes technical sequencing artifacts and facilitates robust downstream analysis. The PathoQC software package is available at http://sourceforge.net/projects/PathoScope/ .

Download Full-text

QUARTIC: QUick pArallel algoRithms for high-Throughput sequencIng data proCessing

F1000Research ◽

10.12688/f1000research.22954.3 ◽

2020 ◽

Vol 9 ◽

pp. 240

Author(s):

Frédéric Jarlier ◽

Nicolas Joly ◽

Nicolas Fedy ◽

Thomas Magalhaes ◽

Leonor Sirotti ◽

...

Keyword(s):

High Throughput ◽

Message Passing ◽

High Performance ◽

Message Passing Interface ◽

High Throughput Sequencing ◽

Genome Structure ◽

Sequencing Data ◽

High Throughput Sequencing Data ◽

Speed Up ◽

Time To Delivery

Life science has entered the so-called 'big data era' where biologists, clinicians and bioinformaticians are overwhelmed with high-throughput sequencing data. While they offer new insights to decipher the genome structure they also raise major challenges to use them for daily clinical practice care and diagnosis purposes as they are bigger and bigger. Therefore, we implemented a software to reduce the time to delivery for the alignment and the sorting of high-throughput sequencing data. Our solution is implemented using Message Passing Interface and is intended for high-performance computing architecture. The software scales linearly with respect to the size of the data and ensures a total reproducibility with the traditional tools. For example, a 300X whole genome can be aligned and sorted within less than 9 hours with 128 cores. The software offers significant speed-up using multi-cores and multi-nodes parallelization.

Download Full-text

De Novo Assembly of High-Throughput Sequencing Data with Cloud Computing and New Operations on String Graphs

2012 IEEE Fifth International Conference on Cloud Computing ◽

10.1109/cloud.2012.123 ◽

2012 ◽

Cited By ~ 5

Author(s):

Yu-Jung Chang ◽

Chien-Chih Chen ◽

Jan-Ming Ho ◽

Chuen-Liang Chen

Keyword(s):

Cloud Computing ◽

High Throughput ◽

De Novo Assembly ◽

High Throughput Sequencing ◽

De Novo ◽

Sequencing Data ◽

String Graphs ◽

High Throughput Sequencing Data

Download Full-text

The Identification and Interpretation of cis-Regulatory Noncoding Mutations in Cancer

High-Throughput ◽

10.3390/ht8010001 ◽

2018 ◽

Vol 8 (1) ◽

pp. 1

Author(s):

Minal Patel ◽

Jun Wang

Keyword(s):

High Throughput Sequencing ◽

Regulatory Element ◽

Molecular Techniques ◽

Driver Mutations ◽

Regulatory Sequences ◽

Sequencing Data ◽

Protein Coding ◽

High Throughput Sequencing Data ◽

Cancer Genomes ◽

Novel Biomarkers

In the need to characterise the genomic landscape of cancers and to establish novel biomarkers and therapeutic targets, studies have largely focused on the identification of driver mutations within the protein-coding gene regions, where the most pathogenic alterations are known to occur. However, the noncoding genome is significantly larger than its protein-coding counterpart, and evidence reveals that regulatory sequences also harbour functional mutations that significantly affect the regulation of genes and pathways implicated in cancer. Due to the sheer number of noncoding mutations (NCMs) and the limited knowledge of regulatory element functionality in cancer genomes, differentiating pathogenic mutations from background passenger noise is particularly challenging technically and computationally. Here we review various up-to-date high-throughput sequencing data/studies and in silico methods that can be employed to interrogate the noncoding genome. We aim to provide an overview of available data resources as well as computational and molecular techniques that can help and guide the search for functional NCMs in cancer genomes.

Download Full-text