MitoFlex: an efficient, high-performance toolkit for animal mitogenome assembly, annotation, and visualization

Author(s):  
Jun-Yu Li ◽  
Wei-Xuan Li ◽  
An-Tai Wang ◽  
Zhang Yu

Abstract Summary MitoFlex is a linux-based mitochondrial genome analysis toolkit, which provides a complete workflow of raw data filtering, de novo assembly, mitochondrial genome identification and annotation for animal high throughput sequencing data. The overall performance was compared between MitoFlex and its analogue MitoZ, in terms of protein coding gene recovery, memory consumption and processing speed. Availability MitoFlex is available at https://github.com/Prunoideae/MitoFlex under GPLv3 license. Supplementary information Supplementary data are available at Bioinformatics online.

Author(s):  
Borja Freire ◽  
Susana Ladra ◽  
Jose R Paramá ◽  
Leena Salmela

Abstract Motivation RNA viruses exhibit a high mutation rate and thus they exist in infected cells as a population of closely related strains called viral quasispecies. The viral quasispecies assembly problem asks to characterize the quasispecies present in a sample from high-throughput sequencing data. We study the de novo version of the problem, where reference sequences of the quasispecies are not available. Current methods for assembling viral quasispecies are either based on overlap graphs or on de Bruijn graphs. Overlap graph-based methods tend to be accurate but slow, whereas de Bruijn graph-based methods are fast but less accurate. Results We present viaDBG, which is a fast and accurate de Bruijn graph-based tool for de novo assembly of viral quasispecies. We first iteratively correct sequencing errors in the reads, which allows us to use large k-mers in the de Bruijn graph. To incorporate the paired-end information in the graph, we also adapt the paired de Bruijn graph for viral quasispecies assembly. These features enable the use of long-range information in contig construction without compromising the speed of de Bruijn graph-based approaches. Our experimental results show that viaDBG is both accurate and fast, whereas previous methods are either fast or accurate but not both. In particular, viaDBG has comparable or better accuracy than SAVAGE, while being at least nine times faster. Furthermore, the speed of viaDBG is comparable to PEHaplo but viaDBG is able to retrieve also low abundance quasispecies, which are often missed by PEHaplo. Availability and implementation viaDBG is implemented in C++ and it is publicly available at https://bitbucket.org/bfreirec1/viadbg. All datasets used in this article are publicly available at https://bitbucket.org/bfreirec1/data-viadbg/. Supplementary information Supplementary data are available at Bioinformatics online.


PeerJ ◽  
2018 ◽  
Vol 6 ◽  
pp. e6131 ◽  
Author(s):  
Shiyu Du ◽  
Gengyun Niu ◽  
Tommi Nyman ◽  
Meicai Wei

We describeArge bellaWei & Du sp. nov., a large and beautiful species of Argidae from south China, and report its mitochondrial genome based on high-throughput sequencing data. We present the gene order, nucleotide composition of protein-coding genes (PCGs), and the secondary structures of RNA genes. The nearly complete mitochondrial genome ofA. bellahas a length of 15,576 bp and a typical set of 37 genes (22 tRNAs, 13 PCGs, and 2 rRNAs). Three tRNAs are rearranged in theA. bellamitochondrial genome as compared to the ancestral type in insects:trnMandtrnQare shuffled, whiletrnWis translocated from thetrnW-trnC-trnYcluster to a location downstream oftrnI. All PCGs are initiated by ATN codons, and terminated with TAA, TA or T as stop codons. All tRNAs have a typical cloverleaf secondary structure, except fortrnS1. H821 ofrrnSand H976 ofrrnLare redundant. A phylogenetic analysis based on mitochondrial genome sequences ofA. bella, 21 other symphytan species, two apocritan representatives, and four outgroup taxa supports the placement of Argidae as sister to the Pergidae within the symphytan superfamily Tenthredinoidea.


2018 ◽  
Vol 35 (13) ◽  
pp. 2326-2328 ◽  
Author(s):  
Tobias Jakobi ◽  
Alexey Uvarovskii ◽  
Christoph Dieterich

Abstract Motivation Circular RNAs (circRNAs) originate through back-splicing events from linear primary transcripts, are resistant to exonucleases, are not polyadenylated and have been shown to be highly specific for cell type and developmental stage. CircRNA detection starts from high-throughput sequencing data and is a multi-stage bioinformatics process yielding sets of potential circRNA candidates that require further analyses. While a number of tools for the prediction process already exist, publicly available analysis tools for further characterization are rare. Our work provides researchers with a harmonized workflow that covers different stages of in silico circRNA analyses, from prediction to first functional insights. Results Here, we present circtools, a modular, Python-based framework for computational circRNA analyses. The software includes modules for circRNA detection, internal sequence reconstruction, quality checking, statistical testing, screening for enrichment of RBP binding sites, differential exon RNase R resistance and circRNA-specific primer design. circtools supports researchers with visualization options and data export into commonly used formats. Availability and implementation circtools is available via https://github.com/dieterich-lab/circtools and http://circ.tools under GPLv3.0. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Anthony Federico ◽  
Stefano Monti

ABSTRACTSummaryGeneset enrichment is a popular method for annotating high-throughput sequencing data. Existing tools fall short in providing the flexibility to tackle the varied challenges researchers face in such analyses, particularly when analyzing many signatures across multiple experiments. We present a comprehensive R package for geneset enrichment workflows that offers multiple enrichment, visualization, and sharing methods in addition to novel features such as hierarchical geneset analysis and built-in markdown reporting. hypeR is a one-stop solution to performing geneset enrichment for a wide audience and range of use cases.Availability and implementationThe most recent version of the package is available at https://github.com/montilab/hypeR.Supplementary informationComprehensive documentation and tutorials, are available at https://montilab.github.io/hypeR-docs.


2020 ◽  
Vol 36 (12) ◽  
pp. 3632-3636 ◽  
Author(s):  
Weibo Zheng ◽  
Jing Chen ◽  
Thomas G Doak ◽  
Weibo Song ◽  
Ying Yan

Abstract Motivation Programmed DNA elimination (PDE) plays a crucial role in the transitions between germline and somatic genomes in diverse organisms ranging from unicellular ciliates to multicellular nematodes. However, software specific for the detection of DNA splicing events is scarce. In this paper, we describe Accurate Deletion Finder (ADFinder), an efficient detector of PDEs using high-throughput sequencing data. ADFinder can predict PDEs with relatively low sequencing coverage, detect multiple alternative splicing forms in the same genomic location and calculate the frequency for each splicing event. This software will facilitate research of PDEs and all down-stream analyses. Results By analyzing genome-wide DNA splicing events in two micronuclear genomes of Oxytricha trifallax and Tetrahymena thermophila, we prove that ADFinder is effective in predicting large scale PDEs. Availability and implementation The source codes and manual of ADFinder are available in our GitHub website: https://github.com/weibozheng/ADFinder. Supplementary information Supplementary data are available at Bioinformatics online.


2014 ◽  
Vol 13s1 ◽  
pp. CIN.S13890 ◽  
Author(s):  
Changjin Hong ◽  
Solaiappan Manimaran ◽  
William Evan Johnson

Quality control and read preprocessing are critical steps in the analysis of data sets generated from high-throughput genomic screens. In the most extreme cases, improper preprocessing can negatively affect downstream analyses and may lead to incorrect biological conclusions. Here, we present PathoQC, a streamlined toolkit that seamlessly combines the benefits of several popular quality control software approaches for preprocessing next-generation sequencing data. PathoQC provides a variety of quality control options appropriate for most high-throughput sequencing applications. PathoQC is primarily developed as a module in the PathoScope software suite for metagenomic analysis. However, PathoQC is also available as an open-source Python module that can run as a stand-alone application or can be easily integrated into any bioinformatics workflow. PathoQC achieves high performance by supporting parallel computation and is an effective tool that removes technical sequencing artifacts and facilitates robust downstream analysis. The PathoQC software package is available at http://sourceforge.net/projects/PathoScope/ .


F1000Research ◽  
2020 ◽  
Vol 9 ◽  
pp. 240
Author(s):  
Frédéric Jarlier ◽  
Nicolas Joly ◽  
Nicolas Fedy ◽  
Thomas Magalhaes ◽  
Leonor Sirotti ◽  
...  

Life science has entered the so-called 'big data era' where biologists, clinicians and bioinformaticians are overwhelmed with high-throughput sequencing data. While they offer new insights to decipher the genome structure they also raise major challenges to use them for daily clinical practice care and diagnosis purposes as they are bigger and bigger. Therefore, we implemented a software to reduce the time to delivery for the alignment and the sorting of high-throughput sequencing data.  Our solution is implemented using Message Passing Interface and is intended for high-performance computing architecture. The software scales linearly with respect to the size of the data and ensures a total reproducibility with the traditional tools. For example, a 300X whole genome can be aligned and sorted within less than 9 hours with 128 cores. The software offers significant speed-up using multi-cores and multi-nodes parallelization.


2018 ◽  
Vol 8 (1) ◽  
pp. 1
Author(s):  
Minal Patel ◽  
Jun Wang

In the need to characterise the genomic landscape of cancers and to establish novel biomarkers and therapeutic targets, studies have largely focused on the identification of driver mutations within the protein-coding gene regions, where the most pathogenic alterations are known to occur. However, the noncoding genome is significantly larger than its protein-coding counterpart, and evidence reveals that regulatory sequences also harbour functional mutations that significantly affect the regulation of genes and pathways implicated in cancer. Due to the sheer number of noncoding mutations (NCMs) and the limited knowledge of regulatory element functionality in cancer genomes, differentiating pathogenic mutations from background passenger noise is particularly challenging technically and computationally. Here we review various up-to-date high-throughput sequencing data/studies and in silico methods that can be employed to interrogate the noncoding genome. We aim to provide an overview of available data resources as well as computational and molecular techniques that can help and guide the search for functional NCMs in cancer genomes.


Sign in / Sign up

Export Citation Format

Share Document