scholarly journals mitoMaker: A Pipeline for Automatic Assembly and Annotation of Animal Mitochondria Using Raw NGS Data

Author(s):  
Alex Schomaker-Bastos ◽  
Francisco Prosdocimi

Next-generation sequencing is now a mature technology, allowing partial animal genomes to be produced for many clades. Though many software exist for genome assembly and annotation, a simple pipeline that allows researchers to input raw sequencing reads in fastq format and allow the retrieval of a completely assembled and annotated mitochondrial genome is still missing. mitoMaker 1.0 is a pipeline developed in python that implements (i) recursive de novo assembly of mitochondrial genomes using a set of increasing k-mers; (ii) search for the best matching result to a target mitogenome and; (iii) performs iterative reference-based strategies to optimize the assembly. After (iv) checking for circularization and (v) positioning tRNA-Phe at the beginning, (vi) geneChecker.py module performs a complete annotation of the mitochondrial genome and provides a GenBank formatted file as output.

PLoS ONE ◽  
2013 ◽  
Vol 8 (2) ◽  
pp. e56301 ◽  
Author(s):  
Chih-Ming Hung ◽  
Rong-Chien Lin ◽  
Jui-Hua Chu ◽  
Chia-Fen Yeh ◽  
Chiou-Ju Yao ◽  
...  

2020 ◽  
Author(s):  
Graham Etherington

De novo assembly of 49 mustelid whole mitochondrial genomes


2012 ◽  
Vol 40 (22) ◽  
pp. e171-e171 ◽  
Author(s):  
Daniel C. Jones ◽  
Walter L. Ruzzo ◽  
Xinxia Peng ◽  
Michael G. Katze

2015 ◽  
Vol 43 (7) ◽  
pp. e46-e46 ◽  
Author(s):  
Xutao Deng ◽  
Samia N. Naccache ◽  
Terry Ng ◽  
Scot Federman ◽  
Linlin Li ◽  
...  

Abstract Next-generation sequencing (NGS) approaches rapidly produce millions to billions of short reads, which allow pathogen detection and discovery in human clinical, animal and environmental samples. A major limitation of sequence homology-based identification for highly divergent microorganisms is the short length of reads generated by most highly parallel sequencing technologies. Short reads require a high level of sequence similarities to annotated genes to confidently predict gene function or homology. Such recognition of highly divergent homologues can be improved by reference-free (de novo) assembly of short overlapping sequence reads into larger contigs. We describe an ensemble strategy that integrates the sequential use of various de Bruijn graph and overlap-layout-consensus assemblers with a novel partitioned sub-assembly approach. We also proposed new quality metrics that are suitable for evaluating metagenome de novo assembly. We demonstrate that this new ensemble strategy tested using in silico spike-in, clinical and environmental NGS datasets achieved significantly better contigs than current approaches.


2011 ◽  
Vol 27 (15) ◽  
pp. 2031-2037 ◽  
Author(s):  
Yong Lin ◽  
Jian Li ◽  
Hui Shen ◽  
Lei Zhang ◽  
Christopher J. Papasian ◽  
...  

Blood ◽  
2011 ◽  
Vol 118 (21) ◽  
pp. 3547-3547
Author(s):  
David H Spencer ◽  
Haley Abel ◽  
Philippe Szankasi ◽  
Todd W. Kelley ◽  
Shashikant Kulkarni ◽  
...  

Abstract Abstract 3547 Introduction: Recurrent somatic mutations are valuable prognostic markers in cytogenetically normal Acute Myeloid Leukemia (AML). The most common of these mutations is a 9 to ∼150 bp internal tandem duplication (ITD) in the fms-related tyrosine kinase 3 (FLT3) gene, which is typically identified via PCR amplification and capillary electrophoresis. Since testing for individual mutations in this manner will become laborious and expensive as the number of clinically relevant mutations increases, we and others have proposed using targeted next-generation sequencing (NGS) for comprehensive detection of somatic mutations in multiple genes simultaneously. Successful application of this approach will require automated analysis methods capable of sensitive detection of a variety of mutation types, including single-base substitutions and insertions/deletions, with a low false-positive rate. However, the accuracy of current methods for identifying medium-sized insertions such as the FLT3 ITDs has not been established. Therefore, we sought to determine the ability of several common analysis tools to identify FLT3 ITDs from Illumina NGS sequence data. Methods: We performed targeted sequencing of 10 samples with known FLT3 ITDs ranging between 17 and 93 base-pairs (bp) as part of a larger test panel of 28 genes commonly mutated in AML and other malignancies. Nine of the FLT3 ITD-positive samples were from patients with newly diagnosed AML and were confirmed by PCR and capillary electrophoresis. A cancer cell line known to be heterozygous for a 30 bp FLT3 ITD, MV4-11, was also included. Indexed Illumina sequencing libraries were generated using automated library preparation and enriched for target regions using solution-phase hybridization-capture with biotinylated cRNA probes targeting exons +/− 200 bp plus 1 kb flanking the FLT3 gene and the 27 other genes in the panel. Enriched libraries were sequenced in multiplex on an Illumina HiSeq instrument using 2 × 101 bp reads. Demultiplexed reads were mapped to the hg19 reference sequence with novoalign, and indels were called in a 1 kilobase-pair region surrounding the FLT3 ITD with samtools, GATK, maq, CLC Genomics Workbench, PINDEL, and DINDEL using default parameters, in addition to de novo assembly of reads with partial similarity to the region using phrap. Insertion calls were then compared to results from PCR and capillary electrophoresis. Results: Multiplex sequencing resulted in 585 to 1,000-fold raw coverage of the FLT3 gene for the 10 study samples (Table 1). No FLT3 ITD insertions were detected in any sample using the common NGS analysis tools samtools, GATK, maq, DINDEL, and CLC Genomics Workbench. However, PINDEL identified insertions between 17 and 72 bp in 9 of 10 FLT3 ITD-positive samples. PINDEL failed to detect a 93 bp ITD insertion (the largest insertion in this set) in one patient sample, as well as an 84 bp insertion in a patient with two insertions (81 and 54 bp) detected by standard methods. De novo assembly of the FLT3 ITD region also resulted in detection of insertions in 9 of the 10 cases. No insertions were called in an additional set of 15 samples without known FLT3 ITDs. Conclusions: We evaluated the ability of several NGS analysis tools to detect previously known FLT3 ITDs in multi-gene targeted NGS data. Most of the general-purpose analysis tools we tested were unable to detect FLT3 ITD insertions. However, two approaches detected known FLT3 ITD insertions in 90% of the samples tested in this study, including the program PINDEL and de-novo assembly of the FLT3 ITD region using phrap. These results demonstrate that medium-sized FLT3 ITD insertions can be detected in clinical samples by high coverage NGS sequencing with the appropriate analysis pipeline. However, further methods for reliable detection of larger (>70bp) insertions must be developed before clinical NGS-based methods can be applied to the detection of the full spectrum of somatic mutations present in leukemias and other malignancies. Disclosures: No relevant conflicts of interest to declare.


Sign in / Sign up

Export Citation Format

Share Document