Mitochondrial genome assembly v1

Next-generation sequencing is now a mature technology, allowing partial animal genomes to be produced for many clades. Though many software exist for genome assembly and annotation, a simple pipeline that allows researchers to input raw sequencing reads in fastq format and allow the retrieval of a completely assembled and annotated mitochondrial genome is still missing. mitoMaker 1.0 is a pipeline developed in python that implements (i) recursive de novo assembly of mitochondrial genomes using a set of increasing k-mers; (ii) search for the best matching result to a target mitogenome and; (iii) performs iterative reference-based strategies to optimize the assembly. After (iv) checking for circularization and (v) positioning tRNA-Phe at the beginning, (vi) geneChecker.py module performs a complete annotation of the mitochondrial genome and provides a GenBank formatted file as output.

Download Full-text

A long reads-based de-novo assembly of the genome of the Arlee homozygous line reveals chromosomal rearrangements in rainbow trout

G3 Genes|Genome|Genetics ◽

10.1093/g3journal/jkab052 ◽

2021 ◽

Author(s):

Guangtu Gao ◽

Susana Magadan ◽

Geoffrey C Waldbieser ◽

Ramey C Youngblood ◽

Paul A Wheeler ◽

...

Keyword(s):

Rainbow Trout ◽

Chromosome Number ◽

Genome Assembly ◽

De Novo Assembly ◽

De Novo ◽

Sequence Data ◽

Structural Variations ◽

High Coverage ◽

Haploid Chromosome Number ◽

Long Reads

Abstract Currently, there is still a need to improve the contiguity of the rainbow trout reference genome and to use multiple genetic backgrounds that will represent the genetic diversity of this species. The Arlee doubled haploid line was originated from a domesticated hatchery strain that was originally collected from the northern California coast. The Canu pipeline was used to generate the Arlee line genome de-novo assembly from high coverage PacBio long-reads sequence data. The assembly was further improved with Bionano optical maps and Hi-C proximity ligation sequence data to generate 32 major scaffolds corresponding to the karyotype of the Arlee line (2 N = 64). It is composed of 938 scaffolds with N50 of 39.16 Mb and a total length of 2.33 Gb, of which ∼95% was in 32 chromosome sequences with only 438 gaps between contigs and scaffolds. In rainbow trout the haploid chromosome number can vary from 29 to 32. In the Arlee karyotype the haploid chromosome number is 32 because chromosomes Omy04, 14 and 25 are divided into six acrocentric chromosomes. Additional structural variations that were identified in the Arlee genome included the major inversions on chromosomes Omy05 and Omy20 and additional 15 smaller inversions that will require further validation. This is also the first rainbow trout genome assembly that includes a scaffold with the sex-determination gene (sdY) in the chromosome Y sequence. The utility of this genome assembly is demonstrated through the improved annotation of the duplicated genome loci that harbor the IGH genes on chromosomes Omy12 and Omy13.

Download Full-text

Accurate long-read de novo assembly evaluation with Inspector

Genome Biology ◽

10.1186/s13059-021-02527-4 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Yu Chen ◽

Yixin Zhang ◽

Amy Y. Wang ◽

Min Gao ◽

Zechen Chong

Keyword(s):

Genome Assembly ◽

De Novo Assembly ◽

In Silico ◽

Large Scale ◽

De Novo ◽

Small Scale ◽

De Novo Genome Assembly ◽

Consensus Sequences ◽

Assembly Evaluation ◽

Long Read

AbstractLong-read de novo genome assembly continues to advance rapidly. However, there is a lack of effective tools to accurately evaluate the assembly results, especially for structural errors. We present Inspector, a reference-free long-read de novo assembly evaluator which faithfully reports types of errors and their precise locations. Notably, Inspector can correct the assembly errors based on consensus sequences derived from raw reads covering erroneous regions. Based on in silico and long-read assembly results from multiple long-read data and assemblers, we demonstrate that in addition to providing generic metrics, Inspector can accurately identify both large-scale and small-scale assembly errors.

Download Full-text

Optimizing de novo genome assembly from PCR-amplified metagenomes

10.7287/peerj.preprints.27453 ◽

2018 ◽

Author(s):

Simon Roux ◽

Gareth Trubl ◽

Danielle Goudeau ◽

Nandita Nath ◽

Estelle Couradeau ◽

...

Keyword(s):

Genome Assembly ◽

De Novo Assembly ◽

De Novo ◽

Pcr Amplification ◽

Error Rates ◽

De Novo Genome Assembly ◽

Low Input ◽

Assembly Algorithm ◽

Coverage Bias ◽

Assembly Pipeline

Background. Metagenomics has transformed our understanding of microbial diversity across ecosystems, with recent advances enabling de novo assembly of genomes from metagenomes. These metagenome-assembled genomes are critical to provide ecological, evolutionary, and metabolic context for all the microbes and viruses yet to be cultivated. Metagenomes can now be generated from nanogram to subnanogram amounts of DNA. However, these libraries require several rounds of PCR amplification before sequencing, and recent data suggest these typically yield smaller and more fragmented assemblies than regular metagenomes. Methods. Here we evaluate de novo assembly methods of 169 PCR-amplified metagenomes, including 25 for which an unamplified counterpart is available, to optimize specific assembly approaches for PCR-amplified libraries. We first evaluated coverage bias by mapping reads from PCR-amplified metagenomes onto reference contigs obtained from unamplified metagenomes of the same samples. Then, we compared different assembly pipelines in terms of assembly size (number of bp in contigs ≥ 10kb) and error rates to evaluate which are the best suited for PCR-amplified metagenomes. Results. Read mapping analyses revealed that the depth of coverage within individual genomes is significantly more uneven in PCR-amplified datasets versus unamplified metagenomes, with regions of high depth of coverage enriched in short inserts. This enrichment scales with the number of PCR cycles performed, and is presumably due to preferential amplification of short inserts. Standard assembly pipelines are confounded by this type of coverage unevenness, so we evaluated other assembly options to mitigate these issues. We found that a pipeline combining read deduplication and an assembly algorithm originally designed to recover genomes from libraries generated after whole genome amplification (single-cell SPAdes) frequently improved assembly of contigs ≥ 10kb by 10 to 100-fold for low input metagenomes. Conclusions. PCR-amplified metagenomes have enabled scientists to explore communities traditionally challenging to describe, including some with extremely low biomass or from which DNA is particularly difficult to extract. Here we show that a modified assembly pipeline can lead to an improved de novo genome assembly from PCR-amplified datasets, and enables a better genome recovery from low input metagenomes.

Download Full-text

Index-Free De Novo Assembly and Deconvolution of Mixed Mitochondrial Genomes

Genome Biology and Evolution ◽

10.1093/gbe/evq029 ◽

2010 ◽

Vol 2 (0) ◽

pp. 410-424 ◽

Cited By ~ 18

Author(s):

B. J. McComish ◽

S. F. K. Hills ◽

P. J. Biggs ◽

D. Penny

Keyword(s):

De Novo Assembly ◽

De Novo ◽

Mitochondrial Genomes

Download Full-text

trio-sga: facilitating de novo assembly of highly heterozygous genomes with parent-child trios

10.1101/051516 ◽

2016 ◽

Cited By ~ 9

Author(s):

Milan Malinsky ◽

Jared T. Simpson ◽

Richard Durbin

Keyword(s):

Dna Sequence ◽

Genome Assembly ◽

De Novo Assembly ◽

De Novo ◽

Genomic Data ◽

The Other ◽

Phase Information ◽

The Third ◽

Cross Platform ◽

Haplotype Phase

AbstractMotivationMost DNA sequence in diploid organisms is found in two copies, one contributed by the mother and the other by the father. The high density of differences between the maternally and paternally contributed sequences (heterozygous sites) in some organisms makes de novo genome assembly very challenging, even for algorithms specifically designed to deal with these cases. Therefore, various approaches, most commonly inbreeding in the laboratory, are used to reduce heterozygosity in genomic data prior to assembly. However, many species are not amenable to these techniques.ResultsWe introduce trio-sga, a set of three algorithms designed to take advantage of mother-father-offspring trio sequencing to facilitate better quality genome assembly in organisms with moderate to high levels of heterozygosity. Two of the algorithms use haplotype phase information present in the trio data to eliminate the majority of heterozygous sites before the assembly commences. The third algorithm is designed to reduce sequencing costs by enabling the use of parents’ reads in the assembly of the genome of the offspring. We test these algorithms on a ‘simulated trio’ from four hap-loid datasets, and further demonstrate their performance by assembling three highly heterozygous Heliconius butterfly genomes. While the implementation of trio-sga is tuned towards Illumina-generated data, we note that the trio approach to reducing heterozygosity is likely to have cross-platform utility for de novo assembly.

Download Full-text

Long-read assemblies reveal structural diversity in genomes of organelles - an example with Acacia pycnantha

10.1101/2020.12.22.423164 ◽

2020 ◽

Author(s):

Anna E. Syme ◽

Todd G.B. McLay ◽

Frank Udovicic ◽

David J. Cantrill ◽

Daniel J. Murphy

Keyword(s):

Mitochondrial Genome ◽

Chloroplast Genome ◽

De Novo ◽

Genomic Structure ◽

Structural Diversity ◽

Mitochondrial Genomes ◽

Long Reads ◽

Organelle Genomes ◽

Long Read ◽

Assembly Algorithms

AbstractAlthough organelle genomes are typically represented as single, static, circular molecules, there is evidence that the chloroplast genome exists in two structural haplotypes and that the mitochondrial genome can display multiple circular, linear or branching forms. We sequenced and assembled chloroplast and mitochondrial genomes of the Golden Wattle, Acacia pycnantha, using long reads, iterative baiting to extract organelle-only reads, and several assembly algorithms to explore genomic structure. Using a de novo assembly approach agnostic to previous hypotheses about structure, we found different assemblies revealed contrasting arrangements of genomic segments; a hypothesis supported by mapped reads spanning alternate paths.

Download Full-text

De novo assembly of the carrot mitochondrial genome using next generation sequencing of whole genomic DNA provides first evidence of DNA transfer into an angiosperm plastid genome

BMC Plant Biology ◽

10.1186/1471-2229-12-61 ◽

2012 ◽

Vol 12 (1) ◽

pp. 61 ◽

Cited By ~ 77

Author(s):

Massimo Iorizzo ◽

Douglas Senalik ◽

Marek Szklarczyk ◽

Dariusz Grzebelus ◽

David Spooner ◽

...

Keyword(s):

Next Generation Sequencing ◽

Mitochondrial Genome ◽

De Novo Assembly ◽

Genomic Dna ◽

Plastid Genome ◽

De Novo ◽

Dna Transfer ◽

Next Generation ◽

Generation Sequencing

Download Full-text

Assembly by Reduced Complexity (ARC): a hybrid approach for targeted assembly of homologous sequences.

10.1101/014662 ◽

2015 ◽

Cited By ~ 17

Author(s):

Samual S Hunter ◽

Robert T Lyon ◽

Brice A.J. Sarver ◽

Kayla Hardwick ◽

Larry J Forney ◽

...

Keyword(s):

De Novo Assembly ◽

High Throughput Sequencing ◽

De Novo ◽

Hybrid Approach ◽

Reference Sequence ◽

Model Organisms ◽

Exome Capture ◽

Mitochondrial Genomes ◽

Homologous Sequences ◽

Reduced Complexity

Analysis of High-throughput sequencing (HTS) data is a difficult problem, especially in the context of non-model organisms where comparison of homologous sequences may be hindered by the lack of a close reference genome. Current mapping-based methods rely on the availability of a highly similar reference sequence, whereas de novo assemblies produce anonymous (unannotated) contigs that are not easily compared across samples. Here, we present Assembly by Reduced Complexity (ARC) a hybrid mapping and assembly approach for targeted assembly of homologous sequences. ARC is an open-source project (http://ibest.github.io/ARC/) implemented in the Python language and consists of the following stages: 1) align sequence reads to reference targets, 2) use alignment results to distribute reads into target specific bins, 3) perform assemblies for each bin (target) to produce contigs, and 4) replace previous reference targets with assembled contigs and iterate. We show that ARC is able to assemble high quality, unbiased mitochondrial genomes seeded from 11 progressively divergent references, and is able to assemble full mitochondrial genomes starting from short, poor quality ancient DNA reads. We also show ARC compares favorably to de novo assembly of a large exome capture dataset for CPU and memory requirements; assembling 7,627 individual targets across 55 samples, completing over 1.3 million assemblies in less than 78 hours, while using under 32 Gb of system memory. ARC breaks the assembly problem down into many smaller problems, solving the anonymous contig and poor scaling inherent in some de novo assembly methods and reference bias inherent in traditional read mapping.

Download Full-text

Comparison of the mitochondrial genomes of the Old and New World strains of the legume pod borer, Maruca vitrata (Lepidoptera: Crambidae)

International Journal of Tropical Insect Science ◽

10.1017/s1742758417000157 ◽

2017 ◽

Vol 37 (03) ◽

pp. 125-136

Author(s):

Tolulope A. Agunbiade ◽

Brad S. Coates ◽

Weilin Sun ◽

Mu-Rou Tsai ◽

Maria Carmen Valero ◽

...

Keyword(s):

Mitochondrial Genome ◽

New World ◽

De Novo ◽

Sequence Data ◽

Complete Mitochondrial Genome ◽

Maruca Vitrata ◽

Mitochondrial Genomes ◽

Old World ◽

Pod Borer ◽

Legume Pod Borer

Abstract Maruca vitrata (Fabricius, 1787) is a cryptic pantropical species of Lepidoptera that are comprised of two unique strains that inhabit the American continents (New World strain) and regions spanning from Africa through to Southeast Asia and Northern Australia (Old World strain). In this study, we de novo assembled the complete mitochondrial genome sequence of the New World legume pod borer, M. vitrata, from shotgun sequence data generated on an Illumina HiSeq 2000. Phylogenomic comparisons were made with other previously published mitochondrial genome sequences from crambid moths, including the Old World strain of M. vitrata. The 15,385 bp M. vitrata (New World) sequence has an 80.7% A+T content and encodes the 13 protein-coding, 2 ribosomal RNA and 22 transfer RNA genes in the typical orientation and arrangement of lepidopteran mitochondrial DNAs. Mitochondrial genome-wide comparison between the New and Old World strains of M. vitrata detected 476 polymorphic sites (4.23% nucleotide divergence) with an excess of synonymous substitution as a result of purifying selection. Furthermore, this level of sequence variation suggests that these strains diverged from ~1.83 to 2.12 million years ago, assuming a linear rate of short-term substitution. The de novo assemblies of mitochondrial genomes from next-generation sequencing (NGS) reads provide readily available data for similar comparative studies.

Download Full-text