scholarly journals Long-read assemblies reveal structural diversity in genomes of organelles - an example with Acacia pycnantha

2020 ◽  
Author(s):  
Anna E. Syme ◽  
Todd G.B. McLay ◽  
Frank Udovicic ◽  
David J. Cantrill ◽  
Daniel J. Murphy

AbstractAlthough organelle genomes are typically represented as single, static, circular molecules, there is evidence that the chloroplast genome exists in two structural haplotypes and that the mitochondrial genome can display multiple circular, linear or branching forms. We sequenced and assembled chloroplast and mitochondrial genomes of the Golden Wattle, Acacia pycnantha, using long reads, iterative baiting to extract organelle-only reads, and several assembly algorithms to explore genomic structure. Using a de novo assembly approach agnostic to previous hypotheses about structure, we found different assemblies revealed contrasting arrangements of genomic segments; a hypothesis supported by mapped reads spanning alternate paths.

2015 ◽  
Author(s):  
Concita Cantarella ◽  
Rachele Tamburino ◽  
Nunzia Scotti ◽  
Teodoro Cardi ◽  
Nunzio D'Agostino

Mitochondrial genomes in plants are larger and more complex than in other eukaryotes due to their recombinogenic nature as widely demonstrated. The mitochondrial DNA (mtDNA) is usually represented as a single circular map, the so-called master molecule. This molecule includes repeated sequences, some of which are able to recombine, generating sub-genomic molecules in various amounts, depending on the balance between their recombination and replication rates. Recent advances in DNA sequencing technology gave a huge boost to plant mitochondrial genome projects. Conventional approaches to mitochondrial genome sequencing involve extraction and enrichment of mitochondrial DNA, cloning, and sequencing. Large repeats and the dynamic mitochondrial genome organization complicate de novo sequence assembly from short reads. The PacBio RS long-read sequencing platform offers the promise of increased read length and unbiased genome coverage and thus the potential to produce genome sequence data of a finished quality (fewer gaps and longer contigs). However, recently published articles revealed that PacBio sequencing is still not sufficient to address mtDNA assembly-related issues. Here we present a preliminary hybrid assembly of a potato mtDNA based on both PacBio and Illumina reads and debate the strategies and obstacles in assembling genomes containing repeated sequences that are recombinationally active and serve as a constant source of rearrangements.


2015 ◽  
Author(s):  
Concita Cantarella ◽  
Rachele Tamburino ◽  
Nunzia Scotti ◽  
Teodoro Cardi ◽  
Nunzio D'Agostino

Mitochondrial genomes in plants are larger and more complex than in other eukaryotes due to their recombinogenic nature as widely demonstrated. The mitochondrial DNA (mtDNA) is usually represented as a single circular map, the so-called master molecule. This molecule includes repeated sequences, some of which are able to recombine, generating sub-genomic molecules in various amounts, depending on the balance between their recombination and replication rates. Recent advances in DNA sequencing technology gave a huge boost to plant mitochondrial genome projects. Conventional approaches to mitochondrial genome sequencing involve extraction and enrichment of mitochondrial DNA, cloning, and sequencing. Large repeats and the dynamic mitochondrial genome organization complicate de novo sequence assembly from short reads. The PacBio RS long-read sequencing platform offers the promise of increased read length and unbiased genome coverage and thus the potential to produce genome sequence data of a finished quality (fewer gaps and longer contigs). However, recently published articles revealed that PacBio sequencing is still not sufficient to address mtDNA assembly-related issues. Here we present a preliminary hybrid assembly of a potato mtDNA based on both PacBio and Illumina reads and debate the strategies and obstacles in assembling genomes containing repeated sequences that are recombinationally active and serve as a constant source of rearrangements.


Author(s):  
Shannon J Sibbald ◽  
Maggie Lawton ◽  
John M Archibald

Abstract The Pelagophyceae are marine stramenopile algae that include Aureoumbra lagunensis and Aureococcus anophagefferens, two microbial species notorious for causing harmful algal blooms. Despite their ecological significance, relatively few genomic studies of pelagophytes have been carried out. To improve understanding of the biology and evolution of pelagophyte algae, we sequenced complete mitochondrial genomes for A. lagunensis (CCMP1510), Pelagomonas calceolata (CCMP1756) and five strains of A. anophagefferens (CCMP1707, CCMP1708, CCMP1850, CCMP1984 and CCMP3368) using Nanopore long-read sequencing. All pelagophyte mitochondrial genomes assembled into single, circular mapping contigs between 39,376 base-pairs (bp) (P. calceolata) and 55,968 bp (A. lagunensis) in size. Mitochondrial genomes for the five A. anophagefferens strains varied slightly in length (42,401 bp—42,621 bp) and were 99.4%-100.0% identical. Gene content and order was highly conserved between the A. anophagefferens and P. calceolata genomes, with the only major difference being a unique region in A. anophagefferens containing DNA adenine and cytosine methyltransferase (dam/dcm) genes that appear to be the product of lateral gene transfer from a prokaryotic or viral donor. While the A. lagunensis mitochondrial genome shares seven distinct syntenic blocks with the other pelagophyte genomes, it has a tandem repeat expansion comprising ∼40% of its length, and lacks identifiable rps19 and glycine tRNA genes. Laterally acquired self-splicing introns were also found in the 23S rRNA (rnl) gene of P. calceolata and the coxI gene of the five A. anophagefferens genomes. Overall, these data provide baseline knowledge about the genetic diversity of bloom-forming pelagophytes relative to non-bloom-forming species.


2020 ◽  
Author(s):  
Graham Etherington

De novo assembly of 49 mustelid whole mitochondrial genomes


DNA Research ◽  
2020 ◽  
Vol 27 (3) ◽  
Author(s):  
Rei Kajitani ◽  
Dai Yoshimura ◽  
Yoshitoshi Ogura ◽  
Yasuhiro Gotoh ◽  
Tetsuya Hayashi ◽  
...  

Abstract De novo assembly of short DNA reads remains an essential technology, especially for large-scale projects and high-resolution variant analyses in epidemiology. However, the existing tools often lack sufficient accuracy required to compare closely related strains. To facilitate such studies on bacterial genomes, we developed Platanus_B, a de novo assembler that employs iterations of multiple error-removal algorithms. The benchmarks demonstrated the superior accuracy and high contiguity of Platanus_B, in addition to its ability to enhance the hybrid assembly of both short and nanopore long reads. Although the hybrid strategies for short and long reads were effective in achieving near full-length genomes, we found that short-read-only assemblies generated with Platanus_B were sufficient to obtain ≥90% of exact coding sequences in most cases. In addition, while nanopore long-read-only assemblies lacked fine-scale accuracies, inclusion of short reads was effective in improving the accuracies. Platanus_B can, therefore, be used for comprehensive genomic surveillances of bacterial pathogens and high-resolution phylogenomic analyses of a wide range of bacteria.


2020 ◽  
Author(s):  
Yuya Kiguchi ◽  
Suguru Nishijima ◽  
Naveen Kumar ◽  
Masahira Hattori ◽  
Wataru Suda

Abstract Background: The ecological and biological features of the indigenous phage community (virome) in the human gut microbiome are poorly understood, possibly due to many fragmented contigs and fewer complete genomes based on conventional short-read metagenomics. Long-read sequencing technologies have attracted attention as an alternative approach to reconstruct long and accurate contigs from microbial communities. However, the impact of long-read metagenomics on human gut virome analysis has not been well evaluated. Results: Here we present chimera-less PacBio long-read metagenomics of multiple displacement amplification (MDA)-treated human gut virome DNA. The method included the development of a novel bioinformatics tool, SACRA (Split Amplified Chimeric Read Algorithm), which efficiently detects and splits numerous chimeric reads in PacBio reads from the MDA-treated virome samples. SACRA treatment of PacBio reads from five samples markedly reduced the average chimera ratio from 72 to 1.5%, generating chimera-less PacBio reads with an average read-length of 1.8 kb. De novo assembly of the chimera-less long reads generated contigs with an average N50 length of 11.1 kb, whereas those of MiSeq short reads from the same samples were 0.7 kb, dramatically improving contig extension. Alignment of both contig sets generated 378 high-quality merged contigs (MCs) composed of the minimum scaffolds of 434 MiSeq and 637 PacBio contigs, respectively, and also identified numerous MiSeq short fragmented contigs ≤500 bp additionally aligned to MCs, which possibly originated from a small fraction of MiSeq chimeric reads. The alignment also revealed that fragmentations of the scaffolded MiSeq contigs were caused primarily by genomic complexity of the community, including local repeats, hypervariable regions, and highly conserved sequences in and between the phage genomes. We identified 142 complete and near-complete phage genomes including 108 novel genomes, varying from 5 to 185 kb in length, the majority of which were predicted to be Microviridae phages including several variants with homologous but distinct genomes, which were fragmented in MiSeq contigs. Conclusions: Long-read metagenomics coupled with SACRA provides an improved method to reconstruct accurate and extended phage genomes from MDA-treated virome samples of the human gut, and potentially from other environmental virome samples.


2019 ◽  
Author(s):  
Shaun D. Jackman ◽  
Lauren Coombe ◽  
René L. Warren ◽  
Heather Kirk ◽  
Eva Trinh ◽  
...  

AbstractPlant mitochondrial genomes vary widely in size. Although many plant mitochondrial genomes have been sequenced and assembled, the vast majority are of angiosperms, and few are of gymnosperms. Most plant mitochondrial genomes are smaller than a megabase, with a few notable exceptions. We have sequenced and assembled the 5.5 Mbp mitochondrial genome of Sitka spruce (Picea sitchensis), the largest complete mitochondrial genome of a gymnosperm. We sequenced the whole genome using Oxford Nanopore MinION, and then identified contigs of mitochondrial origin assembled from these long reads. The assembly graph shows a multipartite genome structure, composed of one smaller 168 kbp circular segment of DNA, and a larger 5.4 Mbp component with a branching structure. The assembly graph gives insight into a putative complex physical genome structure, and its branching points may represent active sites of recombination.


2021 ◽  
Vol 12 ◽  
Author(s):  
Konstantin A. Shestibratov ◽  
Oleg Yu. Baranov ◽  
Eugenia N. Mescherova ◽  
Pavel S. Kiryanov ◽  
Stanislav V. Panteleev ◽  
...  

Curly birch [Betula pendula var. carelica (Merckl.) Hämet-Ahti] is a relatively rare variety of silver birch (B. pendula Roth) that occurs mainly in Northern Europe and northwest part of Russia (Karelia). It is famous for the beautiful decorative texture of wood. Abnormal xylogenesis underlying this trait is heritable, but its genetic mechanism has not yet been fully understood. The high number of potentially informative genetic markers can be identified through sequencing nuclear and organelle genomes. Here, the de novo assembly, complete nucleotide sequence, and annotation of the chloroplast genome (plastome) of curly birch are presented for the first time. The complete plastome length is 160,523 bp. It contains 82 genes encoding structural and enzymatic proteins, 37 transfer RNAs (tRNAs), and eight ribosomal RNAs (rRNAs). The chloroplast DNA (cpDNA) is AT-rich containing 31.5% of A and 32.5% of T nucleotides. The GC-rich regions represent inverted repeats IR1 and IR2 containing genes of rRNAs (5S, 4.5S, 23S, and 16S) and tRNAs (trnV, trnI, and trnA). A high content of GC was found in rRNA (55.2%) and tRNA (53.2%) genes, but only 37.0% in protein-coding genes. In total, 384 microsatellite or simple sequence repeat (SSR) loci were found, mostly with mononucleotide motifs (92% of all loci) and predominantly A or T motifs (94% of all mononucleotide motifs). Comparative analysis of cpDNA in different plant species revealed high structural and functional conservatism in organization of the angiosperm plastomes, while the level of differences depends on the phylogenetic relationship. The structural and functional organization of plastome in curly birch was similar to cpDNA in other species of woody plants. Finally, the identified cpDNA sequence variation will allow to develop useful genetic markers.


2017 ◽  
Author(s):  
Alex Di Genova ◽  
Gonzalo A. Ruz ◽  
Marie-France Sagot ◽  
Alejandro Maass

ABSTRACTLong read sequencing technologies are the ultimate solution for genome repeats, allowing near reference level reconstructions of large genomes. However, long read de novo assembly pipelines are computationally intense and require a considerable amount of coverage, thereby hindering their broad application to the assembly of large genomes. Alternatively, hybrid assembly methods which combine short and long read sequencing technologies can reduce the time and cost required to produce de novo assemblies of large genomes. In this paper, we propose a new method, called FAST-SG, which uses a new ultra-fast alignment-free algorithm specifically designed for constructing a scaffolding graph using light-weight data structures. FAST-SG can construct the graph from either short or long reads. This allows the reuse of efficient algorithms designed for short read data and permits the definition of novel modular hybrid assembly pipelines. Using comprehensive standard datasets and benchmarks, we show how FAST-SG outperforms the state-of-the-art short read aligners when building the scaffolding graph, and can be used to extract linking information from either raw or error-corrected long reads. We also show how a hybrid assembly approach using FAST-SG with shallow long read coverage (5X) and moderate computational resources can produce long-range and accurate reconstructions of the genomes of Arabidopsis thaliana (Ler-0) and human (NA12878).


2018 ◽  
Author(s):  
Haig Djambazian ◽  
Anthony Bayega ◽  
Konstantina T. Tsoumani ◽  
Efthimia Sagri ◽  
Maria-Eleni Gregoriou ◽  
...  

AbstractLong-read sequencing has greatly contributed to the generation of high quality assemblies, albeit at a high cost. It is also not always clear how to combine sequencing platforms. We sequenced the genome of the olive fruit fly (Bactrocera oleae), the most important pest in the olive fruits agribusiness industry, using Illumina short-reads, mate-pairs, 10x Genomics linked-reads, Pacific Biosciences (PacBio), and Oxford Nanopore Technologies (ONT). The 10x linked-reads assembly gave the most contiguous assembly with an N50 of 2.16 Mb. Scaffolding the linked-reads assembly using long-reads from ONT gave a more contiguous assembly with scaffold N50 of 4.59 Mb. We also present the most extensive transcriptome datasets of the olive fly derived from different tissues and stages of development. Finally, we used the Chromosome Quotient method to identify Y-chromosome scaffolds and show that the long-reads based assembly generates very highly contiguous Y-chromosome assembly.JR is a member of the MinION Access Program (MAP) and has received free-of-charge flow cells and sequencing kits from Oxford Nanopore Technologies for other projects. JR has had no other financial support from ONT.AB has received re-imbursement for travel costs associated with attending Nanopore Community meeting 2018, a meeting organized my Oxford Nanopore Technologies.


Sign in / Sign up

Export Citation Format

Share Document