Generating long-read sequences using Oxford Nanopore Technology from Diospyros celebica genomic DNA

Abstract Objectives Development of sequencing technology has opened up vast opportunities for tree genomic research in the tropics. One of the aforesaid technologies named ONT (Oxford Nanopore Technology) has attracted researchers in undertaking testings and experiments due to its affordability and accessibility. To the best of our knowledge, there has been no published reports on the use of ONT for genomic analysis of Indonesian tree species. This progress is promising for further improvement in order to acquire more genomic data for research purposes. Therefore, the present study was carried out to determine the effectiveness of ONT in generating long-read DNA sequences using DNA isolated from leaves and wood cores of Macassar ebony (Diospyros celebica Bakh.). Data description Long-read sequences data of leaves and wood cores of Macassar ebony were generated by using the MinION device and MinKnow v3.6.5 (ONT). The obtained data, as the first long-read sequence dataset for Macassar ebony, is of great importance to conserve the genetic diversity, understanding the molecular mechanism, and sustainable use of plant genetic resources for downstream applications.

Download Full-text

Halcyon: an accurate basecaller exploiting an encoder–decoder model with monotonic attention

Bioinformatics ◽

10.1093/bioinformatics/btaa953 ◽

2020 ◽

Author(s):

Hiroki Konishi ◽

Rui Yamaguchi ◽

Kiyoshi Yamaguchi ◽

Yoichi Furukawa ◽

Seiya Imoto

Keyword(s):

Dna Sequences ◽

Third Party ◽

Nanopore Sequencing ◽

Sequencing Technology ◽

Structural Variations ◽

Haplotype Phasing ◽

Oxford Nanopore ◽

Input Signals ◽

Long Read ◽

Oxford Nanopore Technologies

Abstract Motivation In recent years, nanopore sequencing technology has enabled inexpensive long-read sequencing, which promises reads longer than a few thousand bases. Such long-read sequences contribute to the precise detection of structural variations and accurate haplotype phasing. However, deciphering precise DNA sequences from noisy and complicated nanopore raw signals remains a crucial demand for downstream analyses based on higher-quality nanopore sequencing, although various basecallers have been introduced to date. Results To address this need, we developed a novel basecaller, Halcyon, that incorporates neural-network techniques frequently used in the field of machine translation. Our model employs monotonic-attention mechanisms to learn semantic correspondences between nucleotides and signal levels without any pre-segmentation against input signals. We evaluated performance with a human whole-genome sequencing dataset and demonstrated that Halcyon outperformed existing third-party basecallers and achieved competitive performance against the latest Oxford Nanopore Technologies’ basecallers. Availabilityand implementation The source code (halcyon) can be found at https://github.com/relastle/halcyon. Contact [email protected]

Download Full-text

QAlign: Aligning nanopore reads accurately using current-level modeling

10.1101/862813 ◽

2019 ◽

Author(s):

Dhaivat Joshi ◽

Shunfu Mao ◽

Sreeram Kannan ◽

Suhas Diggavi

Keyword(s):

Reference Genome ◽

Genomic Analysis ◽

Vital Role ◽

High Error Rate ◽

Sequencing Technology ◽

Long Reads ◽

A Genome ◽

Long Read ◽

Nanopore Sequencer ◽

Sequencing Process

AbstractMotivationEfficient and accurate alignment of DNA / RNA sequence reads to each other or to a reference genome / transcriptome is an important problem in genomic analysis. Nanopore sequencing has emerged as a major sequencing technology and many long-read aligners have been designed for aligning nanopore reads. However, the high error rate makes accurate and efficient alignment difficult. Utilizing the noise and error characteristics inherent in the sequencing process properly can play a vital role in constructing a robust aligner. In this paper, we design QAlign, a pre-processor that can be used with any long-read aligner for aligning long reads to a genome / transcriptome or to other long reads. The key idea in QAlign is to convert the nucleotide reads into discretized current levels that capture the error modes of the nanopore sequencer before running it through a sequence aligner.ResultsWe show that QAlign is able to improve alignment rates from around 80% up to 90% with nanopore reads when aligning to the genome. We also show that QAlign improves the average overlap quality by 9.2%, 2.5% and 10.8% in three real datasets for read-to-read alignment. Read-to-transcriptome alignment rates are improved from 51.6% to 75.4% and 82.6% to 90% in two real datasets.Availabilityhttps://github.com/joshidhaivat/QAlign.git

Download Full-text

BleTIES: Annotation of natural genome editing in ciliates using long read sequencing

10.1101/2021.05.18.444610 ◽

2021 ◽

Author(s):

Brandon K. B. Seah ◽

Estienne C. Swart

Keyword(s):

Dna Sequences ◽

Sequence Data ◽

Low Complexity ◽

Supplementary Information ◽

Neighboring Element ◽

Sequencing Technologies ◽

Long Reads ◽

Oxford Nanopore ◽

Long Read ◽

Element Elimination

Ciliates are single-celled eukaryotes that eliminate specific, interspersed DNA sequences (internally eliminated sequences, IESs) from their genomes during development. These are challenging to annotate and assemble because IES-containing sequences are much less abundant in the cell than those without, and IES sequences themselves often contain repetitive and low-complexity sequences. Long read sequencing technologies from Pacific Biosciences and Oxford Nanopore have the potential to reconstruct longer IESs than has been possible with short reads, and also the ability to detect correlations of neighboring element elimination. Here we present BleTIES, a software toolkit for detecting, assembling, and analyzing IESs using mapped long reads. Availability and implementation: BleTIES is implemented in Python 3. Source code is available at https://github.com/Swart-lab/bleties (MIT license), and also distributed via Bioconda. Contact: [email protected] Supplementary information: Benchmarking of BleTIES with published sequence data.

Download Full-text

Effective Potentials and Orbits in Weyl Conformastatic Slender Disk

10.20944/preprints202111.0319.v1 ◽

2021 ◽

Author(s):

Paul Talbert ◽

Steven Henikoff

Keyword(s):

Cell Division ◽

Dna Sequences ◽

Histone H3 ◽

Repeated Dna ◽

Sequencing Technology ◽

Repeated Dna Sequences ◽

Effective Potentials ◽

Histone H3 Variant ◽

Long Read ◽

Chromosomal Loci

Centromeres, the chromosomal loci where spindle fibers attach during cell division to segregate chromosomes, are typically found within satellite arrays in plants and animals. Satellite arrays have been difficult to analyze because they comprise megabases of tandem head-to-tail highly repeated DNA sequences. Much evidence suggests that centromeres are epigenetically defined by the location of nucleosomes containing the centromere-specific histone H3 variant cenH3, independently of the DNA sequences where they are located; however, the reason that cenH3 nucleosomes are generally found on rapidly evolving satellite arrays has remained unclear. Recently, long read sequencing technology has clarified the structures of satellite arrays and sparked rethinking of how they evolve, while new experiments and analyses have helped bring both understanding and further speculation about the role these highly repeated sequences play in centromere identification.

Download Full-text

Improvements in the sequencing and assembly of plant genomes

Gigabyte ◽

10.46471/gigabyte.24 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10 ◽

Cited By ~ 1

Author(s):

Priyanka Sharma ◽

Othman Al-Dossary ◽

Bader Alsubaie ◽

Ibrahim Al-Mssallem ◽

Onkar Nath ◽

...

Keyword(s):

Sequence Data ◽

Linear Increase ◽

Persea Americana ◽

Plant Genome ◽

Sequencing Technology ◽

Genome Coverage ◽

Plant Genomes ◽

Oxford Nanopore ◽

Long Read ◽

Using Data

Advances in DNA sequencing have made it easier to sequence and assemble plant genomes. Here, we extend an earlier study, and compare recent methods for long read sequencing and assembly. Updated Oxford Nanopore Technology software improved assemblies. Using more accurate sequences produced by repeated sequencing of the same molecule (Pacific Biosciences HiFi) resulted in less fragmented assembly of sequencing reads. Using data for increased genome coverage resulted in longer contigs, but reduced total assembly length and improved genome completeness. The original model species, Macadamia jansenii, was also compared with three other Macadamia species, as well as avocado (Persea americana) and jojoba (Simmondsia chinensis). In these angiosperms, increasing sequence data volumes caused a linear increase in contig size, decreased assembly length and further improved already high completeness. Differences in genome size and sequence complexity influenced the success of assembly. Advances in long read sequencing technology continue to improve plant genome sequencing and assembly. However, results were improved by greater genome coverage, with the amount needed to achieve a particular level of assembly being species dependent.

Download Full-text

Chloroplast Genome Draft of Dryobalanops aromatica Generated Using Oxford Nanopore Technology and Its Potential Application for Phylogenetic Study

Forests ◽

10.3390/f12111515 ◽

2021 ◽

Vol 12 (11) ◽

pp. 1515

Author(s):

Dwi Wahyuni ◽

Fifi Gus Dwiyanti ◽

Rahadian Pratama ◽

Muhammad Majiidu ◽

Henti Hendalastuti Rachmat ◽

...

Keyword(s):

Dna Sequencing ◽

Phylogenetic Tree ◽

Chloroplast Genome ◽

Dna Sequences ◽

Genetic Relationships ◽

Phylogenetic Study ◽

Phylogenomic Analysis ◽

Rbcl Gene ◽

Oxford Nanopore ◽

Long Read

Kapur (Dryobalanops aromatica) is an important dipterocarp species currently classified as vulnerable by the IUCN Red List Threatened Species. Science-based conservation and restoration efforts are needed, which can be supported by new genomic data generated from new technologies, including MinION Oxford Nanopore Technology (ONT). ONT allows affordable long-read DNA sequencing, but this technology is still rarely applied to native Indonesian forest trees. Therefore, this study aimed to generate whole genome datasets through ONT and use part of these data to construct the draft of the chloroplast genome and analyze the universal DNA barcode-based genetic relationships for D. aromatica. The method included DNA isolation, library preparation, sequencing, bioinformatics analysis, and phylogenetic tree construction. Results showed that the DNA sequencing of D. aromatica resulted in 1.55 Gb of long-read DNA sequences from which a partial chloroplast genome (148,856 bp) was successfully constructed. The genetic relationship was analyzed using two selected DNA barcodes (rbcL and matK), and its combination showed that species of the genus Dryobalanops had a close relationship as indicated by adjacent branches between species. The phylogenetic tree of matK and the combination of the matK and rbcL genes showed that D. aromatica was closely related to Dryobalanops rappa, whereas the rbcL gene showed group separation between D. aromatica and D. rappa. Therefore, a combination of the matK and rbcL genes is recommended for future use in the phylogenetic or phylogenomic analysis of D. aromatica.

Download Full-text

QAlign: aligning nanopore reads accurately using current-level modeling

Bioinformatics ◽

10.1093/bioinformatics/btaa875 ◽

2020 ◽

Author(s):

Dhaivat Joshi ◽

Shunfu Mao ◽

Sreeram Kannan ◽

Suhas Diggavi

Keyword(s):

Reference Genome ◽

Genomic Analysis ◽

Vital Role ◽

Supplementary Information ◽

Sequencing Technology ◽

Long Reads ◽

A Genome ◽

Long Read ◽

Nanopore Sequencer ◽

Sequencing Process

Abstract Motivation Efficient and accurate alignment of DNA/RNA sequence reads to each other or to a reference genome/transcriptome is an important problem in genomic analysis. Nanopore sequencing has emerged as a major sequencing technology and many long-read aligners have been designed for aligning nanopore reads. However, the high error rate makes accurate and efficient alignment difficult. Utilizing the noise and error characteristics inherent in the sequencing process properly can play a vital role in constructing a robust aligner. In this article, we design QAlign, a pre-processor that can be used with any long-read aligner for aligning long reads to a genome/transcriptome or to other long reads. The key idea in QAlign is to convert the nucleotide reads into discretized current levels that capture the error modes of the nanopore sequencer before running it through a sequence aligner. Results We show that QAlign is able to improve alignment rates from around 80% up to 90% with nanopore reads when aligning to the genome. We also show that QAlign improves the average overlap quality by 9.2, 2.5 and 10.8% in three real datasets for read-to-read alignment. Read-to-transcriptome alignment rates are improved from 51.6% to 75.4% and 82.6% to 90% in two real datasets. Availability and implementation https://github.com/joshidhaivat/QAlign.git. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

A Peek into the Plasmidome of Global Sewage

10.1101/2021.03.08.434362 ◽

2021 ◽

Author(s):

Philipp Kirstahler ◽

Frederik Teudt ◽

Saria Otani ◽

Frank M. Aarestrup ◽

Sünje Johanna Pamp

Keyword(s):

Dna Sequences ◽

Selective Advantage ◽

Global Scale ◽

Natural Environments ◽

Valuable Insight ◽

Oxford Nanopore ◽

Long Read ◽

Range Of Functions ◽

Dna Elements ◽

Related Proteins

AbstractPlasmids can provide a selective advantage for microorganisms to survive and adapt to new environmental conditions. Plasmid-encoded traits, such as antimicrobial resistance (AMR) or virulence, impact on the ecology and evolution of bacteria and can significantly influence the burden of infectious diseases. Insight about the identity and functions encoded on plasmids on the global scale are largely lacking. Here we investigate the plasmidome of 24 samples (22 countries, 5 continents) from the global sewage surveillance project. We obtained 105 Gbp Oxford Nanopore and 167 Gbp Illumina DNA sequences from plasmid DNA preparations and assembled 165,302 contigs (159,322 circular). Of these, 58,429 encoded for genes with plasmid-related and 11,222 with virus/phage-related proteins. About 90% of the circular DNA elements did not have any similarity to known plasmids. Those that exhibited similarity, had similarity to plasmids whose hosts were previously detected in these sewage samples (e.g. Acinetobacter, Escherichia, Moraxella, Enterobacter, Bacteroides, and Klebsiella). Some AMR classes were detected at a higher abundance in plasmidomes (e.g. macrolide-lincosamide-streptogramin B, macrolide, and quinolone), as compared to the respective complex sewage samples. In addition to AMR genes, a range of functions were encoded on the candidate plasmids, including plasmid replication and maintenance, mobilization, and conjugation. In summary, we describe a laboratory and bioinformatics workflow for the recovery of plasmids and other potential extrachromosomal DNA elements from complex microbiomes. Moreover, the obtained data could provide further valuable insight into the ecology and evolution of microbiomes, knowledge about AMR transmission, and the discovery of novel functions.ImportanceThis is, to the best of our knowledge, the first study to investigate plasmidomes at a global scale using long read sequencing from complex untreated domestic sewage. Previous metagenomic surveys have detected AMR genes in a variety of environments, including sewage. However, it is unknown whether the AMR genes were encoded on the microbial chromosome or are located on extrachromosomal elements, such as plasmids. Using our approach, we recovered a large number of plasmids, of which most appear novel. We identified distinct AMR genes that were preferentially located on plasmids, potentially contributing to their transmissibility. Overall, plasmids are of great importance for the biology of microorganisms in their natural environments (free-living and host-associated), as well as molecular biology, and biotechnology. Plasmidome collections may therefore be valuable resources for the discovery of fundamental biological mechanisms and novel functions useful in a variety of contexts.

Download Full-text

BleTIES: Annotation of natural genome editing in ciliates using long read sequencing

Bioinformatics ◽

10.1093/bioinformatics/btab613 ◽

2021 ◽

Author(s):

Brandon K B Seah ◽

Estienne C Swart

Keyword(s):

Dna Sequences ◽

Sequence Data ◽

Low Complexity ◽

Supplementary Information ◽

Software Toolkit ◽

Assembly Strategy ◽

Sequencing Technologies ◽

Long Reads ◽

Oxford Nanopore ◽

Long Read

Abstract Summary Ciliates are single-celled eukaryotes that eliminate specific, interspersed DNA sequences (internally eliminated sequences, IESs) from their genomes during development. These are challenging to annotate and assemble because IES-containing sequences are typically much less abundant in the cell than those without, and IES sequences themselves often contain repetitive and low-complexity sequences. Long read sequencing technologies from Pacific Biosciences and Oxford Nanopore have the potential to reconstruct longer IESs than has been possible with short reads, but require a different assembly strategy. Here we present BleTIES, a software toolkit for detecting, assembling, and analyzing IESs using mapped long reads. Availability and implementation BleTIES is implemented in Python 3. Source code is available at https://github.com/Swart-lab/bleties (MIT license), and also distributed via Bioconda. Supplementary information Benchmarking of BleTIES with published sequence data.

Download Full-text

Improvements in the Sequencing and Assembly of Plant Genomes

10.1101/2021.01.22.427724 ◽

2021 ◽

Author(s):

Priyanka Sharma ◽

Othman Aldossary ◽

Bader Alsubaie ◽

Ibrahim Al-Mssallem ◽

Onka Nath ◽

...

Keyword(s):

Sequence Data ◽

Linear Increase ◽

Persea Americana ◽

Sequencing Technology ◽

Model Species ◽

Genome Coverage ◽

Plant Genomes ◽

Sequence Complexity ◽

Oxford Nanopore ◽

Long Read

AbstractBackgroundAdvances in DNA sequencing have reduced the difficulty of sequencing and assembling plant genomes. A range of methods for long read sequencing and assembly have been recently compared and we now extend the earlier study and report a comparison with more recent methods.ResultsUpdated Oxford Nanopore Technology software supported improved assemblies. The use of more accurate sequences produced by repeated sequencing of the same molecule (PacBio HiFi) resulted in much less fragmented assembly of sequencing reads. The use of more data to give increased genome coverage resulted in longer contigs (higher N50) but reduced the total length of the assemblies and improved genome completeness (BUSCO). The original model species, Macadamia jansenii, a basal eudicot, was also compared with the 3 other Macadamia species and with avocado (Persea americana), a magnoliid, and jojoba (Simmondsia chinensis) a core eudicot. In these phylogenetically diverse angiosperms, increasing sequence data volumes also caused a highly linear increase in contig size, decreased assembly length and further improved already high completeness. Differences in genome size and sequence complexity apparently influenced the success of assembly from these different species.ConclusionsAdvances in long read sequencing technology have continued to significantly improve the results of sequencing and assembly of plant genomes. However, results were consistently improved by greater genome coverage (using an increased number of reads) with the amount needed to achieve a particular level of assembly being species dependant.

Download Full-text