scholarly journals A MinION-based pipeline for fast and cost-effective DNA barcoding

2018 ◽  
Author(s):  
Amrita Srivathsan ◽  
Bilgenur Baloğlu ◽  
Wendy Wang ◽  
Wei Xin Tan ◽  
Denis Bertrand ◽  
...  

ABSTRACTDNA barcodes are useful for species discovery and species identification, but obtaining barcodes currently requires a well-equipped molecular laboratory, is time-consuming, and/or expensive. We here address these issues by developing a barcoding pipeline for Oxford Nanopore MinION™ and demonstrate that one flowcell can generate barcodes for ∼500 specimens despite high base-call error rates of MinION™. The pipeline overcomes the errors by first summarizing all reads for the same tagged amplicon as a consensus barcode. These barcodes are overall mismatch-free but retain indel errors that are concentrated in homopolymeric regions. We thus complement the barcode caller with an optional error correction pipeline that uses conserved amino-acid motifs from publicly available barcodes to correct the indel errors. The effectiveness of this pipeline is documented by analysing reads from three MinION™ runs that represent three different stages of MinION™ development. They generated data for (1) 511 specimens of a mixed Diptera sample, (2) 575 specimens of ants, and (3) 50 specimens of Chironomidae. The run based on the latest chemistry yielded MinION barcodes for 490 specimens which were assessed against reference Sanger barcodes (N=471). Overall, the MinION barcodes have an accuracy of 99.3%-100% and the number of ambiguities ranges from <0.01-1.5% depending on which correction pipeline is used. We demonstrate that it requires only 2 hours of sequencing to gather all information that is needed for obtaining reliable barcodes for most specimens (>90%). We estimate that up to 1000 barcodes can be generated in one flowcell and that the cost of a MinION barcode can be <USD 2.

Author(s):  
Kristoffer Sahlin ◽  
Botond Sipos ◽  
Phillip L James ◽  
Paul Medvedev

AbstractOxford Nanopore (ONT) is a leading long-read technology which has been revolutionizing transcriptome analysis through its capacity to sequence the majority of transcripts from end-to-end. This has greatly increased our ability to study the diversity of transcription mechanisms such as transcription initiation, termination, and alternative splicing. However, ONT still suffers from high error rates which have thus far limited its scope to reference-based analyses. When a reference is not available or is not a viable option due to reference-bias, error correction is a crucial step towards the reconstruction of the sequenced transcripts and downstream sequence analysis of transcripts. In this paper, we present a novel computational method to error correct ONT cDNA sequencing data, called isONcorrect. IsONcorrect is able to jointly use all isoforms from a gene during error correction, thereby allowing it to correct reads at low sequencing depths. We are able to obtain a median accuracy of 98.9-99.6%, demonstrating the feasibility of applying cost-effective cDNA full transcript length sequencing for reference-free transcriptome analysis.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Kristoffer Sahlin ◽  
Botond Sipos ◽  
Phillip L. James ◽  
Paul Medvedev

AbstractOxford Nanopore (ONT) is a leading long-read technology which has been revolutionizing transcriptome analysis through its capacity to sequence the majority of transcripts from end-to-end. This has greatly increased our ability to study the diversity of transcription mechanisms such as transcription initiation, termination, and alternative splicing. However, ONT still suffers from high error rates which have thus far limited its scope to reference-based analyses. When a reference is not available or is not a viable option due to reference-bias, error correction is a crucial step towards the reconstruction of the sequenced transcripts and downstream sequence analysis of transcripts. In this paper, we present a novel computational method to error correct ONT cDNA sequencing data, called isONcorrect. IsONcorrect is able to jointly use all isoforms from a gene during error correction, thereby allowing it to correct reads at low sequencing depths. We are able to obtain a median accuracy of 98.9–99.6%, demonstrating the feasibility of applying cost-effective cDNA full transcript length sequencing for reference-free transcriptome analysis.


2017 ◽  
Author(s):  
Nathan B. Lubock ◽  
Di Zhang ◽  
George M. Church ◽  
Sriram Kosuri

AbstractGene synthesis, the process of assembling gene-length fragments from shorter groups of oligonucleotides (oligos), is becoming an increasingly important tool in molecular and synthetic biology. The length, quality, and cost of gene synthesis is limited by errors produced during oligo synthesis and subsequent assembly. Enzymatic error correction methods are cost-effective means to ameliorate errors in gene synthesis. Previous analyses of these methods relied on cloning and Sanger sequencing to evaluate their efficiencies, limiting quantitative assessment and throughput. Here we develop a method to quantify errors in synthetic DNA by next-generation sequencing. We analyzed errors in a model gene assembly and systematically compared six different error correction enzymes across 11 conditions. We find that ErrASE and T7 Endonuclease I are the most effective at decreasing average error rates (up to 5.8-fold relative to the input), whereas MutS is the best for increasing the number of perfect assemblies (up to 25.2-fold). We are able to quantify differential specificities such as ErrASE preferentially corrects C/G → G/C transversions whereas T7 Endonuclease I preferentially corrects A/T → T/A transversions. More generally, this experimental and computational pipeline is a fast, scalable, and extensible way to analyze errors in gene assemblies, to profile error correction methods, and to benchmark DNA synthesis methods.


2005 ◽  
Vol 360 (1462) ◽  
pp. 1959-1967 ◽  
Author(s):  
Mehrdad Hajibabaei ◽  
Jeremy R deWaard ◽  
Natalia V Ivanova ◽  
Sujeevan Ratnasingham ◽  
Robert T Dooh ◽  
...  

Large-scale DNA barcoding projects are now moving toward activation while the creation of a comprehensive barcode library for eukaryotes will ultimately require the acquisition of some 100 million barcodes. To satisfy this need, analytical facilities must adopt protocols that can support the rapid, cost-effective assembly of barcodes. In this paper we discuss the prospects for establishing high volume DNA barcoding facilities by evaluating key steps in the analytical chain from specimens to barcodes. Alliances with members of the taxonomic community represent the most effective strategy for provisioning the analytical chain with specimens. The optimal protocols for DNA extraction and subsequent PCR amplification of the barcode region depend strongly on their condition, but production targets of 100K barcode records per year are now feasible for facilities working with compliant specimens. The analysis of museum collections is currently challenging, but PCR cocktails that combine polymerases with repair enzyme(s) promise future success. Barcode analysis is already a cost-effective option for species identification in some situations and this will increasingly be the case as reference libraries are assembled and analytical protocols are simplified.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Nathan LaPierre ◽  
Rob Egan ◽  
Wei Wang ◽  
Zhong Wang

Abstract Background Long read sequencing technologies such as Oxford Nanopore can greatly decrease the complexity of de novo genome assembly and large structural variation identification. Currently Nanopore reads have high error rates, and the errors often cluster into low-quality segments within the reads. The limited sensitivity of existing read-based error correction methods can cause large-scale mis-assemblies in the assembled genomes, motivating further innovation in this area. Results Here we developed a Convolutional Neural Network (CNN) based method, called MiniScrub, for identification and subsequent “scrubbing” (removal) of low-quality Nanopore read segments to minimize their interference in downstream assembly process. MiniScrub first generates read-to-read overlaps via MiniMap2, then encodes the overlaps into images, and finally builds CNN models to predict low-quality segments. Applying MiniScrub to real world control datasets under several different parameters, we show that it robustly improves read quality, and improves read error correction in the metagenome setting. Compared to raw reads, de novo genome assembly with scrubbed reads produces many fewer mis-assemblies and large indel errors. Conclusions MiniScrub is able to robustly improve read quality of Oxford Nanopore reads, especially in the metagenome setting, making it useful for downstream applications such as de novo assembly. We propose MiniScrub as a tool for preprocessing Nanopore reads for downstream analyses. MiniScrub is open-source software and is available at https://bitbucket.org/berkeleylab/jgi-miniscrub.


Genome ◽  
2017 ◽  
Vol 60 (5) ◽  
pp. 414-430 ◽  
Author(s):  
Laurence Packer ◽  
Luisa Ruz

We compare the diversity of bees in the Chilean fauna as understood from traditional taxonomy-based catalogues with that currently known from DNA barcodes using the BIN system informed by ongoing morphology-based taxonomic research. While DNA barcode surveys of the Chilean bee fauna remain incomplete, it is clear that new species can readily be distinguished using this method and that morphological differentiation of distinct barcode clusters is sometimes very easy. We assess the situation in two genera in some detail. In Lonchopria Vachal one “species” is readily separable into two BINs that are easily differentiated based upon male mandibular and genitalic morphology (characters generally used in this group) as well as female hair patterns. Consequently, we describe Lonchopria (Lonchopria) heberti Packer and Ruz, new species. For Liphanthus Reed, a large number of new species has been detected using DNA barcoding and considerable additional traditional morphological work will be required to describe them. When we add the number of BINs (whether identified to named species or not) to the number of Chilean bee species that we know have not been barcoded (both described and new species under study in our laboratories) we conclude that the bee fauna of Chile is substantially greater than the 436 species currently known. Spanish language abstract available as supplementary data 1 .


GigaScience ◽  
2020 ◽  
Vol 9 (12) ◽  
Author(s):  
Valentine Murigneux ◽  
Subash Kumar Rai ◽  
Agnelo Furtado ◽  
Timothy J C Bruxner ◽  
Wei Tian ◽  
...  

Abstract Background Sequencing technologies have advanced to the point where it is possible to generate high-accuracy, haplotype-resolved, chromosome-scale assemblies. Several long-read sequencing technologies are available, and a growing number of algorithms have been developed to assemble the reads generated by those technologies. When starting a new genome project, it is therefore challenging to select the most cost-effective sequencing technology, as well as the most appropriate software for assembly and polishing. It is thus important to benchmark different approaches applied to the same sample. Results Here, we report a comparison of 3 long-read sequencing technologies applied to the de novo assembly of a plant genome, Macadamia jansenii. We have generated sequencing data using Pacific Biosciences (Sequel I), Oxford Nanopore Technologies (PromethION), and BGI (single-tube Long Fragment Read) technologies for the same sample. Several assemblers were benchmarked in the assembly of Pacific Biosciences and Nanopore reads. Results obtained from combining long-read technologies or short-read and long-read technologies are also presented. The assemblies were compared for contiguity, base accuracy, and completeness, as well as sequencing costs and DNA material requirements. Conclusions The 3 long-read technologies produced highly contiguous and complete genome assemblies of M. jansenii. At the time of sequencing, the cost associated with each method was significantly different, but continuous improvements in technologies have resulted in greater accuracy, increased throughput, and reduced costs. We propose updating this comparison regularly with reports on significant iterations of the sequencing technologies.


2017 ◽  
Author(s):  
Fritz J. Sedlazeck ◽  
Philipp Rescheneder ◽  
Moritz Smolka ◽  
Han Fang ◽  
Maria Nattestad ◽  
...  

AbstractStructural variations (SVs) are the largest source of genetic variation, but remain poorly understood because of limited genomics technology. Single molecule long read sequencing from Pacific Biosciences and Oxford Nanopore has the potential to dramatically advance the field, although their high error rates challenge existing methods. Addressing this need, we introduce open-source methods for long read alignment (NGMLR, https://github.com/philres/ngmlr) and SV identification (Sniffles, https://github.com/fritzsedlazeck/Sniffles) that enable unprecedented SV sensitivity and precision, including within repeat-rich regions and of complex nested events that can have significant impact on human disorders. Examining several datasets, including healthy and cancerous human genomes, we discover thousands of novel variants using long reads and categorize systematic errors in short-read approaches. NGMLR and Sniffles are further able to automatically filter false events and operate on low amounts of coverage to address the cost factor that has hindered the application of long reads in clinical and research settings.


Author(s):  
Valentine Murigneux ◽  
Subash Kumar Rai ◽  
Agnelo Furtado ◽  
Timothy J.C. Bruxner ◽  
Wei Tian ◽  
...  

AbstractSequencing technologies have advanced to the point where it is possible to generate high accuracy, haplotype resolved, chromosome scale assemblies. Several long read sequencing technologies are available on the market and a growing number of algorithms have been developed over the last years to assemble the reads generated by those technologies. When starting a new genome project, it is therefore challenging to select the most cost-effective sequencing technology as well as the most appropriate software for assembly and polishing. For this reason, it is important to benchmark different approaches applied to the same sample. Here, we report a comparison of three long read sequencing technologies applied to the de novo assembly of a plant genome, Macadamia jansenii. We have generated sequencing data using Pacific Biosciences (Sequel I), Oxford Nanopore Technologies (PromethION) and BGI (single-tube Long Fragment Read) technologies for the same sample. Several assemblers were benchmarked in the assembly of PacBio and Nanopore reads. Results obtained from combining long read technologies or short read and long read technologies are also presented. The assemblies were compared for contiguity, accuracy and completeness as well as sequencing costs and DNA material requirements. Overall, the three long read technologies produced highly contiguous and complete genome assemblies of Macadamia jansenii. At the time of sequencing, the cost associated with each method was significantly different but continuous improvements in technologies have resulted in greater accuracy, increased throughput and reduced costs. We propose updating this comparison regularly with reports on significant iterations of the sequencing technologies.


Author(s):  
James F. Mancuso

IBM PC compatible computers are widely used in microscopy for applications ranging from control to image acquisition and analysis. The choice of IBM-PC based systems over competing computer platforms can be based on technical merit alone or on a number of factors relating to economics, availability of peripherals, management dictum, or simple personal preference.IBM-PC got a strong “head start” by first dominating clerical, document processing and financial applications. The use of these computers spilled into the laboratory where the DOS based IBM-PC replaced mini-computers. Compared to minicomputer, the PC provided a more for cost-effective platform for applications in numerical analysis, engineering and design, instrument control, image acquisition and image processing. In addition, the sitewide use of a common PC platform could reduce the cost of training and support services relative to cases where many different computer platforms were used. This could be especially true for the microscopists who must use computers in both the laboratory and the office.


Sign in / Sign up

Export Citation Format

Share Document