scholarly journals A comprehensive toolkit to enable MinION sequencing in any laboratory

2018 ◽  
Author(s):  
Miriam Schalamun ◽  
David Kainer ◽  
Eleanor Beavan ◽  
Ramawatar Nagar ◽  
David Eccles ◽  
...  

AbstractLong-read sequencing technologies are transforming our ability to assemble highly complex genomes. Realising their full potential relies crucially on extracting high quality, high molecular weight (HMW) DNA from the organisms of interest. This is especially the case for the portable MinION sequencer which potentiates all laboratories to undertake their own genome sequencing projects, due to its low entry cost and minimal spatial footprint. One challenge of the MinION is that each group has to independently establish effective protocols for using the instrument, which can be time consuming and costly. Here we present a workflow and protocols that enabled us to establish MinION sequencing in our own laboratories, based on optimising DNA extractions from a challenging plant tissue as a case study. Following the workflow illustrated we were able to reliably and repeatedly obtain > 8.5 Gb of long read sequencing data with a mean read length of 13 kb and an N50 of 26 kb. Our protocols are open-source and can be performed in any laboratory without special equipment. We also illustrate some more elaborate workflows which can increase mean and average read lengths if this is desired. We envision that our workflow for establishing MinION sequencing, including the illustration of potential pitfalls, will be useful to others who plan to establish long-read sequencing in their own laboratories.

Author(s):  
Clément Schneider ◽  
Christian Woehle ◽  
Carola Greve ◽  
Cyrille A. D’Haese ◽  
Magnus Wolf ◽  
...  

ABSTRACTGenome sequencing of all known eukaryotes on Earth promises unprecedented advances in evolutionary sciences, ecology, systematics and in biodiversity-related applied fields such as environmental management and natural product research. Advances in DNA sequencing technologies make genome sequencing feasible for many non-genetic model species. However, genome sequencing today relies on large quantities of high quality, high molecular weight (HMW) DNA which is mostly obtained from fresh tissues. This is problematic for biodiversity genomics of Metazoa as most species are small and yield minute amounts of DNA. Furthermore, briging living specimens to the lab bench not realistic for the majority of species.Here we overcome those difficulties by sequencing two species of springtails (Collembola) from single specimens preserved in ethanol. We used a newly developed, genome-wide amplification-based protocol to generate PacBio libraries for HiFi long-read sequencing.The assembled genomes were highly continuous. They can be considered complete as we recovered over 95% of BUSCOs. Genome-wide amplification does not seem to bias genome recovery. Presence of almost complete copies of the mitochondrial genome in the nuclear genome were pitfalls for automatic assemblers. The genomes fit well into an existing phylogeny of springtails. A neotype is designated for one of the species, blending genome sequencing and creation of taxonomic references.Our study shows that it is possible to obtain high quality genomes from small, field-preserved sub-millimeter metazoans, thus making their vast diversity accessible to the fields of genomics.


2020 ◽  
Author(s):  
Steffen Klasberg ◽  
Alexander H. Schmidt ◽  
Vinzenz Lange ◽  
Gerhard Schöfl

AbstractBackgroundHigh resolution HLA genotyping of donors and recipients is a crucially important prerequisite for haematopoetic stem-cell transplantation and relies heavily on the quality and completeness of immuno-genetic reference sequence databases of allelic variation.ResultsHere, we report on DR2S, an R package that leverages the strengths of two sequencing technologies – the accuracy of next-generation sequencing with the read length of third-generation sequencing technologies like PacBio’s SMRT sequencing or ONT’s nanopore sequencing – to reconstruct fully-phased high-quality full-length haplotype sequences. Although optimised for HLA and KIR genes, DR2S is applicable to all loci with known reference sequences provided that full-length sequencing data is available for analysis. In addition, DR2S integrates supporting tools for easy visualisation and quality control of the reconstructed haplotype to ensure suitability for submission to public allele databases.ConclusionsDR2S is a largely automated workflow designed to create high-quality fully-phased reference allele sequences for highly polymorphic gene regions such as HLA or KIR. It has been used by biologists to successfully characterise and submit more than 500 HLA alleles and more than 500 KIR alleles to the IPD-IMGT/HLA and IPD-KIR databases.


Author(s):  
Ann McCartney ◽  
Elena Hilario ◽  
Seung-Sub Choi ◽  
Joseph Guhlin ◽  
Jessie Prebble ◽  
...  

We used long read sequencing data generated from Knightia excelsaI R.Br, a nectar producing Proteaceae tree endemic to Aotearoa New Zealand, to explore how sequencing data type, volume and workflows can impact final assembly accuracy and chromosome construction. Establishing a high-quality genome for this species has specific cultural importance to Māori, the indigenous people, as well as commercial importance to honey producers in Aotearoa New Zealand. Assemblies were produced by five long read assemblers using data subsampled based on read lengths, two polishing strategies, and two Hi-C mapping methods. Our results from subsampling the data by read length showed that each assembler tested performed differently depending on the coverage and the read length of the data. Assemblies that used longer read lengths (>30 kb) and lower coverage were the most contiguous, kmer and gene complete. The final genome assembly was constructed into pseudo-chromosomes using all available data assembled with FLYE, polished using Racon/Medaka/Pilon combined, scaffolded using SALSA2 and AllHiC, curated using Juicebox, and validated by synteny with Macadamia. We highlighted the importance of developing assembly workflows based on the volume and type of sequencing data and establishing a set of robust quality metrics for generating high quality assemblies. Scaffolding analyses highlighted that problems found in the initial assemblies could not be resolved accurately by utilizing Hi-C data and that scaffolded assemblies were more accurate when the underlying contig assembly was of higher accuracy. These findings provide insight into what is required for future high-quality de-novo assemblies of non-model organisms.


2019 ◽  
Author(s):  
Alex Di Genova ◽  
Elena Buena-Atienza ◽  
Stephan Ossowski ◽  
Marie-France Sagot

The continuous improvement of long-read sequencing technologies along with the development of ad-doc algorithms has launched a new de novo assembly era that promises high-quality genomes. However, it has proven difficult to use only long reads to generate accurate genome assemblies of large, repeat-rich human genomes. To date, most of the human genomes assembled from long error-prone reads add accurate short reads to further polish the consensus quality. Here, we report the development of a novel algorithm for hybrid assembly, WENGAN, and the de novo assembly of four human genomes using a combination of sequencing data generated on ONT PromethION, PacBio Sequel, Illumina and MGI technology. WENGAN implements efficient algorithms that exploit the sequence information of short and long reads to tackle assembly contiguity as well as consensus quality. The resulting genome assemblies have high contiguity (contig NG50:16.67-62.06 Mb), few assembly errors (contig NGA50:10.9-45.91 Mb), good consensus quality (QV:27.79-33.61), and high gene completeness (BUSCO complete: 94.6-95.1%), while consuming low computational resources (CPU hours:153-1027). In particular, the WENGAN assembly of the haploid CHM13 sample achieved a contig NG50 of 62.06 Mb (NGA50:45.91 Mb), which surpasses the contiguity of the current human reference genome (GRCh38 contig NG50:57.88 Mb). Providing highest quality at low computational cost, WENGAN is an important step towards the democratization of the de novo assembly of human genomes. The WENGAN assembler is available at https://github.com/adigenova/wengan


2020 ◽  
Author(s):  
Ann McCartney ◽  
Elena Hilario ◽  
Seung-Sub Choi ◽  
Joseph Guhlin ◽  
Jessica M. Prebble ◽  
...  

AbstractBackgroundWe used long read sequencing data generated from Knightia excelsaI R.Br, a nectar producing Proteaceae tree endemic to Aotearoa New Zealand, to explore how sequencing data type, volume and workflows can impact final assembly accuracy and chromosome construction. Establishing a high-quality genome for this species has specific cultural importance to Māori, the indigenous people, as well as commercial importance to honey producers in Aotearoa New Zealand.ResultsAssemblies were produced by five long read assemblers using data subsampled based on read lengths, two polishing strategies, and two Hi-C mapping methods. Our results from subsampling the data by read length showed that each assembler tested performed differently depending on the coverage and the read length of the data. Assemblies that used longer read lengths (>30 kb) and lower coverage were the most contiguous, kmer and gene complete. The final genome assembly was constructed into pseudochromosomes using all available data assembled with FLYE, polished using Racon/Medaka/Pilon combined, scaffolded using SALSA2 and AllHiC, curated using Juicebox, and validated by synteny with Macadamia.ConclusionsWe highlighted the importance of developing assembly workflows based on the volume and type of sequencing data and establishing a set of robust quality metrics for generating high quality assemblies. Scaffolding analyses highlighted that problems found in the initial assemblies could not be resolved accurately by utilizing Hi-C data and that scaffolded assemblies were more accurate when the underlying contig assembly was of higher accuracy. These findings provide insight into what is required for future high-quality de-novo assemblies of non-model organisms.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Steffen Klasberg ◽  
Alexander H. Schmidt ◽  
Vinzenz Lange ◽  
Gerhard Schöfl

Abstract Background High resolution HLA genotyping of donors and recipients is a crucially important prerequisite for haematopoetic stem-cell transplantation and relies heavily on the quality and completeness of immunogenetic reference sequence databases of allelic variation. Results Here, we report on , an R package that leverages the strengths of two sequencing technologies—the accuracy of next-generation sequencing with the read length of third-generation sequencing technologies like PacBio’s SMRT sequencing or ONT’s nanopore sequencing—to reconstruct fully-phased high-quality full-length haplotype sequences. Although optimised for HLA and KIR genes, is applicable to all loci with known reference sequences provided that full-length sequencing data is available for analysis. In addition, integrates supporting tools for easy visualisation and quality control of the reconstructed haplotype to ensure suitability for submission to public allele databases. Conclusions is a largely automated workflow designed to create high-quality fully-phased reference allele sequences for highly polymorphic gene regions such as HLA or KIR. It has been used by biologists to successfully characterise and submit more than 500 HLA alleles and more than 500 KIR alleles to the IPD-IMGT/HLA and IPD-KIR databases.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Caitlin M. Singleton ◽  
Francesca Petriglieri ◽  
Jannie M. Kristensen ◽  
Rasmus H. Kirkegaard ◽  
Thomas Y. Michaelsen ◽  
...  

AbstractMicroorganisms play crucial roles in water recycling, pollution removal and resource recovery in the wastewater industry. The structure of these microbial communities is increasingly understood based on 16S rRNA amplicon sequencing data. However, such data cannot be linked to functional potential in the absence of high-quality metagenome-assembled genomes (MAGs) for nearly all species. Here, we use long-read and short-read sequencing to recover 1083 high-quality MAGs, including 57 closed circular genomes, from 23 Danish full-scale wastewater treatment plants. The MAGs account for ~30% of the community based on relative abundance, and meet the stringent MIMAG high-quality draft requirements including full-length rRNA genes. We use the information provided by these MAGs in combination with >13 years of 16S rRNA amplicon sequencing data, as well as Raman microspectroscopy and fluorescence in situ hybridisation, to uncover abundant undescribed lineages belonging to important functional groups.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Chong Chu ◽  
Rebeca Borges-Monroy ◽  
Vinayak V. Viswanadham ◽  
Soohyun Lee ◽  
Heng Li ◽  
...  

AbstractTransposable elements (TEs) help shape the structure and function of the human genome. When inserted into some locations, TEs may disrupt gene regulation and cause diseases. Here, we present xTea (x-Transposable element analyzer), a tool for identifying TE insertions in whole-genome sequencing data. Whereas existing methods are mostly designed for short-read data, xTea can be applied to both short-read and long-read data. Our analysis shows that xTea outperforms other short read-based methods for both germline and somatic TE insertion discovery. With long-read data, we created a catalogue of polymorphic insertions with full assembly and annotation of insertional sequences for various types of retroelements, including pseudogenes and endogenous retroviruses. Notably, we find that individual genomes have an average of nine groups of full-length L1s in centromeres, suggesting that centromeres and other highly repetitive regions such as telomeres are a significant yet unexplored source of active L1s. xTea is available at https://github.com/parklab/xTea.


2021 ◽  
Vol 3 (2) ◽  
Author(s):  
Jean-Marc Aury ◽  
Benjamin Istace

Abstract Single-molecule sequencing technologies have recently been commercialized by Pacific Biosciences and Oxford Nanopore with the promise of sequencing long DNA fragments (kilobases to megabases order) and then, using efficient algorithms, provide high quality assemblies in terms of contiguity and completeness of repetitive regions. However, the error rate of long-read technologies is higher than that of short-read technologies. This has a direct consequence on the base quality of genome assemblies, particularly in coding regions where sequencing errors can disrupt the coding frame of genes. In the case of diploid genomes, the consensus of a given gene can be a mixture between the two haplotypes and can lead to premature stop codons. Several methods have been developed to polish genome assemblies using short reads and generally, they inspect the nucleotide one by one, and provide a correction for each nucleotide of the input assembly. As a result, these algorithms are not able to properly process diploid genomes and they typically switch from one haplotype to another. Herein we proposed Hapo-G (Haplotype-Aware Polishing Of Genomes), a new algorithm capable of incorporating phasing information from high-quality reads (short or long-reads) to polish genome assemblies and in particular assemblies of diploid and heterozygous genomes.


2021 ◽  
Author(s):  
Androniki C. Bibi ◽  
Anastasios Kollias ◽  
Maria Astrinaki ◽  
Despoina Vassou ◽  
Dimitris Kafetzopoulos ◽  
...  

Abstract Background: There have been several attempts to sequence the genome of Vitis vinifera L. (grapevine), utilizing low-resolution second-generation platforms. Nevertheless, the characterization of the grapevine genetic resources and its adaptation to vulnerable conditions could be better addressed through extensive and high-resolution genome sequencing.MinION is a third-generation sequencer preferred by many laboratories due to its relatively low cost, ease of use and small size. Even though this long-read technology has been rapidly improving, to reach its full potential requires high-quality DNA.Results: Here we establish a workflow for DNA extraction suitable for MinION sequencing long reads from grapevine. This protocol was tested with leaf samples from different positions on annual growing branches of grapevine, Purified nuclei from fresh young leaves that led to high quality, long DNA fragments, suitable for long-read sequencing were successfully generated. It is evident that longer reads in grapevine associate with both fresh tissue and adjusted conditions used for nuclei purification.Conclusions: We propose that this workflow presents a significant advancement for long-read quality DNA isolation for grapevine and likely other plant species.


Sign in / Sign up

Export Citation Format

Share Document