scholarly journals Constructing a Reference Genome in a Single Lab: The Possibility to Use Oxford Nanopore Technology

Author(s):  
Yun Gyeong Lee ◽  
Sang Chul Choi ◽  
Yuna Kang ◽  
Kyeong Min Kim ◽  
Chon-Sik Kang ◽  
...  

The whole genome sequencing (WGS) has become a crucial tool to understand genome structure and genetic variation. The MinION sequencing of Oxford Nanopore Technologies (ONT) is an excellent approach for performing WGS and has advantages in comparison with other Next-Generation Sequencing (NGS): It is relatively inexpensive, portable, has simple library preparation, can be monitored in real-time, and has no theoretical limits on read length. Sorghum bicolor (L.) Moench is diploid (2n = 2x = 20) with a genome size of about 730 Mb, and its genome sequence information is released in the Phytozome database. Therefore, sorghum can be be used as a good reference. However, plant species have complex and large genomes compared to animals or microorganisms. As a result, complete genome sequencing is difficult for plant species. MinION sequencing that produces long-reads can be an excellent tool to overcome the weak assembly of short-reads generated from NGS by minimizing the generation of gaps or covering the repetitive sequence that appears on the plant genome. Here, we conducted the genome sequencing for S. bicolor cv. BTx623 using the MinION platform and obtained 895,678 reads and 17.9 gigabytes(Gb) (ca. 25X coverage of reference) from long-read sequence data. Through a de novo assembly using two different tools and mapped assembled contigs against the sorghum reference genome, a total of 6,124 contigs (covering 45.9%) were generated from Canu, and a total of 2,661 contigs (covering 50%) were generated from Minimap and Miniasm with a Racon pipeline. Our results provide a pipeline of long-read sequencing analysis for plant species using the MinION platform and a clue to determine the total sequencing scale for optimal coverage based on various genome sizes.

Plants ◽  
2019 ◽  
Vol 8 (8) ◽  
pp. 270 ◽  
Author(s):  
Yun Gyeong Lee ◽  
Sang Chul Choi ◽  
Yuna Kang ◽  
Kyeong Min Kim ◽  
Chon-Sik Kang ◽  
...  

The whole genome sequencing (WGS) has become a crucial tool in understanding genome structure and genetic variation. The MinION sequencing of Oxford Nanopore Technologies (ONT) is an excellent approach for performing WGS and it has advantages in comparison with other Next-Generation Sequencing (NGS): It is relatively inexpensive, portable, has simple library preparation, can be monitored in real-time, and has no theoretical limits on reading length. Sorghum bicolor (L.) Moench is diploid (2n = 2x = 20) with a genome size of about 730 Mb, and its genome sequence information is released in the Phytozome database. Therefore, sorghum can be used as a good reference. However, plant species have complex and large genomes when compared to animals or microorganisms. As a result, complete genome sequencing is difficult for plant species. MinION sequencing that produces long-reads can be an excellent tool for overcoming the weak assembly of short-reads generated from NGS by minimizing the generation of gaps or covering the repetitive sequence that appears on the plant genome. Here, we conducted the genome sequencing for S. bicolor cv. BTx623 while using the MinION platform and obtained 895,678 reads and 17.9 gigabytes (Gb) (ca. 25× coverage of reference) from long-read sequence data. A total of 6124 contigs (covering 45.9%) were generated from Canu, and a total of 2661 contigs (covering 50%) were generated from Minimap and Miniasm with a Racon through a de novo assembly using two different tools and mapped assembled contigs against the sorghum reference genome. Our results provide an optimal series of long-read sequencing analysis for plant species while using the MinION platform and a clue to determine the total sequencing scale for optimal coverage that is based on various genome sizes.


2021 ◽  
Vol 12 ◽  
Author(s):  
Annika Brinkmann ◽  
Sophie-Luisa Ulm ◽  
Steven Uddin ◽  
Sophie Förster ◽  
Dominique Seifert ◽  
...  

Since the emergence of the Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2) in December 2019, the scientific community has been sharing data on epidemiology, diagnostic methods, and whole-genomic sequences almost in real time. The latter have already facilitated phylogenetic analyses, transmission chain tracking, protein modeling, the identification of possible therapeutic targets, timely risk assessment, and identification of novel variants. We have established and evaluated an amplification-based approach for whole-genome sequencing of SARS-CoV-2. It can be used on the miniature-sized and field-deployable sequencing device Oxford Nanopore MinION, with sequencing library preparation time of 10 min. We show that the generation of 50,000 total reads per sample is sufficient for a near complete coverage (>90%) of the SARS-CoV-2 genome directly from patient samples even if virus concentration is low (Ct 35, corresponding to approximately 5 genome copies per reaction). For patient samples with high viral load (Ct 18–24), generation of 50,000 reads in 1–2 h was shown to be sufficient for a genome coverage of >90%. Comparison to Illumina data reveals an accuracy that suffices to identify virus mutants. AmpliCoV can be applied whenever sequence information on SARS-CoV-2 is required rapidly, for instance for the identification of circulating virus mutants.


2021 ◽  
Vol 99 (Supplement_3) ◽  
pp. 23-24
Author(s):  
Kimberly M Davenport ◽  
Derek M Bickhart ◽  
Kim Worley ◽  
Shwetha C Murali ◽  
Noelle Cockett ◽  
...  

Abstract Sheep are an important agricultural species used for both food and fiber in the United States and globally. A high-quality reference genome enhances the ability to discover genetic and biological mechanisms influencing important traits, such as meat and wool quality. The rapid advances in genome assembly algorithms and emergence of increasingly long sequence read length provide the opportunity for an improved de novo assembly of the sheep reference genome. Tissue was collected postmortem from an adult Rambouillet ewe selected by USDA-ARS for the Ovine Functional Annotation of Animal Genomes project. Short-read (55x coverage), long-read PacBio (75x coverage), and Hi-C data from this ewe were retrieved from public databases. We generated an additional 50x coverage of Oxford Nanopore data and assembled the combined long-read data with canu v1.9. The assembled contigs were polished with Nanopolish v0.12.5 and scaffolded using Hi-C data with Salsa v2.2. Gaps were filled with PBsuite v15.8.24 and polished with Nanopolish v0.12.5 followed by removal of duplicate contigs with PurgeDups v1.0.1. Chromosomes were oriented by identifying centromeres and telomeres with RepeatMasker v4.1.1, indicating a need to reverse the orientation of chromosome 11 relative to Oar_rambouillet_v1.0. Final polishing was performed with two rounds of a pipeline which consisted of freebayes v1.3.1 to call variants, Merfin to validate them, and BCFtools to generate the consensus fasta. The ARS-UI_Ramb_v2.0 assembly has improved continuity (contig N50 of 43.19 Mb) with a 19-fold and 38-fold decrease in the number of scaffolds compared with Oar_rambouillet_v1.0 and Oar_v4.0. ARS-UI_Ramb_v2.0 has greater per-base accuracy and fewer insertions and deletions identified from mapped RNA sequence than previous assemblies. This significantly improved reference assembly, public at NCBI GenBank under accession number GCA_016772045, will optimize the functional annotation of the sheep genome and facilitate improved mapping accuracy of genetic variant and expression data for traits relevant the sheep industry.


2020 ◽  
Vol 9 (37) ◽  
Author(s):  
Samuel O’Donnell ◽  
Frederic Chaux ◽  
Gilles Fischer

ABSTRACT The current Chlamydomonas reinhardtii reference genome remains fragmented due to gaps stemming from large repetitive regions. To overcome the vast majority of these gaps, publicly available Oxford Nanopore Technology data were used to create a new reference-quality de novo genome assembly containing only 21 contigs, 30/34 telomeric ends, and a genome size of 111 Mb.


2020 ◽  
Vol 22 (11) ◽  
pp. 1892-1897 ◽  
Author(s):  
My Linh Thibodeau ◽  
Kieran O’Neill ◽  
Katherine Dixon ◽  
Caralyn Reisle ◽  
Karen L. Mungall ◽  
...  

Abstract Purpose Structural variants (SVs) may be an underestimated cause of hereditary cancer syndromes given the current limitations of short-read next-generation sequencing. Here we investigated the utility of long-read sequencing in resolving germline SVs in cancer susceptibility genes detected through short-read genome sequencing. Methods Known or suspected deleterious germline SVs were identified using Illumina genome sequencing across a cohort of 669 advanced cancer patients with paired tumor genome and transcriptome sequencing. Candidate SVs were subsequently assessed by Oxford Nanopore long-read sequencing. Results Nanopore sequencing confirmed eight simple pathogenic or likely pathogenic SVs, resolving three additional variants whose impact could not be fully elucidated through short-read sequencing. A recurrent sequencing artifact on chromosome 16p13 and one complex rearrangement on chromosome 5q35 were subsequently classified as likely benign, obviating the need for further clinical assessment. Variant configuration was further resolved in one case with a complex pathogenic rearrangement affecting TSC2. Conclusion Our findings demonstrate that long-read sequencing can improve the validation, resolution, and classification of germline SVs. This has important implications for return of results, cascade carrier testing, cancer screening, and prophylactic interventions.


2017 ◽  
Vol 5 (42) ◽  
Author(s):  
S. Wesley Long ◽  
Sarah E. Linson ◽  
Matthew Ojeda Saavedra ◽  
Concepcion Cantu ◽  
James J. Davis ◽  
...  

ABSTRACT In a study of 1,777 Klebsiella strains, we discovered KPN1705, which was distinct from all recognized Klebsiella spp. We closed the genome of strain KPN1705 using a hybrid of Illumina short-read and Oxford Nanopore long-read technologies. For this novel species, we propose the name Klebsiella quasivariicola sp. nov.


2017 ◽  
Author(s):  
Tslil Gabrieli ◽  
Hila Sharim ◽  
Yael Michaeli ◽  
Yuval Ebenstein

ABSTRACTVariations in the genetic code, from single point mutations to large structural or copy number alterations, influence susceptibility, onset, and progression of genetic diseases and tumor transformation. Next-generation sequencing analysis is unable to reliably capture aberrations larger than the typical sequencing read length of several hundred bases. Long-read, single-molecule sequencing methods such as SMRT and nanopore sequencing can address larger variations, but require costly whole genome analysis. Here we describe a method for isolation and enrichment of a large genomic region of interest for targeted analysis based on Cas9 excision of two sites flanking the target region and isolation of the excised DNA segment by pulsed field gel electrophoresis. The isolated target remains intact and is ideally suited for optical genome mapping and long-read sequencing at high coverage. In addition, analysis is performed directly on native genomic DNA that retains genetic and epigenetic composition without amplification bias. This method enables detection of mutations and structural variants as well as detailed analysis by generation of hybrid scaffolds composed of optical maps and sequencing data at a fraction of the cost of whole genome sequencing.


2019 ◽  
Author(s):  
Dhaivat Joshi ◽  
Shunfu Mao ◽  
Sreeram Kannan ◽  
Suhas Diggavi

AbstractMotivationEfficient and accurate alignment of DNA / RNA sequence reads to each other or to a reference genome / transcriptome is an important problem in genomic analysis. Nanopore sequencing has emerged as a major sequencing technology and many long-read aligners have been designed for aligning nanopore reads. However, the high error rate makes accurate and efficient alignment difficult. Utilizing the noise and error characteristics inherent in the sequencing process properly can play a vital role in constructing a robust aligner. In this paper, we design QAlign, a pre-processor that can be used with any long-read aligner for aligning long reads to a genome / transcriptome or to other long reads. The key idea in QAlign is to convert the nucleotide reads into discretized current levels that capture the error modes of the nanopore sequencer before running it through a sequence aligner.ResultsWe show that QAlign is able to improve alignment rates from around 80% up to 90% with nanopore reads when aligning to the genome. We also show that QAlign improves the average overlap quality by 9.2%, 2.5% and 10.8% in three real datasets for read-to-read alignment. Read-to-transcriptome alignment rates are improved from 51.6% to 75.4% and 82.6% to 90% in two real datasets.Availabilityhttps://github.com/joshidhaivat/QAlign.git


2019 ◽  
Vol 23 (1) ◽  
pp. 38-48 ◽  
Author(s):  
M. K. Bragina ◽  
D. A. Afonnikov ◽  
E. A. Salina

Since the first plant genome of Arabidopsis thaliana has been sequenced and published, genome sequencing technologies have undergone significant changes. New algorithms, sequencing technologies and bioinformatic approaches were adopted to obtain genome, transcriptome and exome sequences for model and crop species, which have permitted deep inferences into plant biology. As a result of an improved genome assembly and analysis methods, genome sequencing costs plummeted and the number of high-quality plant genome sequences is constantly growing. Consequently, more than 300 plant genome sequences have been published over the past twenty years. Although many of the published genomes are considered incomplete, they proved to be a valuable tool for identifying genes involved in the formation of economically valuable plant traits, for marker-assisted and genomic selection and for comparative analysis of plant genomes in order to determine the basic patterns of origin of various plant species. Since a high coverage and resolution of a genome sequence is not enough to detect all changes in complex samples, targeted sequencing, which consists in the isolation and sequencing of a specific region of the genome, has begun to develop. Targeted sequencing has a higher detection power (the ability to identify new differences/variants) and resolution (up to one basis). In addition, exome sequencing (the method of sequencing only protein-coding genes regions) is actively developed, which allows for the sequencing of non-expressed alleles and genes that cannot be found with RNA-seq. In this review, an analysis of sequencing technologies development and the construction of “reference” genomes of plants is performed. A comparison of the methods of targeted sequencing based on the use of the reference DNA sequence is accomplished.


2021 ◽  
Author(s):  
Simon Lee ◽  
Loan T. Nguyen ◽  
Ben J. Hayes ◽  
Elizabeth M Ross

Motivation: Quality control (QC) tools are critical in DNA sequencing analysis because they increase the accuracy of sequence alignments and thus the reliability of results. Oxford Nanopore Technologies (ONT) QC is currently rudimentary, generally based on whole read average quality. This results in discarding reads that contain regions of high quality sequence. Here we propose Prowler, a multi-window approach inspired by algorithms used to QC short read data. Importantly, we retain the phase and read length information by optionally replacing trimmed sections with Ns. Results: Prowler was applied to mammalian and bacterial datasets, to assess effects on alignment and assembly respectively. Compared to Nanofilt, alignments of data QCed with Prowler had lower error rates and more mapped reads. Assemblies of Prowler QCed data had a lower error rate than Nanofilt QCed data however this came at some cost to assembly contiguity. Availability and implementation: Prowler is implemented in Python and is available at: https://github.com/ProwlerForNanopore/ProwlerTrimmer Contact: [email protected]


Sign in / Sign up

Export Citation Format

Share Document