scholarly journals Draft Assembly of Phytophthora capsici from Long-Read Sequencing Uncovers Complexity

2019 ◽  
Vol 32 (12) ◽  
pp. 1559-1563 ◽  
Author(s):  
Chenming Cui ◽  
John H. Herlihy ◽  
Aureliano Bombarely ◽  
John M. McDowell ◽  
David C. Haak

Resolving complex plant pathogen genomes is important for identifying the genomic shifts associated with rapid adaptation to selective agents such as hosts and fungicides, yet assembling these genomes remains challenging and expensive. Phytophthora capsici is an important, globally distributed plant pathogen that exhibits widespread fungicide resistance and a broad host range. As with other pathogenic oomycetes, P. capsici has a complex life history and a complex genome. Here, we leverage Oxford Nanopore Technologies and existing short-read resources to rapidly generate a low-cost, improved assembly. We generated 10 Gbp from a single MinION flow cell resulting in >1.25 million reads with an N50 of 13 kb. The resulting assembly is 95.2 Mbp in 424 scaffolds with an N50 length of 313 kb. This assembly is approximately 30 Mbp bigger than the current reference genome of 64 Mbp. We confirmed this larger genome size using flow cytometry, with an estimated size of 110 Mbp. BUSCO analysis identified 97.4% complete orthologs (19.2% duplicated). Evolutionary analysis supports a recent whole-genome duplication in this group. Our work provides a blueprint for rapidly integrating benchtop long-read sequencing with existing short-read data, to dramatically improve assembly quality and integrity of complex genomes and offer novel insights into pathogen genome function and evolution.

2020 ◽  
Vol 10 (6) ◽  
pp. 1829-1836 ◽  
Author(s):  
Graham Wiley ◽  
Matthew J. Miller

Woodpeckers are found in nearly every part of the world and have been important for studies of biogeography, phylogeography, and macroecology. Woodpecker hybrid zones are often studied to understand the dynamics of introgression between bird species. Notably, woodpeckers are gaining attention for their enriched levels of transposable elements (TEs) relative to most other birds. This enrichment of TEs may have substantial effects on molecular evolution. However, comparative studies of woodpecker genomes are hindered by the fact that no high-contiguity genome exists for any woodpecker species. Using hybrid assembly methods combining long-read Oxford Nanopore and short-read Illumina sequencing data, we generated a highly contiguous genome assembly for the Golden-fronted Woodpecker (Melanerpes aurifrons). The final assembly is 1.31 Gb and comprises 441 contigs plus a full mitochondrial genome. Half of the assembly is represented by 28 contigs (contig L50), each of these contigs is at least 16 Mb in size (contig N50). High recovery (92.6%) of bird-specific BUSCO genes suggests our assembly is both relatively complete and relatively accurate. Over a quarter (25.8%) of the genome consists of repetitive elements, with 287 Mb (21.9%) of those elements assignable to the CR1 superfamily of transposable elements, the highest proportion of CR1 repeats reported for any bird genome to date. Our assembly should improve comparative studies of molecular evolution and genomics in woodpeckers and allies. Additionally, the sequencing and bioinformatic resources used to generate this assembly were relatively low-cost and should provide a direction for development of high-quality genomes for studies of animal biodiversity.


2020 ◽  
Vol 22 (11) ◽  
pp. 1892-1897 ◽  
Author(s):  
My Linh Thibodeau ◽  
Kieran O’Neill ◽  
Katherine Dixon ◽  
Caralyn Reisle ◽  
Karen L. Mungall ◽  
...  

Abstract Purpose Structural variants (SVs) may be an underestimated cause of hereditary cancer syndromes given the current limitations of short-read next-generation sequencing. Here we investigated the utility of long-read sequencing in resolving germline SVs in cancer susceptibility genes detected through short-read genome sequencing. Methods Known or suspected deleterious germline SVs were identified using Illumina genome sequencing across a cohort of 669 advanced cancer patients with paired tumor genome and transcriptome sequencing. Candidate SVs were subsequently assessed by Oxford Nanopore long-read sequencing. Results Nanopore sequencing confirmed eight simple pathogenic or likely pathogenic SVs, resolving three additional variants whose impact could not be fully elucidated through short-read sequencing. A recurrent sequencing artifact on chromosome 16p13 and one complex rearrangement on chromosome 5q35 were subsequently classified as likely benign, obviating the need for further clinical assessment. Variant configuration was further resolved in one case with a complex pathogenic rearrangement affecting TSC2. Conclusion Our findings demonstrate that long-read sequencing can improve the validation, resolution, and classification of germline SVs. This has important implications for return of results, cascade carrier testing, cancer screening, and prophylactic interventions.


2019 ◽  
Vol 8 (34) ◽  
Author(s):  
Natsuki Tomariguchi ◽  
Kentaro Miyazaki

Rubrobacter xylanophilus strain AA3-22, belonging to the phylum Actinobacteria, was isolated from nonvolcanic Arima Onsen (hot spring) in Japan. Here, we report the complete genome sequence of this organism, which was obtained by combining Oxford Nanopore long-read and Illumina short-read sequencing data.


2017 ◽  
Vol 5 (42) ◽  
Author(s):  
S. Wesley Long ◽  
Sarah E. Linson ◽  
Matthew Ojeda Saavedra ◽  
Concepcion Cantu ◽  
James J. Davis ◽  
...  

ABSTRACT In a study of 1,777 Klebsiella strains, we discovered KPN1705, which was distinct from all recognized Klebsiella spp. We closed the genome of strain KPN1705 using a hybrid of Illumina short-read and Oxford Nanopore long-read technologies. For this novel species, we propose the name Klebsiella quasivariicola sp. nov.


Blood ◽  
2018 ◽  
Vol 132 (Supplement 1) ◽  
pp. 1847-1847 ◽  
Author(s):  
Adam Burns ◽  
David Robert Bruce ◽  
Pauline Robbe ◽  
Adele Timbs ◽  
Basile Stamatopoulos ◽  
...  

Abstract Introduction Chronic Lymphocytic Leukaemia (CLL) is the most prevalent leukaemia in the Western world and characterised by clinical heterogeneity. IgHV mutation status, mutations in the TP53 gene and deletions of the p-arm of chromosome 17 are currently used to predict an individual patient's response to therapy and give an indication as to their long-term prognosis. Current clinical guidelines recommend screening patients prior to initial, and any subsequent, treatment. Routine clinical laboratory practices for CLL involve three separate assays, each of which are time-consuming and require significant investment in equipment. Nanopore sequencing offers a rapid, low-cost alternative, generating a full prognostic dataset on a single platform. In addition, Nanopore sequencing also promises low failure rates on degraded material such as FFPE and excellent detection of structural variants due to long read length of sequencing. Importantly, Nanopore technology does not require expensive equipment, is low-maintenance and ideal for patient-near testing, making it an attractive DNA sequencing device for low-to-middle-income countries. Methods Eleven untreated CLL samples were selected for the analysis, harbouring both mutated (n=5) and unmutated (n=6) IgHV genes, seven TP53 mutations (five missense, one stop gain and one frameshift) and two del(17p) events. Primers were designed to amplify all exons of TP53, along with the IgHV locus, and each primer included universal tails for individual sample barcoding. The resulting PCR amplicons were prepared for sequencing using a ligation sequencing kit (SQK-LSK108, Oxford Nanopore Technologies, Oxford, UK). All IgHV libraries were pooled and sequenced on one R9.4 flowcell, with the TP53 libraries pooled and sequenced on a second R9.4 flowcell. Whole genome libraries were prepared from 400ng genomic DNA for each sample using a rapid sequencing kit (SQK-RAD004, Oxford Nanopore Technologies, Oxford, UK), and each sample sequenced on individual flowcells on a MinION mk1b instrument (Oxford Nanopore Technologies, Oxford, UK). We developed a bespoke bioinformatics pipeline to detect copy-number changes, TP53 mutations and IgHV mutation status from the Nanopore sequencing data. Results were compared to short-read sequencing data obtained earlier by targeted deep sequencing (MiSeq, Illumina Inc, San Diego, CA, USA) and whole genome sequencing (HiSeq 2500, Illumina Inc, San Diego CA, USA). Results Following basecalling and adaptor trimming, the raw data were submitted to the IMGT database. In the absence of error correction, it was possible to identify the correct VH family for each sample; however the germline homology was not sufficient to differentiate between IgHVmut and IgHVunmut CLL cases. Following bio-informatic error correction and consensus building, the percentage to germline homology was the same as that obtained from short-read sequencing and nanopore sequencing also called the same productive rearrangements in all cases. A total of 77 TP53 variants were identified, including 68 in non-coding regions, and three synonymous SNVs. The remaining 6 were predicted to be functional variants (eight missense and two stop-gains) and had all been identified in early MiSeq targeted sequencing. However, the frameshift mutation was not called by the analysis pipeline, although it is present in the aligned reads. Using the low-coverage WGS data, we were able to identify del(17p) events, of 19Mb and 20Mb length, in both patients with high confidence. Conclusions Here we demonstrate that characterization of the IgHV locus in CLL cases is possible using the MinION platform, provided sufficient downstream analysis, including error correction, is applied. Furthermore, somatic SNVs in TP53 can be identified, although similar to second generation sequencing, variant calling of small insertions and deletions is more problematic. Identification of del(17p) is possible from low-coverage WGS on the MinION and is inexpensive. Our data demonstrates that Nanopore sequencing can be a viable, patient-near, low-cost alternative to established screening methods, with the potential of diagnostic implementation in resource-poor regions of the world. Disclosures Schuh: Giles, Roche, Janssen, AbbVie: Honoraria.


Plant Disease ◽  
2021 ◽  
Author(s):  
Zhixin Wang ◽  
Jiandong Bao ◽  
Lin Lv ◽  
Lianyu Lin ◽  
Zhiting Li ◽  
...  

Phytophthora colocasiae is a destructive oomycete pathogen of taro (Colocasia esculenta), which causes taro leaf blight. To date, only one highly fragmented Illumina short-read-based genome assembly is available for this species. To address this problem, we sequenced strain Lyd2019 from China using Oxford Nanopore Technologies (ONT) long-read sequencing and Illumina short-read sequencing. We generated a 92.51-Mb genome assembly consisting of 105 contigs with an N50 of 1.70 Mb and a maximum length of 4.17 Mb. In the genome assembly, we identified 52.78% repeats and 18,322 protein-coding genes, of which 12,782 genes were annotated. We also identified 191 candidate RXLR effectors and 1 candidate CRN effectors. The updated near-chromosome genome assembly and annotation resources will provide a better understanding of the infection mechanisms of P. colocasiae.


2019 ◽  
Author(s):  
Zhoutao Chen ◽  
Long Pham ◽  
Tsai-Chin Wu ◽  
Guoya Mo ◽  
Yu Xia ◽  
...  

AbstractLong-range sequencing information is required for haplotype phasing, de novo assembly and structural variation detection. Current long-read sequencing technologies can provide valuable long-range information but at a high cost with low accuracy and high DNA input requirement. We have developed a single-tube Transposase Enzyme Linked Long-read Sequencing (TELL-Seq™) technology, which enables a low-cost, high-accuracy and high-throughput short-read next generation sequencer to routinely generate over 100 Kb long-range sequencing information with as little as 0.1 ng input material. In a PCR tube, millions of clonally barcoded beads are used to uniquely barcode long DNA molecules in an open bulk reaction without dilution and compartmentation. The barcode linked reads are used to successfully assemble genomes ranging from microbes to human. These linked-reads also generate mega-base-long phased blocks and provide a cost-effective tool for detecting structural variants in a genome, which are important to identify compound heterozygosity in recessive Mendelian diseases and discover genetic drivers and diagnostic biomarkers in cancers.


GigaScience ◽  
2020 ◽  
Vol 9 (6) ◽  
Author(s):  
Lisa K Johnson ◽  
Ruta Sahasrabudhe ◽  
James Anthony Gill ◽  
Jennifer L Roach ◽  
Lutz Froenicke ◽  
...  

Abstract Background Whole-genome sequencing data from wild-caught individuals of closely related North American killifish species (Fundulus xenicus, Fundulus catenatus, Fundulus nottii, and Fundulus olivaceus) were obtained using long-read Oxford Nanopore Technology (ONT) PromethION and short-read Illumina platforms. Findings Draft de novo reference genome assemblies were generated using a combination of long and short sequencing reads. For each species, the PromethION platform was used to generate 30–45× sequence coverage, and the Illumina platform was used to generate 50–160× sequence coverage. Illumina-only assemblies were fragmented with high numbers of contigs, while ONT-only assemblies were error prone with low BUSCO scores. The highest N50 values, ranging from 0.4 to 2.7 Mb, were from assemblies generated using a combination of short- and long-read data. BUSCO scores were consistently >90% complete using the Eukaryota database. Conclusions High-quality genomes can be obtained from a combination of using short-read Illumina data to polish assemblies generated with long-read ONT data. Draft assemblies and raw sequencing data are available for public use. We encourage use and reuse of these data for assembly benchmarking and other analyses.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Trent M. Prall ◽  
Emma K. Neumann ◽  
Julie A. Karl ◽  
Cecilia G. Shortreed ◽  
David A. Baker ◽  
...  

Abstract Background Oxford Nanopore Technologies’ instruments can sequence reads of great length. Long reads improve sequence assemblies by unambiguously spanning repetitive elements of the genome. Sequencing reads of significant length requires the preservation of long DNA template molecules through library preparation by pipetting reagents as slowly as possible to minimize shearing. This process is time-consuming and inconsistent at preserving read length as even small changes in volumetric flow rate can result in template shearing. Results We have designed SNAILS (Slow Nucleic Acid Instrument for Long Sequences), a 3D-printable instrument that automates slow pipetting of reagents used in long read library preparation for Oxford Nanopore sequencing. Across six sequencing libraries, SNAILS preserved more reads exceeding 100 kilobases in length and increased its libraries’ average read length over manual slow pipetting. Conclusions SNAILS is a low-cost, easily deployable solution for improving sequencing projects that require reads of significant length. By automating the slow pipetting of library preparation reagents, SNAILS increases the consistency and throughput of long read Nanopore sequencing.


PeerJ ◽  
2019 ◽  
Vol 7 ◽  
pp. e6800 ◽  
Author(s):  
Joanna Warwick-Dugdale ◽  
Natalie Solonenko ◽  
Karen Moore ◽  
Lauren Chittick ◽  
Ann C. Gregory ◽  
...  

Marine viruses impact global biogeochemical cycles via their influence on host community structure and function, yet our understanding of viral ecology is constrained by limitations in host culturing and a lack of reference genomes and ‘universal’ gene markers to facilitate community surveys. Short-read viral metagenomic studies have provided clues to viral function and first estimates of global viral gene abundance and distribution, but their assemblies are confounded by populations with high levels of strain evenness and nucleotide diversity (microdiversity), limiting assembly of some of the most abundant viruses on Earth. Such features also challenge assembly across genomic islands containing niche-defining genes that drive ecological speciation. These populations and features may be successfully captured by single-virus genomics and fosmid-based approaches, at least in abundant taxa, but at considerable cost and technical expertise. Here we established a low-cost, low-input, high throughput alternative sequencing and informatics workflow to improve viral metagenomic assemblies using short-read and long-read technology. The ‘VirION’ (Viral, long-read metagenomics via MinION sequencing) approach was first validated using mock communities where it was found to be as relatively quantitative as short-read methods and provided significant improvements in recovery of viral genomes. We then then applied VirION to the first metagenome from a natural viral community from the Western English Channel. In comparison to a short-read only approach, VirION: (i) increased number and completeness of assembled viral genomes; (ii) captured abundant, highly microdiverse virus populations, and (iii) captured more and longer genomic islands. Together, these findings suggest that VirION provides a high throughput and cost-effective alternative to fosmid and single-virus genomic approaches to more comprehensively explore viral communities in nature.


Sign in / Sign up

Export Citation Format

Share Document