scholarly journals Long-fragment targeted capture for long read sequencing of plastomes

Author(s):  
Kevin Bethune ◽  
Cédric Mariac ◽  
Marie Couderc ◽  
Nora Scarcelli ◽  
Sylvian Santoni ◽  
...  

Third generation sequencing methods generate significantly longer reads than those produced using alternative sequencing methods. This provides increased possibilities to better study biodiversity, phylogeography and population genetics. We developed a protocol for in-solution enrichment hybridization capture of long DNA fragments applicable to complete chloroplast genomes. The protocol uses cost effective in-house probes developed via long-range PCR and was used in six non-model monocot species (Poaceae: African rice, pearl millet, fonio; and three palm species). DNA was extracted from fresh and silicagel dried leaves. Our protocol successfully captured long read chloroplast fragments (up to 4264 bp median) with an enrichment rate ranging from 15% to 98%. DNA extracted from silicagel dried leaves led to low quality plastome assemblies when compared to freshly extracted DNA. Our protocol could also be generalized to capture long sequences from specific nuclear fragments.

2018 ◽  
Author(s):  
Kevin Bethune ◽  
Cédric Mariac ◽  
Marie Couderc ◽  
Nora Scarcelli ◽  
Sylvian Santoni ◽  
...  

Third generation sequencing methods generate significantly longer reads than those produced using alternative sequencing methods. This provides increased possibilities to better study biodiversity, phylogeography and population genetics. We developed a protocol for in-solution enrichment hybridization capture of long DNA fragments applicable to complete chloroplast genomes. The protocol uses cost effective in-house probes developed via long-range PCR and was used in six non-model monocot species (Poaceae: African rice, pearl millet, fonio; and three palm species). DNA was extracted from fresh and silicagel dried leaves. Our protocol successfully captured long read chloroplast fragments (up to 4264 bp median) with an enrichment rate ranging from 15% to 98%. DNA extracted from silicagel dried leaves led to low quality plastome assemblies when compared to freshly extracted DNA. Our protocol could also be generalized to capture long sequences from specific nuclear fragments.


2021 ◽  
Author(s):  
Zhe Weng ◽  
Fengying Ruan ◽  
Weitian Chen ◽  
Zhe Xie ◽  
Yeming Xie ◽  
...  

The epigenetic modifications of histones are essential marks related to the development and disease pathogenesis, including human cancers. Mapping histone modification has emerged as the widely used tool for studying epigenetic regulation. However, existing approaches limited by fragmentation and short-read sequencing cannot provide information about the long-range chromatin states and represent the average chromatin status in samples. We leveraged the advantage of long read sequencing to develop a method "BIND&MODIFY" for profiling the histone modification of individual DNA fiber. Our approach is based on the recombinant fused protein A-EcoGII, which tethers the methyltransferase EcoGII to the protein binding sites and locally labels the neighboring DNA regions through artificial methylations. We demonstrate that the aggregated BIND&MODIFY signal matches the bulk-level ChIP-seq and CUT&TAG, observe the single-molecule heterogenous histone modification status, and quantify the correlation between distal elements. This method could be an essential tool in the future third-generation sequencing ages.


2021 ◽  
Author(s):  
Jean-Pierre Kocher ◽  
Zachary Stephens ◽  
Daniel O'Brien ◽  
Mrunal Dehankar ◽  
Lewis Roberts ◽  
...  

The integration of viruses into the human genome is known to be associated with tumorigenesis in many cancers, but the accurate detection of integration breakpoints from short read sequencing data is made difficult by human-viral homologies, viral genome heterogeneity, coverage limitations, and other factors. To address this, we present Exogene, a sensitive and efficient workflow for detecting viral integrations from paired-end next generation sequencing data. Exogene's read filtering and breakpoint detection strategies yield integration coordinates that are highly concordant with those found in long read validation sets. We demonstrate this concordance across 6 TCGA Hepatocellular carcinoma (HCC) tumor samples, identifying integrations of hepatitis B virus that are validated by long reads. Additionally, we applied Exogene to targeted capture data from 426 previously studied HCC samples, achieving 98.9% concordance with existing methods and identifying 238 high-confidence integrations that were not previously reported. Exogene is applicable to multiple types of paired-end sequence data, including genome, exome, RNA-Seq or targeted capture.


2020 ◽  
Vol 71 (18) ◽  
pp. 5313-5322 ◽  
Author(s):  
Kathryn Dumschott ◽  
Maximilian H-W Schmidt ◽  
Harmeet Singh Chawla ◽  
Rod Snowdon ◽  
Björn Usadel

Abstract DNA sequencing was dominated by Sanger’s chain termination method until the mid-2000s, when it was progressively supplanted by new sequencing technologies that can generate much larger quantities of data in a shorter time. At the forefront of these developments, long-read sequencing technologies (third-generation sequencing) can produce reads that are several kilobases in length. This greatly improves the accuracy of genome assemblies by spanning the highly repetitive segments that cause difficulty for second-generation short-read technologies. Third-generation sequencing is especially appealing for plant genomes, which can be extremely large with long stretches of highly repetitive DNA. Until recently, the low basecalling accuracy of third-generation technologies meant that accurate genome assembly required expensive, high-coverage sequencing followed by computational analysis to correct for errors. However, today’s long-read technologies are more accurate and less expensive, making them the method of choice for the assembly of complex genomes. Oxford Nanopore Technologies (ONT), a third-generation platform for the sequencing of native DNA strands, is particularly suitable for the generation of high-quality assemblies of highly repetitive plant genomes. Here we discuss the benefits of ONT, especially for the plant science community, and describe the issues that remain to be addressed when using ONT for plant genome sequencing.


2020 ◽  
Vol 10 (4) ◽  
pp. 1193-1196
Author(s):  
Yoshinori Fukasawa ◽  
Luca Ermini ◽  
Hai Wang ◽  
Karen Carty ◽  
Min-Sin Cheung

We propose LongQC as an easy and automated quality control tool for genomic datasets generated by third generation sequencing (TGS) technologies such as Oxford Nanopore technologies (ONT) and SMRT sequencing from Pacific Bioscience (PacBio). Key statistics were optimized for long read data, and LongQC covers all major TGS platforms. LongQC processes and visualizes those statistics automatically and quickly.


2020 ◽  
Author(s):  
Luise Schulte ◽  
Nadine Bernhardt ◽  
Kathleen Stoof-Leichsenring ◽  
Heike Zimmermann ◽  
Luidmila Pestryakova ◽  
...  

<p>Siberian larch (<em>Larix</em> Mill.) forests dominate vast areas of northern Russia and contribute important ecosystem services to the earth. To be able to predict future responses of these forests to a changing climate, it is important to understand also past dynamics of larch populations. One well-preserved archive to study vegetation changes of the past is sedimentary ancient DNA (sedaDNA) extracted from lake sediment cores. We studied a lake sediment core covering 6700 calibrated years BP, from the Taymyr region in northern Siberia. To enrich the sedaDNA for DNA of our focal species <em>Larix</em>, we combine shotgun sequencing and hybridization capture with long-range PCR-generated baits covering the complete <em>Larix</em> chloroplast genome. In comparison to shotgun sequencing, hybridization capture results in an increase of taxonomically classified reads by several orders of magnitude and the recovery of near-complete chloroplast genomes of <em>Larix</em>. Variation in the chloroplast reads confirm an invasion of <em>Larix gmelinii</em> into the range of <em>Larix sibirica</em> before 6700 years ago. In this time span, both species can be detected at the site, although larch populations have decreased from a forested area to a single-tree tundra at present. This study demonstrates for the first time that hybridization capture applied to ancient DNA from lake sediments can provide genome-scale information and is a viable tool for studying past changes of a specific taxon.</p>


PLoS ONE ◽  
2021 ◽  
Vol 16 (9) ◽  
pp. e0250915
Author(s):  
Zachary Stephens ◽  
Daniel O’Brien ◽  
Mrunal Dehankar ◽  
Lewis R. Roberts ◽  
Ravishankar K. Iyer ◽  
...  

The integration of viruses into the human genome is known to be associated with tumorigenesis in many cancers, but the accurate detection of integration breakpoints from short read sequencing data is made difficult by human-viral homologies, viral genome heterogeneity, coverage limitations, and other factors. To address this, we present Exogene, a sensitive and efficient workflow for detecting viral integrations from paired-end next generation sequencing data. Exogene’s read filtering and breakpoint detection strategies yield integration coordinates that are highly concordant with long read validation. We demonstrate this concordance across 6 TCGA Hepatocellular carcinoma (HCC) tumor samples, identifying integrations of hepatitis B virus that are also supported by long reads. Additionally, we applied Exogene to targeted capture data from 426 previously studied HCC samples, achieving 98.9% concordance with existing methods and identifying 238 high-confidence integrations that were not previously reported. Exogene is applicable to multiple types of paired-end sequence data, including genome, exome, RNA-Seq and targeted capture.


2019 ◽  
Author(s):  
Laura H. Tung ◽  
Mingfu Shao ◽  
Carl Kingsford

AbstractThird-generation sequencing technologies benefit transcriptome analysis by generating longer sequencing reads. However, not all single-molecule long reads represent full transcripts due to incomplete cDNA synthesis and the sequencing length limit of the platform. This drives a need for long read transcript assembly. We quantify the benefit that can be achieved by using a transcript assembler on long reads. Adding long-read-specific algorithms, we evolved Scallop to make Scallop-LR, a long-read transcript assembler, to handle the computational challenges arising from long read lengths and high error rates. Analyzing 26 SRA PacBio datasets using Scallop-LR, Iso-Seq Analysis, and StringTie, we quantified the amount by which assembly improved Iso-Seq results. Through combined evaluation methods, we found that Scallop-LR identifies 2100–4000 more (for 18 human datasets) or 1100–2200 more (for eight mouse datasets) known transcripts than Iso-Seq Analysis, which does not do assembly. Further, Scallop-LR finds 2.4–4.4 times more potentially novel isoforms than Iso-Seq Analysis for the human and mouse datasets. StringTie also identifies more transcripts than Iso-Seq Analysis. Adding long-read-specific optimizations in Scallop-LR increases the numbers of predicted known transcripts and potentially novel isoforms for the human transcriptome compared to several recent short-read assemblers (e.g. StringTie). Our findings indicate that transcript assembly by Scallop-LR can reveal a more complete human transcriptome.


2020 ◽  
Author(s):  
Jose M. Haro-Moreno ◽  
Mario López-Pérez ◽  
Francisco Rodríguez-Valera

ABSTRACTBackgroundThird-generation sequencing has penetrated little in metagenomics due to the high error rate and dependence for assembly on short-read designed bioinformatics. However, 2nd generation sequencing metagenomics (mostly Illumina) suffers from limitations, particularly in allowing assembly of microbes with high microdiversity or retrieving the flexible (adaptive) compartment of prokaryotic genomes.ResultsHere we have used different 3rd generation techniques to study the metagenome of a well-known marine sample from the mixed epipelagic water column of the winter Mediterranean. We have compared Oxford Nanopore and PacBio last generation technologies with the classical approach using Illumina short reads followed by assembly. PacBio Sequel II CCS appears particularly suitable for cellular metagenomics due to its low error rate. Long reads allow efficient direct retrieval of complete genes (473M/Tb) and operons before assembly, facilitating annotation and compensates the limitations of short reads or short-read assemblies. MetaSPAdes was the most appropriate assembly program when used in combination with short reads. The assemblies of the long reads allow also the reconstruction of much more complete metagenome-assembled genomes, even from microbes with high microdiversity. The flexible genome of reconstructed MAGs is much more complete and allows rescuing more adaptive genes.ConclusionsFor most applications of metagenomics, from community structure analysis to ecosystem functioning, long-reads should be applied whenever possible. Particularly for in-silico screening of biotechnologically useful genes, or population genomics, long-read metagenomics appears presently as a very fruitful approach and can be used from raw reads, before a computing-demanding (and potentially artefactual) assembly step.


2020 ◽  
Vol 103 (1) ◽  
pp. 273-293
Author(s):  
Leho Tedersoo ◽  
Sten Anslan ◽  
Mohammad Bahram ◽  
Urmas Kõljalg ◽  
Kessy Abarenkov

Sign in / Sign up

Export Citation Format

Share Document