scholarly journals CODEC enables 'single duplex' sequencing

2021 ◽  
Author(s):  
Jin H. Bae ◽  
Ruolin Liu ◽  
Erica Nguyen ◽  
Justin Rhoades ◽  
Timothy Blewett ◽  
...  

Detecting mutations as rare as a single molecule is crucial in many fields such as cancer diagnostics and aging research but remains challenging. Third generation sequencers can read a double-stranded DNA molecule (a 'single duplex') in whole to identify true mutations on both strands apart from false mutations on either strand but with limited accuracy and throughput. Although next generation sequencing (NGS) can track dissociated strands with Duplex Sequencing, the need to sequence each strand independently severely diminishes its throughput. Here, we developed a hybrid method called Concatenating Original Duplex for Error Correction (CODEC) that combines the massively parallel nature of NGS with the single-molecule capability of third generation sequencing. CODEC physically links both strands to enable NGS to sequence a single duplex with a single read pair. By comparing CODEC and Duplex Sequencing, we showed that CODEC achieved a similar error rate (10-6) with 100 times fewer reads and conferred 'single duplex' resolution to most major NGS workflows.

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Xiaoying Fan ◽  
Cheng Yang ◽  
Wen Li ◽  
Xiuzhen Bai ◽  
Xin Zhou ◽  
...  

AbstractThere is no effective way to detect structure variations (SVs) and extra-chromosomal circular DNAs (ecDNAs) at single-cell whole-genome level. Here, we develop a novel third-generation sequencing platform-based single-cell whole-genome sequencing (scWGS) method named SMOOTH-seq (single-molecule real-time sequencing of long fragments amplified through transposon insertion). We evaluate the method for detecting CNVs, SVs, and SNVs in human cancer cell lines and a colorectal cancer sample and show that SMOOTH-seq reliably and effectively detects SVs and ecDNAs in individual cells, but shows relatively limited accuracy in detection of CNVs and SNVs. SMOOTH-seq opens a new chapter in scWGS as it generates high fidelity reads of kilobases long.


2016 ◽  
Author(s):  
Hayan Lee ◽  
James Gurtowski ◽  
Shinjae Yoo ◽  
Maria Nattestad ◽  
Shoshana Marcus ◽  
...  

AbstractThird-generation long-range DNA sequencing and mapping technologies are creating a renaissance in high-quality genome sequencing. Unlike second-generation sequencing, which produces short reads a few hundred base-pairs long, third-generation single-molecule technologies generate over 10,000 bp reads or map over 100,000 bp molecules. We analyze how increased read lengths can be used to address longstanding problems in de novo genome assembly, structural variation analysis and haplotype phasing.


Author(s):  
P.D.N. HEBERT ◽  
◽  
T.W.A. BRAUKMANN ◽  
S.W.J. PROSSER ◽  
S. RATNASINGHAM ◽  
...  

2020 ◽  
Vol 15 ◽  
Author(s):  
Hongdong Li ◽  
Wenjing Zhang ◽  
Yuwen Luo ◽  
Jianxin Wang

Aims: Accurately detect isoforms from third generation sequencing data. Background: Transcriptome annotation is the basis for the analysis of gene expression and regulation. The transcriptome annotation of many organisms such as humans is far from incomplete, due partly to the challenge in the identification of isoforms that are produced from the same gene through alternative splicing. Third generation sequencing (TGS) reads provide unprecedented opportunity for detecting isoforms due to their long length that exceeds the length of most isoforms. One limitation of current TGS reads-based isoform detection methods is that they are exclusively based on sequence reads, without incorporating the sequence information of known isoforms. Objective: Develop an efficient method for isoform detection. Method: Based on annotated isoforms, we propose a splice isoform detection method called IsoDetect. First, the sequence at exon-exon junction is extracted from annotated isoforms as the “short feature sequence”, which is used to distinguish different splice isoforms. Second, we aligned these feature sequences to long reads and divided long reads into groups that contain the same set of feature sequences, thereby avoiding the pair-wise comparison among the large number of long reads. Third, clustering and consensus generation are carried out based on sequence similarity. For the long reads that do not contain any short feature sequence, clustering analysis based on sequence similarity is performed to identify isoforms. Result: Tested on two datasets from Calypte Anna and Zebra Finch, IsoDetect showed higher speed and compelling accuracy compared with four existing methods. Conclusion: IsoDetect is a promising method for isoform detection. Other: This paper was accepted by the CBC2019 conference.


Cancers ◽  
2021 ◽  
Vol 13 (15) ◽  
pp. 3827
Author(s):  
Jae Young Hur ◽  
Kye Young Lee

Extracellular vesicles (EVs) carry RNA, proteins, lipids, and diverse biomolecules for intercellular communication. Recent studies have reported that EVs contain double-stranded DNA (dsDNA) and oncogenic mutant DNA. The advantage of EV-derived DNA (EV DNA) over cell-free DNA (cfDNA) is the stability achieved through the encapsulation in the lipid bilayer of EVs, which protects EV DNA from degradation by external factors. The existence of DNA and its stability make EVs a useful source of biomarkers. However, fundamental research on EV DNA remains limited, and many aspects of EV DNA are poorly understood. This review examines the known characteristics of EV DNA, biogenesis of DNA-containing EVs, methylation, and next-generation sequencing (NGS) analysis using EV DNA for biomarker detection. On the basis of this knowledge, this review explores how EV DNA can be incorporated into diagnosis and prognosis in clinical settings, as well as gene transfer of EV DNA and its therapeutic potential.


2020 ◽  
Vol 36 (12) ◽  
pp. 3669-3679 ◽  
Author(s):  
Can Firtina ◽  
Jeremie S Kim ◽  
Mohammed Alser ◽  
Damla Senol Cali ◽  
A Ercument Cicek ◽  
...  

Abstract Motivation Third-generation sequencing technologies can sequence long reads that contain as many as 2 million base pairs. These long reads are used to construct an assembly (i.e. the subject’s genome), which is further used in downstream genome analysis. Unfortunately, third-generation sequencing technologies have high sequencing error rates and a large proportion of base pairs in these long reads is incorrectly identified. These errors propagate to the assembly and affect the accuracy of genome analysis. Assembly polishing algorithms minimize such error propagation by polishing or fixing errors in the assembly by using information from alignments between reads and the assembly (i.e. read-to-assembly alignment information). However, current assembly polishing algorithms can only polish an assembly using reads from either a certain sequencing technology or a small assembly. Such technology-dependency and assembly-size dependency require researchers to (i) run multiple polishing algorithms and (ii) use small chunks of a large genome to use all available readsets and polish large genomes, respectively. Results We introduce Apollo, a universal assembly polishing algorithm that scales well to polish an assembly of any size (i.e. both large and small genomes) using reads from all sequencing technologies (i.e. second- and third-generation). Our goal is to provide a single algorithm that uses read sets from all available sequencing technologies to improve the accuracy of assembly polishing and that can polish large genomes. Apollo (i) models an assembly as a profile hidden Markov model (pHMM), (ii) uses read-to-assembly alignment to train the pHMM with the Forward–Backward algorithm and (iii) decodes the trained model with the Viterbi algorithm to produce a polished assembly. Our experiments with real readsets demonstrate that Apollo is the only algorithm that (i) uses reads from any sequencing technology within a single run and (ii) scales well to polish large assemblies without splitting the assembly into multiple parts. Availability and implementation Source code is available at https://github.com/CMU-SAFARI/Apollo. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document