scholarly journals Long single-molecule reads can resolve the complexity of the Influenza virus composed of rare, closely related mutant variants

2016 ◽  
Author(s):  
Alexander Artyomenko ◽  
Nicholas C Wu ◽  
Serghei Mangul ◽  
Eleazar Eskin ◽  
Ren Sun ◽  
...  

AbstractAs a result of a high rate of mutations and recombination events, an RNA-virus exists as a heterogeneous “swarm” of mutant variants. The long read length offered by single-molecule sequencing technologies allows each mutant variant to be sequenced in a single pass. However, high error rate limits the ability to reconstruct heterogeneous viral population composed of rare, related mutant variants. In this paper, we present 2SNV, a method able to tolerate the high error-rate of the single-molecule protocol and reconstruct mutant variants. 2SNV uses linkage between single nucleotide variations to efficiently distinguish them from read errors. To benchmark the sensitivity of 2SNV, we performed a single-molecule sequencing experiment on a sample containing a titrated level of known viral mutant variants. Our method is able to accurately reconstruct clone with frequency of 0.2% and distinguish clones that differed in only two nucleotides distantly located on the genome. 2SNV outperforms existing methods for full-length viral mutant reconstruction. The open source implementation of 2SNV is freely available for download at http://alan.cs.gsu.edu/NGS/?q=content/2snv

2016 ◽  
Author(s):  
Serghei Mangul ◽  
Harry (Taegyun) Yang ◽  
Farhad Hormozdiari ◽  
Elizabeth Tseng ◽  
Alex Zelikovsky ◽  
...  

AbstractSequencing of RNA provides the possibility to study an individual’s transcriptome landscape and determine allelic expression ratios. Single-molecule protocols generate multi-kilobase reads longer than most transcripts allowing sequencing of complete haplotype isoforms. This allows partitioning the reads into two parental haplotypes. While the read length of the single-molecule protocols is long, the relatively high error rate limits the ability to accurately detect the genetic variants and assemble them into the haplotype-specific isoforms. In this paper, we present HapIso (Haplotype-specific Isoform Reconstruction), a method able to tolerate the relatively high error-rate of the single-molecule platform and partition the isoform reads into the parental alleles. Phasing the reads according to the allele of origin allows our method to efficiently distinguish between the read errors and the true biological mutations. HapIso uses a k-means clustering algorithm aiming to group the reads into two meaningful clusters maximizing the similarity of the reads within cluster and minimizing the similarity of the reads from different clusters. Each cluster corresponds to a parental haplotype. We use family pedigree information to evaluate our approach. Experimental validation suggests that HapIso is able to tolerate the relatively high error-rate and accurately partition the reads into the parental alleles of the isoform transcripts. Furthermore, our method is the first method able to reconstruct the haplotype-specific isoforms from long single-molecule reads.The open source Python implementation of HapIso is freely available for download at https://github.com/smangul1/HapIso/


2019 ◽  
Vol 21 (6) ◽  
pp. 1971-1986 ◽  
Author(s):  
Matteo Chiara ◽  
Federico Zambelli ◽  
Ernesto Picardi ◽  
David S Horner ◽  
Graziano Pesole

Abstract A number of studies have reported the successful application of single-molecule sequencing technologies to the determination of the size and sequence of pathological expanded microsatellite repeats over the last 5 years. However, different custom bioinformatics pipelines were employed in each study, preventing meaningful comparisons and somewhat limiting the reproducibility of the results. In this review, we provide a brief summary of state-of-the-art methods for the characterization of expanded repeats alleles, along with a detailed comparison of bioinformatics tools for the determination of repeat length and sequence, using both real and simulated data. Our reanalysis of publicly available human genome sequencing data suggests a modest, but statistically significant, increase of the error rate of single-molecule sequencing technologies at genomic regions containing short tandem repeats. However, we observe that all the methods herein tested, irrespective of the strategy used for the analysis of the data (either based on the alignment or assembly of the reads), show high levels of sensitivity in both the detection of expanded tandem repeats and the estimation of the expansion size, suggesting that approaches based on single-molecule sequencing technologies are highly effective for the detection and quantification of tandem repeat expansions and contractions.


2002 ◽  
Vol 35 (2) ◽  
pp. 169-200 ◽  
Author(s):  
Lilian T. C. França ◽  
Emanuel Carrilho ◽  
Tarso B. L. Kist

1. Summary 1692. Introduction 1703. Sanger's method and other enzymic methods 1703.1 Random approach 1713.2 Direct approach 1713.3 Enzyme technology 1753.4 Sample preparation 1753.5 Labels and DNA labelling 1763.5.1 Radioisotopes 1763.5.2 Chemiluminescent detection 1763.5.3 Fluorescent dyes 1773.6 Fragment separation and analysis 1803.6.1 Electrophoresis 1803.6.2 Mass spectrometry – an alternative 1824. Maxam & Gilbert and other chemical methods 1835. Pyrosequencing – DNA sequencing in real time by the detection of released PPi 1876. Single molecule sequencing with exonuclease 1907. Conclusion 1928. Acknowledgements 1929. References 193The four best known DNA sequencing techniques are reviewed. Important practical issues covered are read-length, speed, accuracy, throughput, cost, as well as the automation of sample handling and preparation. The methods reviewed are: (i) the Sanger method and its most important variants (enzymic methods); (ii) the Maxam & Gilbert method and other chemical methods; (iii) the PyrosequencingTM method – DNA sequencing in real time by the detection of released pyrophosphate (PPi); and (iv) single molecule sequencing with exonuclease (exonuclease digestion of a single molecule composed of a single strand of fluorescently labelled deoxynucleotides). Each method is briefly described, the current literature is covered, advantages, disadvantages, and the most suitable applications of each method are discussed.


2018 ◽  
Vol 1 (4) ◽  
pp. e00086
Author(s):  
S.P. Radko ◽  
L.K. Kurbatov ◽  
K.G. Ptitsyn ◽  
Y.Y. Kiseleva ◽  
E.A. Ponomarenko ◽  
...  

Transcriptome profiling is widely employed to analyze transcriptome dynamics when studying various biological processes at the cell and tissue levels. Unlike the second generation sequencers, which sequence relatively short fragments of nucleic acids, the third generation DNA/RNA sequencers developed by biotechnology companies “PacBio” and “Oxford Nanopore Technologies” allow one to sequence transcripts as single molecules and may be considered as potential molecular counters capable to measure the number of copies of each transcript with high throughput, sensitivity, and specificity. In the present review, the features of single molecule sequencing technologies offered by “PacBio” and “Oxford Nanopore Technologies” are considered alongside with their utility for transcriptome analysis, including the analysis of transcript isoforms. The prospects and limitations of the single molecule sequencing technology in application to quantitative transcriptome profiling are also discussed.


2019 ◽  
Author(s):  
Kuo-Ping Chiu ◽  
Alice L. Yu

Background. It is an important issue whether and how microorganisms can live harmoniously withnormal cells in the circulatory system. Answers to these issues will have enormous impact on medical microbiology. To address these issues, it is essential to identify and characterize the blood-borne microbes in an efficient and comprehensive manner. Methodology. Traditional approaches using PCR or microarray are not suitable for the purpose due to the complexity and composition of large amount of unknown microbial species in the circulatory system. Recent reports indicated that cell-free DNA (cfDNA) sequencing using advanced sequencing technologies, including next-generation sequencing (NGS) and single-molecule sequencing (SMS) together with associated bioinformatics approaches, possess a strong potential enabling us to address these issues at the molecular level. Results. Multiple studies using microbial cfDNA sequencing to identify microbes for septic patients have shown strong agreement with cell culture. Similar approaches have also been applied to reveal previously unidentified microorganisms or to demonstrate the feasibility of comprehensive assessment of bloodborne microorganisms for healthy and/or diseased individuals. Single-molecule sequencing (SMS) using either SMRT (single-molecule real-time) sequencing or Nanopore sequencing are providing new momentum to reinforce this line of investigations. Conclusions. Microbial cfDNA sequencing provides a novel opportunity allowing us to further understand the involvement of blood-borne microbes in development of diseases. Similar approaches should also be applicable to the study of metagenomics for sufficient and comprehensive analysis of microbial species isolated from various environments. This article reviews this line of research and discuss the methodological approaches that have been developed, or are likely to be developed in the future, which may have strong potential to facilitate cfDNA- and cfRNA-based studies of cancer and chronic diseases, in the hope that a better understanding of the hidden microbes in the circulatory system would improve the accuracy of diagnosis, prevention, and treatment of problematic diseases.


2019 ◽  
Author(s):  
Sam Kovaka ◽  
Aleksey V. Zimin ◽  
Geo M. Pertea ◽  
Roham Razaghi ◽  
Steven L. Salzberg ◽  
...  

AbstractRNA sequencing using the latest single-molecule sequencing instruments produces reads that are thousands of nucleotides long. The ability to assemble these long reads can greatly improve the sensitivity of long-read analyses. Here we present StringTie2, a reference-guided transcriptome assembler that works with both short and long reads. StringTie2 includes new computational methods to handle the high error rate of long-read sequencing technology, which previous assemblers could not tolerate. It also offers the ability to work with full-length super-reads assembled from short reads, which further improves the quality of assemblies. On 33 short-read datasets from humans and two plant species, StringTie2 is 47.3% more precise and 3.9% more sensitive than Scallop. On multiple long read datasets, StringTie2 on average correctly assembles 8.3 and 2.6 times as many transcripts as FLAIR and Traphlor, respectively, with substantially higher precision. StringTie2 is also faster and has a smaller memory footprint than all comparable tools.


2020 ◽  
Vol 2 (2) ◽  
Author(s):  
Juliane C Dohm ◽  
Philipp Peters ◽  
Nancy Stralis-Pavese ◽  
Heinz Himmelbauer

Abstract Third-generation sequencing technologies provided by Pacific Biosciences and Oxford Nanopore Technologies generate read lengths in the scale of kilobasepairs. However, these reads display high error rates, and correction steps are necessary to realize their great potential in genomics and transcriptomics. Here, we compare properties of PacBio and Nanopore data and assess correction methods by Canu, MARVEL and proovread in various combinations. We found total error rates of around 13% in the raw datasets. PacBio reads showed a high rate of insertions (around 8%) whereas Nanopore reads showed similar rates for substitutions, insertions and deletions of around 4% each. In data from both technologies the errors were uniformly distributed along reads apart from noisy 5′ ends, and homopolymers appeared among the most over-represented kmers relative to a reference. Consensus correction using read overlaps reduced error rates to about 1% when using Canu or MARVEL after patching. The lowest error rate in Nanopore data (0.45%) was achieved by applying proovread on MARVEL-patched data including Illumina short-reads, and the lowest error rate in PacBio data (0.42%) was the result of Canu correction with minimap2 alignment after patching. Our study provides valuable insights and benchmarks regarding long-read data and correction methods.


2020 ◽  
Vol 15 (2) ◽  
pp. 165-172
Author(s):  
Chaithra Pradeep ◽  
Dharam Nandan ◽  
Arya A. Das ◽  
Dinesh Velayutham

Background: The standard approach for transcriptomic profiling involves high throughput short-read sequencing technology, mainly dominated by Illumina. However, the short reads have limitations in transcriptome assembly and in obtaining full-length transcripts due to the complex nature of transcriptomes with variable length and multiple alternative spliced isoforms. Recent advances in long read sequencing by the Oxford Nanopore Technologies (ONT) offered both cDNA as well as direct RNA sequencing and has brought a paradigm change in the sequencing technology to greatly improve the assembly and expression estimates. ONT enables molecules to be sequenced without fragmentation resulting in ultra-long read length enabling the entire genes and transcripts to be fully characterized. The direct RNA sequencing method, in addition, circumvents the reverse transcription and amplification steps. Objective: In this study, RNA sequencing methods were assessed by comparing data from Illumina (ILM), ONT cDNA (OCD) and ONT direct RNA (ODR). Methods: The sensitivity & specificity of the isoform detection was determined from the data generated by Illumina, ONT cDNA and ONT direct RNA sequencing technologies using Saccharomyces cerevisiae as model. Comparative studies were conducted with two pipelines to detect the isoforms, novel genes and variable gene length. Results: Mapping metrics and qualitative profiles for different pipelines are presented to understand these disruptive technologies. The variability in sequencing technology and the analysis pipeline were studied.


2019 ◽  
Author(s):  
Ruibang Luo ◽  
Chak-Lim Wong ◽  
Yat-Sing Wong ◽  
Chi-Ian Tang ◽  
Chi-Man Liu ◽  
...  

AbstractSingle-molecule sequencing technologies have emerged in recent years and revolutionized structural variant calling, complex genome assembly, and epigenetic mark detection. However, the lack of a highly accurate small variant caller has limited the new technologies from being more widely used. In this study, we present Clair, the successor to Clairvoyante, a program for fast and accurate germline small variant calling, using single molecule sequencing data. For ONT data, Clair achieves the best precision, recall and speed as compared to several competing programs, including Clairvoyante, Longshot and Medaka. Through studying the missed variants and benchmarking intentionally overfitted models, we found that Clair may be approaching the limit of possible accuracy for germline small variant calling using pileup data and deep neural networks. Clair requires only a conventional CPU for variant calling and is an open source project available at https://github.com/HKU-BAL/Clair.


Sign in / Sign up

Export Citation Format

Share Document