scholarly journals Discriminating Clonotypes of Influenza a Virus Genes by Nanopore Sequencing

2021 ◽  
Vol 22 (18) ◽  
pp. 10069
Author(s):  
Ying Cao ◽  
Haizhou Liu ◽  
Yi Yan ◽  
Wenjun Liu ◽  
Di Liu ◽  
...  

Influenza viruses still pose a serious threat to humans, and we have not yet been able to effectively predict future pandemic strains and prepare vaccines in advance. One of the main reasons is the high genetic diversity of influenza viruses. We do not know the individual clonotypes of a virus population because some are the majority and others make up only a small fraction of the population. First-generation (FGS) and next-generation sequencing (NGS) technologies have inherent limitations that are unable to resolve a minority clonotype’s information in the virus population. Third-generation sequencing (TGS) technologies with ultra-long reads have the potential to solve this problem but have a high error rate. Here, we evaluated emerging direct RNA sequencing and cDNA sequencing with the MinION platform and established a novel approach that combines the high accuracy of Illumina sequencing technology and long reads of nanopore sequencing technology to resolve both variants and clonotypes of influenza virus. Furthermore, a new program was written to eliminate the effect of nanopore sequencing errors for the analysis of the results. By using this pipeline, we identified 47 clonotypes in our experiment. We conclude that this approach can quickly discriminate the clonotypes of virus genes, allowing researchers to understand virus adaptation and evolution at the population level.

2020 ◽  
Vol 36 (12) ◽  
pp. 3669-3679 ◽  
Author(s):  
Can Firtina ◽  
Jeremie S Kim ◽  
Mohammed Alser ◽  
Damla Senol Cali ◽  
A Ercument Cicek ◽  
...  

Abstract Motivation Third-generation sequencing technologies can sequence long reads that contain as many as 2 million base pairs. These long reads are used to construct an assembly (i.e. the subject’s genome), which is further used in downstream genome analysis. Unfortunately, third-generation sequencing technologies have high sequencing error rates and a large proportion of base pairs in these long reads is incorrectly identified. These errors propagate to the assembly and affect the accuracy of genome analysis. Assembly polishing algorithms minimize such error propagation by polishing or fixing errors in the assembly by using information from alignments between reads and the assembly (i.e. read-to-assembly alignment information). However, current assembly polishing algorithms can only polish an assembly using reads from either a certain sequencing technology or a small assembly. Such technology-dependency and assembly-size dependency require researchers to (i) run multiple polishing algorithms and (ii) use small chunks of a large genome to use all available readsets and polish large genomes, respectively. Results We introduce Apollo, a universal assembly polishing algorithm that scales well to polish an assembly of any size (i.e. both large and small genomes) using reads from all sequencing technologies (i.e. second- and third-generation). Our goal is to provide a single algorithm that uses read sets from all available sequencing technologies to improve the accuracy of assembly polishing and that can polish large genomes. Apollo (i) models an assembly as a profile hidden Markov model (pHMM), (ii) uses read-to-assembly alignment to train the pHMM with the Forward–Backward algorithm and (iii) decodes the trained model with the Viterbi algorithm to produce a polished assembly. Our experiments with real readsets demonstrate that Apollo is the only algorithm that (i) uses reads from any sequencing technology within a single run and (ii) scales well to polish large assemblies without splitting the assembly into multiple parts. Availability and implementation Source code is available at https://github.com/CMU-SAFARI/Apollo. Supplementary information Supplementary data are available at Bioinformatics online.


2015 ◽  
Author(s):  
Alexis L. Norris ◽  
Rachael E. Workman ◽  
Yunfan Fan ◽  
James R. Eshleman ◽  
Winston Timp

Despite advances in sequencing, structural variants (SVs) remain difficult to reliably detect due to the short read length (<300bp) of 2nd generation sequencing. Not only do the reads (or paired-end reads) need to straddle a breakpoint, but repetitive elements often lead to ambiguities in the alignment of short reads. We propose to use the long-reads (up to 20kb) possible with 3rd generation sequencing, specifically nanopore sequencing on the MinION. Nanopore sequencing relies on a similar concept to a Coulter counter, reading the DNA sequence from the change in electrical current resulting from a DNA strand being forced through a nanometer-sized pore embedded in a membrane. Though nanopore sequencing currently has a relatively high mismatch rate that precludes base substitution and small frameshift mutation detection, its accuracy is sufficient for SV detection because of its long reads. In fact, long reads in some cases may improve SV detection efficiency. We have tested nanopore sequencing to detect a series of well-characterized SVs, including large deletions, inversions, and translocations that inactivate the CDKN2A/p16 and SMAD4/DPC4 tumor suppressor genes in pancreatic cancer. Using PCR amplicon mixes, we have demonstrated that nanopore sequencing can detect large deletions, translocations and inversions at dilutions as low as 1:100, with as few as 500 reads per sample. Given the speed, small footprint, and low capital cost, nanopore sequencing could become the ideal tool for the low-level detection of cancer-associated SVs needed for molecular relapse, early detection, or therapeutic monitoring.


Author(s):  
E. S. Gribchenko

The transcriptome profiles the cv. Frisson mycorrhizal roots and inoculated nitrogen-fixing nodules were investigated using the Oxford Nanopore sequencing technology. A database of gene isoforms and their expression has been created.


2020 ◽  
Author(s):  
Li Hou ◽  
Yadong Wang

Abstract BackgroundIn recent years, because of the development of sequencing technology, long reads were widely used in many studies, include transcriptomics studies. Obviously, Long reads have more advantages than short reads. And long reads align also different from short reads align. Until now Lots of tools can process long RNA-Seq, but there still have some problems need to solve. ResultsWe developed Deep-Long to process long RNA-Seq, Deep-Long is a fast and accurate tool. Deep-Long can handle troubles come from complicated gene structures and sequencing errors well, Deep-Long does well especially on alternative splicing and small exons. When sequencing error rate is low, Deep-Long can rapidly get more accurate results. While sequencing error rate rising, Deep-Long will use more time, but still more fast and accurate than most other tools.ConclusionsDeep-Long is an useful tool to align long RNA-Seq to genome, and Deep-Long can find more exons and splices.


2019 ◽  
Vol 24 (3) ◽  
Author(s):  
Larisa V Gubareva ◽  
Vasiliy P Mishin ◽  
Mira C Patel ◽  
Anton Chesnokov ◽  
Ha T Nguyen ◽  
...  

The anti-influenza therapeutic baloxavir targets cap-dependent endonuclease activity of polymerase acidic (PA) protein. We monitored baloxavir susceptibility in the United States with next generation sequencing analysis supplemented by phenotypic one-cycle infection assay. Analysis of PA sequences of 6,891 influenza A and B viruses collected during 2016/17 and 2017/18 seasons showed amino acid substitutions: I38L (two A(H1N1)pdm09 viruses), E23G (two A(H1N1)pdm09 viruses) and I38M (one A(H3N2) virus); conferring 4–10-fold reduced susceptibility to baloxavir.


2021 ◽  
Author(s):  
Yelena Chernyavskaya ◽  
Xiaofei Zhang ◽  
Jinze Liu ◽  
Jessica S. Blackburn

Nanopore sequencing technology has revolutionized the field of genome biology with its ability to generate extra-long reads that can resolve regions of the genome that were previously inaccessible to short-read sequencing platforms. Although long-read sequencing has been used to resolve several vertebrate genomes, a nanopore-based zebrafish assembly has not yet been released. Over 50% of the zebrafish genome consists of difficult to map, highly repetitive, low complexity elements that pose inherent problems for short-read sequencers and assemblers. We used nanopore sequencing to improve upon and resolve the issues plaguing the current zebrafish reference assembly (GRCz11). Our long-read assembly improved the current resolution of the reference genome by identifying 1,697 novel insertions and deletions over 1Kb in length and placing 106 previously unlocalized scaffolds. We also discovered additional sites of retrotransposon integration previously unreported in GRCz11 and observed their expression in adult zebrafish under physiologic conditions, implying they have active mobility in the zebrafish genome and contribute to the ever-changing genomic landscape.


2019 ◽  
Vol 48 (1) ◽  
pp. 290-303 ◽  
Author(s):  
Christopher E Ellison ◽  
Weihuan Cao

Abstract Illumina sequencing has allowed for population-level surveys of transposable element (TE) polymorphism via split alignment approaches, which has provided important insight into the population dynamics of TEs. However, such approaches are not able to identify insertions of uncharacterized TEs, nor can they assemble the full sequence of inserted elements. Here, we use nanopore sequencing and Hi-C scaffolding to produce de novo genome assemblies for two wild strains of Drosophila melanogaster from the Drosophila Genetic Reference Panel (DGRP). Ovarian piRNA populations and Illumina split-read TE insertion profiles have been previously produced for both strains. We find that nanopore sequencing with Hi-C scaffolding produces highly contiguous, chromosome-length scaffolds, and we identify hundreds of TE insertions that were missed by Illumina-based methods, including a novel micropia-like element that has recently invaded the DGRP population. We also find hundreds of piRNA-producing loci that are specific to each strain. Some of these loci are created by strain-specific TE insertions, while others appear to be epigenetically controlled. Our results suggest that Illumina approaches reveal only a portion of the repetitive sequence landscape of eukaryotic genomes and that population-level resequencing using long reads is likely to provide novel insight into the evolutionary dynamics of repetitive elements.


Biosensors ◽  
2021 ◽  
Vol 11 (2) ◽  
pp. 47 ◽  
Author(s):  
Samantha J. Courtney ◽  
Zachary R. Stromberg ◽  
Jessica Z. Kubicek-Sutherland

Influenza virus poses a threat to global health by causing seasonal outbreaks as well as three pandemics in the 20th century. In humans, disease is primarily caused by influenza A and B viruses, while influenza C virus causes mild disease mostly in children. Influenza D is an emerging virus found in cattle and pigs. To mitigate the morbidity and mortality associated with influenza, rapid and accurate diagnostic tests need to be deployed. However, the high genetic diversity displayed by influenza viruses presents a challenge to the development of a robust diagnostic test. Nucleic acid-based tests are more accurate than rapid antigen tests for influenza and are therefore better candidates to be used in both diagnostic and surveillance applications. Here, we review various nucleic acid-based techniques that have been applied towards the detection of influenza viruses in order to evaluate their utility as both diagnostic and surveillance tools. We discuss both traditional as well as novel methods to detect influenza viruses by covering techniques that require nucleic acid amplification or direct detection of viral RNA as well as comparing advantages and limitations for each method. There has been substantial progress in the development of nucleic acid-based sensing techniques for the detection of influenza virus. However, there is still an urgent need for a rapid and reliable influenza diagnostic test that can be used at point-of-care in order to enhance responsiveness to both seasonal and pandemic influenza outbreaks.


2022 ◽  
Author(s):  
David Pellow ◽  
Abhinav Dutta ◽  
Ron Shamir

As sequencing datasets keep growing larger, time and memory efficiency of read mapping are becoming more critical. Many clever algorithms and data structures were used to develop mapping tools for next generation sequencing, and in the last few years also for third generation long reads. A key idea in mapping algorithms is to sketch sequences with their minimizers. Recently, syncmers were introduced as an alternative sketching method that is more robust to mutations and sequencing errors. Here we introduce parameterized syncmer schemes, and provide a theoretical analysis for multi-parameter schemes. By combining these schemes with downsampling or minimizers we can achieve any desired compression and window guarantee. We introduced syncmer schemes into the popular minimap2 and Winnowmap2 mappers. In tests on simulated and real long read data from a variety of genomes, the syncmer-based algorithms reduced unmapped reads by 20-60% at high compression while using less memory. The advantage of syncmer-based mapping was even more pronounced at lower sequence identity. At sequence identity of 65-75% and medium compression, syncmer mappers had 50-60% fewer unmapped reads, and ∼ 10% fewer of the reads that did map were incorrectly mapped. We conclude that syncmer schemes improve mapping under higher error and mutation rates. This situation happens, for example, when the high error rate of long reads is compounded by a high mutation rate in a cancer tumor, or due to differences between strains of viruses or bacteria.


Sign in / Sign up

Export Citation Format

Share Document