scholarly journals Targeted Long-Read RNA Sequencing Demonstrates Transcriptional Diversity Driven by Splice-Site Variation in MYBPC3

2019 ◽  
Author(s):  
Alexandra Dainis ◽  
Elizabeth Tseng ◽  
Tyson A. Clark ◽  
Ting Hon ◽  
Matthew Wheeler ◽  
...  

ABSTRACTBackgroundClinical sequencing has traditionally focused on genomic DNA through the use of targeted panels and exome sequencing, rather than investigating the potential transcriptomic consequences of disease-associated variants. RNA sequencing has recently been shown to be an effective additional tool for identifying disease-causing variants. We here use targeted long-read genome and transcriptome sequencing to efficiently and economically identify molecular consequences of a rare, disease-associated variant in hypertrophic cardiomyopathy (HCM).Methods and ResultsOur study, which employed both Pacific Biosciences SMRT sequencing and Oxford Nanopore Technologies MinION sequencing, as well as two RNA targeting strategies, identified alternatively-spliced isoforms that resulted from a splice-site variant containing allele in HCM. These included a predicted in-frame exon-skipping event, as well as an abundance of additional isoforms with unexpected intron-inclusion, exon-extension, and pseudo-exon events. The use of long-read RNA sequencing allowed us to not only investigate full length alternatively-spliced transcripts but also to phase them back to the variant-containing allele.ConclusionsWe suggest that targeted, long-read RNA sequencing in conjunction with genome sequencing may provide additional molecular evidence of disease for rare or de novo variants in cardiovascular disease, as well as providing new information about the consequence of these variants on downstream RNA and protein expression.

Author(s):  
Fairlie Reese ◽  
Ali Mortazavi

Abstract Motivation Long-read RNA-sequencing technologies such as PacBio and Oxford Nanopore have discovered an explosion of new transcript isoforms that are difficult to visually analyze using currently available tools. We introduce the Swan Python library, which is designed to analyze and visualize transcript models. Results Swan finds 4909 differentially expressed transcripts between cell lines HepG2 and HFFc6, including 279 that are differentially expressed even though the parent gene is not. Additionally, Swan discovers 285 reproducible exon skipping and 47 intron retention events not recorded in the GENCODE v29 annotation. Availability and implementation The Swan library for Python 3 is available on PyPi at https://pypi.org/project/swan-vis/ and on GitHub at https://github.com/mortazavilab/swan_vis.


Author(s):  
Alexandra Dainis ◽  
Elizabeth Tseng ◽  
Tyson A. Clark ◽  
Ting Hon ◽  
Matthew Wheeler ◽  
...  

2020 ◽  
Author(s):  
Fairlie Reese ◽  
Ali Mortazavi

AbstractMotivationLong-read RNA-sequencing technologies such as PacBio and Oxford Nanopore have discovered an explosion of new transcript isoforms that are difficult to visually analyze using currently available tools. We introduce the Swan Python library, which is designed to analyze and visualize transcript models.ResultsSwan finds 4,909 differentially expressed transcripts between cell lines HepG2 and HFFc6, including 279 that are differentially expressed even though the parent gene is not. Additionally, Swan discovers 1,021 reproducible exon skipping and 73 intron retention events not recorded in the GENCODE v29 annotation.AvailabilityThe Swan library for Python 3 is available on PyPi and on GitHub at https://pypi.org/project/swan-vis/1.0/ and https://github.com/mortazavilab/swan_paper.


Author(s):  
Huan Zhong ◽  
Zongwei Cai ◽  
Zhu Yang ◽  
Yiji Xia

AbstractNAD tagSeq has recently been developed for the identification and characterization of NAD+-capped RNAs (NAD-RNAs). This method adopts a strategy of chemo-enzymatic reactions to label the NAD-RNAs with a synthetic RNA tag before subjecting to the Oxford Nanopore direct RNA sequencing. A computational tool designed for analyzing the sequencing data of tagged RNA will facilitate the broader application of this method. Hence, we introduce TagSeqTools as a flexible, general pipeline for the identification and quantification of tagged RNAs (i.e., NAD+-capped RNAs) using long-read transcriptome sequencing data generated by NAD tagSeq method. TagSeqTools comprises two major modules, TagSeek for differentiating tagged and untagged reads, and TagSeqQuant for the quantitative and further characterization analysis of genes and isoforms. Besides, the pipeline also integrates some advanced functions to identify antisense or splicing, and supports the data reformation for visualization. Therefore, TagSeqTools provides a convenient and comprehensive workflow for researchers to analyze the data produced by the NAD tagSeq method or other tagging-based experiments using Oxford nanopore direct RNA sequencing. The pipeline is available at https://github.com/dorothyzh/TagSeqTools, under Apache License 2.0.


2015 ◽  
Author(s):  
Sara Goodwin ◽  
James Gurtowski ◽  
Scott Ethe-Sayers ◽  
Panchajanya Deshpande ◽  
Michael Schatz ◽  
...  

Monitoring the progress of DNA molecules through a membrane pore has been postulated as a method for sequencing DNA for several decades. Recently, a nanopore-based sequencing instrument, the Oxford Nanopore MinION, has become available that we used for sequencing the S. cerevisiae genome. To make use of these data, we developed a novel open-source hybrid error correction algorithm Nanocorr (https://github.com/jgurtowski/nanocorr) specifically for Oxford Nanopore reads, as existing packages were incapable of assembling the long read lengths (5-50kbp) at such high error rate (between ~5 and 40% error). With this new method we were able to perform a hybrid error correction of the nanopore reads using complementary MiSeq data and produce a de novo assembly that is highly contiguous and accurate: the contig N50 length is more than ten-times greater than an Illumina-only assembly (678kb versus 59.9kbp), and has greater than 99.88% consensus identity when compared to the reference. Furthermore, the assembly with the long nanopore reads presents a much more complete representation of the features of the genome and correctly assembles gene cassettes, rRNAs, transposable elements, and other genomic features that were almost entirely absent in the Illumina-only assembly.


2018 ◽  
Author(s):  
Stáphane Deschamps ◽  
Yun Zhang ◽  
Victor Llaca ◽  
Liang Ye ◽  
Gregory May ◽  
...  

The advent of long-read sequencing technologies has greatly facilitated assemblies of large eukaryotic genomes. In this paper, Oxford Nanopore sequences generated on a MinION sequencer were combined with BioNano Genomics Direct Label and Stain (DLS) optical maps to generate a chromosome-scale de novo assembly of the repeat-rich Sorghum bicolor Tx430 genome. The final hybrid assembly consists of 29 scaffolds, encompassing in most cases entire chromosome arms. It has a scaffold N50 value of 33.28Mbps and covers >90% of Sorghum bicolor expected genome length. A sequence accuracy of 99.67% was obtained in unique regions after aligning contigs against Illumina Tx430 data. Alignments showed that 99.4% of the 34,211 public gene models are present in the assembly, including 94.2% mapping end-to-end. Comparisons of the DLS optical maps against the public Sorghum Bicolor v3.0.1 BTx623 genome assembly suggest the presence of substantial genomic rearrangements whose origin remains to be determined.


2018 ◽  
Author(s):  
Haig Djambazian ◽  
Anthony Bayega ◽  
Konstantina T. Tsoumani ◽  
Efthimia Sagri ◽  
Maria-Eleni Gregoriou ◽  
...  

AbstractLong-read sequencing has greatly contributed to the generation of high quality assemblies, albeit at a high cost. It is also not always clear how to combine sequencing platforms. We sequenced the genome of the olive fruit fly (Bactrocera oleae), the most important pest in the olive fruits agribusiness industry, using Illumina short-reads, mate-pairs, 10x Genomics linked-reads, Pacific Biosciences (PacBio), and Oxford Nanopore Technologies (ONT). The 10x linked-reads assembly gave the most contiguous assembly with an N50 of 2.16 Mb. Scaffolding the linked-reads assembly using long-reads from ONT gave a more contiguous assembly with scaffold N50 of 4.59 Mb. We also present the most extensive transcriptome datasets of the olive fly derived from different tissues and stages of development. Finally, we used the Chromosome Quotient method to identify Y-chromosome scaffolds and show that the long-reads based assembly generates very highly contiguous Y-chromosome assembly.JR is a member of the MinION Access Program (MAP) and has received free-of-charge flow cells and sequencing kits from Oxford Nanopore Technologies for other projects. JR has had no other financial support from ONT.AB has received re-imbursement for travel costs associated with attending Nanopore Community meeting 2018, a meeting organized my Oxford Nanopore Technologies.


GigaScience ◽  
2020 ◽  
Vol 9 (6) ◽  
Author(s):  
Lisa K Johnson ◽  
Ruta Sahasrabudhe ◽  
James Anthony Gill ◽  
Jennifer L Roach ◽  
Lutz Froenicke ◽  
...  

Abstract Background Whole-genome sequencing data from wild-caught individuals of closely related North American killifish species (Fundulus xenicus, Fundulus catenatus, Fundulus nottii, and Fundulus olivaceus) were obtained using long-read Oxford Nanopore Technology (ONT) PromethION and short-read Illumina platforms. Findings Draft de novo reference genome assemblies were generated using a combination of long and short sequencing reads. For each species, the PromethION platform was used to generate 30–45× sequence coverage, and the Illumina platform was used to generate 50–160× sequence coverage. Illumina-only assemblies were fragmented with high numbers of contigs, while ONT-only assemblies were error prone with low BUSCO scores. The highest N50 values, ranging from 0.4 to 2.7 Mb, were from assemblies generated using a combination of short- and long-read data. BUSCO scores were consistently >90% complete using the Eukaryota database. Conclusions High-quality genomes can be obtained from a combination of using short-read Illumina data to polish assemblies generated with long-read ONT data. Draft assemblies and raw sequencing data are available for public use. We encourage use and reuse of these data for assembly benchmarking and other analyses.


2018 ◽  
Author(s):  
Kristoffer Sahlin ◽  
Paul Medvedev

AbstractLong-read sequencing of transcripts with PacBio Iso-Seq and Oxford Nanopore Technologies has proven to be central to the study of complex isoform landscapes in many organisms. However, current de novo transcript reconstruction algorithms from long-read data are limited, leaving the potential of these technologies unfulfilled. A common bottleneck is the dearth of scalable and accurate algorithms for clustering long reads according to their gene family of origin. To address this challenge, we develop isONclust, a clustering algorithm that is greedy (in order to scale) and makes use of quality values (in order to handle variable error rates). We test isONclust on three simulated and five biological datasets, across a breadth of organisms, technologies, and read depths. Our results demonstrate that isONclust is a substantial improvement over previous approaches, both in terms of overall accuracy and/or scalability to large datasets. Our tool is available at https://github.com/ksahlin/isONclust.


Sign in / Sign up

Export Citation Format

Share Document