Sequoia: an interactive visual analytics platform for interpretation and feature extraction from nanopore sequencing datasets

Abstract Background Direct-sequencing technologies, such as Oxford Nanopore’s, are delivering long RNA reads with great efficacy and convenience. These technologies afford an ability to detect post-transcriptional modifications at a single-molecule resolution, promising new insights into the functional roles of RNA. However, realizing this potential requires new tools to analyze and explore this type of data. Result Here, we present Sequoia, a visual analytics tool that allows users to interactively explore nanopore sequences. Sequoia combines a Python-based backend with a multi-view visualization interface, enabling users to import raw nanopore sequencing data in a Fast5 format, cluster sequences based on electric-current similarities, and drill-down onto signals to identify properties of interest. We demonstrate the application of Sequoia by generating and analyzing ~ 500k reads from direct RNA sequencing data of human HeLa cell line. We focus on comparing signal features from m6A and m5C RNA modifications as the first step towards building automated classifiers. We show how, through iterative visual exploration and tuning of dimensionality reduction parameters, we can separate modified RNA sequences from their unmodified counterparts. We also document new, qualitative signal signatures that characterize these modifications from otherwise normal RNA bases, which we were able to discover from the visualization. Conclusions Sequoia’s interactive features complement existing computational approaches in nanopore-based RNA workflows. The insights gleaned through visual analysis should help users in developing rationales, hypotheses, and insights into the dynamic nature of RNA. Sequoia is available at https://github.com/dnonatar/Sequoia.

Download Full-text

TV-MV Analytics: A visual analytics framework to explore time-varying multivariate data

Information Visualization ◽

10.1177/1473871619858937 ◽

2019 ◽

Vol 19 (1) ◽

pp. 3-23

Author(s):

Aurea Soriano-Vargas ◽

Bernd Hamann ◽

Maria Cristina F de Oliveira

Keyword(s):

Visual Analytics ◽

Visual Analysis ◽

Multivariate Data ◽

Visual Exploration ◽

Data Sets ◽

Time Varying ◽

Domain Experts ◽

Data Mining Algorithms ◽

Temporal Relationships ◽

Visualization Techniques

We present an integrated interactive framework for the visual analysis of time-varying multivariate data sets. As part of our research, we performed in-depth studies concerning the applicability of visualization techniques to obtain valuable insights. We consolidated the considered analysis and visualization methods in one framework, called TV-MV Analytics. TV-MV Analytics effectively combines visualization and data mining algorithms providing the following capabilities: (1) visual exploration of multivariate data at different temporal scales, and (2) a hierarchical small multiples visualization combined with interactive clustering and multidimensional projection to detect temporal relationships in the data. We demonstrate the value of our framework for specific scenarios, by studying three use cases that were validated and discussed with domain experts.

Download Full-text

A Transposon Story: From TE Content to TE Dynamic Invasion of Drosophila Genomes Using the Single-Molecule Sequencing Technology from Oxford Nanopore

Cells ◽

10.3390/cells9081776 ◽

2020 ◽

Vol 9 (8) ◽

pp. 1776

Author(s):

Mourdas Mohamed ◽

Nguyet Thi-Minh Dang ◽

Yuki Ogyama ◽

Nelly Burlet ◽

Bruno Mugat ◽

...

Keyword(s):

Single Molecule ◽

Wild Type ◽

Sequencing Data ◽

Short Read ◽

Short Read Sequencing ◽

Sequencing Technologies ◽

Oxford Nanopore ◽

In The Wild ◽

Successive Generations ◽

Type Strains

Transposable elements (TEs) are the main components of genomes. However, due to their repetitive nature, they are very difficult to study using data obtained with short-read sequencing technologies. Here, we describe an efficient pipeline to accurately recover TE insertion (TEI) sites and sequences from long reads obtained by Oxford Nanopore Technology (ONT) sequencing. With this pipeline, we could precisely describe the landscapes of the most recent TEIs in wild-type strains of Drosophila melanogaster and Drosophila simulans. Their comparison suggests that this subset of TE sequences is more similar than previously thought in these two species. The chromosome assemblies obtained using this pipeline also allowed recovering piRNA cluster sequences, which was impossible using short-read sequencing. Finally, we used our pipeline to analyze ONT sequencing data from a D. melanogaster unstable line in which LTR transposition was derepressed for 73 successive generations. We could rely on single reads to identify new insertions with intact target site duplications. Moreover, the detailed analysis of TEIs in the wild-type strains and the unstable line did not support the trap model claiming that piRNA clusters are hotspots of TE insertions.

Download Full-text

Evaluation of Germline Structural Variant Calling Methods for Nanopore Sequencing Data

Frontiers in Genetics ◽

10.3389/fgene.2021.761791 ◽

2021 ◽

Vol 12 ◽

Author(s):

Davide Bolognini ◽

Alberto Magi

Keyword(s):

Variant Calling ◽

Research Report ◽

Nanopore Sequencing ◽

Sequencing Data ◽

Factors Affecting ◽

Sequencing Technologies ◽

Long Reads ◽

Oxford Nanopore ◽

Sequencing Studies ◽

Long Read

Structural variants (SVs) are genomic rearrangements that involve at least 50 nucleotides and are known to have a serious impact on human health. While prior short-read sequencing technologies have often proved inadequate for a comprehensive assessment of structural variation, more recent long reads from Oxford Nanopore Technologies have already been proven invaluable for the discovery of large SVs and hold the potential to facilitate the resolution of the full SV spectrum. With many long-read sequencing studies to follow, it is crucial to assess factors affecting current SV calling pipelines for nanopore sequencing data. In this brief research report, we evaluate and compare the performances of five long-read SV callers across four long-read aligners using both real and synthetic nanopore datasets. In particular, we focus on the effects of read alignment, sequencing coverage, and variant allele depth on the detection and genotyping of SVs of different types and size ranges and provide insights into precision and recall of SV callsets generated by integrating the various long-read aligners and SV callers. The computational pipeline we propose is publicly available at https://github.com/davidebolo1993/EViNCe and can be adjusted to further evaluate future nanopore sequencing datasets.

Download Full-text

Cas9-Assisted Targeting of CHromosome segments (CATCH) for targeted nanopore sequencing and optical genome mapping

10.1101/110163 ◽

2017 ◽

Cited By ~ 5

Author(s):

Tslil Gabrieli ◽

Hila Sharim ◽

Yael Michaeli ◽

Yuval Ebenstein

Keyword(s):

Single Molecule ◽

Genome Mapping ◽

Single Point ◽

Read Length ◽

Whole Genome ◽

Sequencing Analysis ◽

Nanopore Sequencing ◽

Sequencing Data ◽

Whole Genome Analysis ◽

Long Read

ABSTRACTVariations in the genetic code, from single point mutations to large structural or copy number alterations, influence susceptibility, onset, and progression of genetic diseases and tumor transformation. Next-generation sequencing analysis is unable to reliably capture aberrations larger than the typical sequencing read length of several hundred bases. Long-read, single-molecule sequencing methods such as SMRT and nanopore sequencing can address larger variations, but require costly whole genome analysis. Here we describe a method for isolation and enrichment of a large genomic region of interest for targeted analysis based on Cas9 excision of two sites flanking the target region and isolation of the excised DNA segment by pulsed field gel electrophoresis. The isolated target remains intact and is ideally suited for optical genome mapping and long-read sequencing at high coverage. In addition, analysis is performed directly on native genomic DNA that retains genetic and epigenetic composition without amplification bias. This method enables detection of mutations and structural variants as well as detailed analysis by generation of hybrid scaffolds composed of optical maps and sequencing data at a fraction of the cost of whole genome sequencing.

Download Full-text

Critical assessment of bioinformatics methods for the characterization of pathological repeat expansions with single-molecule sequencing data

Briefings in Bioinformatics ◽

10.1093/bib/bbz099 ◽

2019 ◽

Vol 21 (6) ◽

pp. 1971-1986 ◽

Cited By ~ 1

Author(s):

Matteo Chiara ◽

Federico Zambelli ◽

Ernesto Picardi ◽

David S Horner ◽

Graziano Pesole

Keyword(s):

Single Molecule ◽

Tandem Repeats ◽

Simulated Data ◽

Detailed Comparison ◽

Sequencing Data ◽

Single Molecule Sequencing ◽

Sequencing Technologies ◽

Repeat Expansions

Abstract A number of studies have reported the successful application of single-molecule sequencing technologies to the determination of the size and sequence of pathological expanded microsatellite repeats over the last 5 years. However, different custom bioinformatics pipelines were employed in each study, preventing meaningful comparisons and somewhat limiting the reproducibility of the results. In this review, we provide a brief summary of state-of-the-art methods for the characterization of expanded repeats alleles, along with a detailed comparison of bioinformatics tools for the determination of repeat length and sequence, using both real and simulated data. Our reanalysis of publicly available human genome sequencing data suggests a modest, but statistically significant, increase of the error rate of single-molecule sequencing technologies at genomic regions containing short tandem repeats. However, we observe that all the methods herein tested, irrespective of the strategy used for the analysis of the data (either based on the alignment or assembly of the reads), show high levels of sensitivity in both the detection of expanded tandem repeats and the estimation of the expansion size, suggesting that approaches based on single-molecule sequencing technologies are highly effective for the detection and quantification of tandem repeat expansions and contractions.

Download Full-text

Perspectives and benefits of high-throughput long-read sequencing in microbial ecology

Applied and Environmental Microbiology ◽

10.1128/aem.00626-21 ◽

2021 ◽

Author(s):

Leho Tedersoo ◽

Mads Albertsen ◽

Sten Anslan ◽

Benjamin Callahan

Keyword(s):

Microbial Ecology ◽

High Throughput ◽

Single Molecule ◽

High Throughput Sequencing ◽

Environmental Dna ◽

Nanopore Sequencing ◽

High Quality ◽

Short Read ◽

Sequencing Technologies ◽

Long Read

Short-read, high-throughput sequencing (HTS) methods have yielded numerous important insights into microbial ecology and function. Yet, in many instances short-read HTS techniques are suboptimal, for example by providing insufficient phylogenetic resolution or low integrity of assembled genomes. Single-molecule and synthetic long-read (SLR) HTS methods have successfully ameliorated these limitations. In addition, nanopore sequencing has generated a number of unique analysis opportunities such as rapid molecular diagnostics and direct RNA sequencing, and both PacBio and nanopore sequencing support detection of epigenetic modifications. Although initially suffering from relatively low sequence quality, recent advances have greatly improved the accuracy of long read sequencing technologies. In spite of great technological progress in recent years, the long-read HTS methods (PacBio and nanopore sequencing) are still relatively costly, require large amounts of high-quality starting material, and commonly need specific solutions in various analysis steps. Despite these challenges, long-read sequencing technologies offer high-quality, cutting-edge alternatives for testing hypotheses about microbiome structure and functioning as well as assembly of eukaryote genomes from complex environmental DNA samples.

Download Full-text

Super resolution imaging of a distinct chromatin loop in human lymphoblastoid cells

10.1101/621920 ◽

2019 ◽

Author(s):

Jacqueline Jufen Zhu ◽

Zofia Parteka ◽

Byoungkoo Lee ◽

Przemyslaw Szalaj ◽

Ping Wang ◽

...

Keyword(s):

Single Molecule ◽

Genome Structure ◽

Three Dimensional ◽

Super Resolution ◽

Chromatin Loop ◽

Imaging Data ◽

Sequencing Data ◽

Cellular Functions ◽

Sequencing Technologies ◽

Chromatin Folding

AbstractThe three-dimensional genome structure plays a fundamental role in gene regulation and cellular functions. Recent studies in genomics based on sequencing technologies inferred the very basic functional chromatin folding structures of the genome known as chromatin loops, the long-range chromatin interactions that are often mediated by protein factors. To visualize the looping structure of chromatin we applied super-resolution microscopy iPALM to image a specific chromatin loop in GM12878 cells. Totally, we have generated six images of the target chromatin region at the single molecule resolution. To infer the chromatin structures from the captured images, we modeled them as looping conformations using different computational algorithms and then evaluated the models by comparing with Hi-C data to examine the concordance. The results showed a good correlation between the imaging data and sequencing data, suggesting the visualization of higher-order chromatin structures for the very short genomic segments can be realized by microscopic imaging.

Download Full-text

DNA methylation-calling tools for Oxford Nanopore sequencing: a survey and human epigenome-wide evaluation

Genome Biology ◽

10.1186/s13059-021-02510-z ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Yang Liu ◽

Wojciech Rosikiewicz ◽

Ziwei Pan ◽

Nathaniel Jillette ◽

Ping Wang ◽

...

Keyword(s):

Dna Methylation ◽

Single Molecule ◽

Evaluation Criteria ◽

Systematic Evaluation ◽

Whole Genome ◽

Nanopore Sequencing ◽

Sequencing Data ◽

Long Read ◽

Genome Scale ◽

Analytical Tools

Abstract Background Nanopore long-read sequencing technology greatly expands the capacity of long-range, single-molecule DNA-modification detection. A growing number of analytical tools have been developed to detect DNA methylation from nanopore sequencing reads. Here, we assess the performance of different methylation-calling tools to provide a systematic evaluation to guide researchers performing human epigenome-wide studies. Results We compare seven analytic tools for detecting DNA methylation from nanopore long-read sequencing data generated from human natural DNA at a whole-genome scale. We evaluate the per-read and per-site performance of CpG methylation prediction across different genomic contexts, CpG site coverage, and computational resources consumed by each tool. The seven tools exhibit different performances across the evaluation criteria. We show that the methylation prediction at regions with discordant DNA methylation patterns, intergenic regions, low CG density regions, and repetitive regions show room for improvement across all tools. Furthermore, we demonstrate that 5hmC levels at least partly contribute to the discrepancy between bisulfite and nanopore sequencing. Lastly, we provide an online DNA methylation database (https://nanome.jax.org) to display the DNA methylation levels detected by nanopore sequencing and bisulfite sequencing data across different genomic contexts. Conclusions Our study is the first systematic benchmark of computational methods for detection of mammalian whole-genome DNA modifications in nanopore sequencing. We provide a broad foundation for cross-platform standardization and an evaluation of analytical tools designed for genome-scale modified base detection using nanopore sequencing.

Download Full-text

A single chromosome assembly of Bacteroides fragilis strain BE1 from Illumina and MinION nanopore sequencing data

10.1101/024323 ◽

2015 ◽

Cited By ~ 2

Author(s):

Judith Risse ◽

Marian Thomson ◽

Garry Blakely ◽

Georgios Koutsovoulos ◽

Mark Blaxter ◽

...

Keyword(s):

Single Molecule ◽

Bacteroides Fragilis ◽

Illumina Miseq ◽

Sequencing Data ◽

Single Chromosome ◽

Sequencing Technologies ◽

Third Generation Sequencing ◽

Oxford Nanopore ◽

Bacterial Genomics ◽

Commodity Computing

Background Second and third generation sequencing technologies have revolutionised bacterial genomics. Short-read Illumina reads result in cheap but fragmented assemblies, whereas longer reads are more expensive but result in more complete genomes. The Oxford Nanopore MinION device is a revolutionary mobile sequencer that can produce thousands of long, single molecule reads. Results We sequenced Bacteroides fragilis strain BE1 using both the Illumina MiSeq and Oxford Nanopore MinION platforms. We were able to assemble a single chromosome of 5.18 Mb, with no gaps, using publicly available software and commodity computing hardware. We identified gene rearrangements and the state of invertible promoters in the strain. Conclusions The single chromosome assembly of Bacteroides fragilis strain BE1 was achieved using only modest amounts of data, publicly available software and commodity computing hardware. This combination of technologies offers the possibility of ultra-cheap, high quality, finished bacterial genomes.

Download Full-text

Visual analysis of contagion in networks

Information Visualization ◽

10.1177/1473871613487087 ◽

2013 ◽

Vol 14 (2) ◽

pp. 93-110 ◽

Cited By ~ 12

Author(s):

Tatiana von Landesberger ◽

Simon Diel ◽

Sebastian Bremm ◽

Dieter W Fellner

Keyword(s):

Visual Analytics ◽

Visual Analysis ◽

Visual Exploration ◽

Financial Networks ◽

Analysis Techniques ◽

Contagion Effects ◽

Or Gene ◽

A Chain ◽

Diffusion Prediction ◽

Contagion Process

Contagion is a process whereby the collapse of a node in a network leads to the collapse of neighboring nodes and thereby sets off a chain reaction in the network. It thus creates a special type of time-dependent network. Such processes are studied in various applications, for example, in financial network analysis, infection diffusion prediction, supply-chain management, or gene regulation. Visual analytics methods can help analysts examine contagion effects. For this purpose, network visualizations need to be complemented with specific features to illustrate the contagion process. Moreover, new visual analysis techniques for comparison of contagion need to be developed. In this paper, we propose a system geared to the visual analysis of contagion. It includes the simulation of contagion effects as well as their visual exploration. We present new tools able to compare the evolution of the different contagion processes. In this way, propagation of disturbances can be effectively analyzed. We focus on financial networks; however, our system can be applied to other use cases as well.

Download Full-text