Image-based representation of massive spatial transcriptomics datasets

We present STIM, an imaging-based computational framework for exploring, visualizing, and processing high-throughput spatial sequencing datasets. STIM is built on the powerful ImgLib2, N5 and BigDataViewer (BDV) frameworks enabling transfer of computer vision techniques to datasets with irregular measurement-spacing and arbitrary spatial resolution, such as spatial transcriptomics data generated by multiplexed targeted hybridization or spatial sequencing technologies. We illustrate STIM's capabilities by representing, visualizing, and automatically registering publicly available spatial sequencing data from 14 serial sections of mouse brain tissue.

Download Full-text

Advancing clinical genomics and precision medicine with GVViZ: FAIR bioinformatics platform for variable gene-disease annotation, visualization, and expression analysis

Human Genomics ◽

10.1186/s40246-021-00336-1 ◽

2021 ◽

Vol 15 (1) ◽

Author(s):

Zeeshan Ahmed ◽

Eduard Gibert Renart ◽

Saman Zeeshan ◽

XinQi Dong

Keyword(s):

Data Analysis ◽

Patient Care ◽

Expression Analysis ◽

High Throughput ◽

Gene Annotation ◽

Next Generation Sequencing Data ◽

Rna Seq ◽

Sequencing Data ◽

Complex Disorders ◽

Transcriptomics Data

Abstract Background Genetic disposition is considered critical for identifying subjects at high risk for disease development. Investigating disease-causing and high and low expressed genes can support finding the root causes of uncertainties in patient care. However, independent and timely high-throughput next-generation sequencing data analysis is still a challenge for non-computational biologists and geneticists. Results In this manuscript, we present a findable, accessible, interactive, and reusable (FAIR) bioinformatics platform, i.e., GVViZ (visualizing genes with disease-causing variants). GVViZ is a user-friendly, cross-platform, and database application for RNA-seq-driven variable and complex gene-disease data annotation and expression analysis with a dynamic heat map visualization. GVViZ has the potential to find patterns across millions of features and extract actionable information, which can support the early detection of complex disorders and the development of new therapies for personalized patient care. The execution of GVViZ is based on a set of simple instructions that users without a computational background can follow to design and perform customized data analysis. It can assimilate patients’ transcriptomics data with the public, proprietary, and our in-house developed gene-disease databases to query, easily explore, and access information on gene annotation and classified disease phenotypes with greater visibility and customization. To test its performance and understand the clinical and scientific impact of GVViZ, we present GVViZ analysis for different chronic diseases and conditions, including Alzheimer’s disease, arthritis, asthma, diabetes mellitus, heart failure, hypertension, obesity, osteoporosis, and multiple cancer disorders. The results are visualized using GVViZ and can be exported as image (PNF/TIFF) and text (CSV) files that include gene names, Ensembl (ENSG) IDs, quantified abundances, expressed transcript lengths, and annotated oncology and non-oncology diseases. Conclusions We emphasize that automated and interactive visualization should be an indispensable component of modern RNA-seq analysis, which is currently not the case. However, experts in clinics and researchers in life sciences can use GVViZ to visualize and interpret the transcriptomics data, making it a powerful tool to study the dynamics of gene expression and regulation. Furthermore, with successful deployment in clinical settings, GVViZ has the potential to enable high-throughput correlations between patient diagnoses based on clinical and transcriptomics data.

Download Full-text

Gigantic Montages with a Fully Automated FE-SEM (Serial Sections of a Mouse Brain Tissue)

Microscopy and Microanalysis ◽

10.1017/s143192761005628x ◽

2010 ◽

Vol 16 (S2) ◽

pp. 52-53 ◽

Cited By ~ 5

Author(s):

K Ogura ◽

M Yamada ◽

O Hirahara ◽

M Mita ◽

N Erdman ◽

...

Keyword(s):

Brain Tissue ◽

Mouse Brain ◽

Serial Sections

Extended abstract of a paper presented at Microscopy and Microanalysis 2010 in Portland, Oregon, USA, August 1 – August 5, 2010.

Download Full-text

ChiTaH: a fast and accurate tool for identifying known human chimeric sequences from high-throughput sequencing data

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqab112 ◽

2021 ◽

Vol 3 (4) ◽

Author(s):

Rajesh Detroja ◽

Alessandro Gorohovski ◽

Olawumi Giwa ◽

Gideon Baum ◽

Milana Frenkel-Morgenstern

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Complex Disease ◽

Single Cells ◽

Reference Database ◽

Sequencing Data ◽

Sequencing Technologies ◽

High Throughput Sequencing Data ◽

Chimeric Rnas ◽

Sensitivity Specificity

Abstract Fusion genes or chimeras typically comprise sequences from two different genes. The chimeric RNAs of such joined sequences often serve as cancer drivers. Identifying such driver fusions in a given cancer or complex disease is important for diagnosis and treatment. The advent of next-generation sequencing technologies, such as DNA-Seq or RNA-Seq, together with the development of suitable computational tools, has made the global identification of chimeras in tumors possible. However, the testing of over 20 computational methods showed these to be limited in terms of chimera prediction sensitivity, specificity, and accurate quantification of junction reads. These shortcomings motivated us to develop the first ‘reference-based’ approach termed ChiTaH (Chimeric Transcripts from High–throughput sequencing data). ChiTaH uses 43,466 non–redundant known human chimeras as a reference database to map sequencing reads and to accurately identify chimeric reads. We benchmarked ChiTaH and four other methods to identify human chimeras, leveraging both simulated and real sequencing datasets. ChiTaH was found to be the most accurate and fastest method for identifying known human chimeras from simulated and sequencing datasets. Moreover, especially ChiTaH uncovered heterogeneity of the BCR-ABL1 chimera in both bulk and single-cells of the K-562 cell line, which was confirmed experimentally.

Download Full-text

Fast Assembling of Neuron Fragments in Serial 3D Sections

10.1101/097253 ◽

2016 ◽

Cited By ~ 1

Author(s):

Hanbo Chen ◽

Daniel M. Iascone ◽

Nuno Macarico da Costa ◽

Ed S. Lein ◽

Tianming Liu ◽

...

Keyword(s):

Brain Tissue ◽

High Throughput ◽

Brain Mapping ◽

Efficient Algorithm ◽

Serial Sections ◽

3D Image ◽

Software Suite ◽

User Friendly ◽

Human And Mouse

Abstract:Reconstructing neurons from 3D image-stacks of serial sections of thick brain tissue is very time-consuming and often becomes a bottleneck in high-throughput brain mapping projects. We developed NeuronStitcher, a software suite for stitching non-overlapping neuron fragments reconstructed in serial 3D image sections. With its efficient algorithm and user-friendly interface, NeuronStitcher has been used successfully to reconstruct very large and complex human and mouse neurons.

Download Full-text

PgRC: Pseudogenome based Read Compressor

10.1101/710822 ◽

2019 ◽

Author(s):

Tomasz Kowalski ◽

Szymon Grabowski

Keyword(s):

High Throughput ◽

Compression Ratio ◽

High Throughput Sequencing ◽

Sequencing Data ◽

High Quality ◽

Link Type ◽

Sequencing Technologies ◽

Significant Interest ◽

The One ◽

Shortest Common Superstring

AbstractMotivationThe amount of sequencing data from High-Throughput Sequencing technologies grows at a pace exceeding the one predicted by Moore’s law. One of the basic requirements is to efficiently store and transmit such huge collections of data. Despite significant interest in designing FASTQ compressors, they are still imperfect in terms of compression ratio or decompression resources.ResultsWe present Pseudogenome-based Read Compressor (PgRC), an in-memory algorithm for compressing the DNA stream, based on the idea of building an approximation of the shortest common superstring over high-quality reads. Experiments show that PgRC wins in compression ratio over its main competitors, SPRING and Minicom, by up to 18 and 21 percent on average, respectively, while being at least comparably fast in decompression.AvailabilityPgRC can be downloaded from https://github.com/kowallus/[email protected]

Download Full-text

Dual indexed design of in-Drop single-cell RNA-seq libraries improves sequencing quality and throughput

10.1101/835488 ◽

2019 ◽

Author(s):

Austin N. Southard Smith ◽

Alan J. Simmons ◽

Bob Chen ◽

Angela L. Jones ◽

Marisol A. Ramirez Solano ◽

...

Keyword(s):

Single Cell ◽

High Throughput ◽

Cost Effective ◽

Quality Data ◽

Sequencing Data ◽

High Quality ◽

High Data ◽

Sequencing Technologies ◽

Effective Manner ◽

Sequencing Quality

AbstractThe increasing demand of single-cell RNA-sequencing (scRNA-seq) experiments, such as the number of experiments and cells queried per experiment, necessitates higher sequencing depth coupled to high data quality. New high-throughput sequencers, such as the Illumina NovaSeq 6000, enables this demand to be filled in a cost-effective manner. However, current scRNA-seq library designs present compatibility challenges with newer sequencing technologies, such as index-hopping, and their ability to generate high quality data has yet to be systematically evaluated. Here, we engineered a new dual-indexed library structure, called TruDrop, on top of the inDrop scRNA-seq platform to solve these compatibility challenges, such that TruDrop libraries and standard Illumina libraries can be sequenced alongside each other on the NovaSeq. We overcame the index-hopping issue, demonstrated significant improvements in base-calling accuracy, and provided an example of multiplexing twenty-four scRNA-seq libraries simultaneously. We showed favorable comparisons in transcriptional diversity of TruDrop compared with prior library structures. Our approach enables cost-effective, high throughput generation of sequencing data with high quality, which should enable more routine use of scRNA-seq technologies.

Download Full-text

High-Throughput Identification of Adapters in Single-Read Sequencing Data

Biomolecules ◽

10.3390/biom10060878 ◽

2020 ◽

Vol 10 (6) ◽

pp. 878

Author(s):

Asan M.S.H. Mohideen ◽

Steinar D. Johansen ◽

Igor Babiak

Keyword(s):

High Throughput ◽

Automated Detection ◽

Research Articles ◽

Substantial Portion ◽

Sequencing Data ◽

Essential Step ◽

Sequencing Technologies ◽

Data Files ◽

Public Repositories ◽

Tool Set

Sequencing datasets available in public repositories are already high in number, and their growth is exponential. Raw sequencing data files constitute a substantial portion of these data, and they need to be pre-processed for any downstream analyses. The removal of adapter sequences is the first essential step. Tools available for the automated detection of adapters in single-read sequencing protocol datasets have certain limitations. To explore these datasets, one needs to retrieve the information on adapter sequences from the methods sections of appropriate research articles. This can be time-consuming in metadata analyses. Moreover, not all research articles provide the information on adapter sequences. We have developed adapt_find, a tool that automates the process of adapter sequences identification in raw single-read sequencing datasets. We have verified adapt_find through testing a number of publicly available datasets. adapt_find secures a robust, reliable and high-throughput process across different sequencing technologies and various adapter designs. It does not need prior knowledge of the adapter sequences. We also produced associated tools: random_mer, for the detection of random N bases either on one or both termini of the reads, and fastqc_parser, for consolidating the results from FASTQC outputs. Together, this is a valuable tool set for metadata analyses on multiple sequencing datasets.

Download Full-text

Advances in high throughput DNA sequence data compression

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720016300021 ◽

2016 ◽

Vol 14 (03) ◽

pp. 1630002 ◽

Cited By ~ 13

Author(s):

Muhammad Sardaraz ◽

Muhammad Tahir ◽

Ataul Aziz Ikram

Keyword(s):

Data Compression ◽

Dna Sequence ◽

High Throughput ◽

High Throughput Sequencing ◽

Sequence Data ◽

Sequencing Data ◽

Research Directions ◽

Compression Algorithms ◽

Dna Sequence Data ◽

Sequencing Technologies

Advances in high throughput sequencing technologies and reduction in cost of sequencing have led to exponential growth in high throughput DNA sequence data. This growth has posed challenges such as storage, retrieval, and transmission of sequencing data. Data compression is used to cope with these challenges. Various methods have been developed to compress genomic and sequencing data. In this article, we present a comprehensive review of compression methods for genome and reads compression. Algorithms are categorized as referential or reference free. Experimental results and comparative analysis of various methods for data compression are presented. Finally, key challenges and research directions in DNA sequence data compression are highlighted.

Download Full-text