Exploring the single-cell RNA-seq analysis landscape with the scRNA-tools database

AbstractAs single-cell RNA-sequencing (scRNA-seq) datasets have become more widespread the number of tools designed to analyse these data has dramatically increased. Navigating the vast sea of tools now available is becoming increasingly challenging for researchers. In order to better facilitate selection of appropriate analysis tools we have created the scRNA-tools database (www.scRNA-tools.org) to catalogue and curate analysis tools as they become available. Our database collects a range of information on each scRNA-seq analysis tool and categorises them according to the analysis tasks they perform. Exploration of this database gives insights into the areas of rapid development of analysis methods for scRNA-seq data. We see that many tools perform tasks specific to scRNA-seq analysis, particularly clustering and ordering of cells. We also find that the scRNA-seq community embraces an open-source approach, with most tools available under open-source licenses and preprints being extensively used as a means to describe methods. The scRNA-tools database provides a valuable resource for researchers embarking on scRNA-seq analysis and records of the growth of the field over time.Author summaryIn recent years single-cell RNA-sequeing technologies have emerged that allow scientists to measure the activity of genes in thousands of individual cells simultaneously. This means we can start to look at what each cell in a sample is doing instead of considering an average across all cells in a sample, as was the case with older technologies. However, while access to this kind of data presents a wealth of opportunities it comes with a new set of challenges. Researchers across the world have developed new methods and software tools to make the most of these datasets but the field is moving at such a rapid pace it is difficult to keep up with what is currently available. To make this easier we have developed the scRNA-tools database and website (www.scRNA-tools.org). Our database catalogues analysis tools, recording the tasks they can be used for, where they can be downloaded from and the publications that describe how they work. By looking at this database we can see that developers have focued on methods specific to single-cell data and that they embrace an open-source approach with permissive licensing, sharing of code and preprint publications.

Download Full-text

scDIOR: single cell RNA-seq data IO software

BMC Bioinformatics ◽

10.1186/s12859-021-04528-3 ◽

2022 ◽

Vol 23 (1) ◽

Author(s):

Huijian Feng ◽

Lihui Lin ◽

Jiekai Chen

Keyword(s):

Single Cell ◽

Programming Languages ◽

Large Scale ◽

Developmental Trajectories ◽

Rapid Development ◽

Data Transformation ◽

Rna Seq ◽

Data Types ◽

User Friendly ◽

Cell Data

Abstract Background Single-cell RNA sequencing is becoming a powerful tool to identify cell states, reconstruct developmental trajectories, and deconvolute spatial expression. The rapid development of computational methods promotes the insight of heterogeneous single-cell data. An increasing number of tools have been provided for biological analysts, of which two programming languages- R and Python are widely used among researchers. R and Python are complementary, as many methods are implemented specifically in R or Python. However, the different platforms immediately caused the data sharing and transformation problem, especially for Scanpy, Seurat, and SingleCellExperiemnt. Currently, there is no efficient and user-friendly software to perform data transformation of single-cell omics between platforms, which makes users spend unbearable time on data Input and Output (IO), significantly reducing the efficiency of data analysis. Results We developed scDIOR for single-cell data transformation between platforms of R and Python based on Hierarchical Data Format Version 5 (HDF5). We have created a data IO ecosystem between three R packages (Seurat, SingleCellExperiment, Monocle) and a Python package (Scanpy). Importantly, scDIOR accommodates a variety of data types across programming languages and platforms in an ultrafast way, including single-cell RNA-seq and spatial resolved transcriptomics data, using only a few codes in IDE or command line interface. For large scale datasets, users can partially load the needed information, e.g., cell annotation without the gene expression matrices. scDIOR connects the analytical tasks of different platforms, which makes it easy to compare the performance of algorithms between them. Conclusions scDIOR contains two modules, dior in R and diopy in Python. scDIOR is a versatile and user-friendly tool that implements single-cell data transformation between R and Python rapidly and stably. The software is freely accessible at https://github.com/JiekaiLab/scDIOR.

Download Full-text

STACAS: Sub-Type Anchor Correction for Alignment in Seurat to integrate single-cell RNA-seq data

10.1101/2020.06.15.152306 ◽

2020 ◽

Cited By ~ 1

Author(s):

Massimo Andreatta ◽

Santiago J. Carmona

Keyword(s):

Single Cell ◽

Distance Measure ◽

Cell Types ◽

R Package ◽

Rna Seq ◽

Batch Effects ◽

Link Type ◽

Transcriptomics Data ◽

Public Repositories ◽

Cell Data

AbstractComputational tools for the integration of single-cell transcriptomics data are designed to correct batch effects between technical replicates or different technologies applied to the same population of cells. However, they have inherent limitations when applied to heterogeneous sets of data with moderate overlap in cell states or sub-types. STACAS is a package for the identification of integration anchors in the Seurat environment, optimized for the integration of datasets that share only a subset of cell types. We demonstrate that by i) correcting batch effects while preserving relevant biological variability across datasets, ii) filtering aberrant integration anchors with a quantitative distance measure, and iii) constructing optimal guide trees for integration, STACAS can accurately align scRNA-seq datasets composed of only partially overlapping cell populations. We anticipate that the algorithm will be a useful tool for the construction of comprehensive single-cell atlases by integration of the growing amount of single-cell data becoming available in public repositories.Code availabilityR package:https://github.com/carmonalab/STACASDocker image:https://hub.docker.com/repository/docker/mandrea1/stacas_demo

Download Full-text

snakePipes enable flexible, scalable and integrative epigenomic analysis

10.1101/407312 ◽

2018 ◽

Cited By ~ 2

Author(s):

Vivek Bhardwaj ◽

Steffen Heyne ◽

Katarzyna Sikora ◽

Leily Rabbani ◽

Michael Rauer ◽

...

Keyword(s):

Single Cell ◽

Open Source ◽

Large Scale ◽

Integrative Analysis ◽

Exploratory Research ◽

Rna Seq ◽

Link Type ◽

Open Source License ◽

Fast Processing ◽

Downstream Analysis

AbstractThe scale and diversity of epigenomics data has been rapidly increasing and ever more studies now present analyses of data from multiple epigenomic techniques. Performing such integrative analysis is time-consuming, especially for exploratory research, since there are currently no pipelines available that allow fast processing of datasets from multiple epigenomic assays while also allow for flexibility in running or upgrading the workflows. Here we present a solution to this problem: snakePipes, which can process and perform downstream analysis of data from all common epigenomic techniques (ChIP-seq, RNA-seq, Bisulfite-seq, ATAC-seq, Hi-C and single-cell RNA-seq) in a single package. We demonstrate how snakePipes can simplify integrative analysis by reproducing and extending the results from a recently published large-scale epigenomics study with a few simple commands. snakePipes are available under an open-source license at https://github.com/maxplanck-ie/snakepipes.

Download Full-text

Open Plot Project: an open-source toolkit for 3-D structural data analysis

Solid Earth ◽

10.5194/se-2-53-2011 ◽

2011 ◽

Vol 2 (1) ◽

pp. 53-63 ◽

Cited By ~ 18

Author(s):

S. Tavani ◽

P. Arbues ◽

M. Snidero ◽

N. Carrera ◽

J. A. Muñoz

Keyword(s):

Spatial Distribution ◽

Data Analysis ◽

Open Source ◽

Open Source Software ◽

Source Code ◽

Structural Data ◽

Geological Modelling ◽

Analysis Tools ◽

Transect Analysis ◽

Selection Of

Abstract. In this work we present the Open Plot Project, an open-source software for structural data analysis, including a 3-D environment. The software includes many classical functionalities of structural data analysis tools, like stereoplot, contouring, tensorial regression, scatterplots, histograms and transect analysis. In addition, efficient filtering tools are present allowing the selection of data according to their attributes, including spatial distribution and orientation. This first alpha release represents a stand-alone toolkit for structural data analysis. The presence of a 3-D environment with digitalising tools allows the integration of structural data with information extracted from georeferenced images to produce structurally validated dip domains. This, coupled with many import/export facilities, allows easy incorporation of structural analyses in workflows for 3-D geological modelling. Accordingly, Open Plot Project also candidates as a structural add-on for 3-D geological modelling software. The software (for both Windows and Linux O.S.), the User Manual, a set of example movies (complementary to the User Manual), and the source code are provided as Supplement. We intend the publication of the source code to set the foundation for free, public software that, hopefully, the structural geologists' community will use, modify, and implement. The creation of additional public controls/tools is strongly encouraged.

Download Full-text

EpiScanpy: integrated single-cell epigenomic analysis

10.1101/648097 ◽

2019 ◽

Cited By ~ 4

Author(s):

Anna Danese ◽

Maria L. Richter ◽

David S. Fischer ◽

Fabian J. Theis ◽

Maria Colomé-Tatché

Keyword(s):

Dna Methylation ◽

Single Cell ◽

Large Scale ◽

Feature Space ◽

Rna Seq ◽

Computational Framework ◽

Learning Techniques ◽

Multiple Feature ◽

The Many ◽

Cell Data

ABSTRACTEpigenetic single-cell measurements reveal a layer of regulatory information not accessible to single-cell transcriptomics, however single-cell-omics analysis tools mainly focus on gene expression data. To address this issue, we present epiScanpy, a computational framework for the analysis of single-cell DNA methylation and single-cell ATAC-seq data. EpiScanpy makes the many existing RNA-seq workflows from scanpy available to large-scale single-cell data from other -omics modalities. We introduce and compare multiple feature space constructions for epigenetic data and show the feasibility of common clustering, dimension reduction and trajectory learning techniques. We benchmark epiScanpy by interrogating different single-cell brain mouse atlases of DNA methylation, ATAC-seq and transcriptomics. We find that differentially methylated and differentially open markers between cell clusters enrich transcriptome-based cell type labels by orthogonal epigenetic information.

Download Full-text

Flexible comparison of batch correction methods for single-cell RNA-seq using BatchBench

10.1101/2020.05.22.111211 ◽

2020 ◽

Author(s):

Ruben Chazarra-Gil ◽

Stijn van Dongen ◽

Vladimir Yu Kiselev ◽

Martin Hemberg

Keyword(s):

Single Cell ◽

Computational Methods ◽

Rna Seq ◽

Batch Effects ◽

Systematic Comparison ◽

Batch Correction ◽

Link Type ◽

Biological Signals ◽

The Cost

AbstractAs the cost of single-cell RNA-seq experiments has decreased, an increasing number of datasets are now available. Combining newly generated and publicly accessible datasets is challenging due to non-biological signals, commonly known as batch effects. Although there are several computational methods available that can remove batch effects, evaluating which method performs best is not straightforward. Here we present BatchBench (https://github.com/cellgeni/batchbench), a modular and flexible pipeline for comparing batch correction methods for single-cell RNA-seq data. We apply BatchBench to eight methods, highlighting their methodological differences and assess their performance and computational requirements through a compendium of well-studied datasets. This systematic comparison guides users in the choice of batch correction tool, and the pipeline makes it easy to evaluate other datasets.

Download Full-text

The SZS is an efficient statistical method to identify regulated splicing events in droplet-based RNA sequencing

10.1101/2020.11.10.377572 ◽

2020 ◽

Author(s):

Julia Eve Olivieri ◽

Roozbeh Dehghannasiri ◽

Julia Salzman

Keyword(s):

Single Cell ◽

Statistical Method ◽

Rna Seq ◽

Computationally Efficient ◽

Small Set ◽

Biological Discovery ◽

Cell Type Specific ◽

Human Spermatogenesis ◽

Splicing Patterns ◽

Cell Data

AbstractTo date, the field of single-cell genomics has viewed robust splicing analysis as completely out of reach in droplet-based platforms, preventing biological discovery of single-cell regulated splicing. Here, we introduce a novel, robust, and computationally efficient statistical method, the Splicing Z Score (SZS), to detect differential alternative splicing in single cell RNA-Seq technologies including 10x Chromium. We applied the SZS to primary human cells to discover new regulated, cell type-specific splicing patterns. Illustrating the power of the SZS method, splicing of a small set of genes has high predictive power for tissue compartment in the human lung, and the SZS identifies un-annotated, conserved splicing regulation in the human spermatogenesis. The SZS is a method that can rapidly identify regulated splicing events from single cell data and prioritize genes predicted to have functionally significant splicing programs.

Download Full-text

Transcriptional network analysis of transcriptomic diversity in resident tissue macrophages and dendritic cells in the mouse mononuclear phagocyte system

10.1101/2020.03.24.002816 ◽

2020 ◽

Cited By ~ 1

Author(s):

Kim M. Summers ◽

Stephen J. Bush ◽

David A. Hume

Keyword(s):

Dendritic Cells ◽

Network Analysis ◽

Single Cell ◽

Cell Types ◽

Mononuclear Phagocyte ◽

The Body ◽

Mononuclear Phagocyte System ◽

Primary Data ◽

Analysis Tool ◽

Rna Seq

AbstractThe mononuclear phagocyte system (MPS) is a family of cells including progenitors, circulating blood monocytes, resident tissue macrophages and dendritic cells (DC) present in every tissue in the body. To test the relationships between markers and transcriptomic diversity in the MPS, we collected from NCBI-GEO >500 quality RNA-seq datasets generated from mouse MPS cells isolated from multiple tissues. The primary data were randomly down-sized to a depth of 10 million reads and requantified. The resulting dataset was clustered using the network analysis tool Graphia. A sample-to-sample matrix revealed that MPS populations could be separated based upon tissue of origin. Cells identified as classical DC subsets, cDC1 and cDC2, and lacking Fcgr1 (CD64), were centrally-located within the MPS cluster and no more distinct than other MPS cell types. A gene-to-gene correlation matrix identified large generic co-expression clusters associated with MPS maturation and innate immune function. Smaller co-expression gene clusters including the transcription factors that drive them showed higher expression within defined isolated cells, including macrophages and DC from specific tissues. They include a cluster containing Lyve1 that implies a function in endothelial cell homeostasis, a cluster of transcripts enriched in intestinal macrophages and a generic cDC cluster associated with Ccr7. However, transcripts encoding many other putative MPS subset markers including Adgre1, Itgax, Itgam, Clec9a, Cd163, Mertk, Retnla and H2-a/e (class II MHC) clustered idiosyncratically and were not correlated with underlying functions. The data provide no support for the concept of markers of M2 polarization or the specific adaptation of DC to present antigen to T cells. Co-expression of immediate early genes (e.g. Egr1, Fos, Dusp1) and inflammatory cytokines and chemokines (Tnf, Il1b, Ccl3/4) indicated that all tissue disaggregation protocols activate MPS cells. Tissue-specific expression clusters indicated that all cell isolation procedures also co-purify other unrelated cell types that may interact with MPS cells in vivo. Comparative analysis of public RNA-seq and single cell RNA-seq data from the same lung cell populations showed that the extensive heterogeneity implied by the global cluster analysis may be even greater at a single cell level with few markers strongly correlated with each other. This analysis highlights the power of large datasets to identify the diversity of MPS cellular phenotypes, and the limited predictive value of surface markers to define lineages, functions or subpopulations.

Download Full-text

Single Cell Viewer (SCV): An interactive visualization data portal for single cell RNA sequence data

10.1101/664789 ◽

2019 ◽

Cited By ~ 2

Author(s):

Shuoguo Wang ◽

Constance Brett ◽

Mohan Bolisetty ◽

Ryan Golhar ◽

Isaac Neuhaus ◽

...

Keyword(s):

Single Cell ◽

Sequence Data ◽

Single Cells ◽

Link Type ◽

Technological Advances ◽

R Shiny ◽

Data Volume ◽

Exploratory Data ◽

Cell Data ◽

Shiny Application

AbstractMotivationThanks to technological advances made in the last few years, we are now able to study transcriptomes from thousands of single cells. These have been applied widely to study various aspects of Biology. Nevertheless, comprehending and inferring meaningful biological insights from these large datasets is still a challenge. Although tools are being developed to deal with the data complexity and data volume, we do not have yet an effective visualizations and comparative analysis tools to realize the full value of these datasets.ResultsIn order to address this gap, we implemented a single cell data visualization portal called Single Cell Viewer (SCV). SCV is an R shiny application that offers users rich visualization and exploratory data analysis options for single cell datasets.AvailabilitySource code for the application is available online at GitHub (http://www.github.com/neuhausi/single-cell-viewer) and there is a hosted exploration application using the same example dataset as this publication at http://periscopeapps.org/[email protected]; [email protected]

Download Full-text

SPRING: a kinetic interface for visualizing high dimensional single-cell expression data

10.1101/090332 ◽

2016 ◽

Cited By ~ 10

Author(s):

Caleb Weinreb ◽

Samuel Wolock ◽

Allon Klein

Keyword(s):

Gene Expression ◽

Single Cell ◽

Nearest Neighbor ◽

High Dimensional ◽

K Nearest Neighbor ◽

Link Type ◽

Cell Gene Expression ◽

Graph Layouts ◽

Cell Expression ◽

Cell Data

MotivationSingle-cell gene expression profiling technologies can map the cell states in a tissue or organism. As these technologies become more common, there is a need for computational tools to explore the data they produce. In particular, existing data visualization approaches are imperfect for studying continuous gene expression topologies.ResultsForce-directed layouts of k-nearest-neighbor graphs can visualize continuous gene expression topologies in a manner that preserves high-dimensional relationships and allows manually exploration of different stable two-dimensional representations of the same data. We implemented an interactive web-tool to visualize single-cell data using force-directed graph layouts, called SPRING. SPRING reveals more detailed biological relationships than existing approaches when applied to branching gene expression trajectories from hematopoietic progenitor cells. Visualizations from SPRING are also more reproducible than those of stochastic visualization methods such as tSNE, a state-of-the-art tool.Availabilityhttps://kleintools.hms.harvard.edu/tools/spring.html,https://github.com/AllonKleinLab/SPRING/[email protected], [email protected]

Download Full-text