Scedar: A scalable Python package for single-cell RNA-seq exploratory data analysis

AbstractIn single-cell RNA-seq (scRNA-seq) experiments, the number of individual cells has increased exponentially, and the sequencing depth of each cell has decreased significantly. As a result, analyzing scRNA-seq data requires extensive considerations of program efficiency and method selection. In order to reduce the complexity of scRNA-seq data analysis, we present scedar, a scalable Python package for scRNA-seq exploratory data analysis. The package provides a convenient and reliable interface for performing visualization, imputation of gene dropouts, detection of rare transcriptomic profiles, and clustering on large-scale scRNA-seq datasets. The analytical methods are efficient, and they also do not assume that the data follow certain statistical distributions. The package is extensible and modular, which would facilitate the further development of functionalities for future requirements with the open-source development community. The scedar package is distributed under the terms of the MIT license at https://pypi.org/project/scedar.

Download Full-text

Abstract 2265: viSNE in Cytobank enables rapid exploratory data analysis for RNA-seq biomarker discovery

10.1158/1538-7445.am2018-2265 ◽

2018 ◽

Author(s):

Ashu Sethi ◽

Hannah Polikowsky ◽

Katherine A. Drake

Keyword(s):

Data Analysis ◽

Exploratory Data Analysis ◽

Biomarker Discovery ◽

Rna Seq ◽

Exploratory Data

Download Full-text

Assessment of statistical methods from single cell, bulk RNA-seq and metagenomics applied to microbiome data

10.1101/2020.01.15.907964 ◽

2020 ◽

Author(s):

Matteo Calgaro ◽

Chiara Romualdi ◽

Levi Waldron ◽

Davide Risso ◽

Nicola Vitulo

Keyword(s):

Data Analysis ◽

Single Cell ◽

Differential Expression Analysis ◽

Informed Choice ◽

Correct Identification ◽

Rna Seq ◽

Experimental Conditions ◽

Exploratory Data ◽

Microbiome Data ◽

False Discoveries

AbstractBackgroundThe correct identification of differentially abundant microbial taxa between experimental conditions is a methodological and computational challenge. Recent work has produced methods to deal with the high sparsity and compositionality characteristic of microbiome data, but independent benchmarks comparing these to alternatives developed for RNA-seq data analysis are lacking.ResultsHere, we compare methods developed for single cell, bulk RNA-seq, and microbiome data, in terms of suitability of distributional assumptions, ability to control false discoveries, concordance, and power. We benchmark these methods using 100 manually curated datasets from 16S and whole metagenome shotgun sequencing.ConclusionsThe multivariate and compositional methods developed specifically for microbiome analysis did not outperform univariate methods developed for differential expression analysis of RNA-seq data. We recommend a careful exploratory data analysis prior to application of any inferential model and we present a framework to help scientists make an informed choice of analysis methods in a dataset-specific manner.

Download Full-text

Comparing single cell datasets using DensityMorph

10.1101/2021.10.28.466371 ◽

2021 ◽

Author(s):

Kristen Feher

Keyword(s):

Data Analysis ◽

Single Cell ◽

Cell Population ◽

Exploratory Data Analysis ◽

Principal Component ◽

Data Visualisation ◽

Biological Interpretation ◽

Exploratory Data ◽

Visualisation Method ◽

Time Point

The proliferation of single cell datasets has brought a wealth of information, but also great challenges in data analysis. Obtaining a cohesive overview of multiple single cell samples is difficult and requires consideration of cell population structure - which may or may not be well defined - along with subtle shifts in expression within cell populations across samples, and changes in population frequency across samples. Ideally, all this would be integrated with the experimental design, e.g. time point, genotype, treatment etc. Data visualisation is the most effective way of communicating analysis but often this takes the form of a plethora of t-SNE plots, colour coded according to marker and sample. In this manuscript, I introduce a novel exploratory data analysis and visualisation method that is centred around a novel quasi-distance (DensityMorph) between single cell samples. DensityMorph makes it possible to plot single cell samples in a manner analogous to performing principal component analysis on microarray samples. Biological interpretation is ensured by the introduction of Explanatory Components, which show how marker expression and coexpression drive the differences between samples. This method is a breakthrough in terms of displaying the most pertinent biological changes across single cell samples in a compact plot. Finally, it can be used either as a stand-alone method or to structure other types of analysis such as manual flow cytometry gating or cell population clustering.

Download Full-text

Covering all your bases: incorporating intron signal from RNA-seq data

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqaa073 ◽

2020 ◽

Vol 2 (3) ◽

Author(s):

Stuart Lee ◽

Albert Y Zhang ◽

Shian Su ◽

Ashley P Ng ◽

Aliaksei Z Holik ◽

...

Keyword(s):

Data Analysis ◽

Exploratory Data Analysis ◽

Intron Retention ◽

Differential Expression Analysis ◽

Read Coverage ◽

Rna Seq ◽

Index Method ◽

Methods Development ◽

Exploratory Data ◽

Downstream Analysis

Abstract RNA-seq datasets can contain millions of intron reads per library that are typically removed from downstream analysis. Only reads overlapping annotated exons are considered to be informative since mature mRNA is assumed to be the major component sequenced, especially for poly(A) RNA libraries. In this study, we show that intron reads are informative, and through exploratory data analysis of read coverage that intron signal is representative of both pre-mRNAs and intron retention. We demonstrate how intron reads can be utilized in differential expression analysis using our index method where a unique set of differentially expressed genes can be detected using intron counts. In exploring read coverage, we also developed the superintronic software that quickly and robustly calculates user-defined summary statistics for exonic and intronic regions. Across multiple datasets, superintronic enabled us to identify several genes with distinctly retained introns that had similar coverage levels to that of neighbouring exons. The work and ideas presented in this paper is the first of its kind to consider multiple biological sources for intron reads through exploratory data analysis, minimizing bias in discovery and interpretation of results. Our findings open up possibilities for further methods development for intron reads and RNA-seq data in general.

Download Full-text

John Tukey, Exploratory Data Analysis, and Its Possibilities for Participatory Action Research

PsycEXTRA Dataset ◽

10.1037/e567862014-001 ◽

2014 ◽

Author(s):

Brett Stoudt

Keyword(s):

Data Analysis ◽

Action Research ◽

Participatory Action Research ◽

Exploratory Data Analysis ◽

Participatory Action ◽

Exploratory Data

Download Full-text

Graphical Exploratory Data Analysis for Categorical Longitudinal and Time Series Data

PsycEXTRA Dataset ◽

10.1037/e634372013-001 ◽

2013 ◽

Author(s):

Stephen J. Tueller ◽

Richard A. Van Dorn ◽

Georgiy Bobashev ◽

Barry Eggleston

Keyword(s):

Time Series ◽

Data Analysis ◽

Exploratory Data Analysis ◽

Time Series Data ◽

Series Data ◽

Exploratory Data

Download Full-text

Covid-19 Cases in India: A Visual Exploratory Data Analysis Model (Preprint)

10.2196/preprints.24226 ◽

2020 ◽

Cited By ~ 2

Author(s):

Jayesh S

Keyword(s):

Data Analysis ◽

Exploratory Data Analysis ◽

Case Fatality ◽

Public Health Emergency ◽

Virus Spread ◽

Analysis Model ◽

Case Fatality Ratio ◽

First Case ◽

Exploratory Data ◽

The Government

UNSTRUCTURED Covid-19 outbreak was first reported in Wuhan, China. The deadly virus spread not just the disease, but fear around the globe. On January 2020, WHO declared COVID-19 as a Public Health Emergency of International Concern (PHEIC). First case of Covid-19 in India was reported on January 30, 2020. By the time, India was prepared in fighting against the virus. India has taken various measures to tackle the situation. In this paper, an exploratory data analysis of Covid-19 cases in India is carried out. Data namely number of cases, testing done, Case Fatality ratio, Number of deaths, change in visits stringency index and measures taken by the government is used for modelling and visual exploratory data analysis.

Download Full-text

Follow The Clicks: Learning and Anticipating Mouse Interactions During Exploratory Data Analysis

Computer Graphics Forum ◽

10.1111/cgf.13670 ◽

2019 ◽

Vol 38 (3) ◽

pp. 41-52 ◽

Cited By ~ 4

Author(s):

Alvitta Ottley ◽

Roman Garnett ◽

Ran Wan

Keyword(s):

Data Analysis ◽

Exploratory Data Analysis ◽

Exploratory Data

Download Full-text

Multivariate Statistical Approach for Nephrines in Women with Obesity

Molecules ◽

10.3390/molecules26051393 ◽

2021 ◽

Vol 26 (5) ◽

pp. 1393

Author(s):

Ralitsa Robeva ◽

Miroslava Nedyalkova ◽

Georgi Kirilov ◽

Atanaska Elenkova ◽

Sabina Zacharieva ◽

...

Keyword(s):

Type 2 Diabetes ◽

Metabolic Syndrome ◽

Data Analysis ◽

Stress Response ◽

Exploratory Data Analysis ◽

Principal Component ◽

Hierarchical Cluster ◽

Obese Women ◽

Exploratory Data

Catecholamines are physiological regulators of carbohydrate and lipid metabolism during stress, but their chronic influence on metabolic changes in obese patients is still not clarified. The present study aimed to establish the associations between the catecholamine metabolites and metabolic syndrome (MS) components in obese women as well as to reveal the possible hidden subgroups of patients through hierarchical cluster analysis and principal component analysis. The 24-h urine excretion of metanephrine and normetanephrine was investigated in 150 obese women (54 non diabetic without MS, 70 non-diabetic with MS and 26 with type 2 diabetes). The interrelations between carbohydrate disturbances, metabolic syndrome components and stress response hormones were studied. Exploratory data analysis was used to determine different patterns of similarities among the patients. Normetanephrine concentrations were significantly increased in postmenopausal patients and in women with morbid obesity, type 2 diabetes, and hypertension but not with prediabetes. Both metanephrine and normetanephrine levels were positively associated with glucose concentrations one hour after glucose load irrespectively of the insulin levels. The exploratory data analysis showed different risk subgroups among the investigated obese women. The development of predictive tools that include not only traditional metabolic risk factors, but also markers of stress response systems might help for specific risk estimation in obesity patients.

Download Full-text