scholarly journals Scedar: a scalable Python package for single-cell RNA-seq exploratory data analysis

2018 ◽  
Author(s):  
Yuanchao Zhang ◽  
Man S. Kim ◽  
Erin R. Reichenberger ◽  
Ben Stear ◽  
Deanne M. Taylor

AbstractIn single-cell RNA-seq (scRNA-seq) experiments, the number of individual cells has increased exponentially, and the sequencing depth of each cell has decreased significantly. As a result, analyzing scRNA-seq data requires extensive considerations of program efficiency and method selection. In order to reduce the complexity of scRNA-seq data analysis, we present scedar, a scalable Python package for scRNA-seq exploratory data analysis. The package provides a convenient and reliable interface for performing visualization, imputation of gene dropouts, detection of rare transcriptomic profiles, and clustering on large-scale scRNA-seq datasets. The analytical methods are efficient, and they also do not assume that the data follow certain statistical distributions. The package is extensible and modular, which would facilitate the further development of functionalities for future requirements with the open-source development community. The scedar package is distributed under the terms of the MIT license at https://pypi.org/project/scedar.

2020 ◽  
Vol 16 (4) ◽  
pp. e1007794
Author(s):  
Yuanchao Zhang ◽  
Man S. Kim ◽  
Erin R. Reichenberger ◽  
Ben Stear ◽  
Deanne M. Taylor

2020 ◽  
Author(s):  
Matteo Calgaro ◽  
Chiara Romualdi ◽  
Levi Waldron ◽  
Davide Risso ◽  
Nicola Vitulo

AbstractBackgroundThe correct identification of differentially abundant microbial taxa between experimental conditions is a methodological and computational challenge. Recent work has produced methods to deal with the high sparsity and compositionality characteristic of microbiome data, but independent benchmarks comparing these to alternatives developed for RNA-seq data analysis are lacking.ResultsHere, we compare methods developed for single cell, bulk RNA-seq, and microbiome data, in terms of suitability of distributional assumptions, ability to control false discoveries, concordance, and power. We benchmark these methods using 100 manually curated datasets from 16S and whole metagenome shotgun sequencing.ConclusionsThe multivariate and compositional methods developed specifically for microbiome analysis did not outperform univariate methods developed for differential expression analysis of RNA-seq data. We recommend a careful exploratory data analysis prior to application of any inferential model and we present a framework to help scientists make an informed choice of analysis methods in a dataset-specific manner.


Cell Systems ◽  
2019 ◽  
Vol 9 (6) ◽  
pp. 609-613.e3 ◽  
Author(s):  
Alyssa Kramer Morrow ◽  
George Zhixuan He ◽  
Frank Austin Nothaft ◽  
Eric Tongching Tu ◽  
Justin Paschall ◽  
...  

2020 ◽  
Vol 17 (8) ◽  
pp. 793-798 ◽  
Author(s):  
Bo Li ◽  
Joshua Gould ◽  
Yiming Yang ◽  
Siranush Sarkizova ◽  
Marcin Tabaka ◽  
...  

2021 ◽  
Author(s):  
Kristen Feher

The proliferation of single cell datasets has brought a wealth of information, but also great challenges in data analysis. Obtaining a cohesive overview of multiple single cell samples is difficult and requires consideration of cell population structure - which may or may not be well defined - along with subtle shifts in expression within cell populations across samples, and changes in population frequency across samples. Ideally, all this would be integrated with the experimental design, e.g. time point, genotype, treatment etc. Data visualisation is the most effective way of communicating analysis but often this takes the form of a plethora of t-SNE plots, colour coded according to marker and sample. In this manuscript, I introduce a novel exploratory data analysis and visualisation method that is centred around a novel quasi-distance (DensityMorph) between single cell samples. DensityMorph makes it possible to plot single cell samples in a manner analogous to performing principal component analysis on microarray samples. Biological interpretation is ensured by the introduction of Explanatory Components, which show how marker expression and coexpression drive the differences between samples. This method is a breakthrough in terms of displaying the most pertinent biological changes across single cell samples in a compact plot. Finally, it can be used either as a stand-alone method or to structure other types of analysis such as manual flow cytometry gating or cell population clustering.


2018 ◽  
Vol 46 (1) ◽  
pp. 3-17 ◽  
Author(s):  
Macauley S. Breault ◽  
Pierre Sacré ◽  
Jorge González-Martínez ◽  
John T. Gale ◽  
Sridevi V. Sarma

2020 ◽  
Vol 2 (3) ◽  
Author(s):  
Stuart Lee ◽  
Albert Y Zhang ◽  
Shian Su ◽  
Ashley P Ng ◽  
Aliaksei Z Holik ◽  
...  

Abstract RNA-seq datasets can contain millions of intron reads per library that are typically removed from downstream analysis. Only reads overlapping annotated exons are considered to be informative since mature mRNA is assumed to be the major component sequenced, especially for poly(A) RNA libraries. In this study, we show that intron reads are informative, and through exploratory data analysis of read coverage that intron signal is representative of both pre-mRNAs and intron retention. We demonstrate how intron reads can be utilized in differential expression analysis using our index method where a unique set of differentially expressed genes can be detected using intron counts. In exploring read coverage, we also developed the superintronic software that quickly and robustly calculates user-defined summary statistics for exonic and intronic regions. Across multiple datasets, superintronic enabled us to identify several genes with distinctly retained introns that had similar coverage levels to that of neighbouring exons. The work and ideas presented in this paper is the first of its kind to consider multiple biological sources for intron reads through exploratory data analysis, minimizing bias in discovery and interpretation of results. Our findings open up possibilities for further methods development for intron reads and RNA-seq data in general.


2021 ◽  
Vol 3 (1) ◽  
Author(s):  
Ibrahim Muzaferija ◽  
Zerina Mašetić ◽  

While leveraging cloud computing for large-scale distributed applications allows seamless scaling, many companies struggle following up with the amount of data generated in terms of efficient processing and anomaly detection, which is a necessary part of the management of modern applications. As the record of user behavior, weblogs surely become the research item related to anomaly detection. Many anomaly detection methods based on automated log analysis have been proposed. However, not in the context of big data applications where anomalous behavior needs to be detected in understanding phases prior to modeling a system for such use. Big Data Analytics often ignores anomalous point due to high volume of data. To address this problem, we propose a complemented methodology for Big Data Analytics – the Exploratory Data Analysis, which assists in gaining insight into data relationships without the classical hypothesis modeling. In that way, we can gain better understanding of the patterns and spot anomalies. Results show that Exploratory Data Analysis facilitates anomaly detection and the CRISP-DM Business Understanding phase, making it one of the key steps in the Data Understanding phase.


Sign in / Sign up

Export Citation Format

Share Document