scholarly journals Comparing single cell datasets using DensityMorph

2021 ◽  
Author(s):  
Kristen Feher

The proliferation of single cell datasets has brought a wealth of information, but also great challenges in data analysis. Obtaining a cohesive overview of multiple single cell samples is difficult and requires consideration of cell population structure - which may or may not be well defined - along with subtle shifts in expression within cell populations across samples, and changes in population frequency across samples. Ideally, all this would be integrated with the experimental design, e.g. time point, genotype, treatment etc. Data visualisation is the most effective way of communicating analysis but often this takes the form of a plethora of t-SNE plots, colour coded according to marker and sample. In this manuscript, I introduce a novel exploratory data analysis and visualisation method that is centred around a novel quasi-distance (DensityMorph) between single cell samples. DensityMorph makes it possible to plot single cell samples in a manner analogous to performing principal component analysis on microarray samples. Biological interpretation is ensured by the introduction of Explanatory Components, which show how marker expression and coexpression drive the differences between samples. This method is a breakthrough in terms of displaying the most pertinent biological changes across single cell samples in a compact plot. Finally, it can be used either as a stand-alone method or to structure other types of analysis such as manual flow cytometry gating or cell population clustering.

Molecules ◽  
2021 ◽  
Vol 26 (5) ◽  
pp. 1393
Author(s):  
Ralitsa Robeva ◽  
Miroslava Nedyalkova ◽  
Georgi Kirilov ◽  
Atanaska Elenkova ◽  
Sabina Zacharieva ◽  
...  

Catecholamines are physiological regulators of carbohydrate and lipid metabolism during stress, but their chronic influence on metabolic changes in obese patients is still not clarified. The present study aimed to establish the associations between the catecholamine metabolites and metabolic syndrome (MS) components in obese women as well as to reveal the possible hidden subgroups of patients through hierarchical cluster analysis and principal component analysis. The 24-h urine excretion of metanephrine and normetanephrine was investigated in 150 obese women (54 non diabetic without MS, 70 non-diabetic with MS and 26 with type 2 diabetes). The interrelations between carbohydrate disturbances, metabolic syndrome components and stress response hormones were studied. Exploratory data analysis was used to determine different patterns of similarities among the patients. Normetanephrine concentrations were significantly increased in postmenopausal patients and in women with morbid obesity, type 2 diabetes, and hypertension but not with prediabetes. Both metanephrine and normetanephrine levels were positively associated with glucose concentrations one hour after glucose load irrespectively of the insulin levels. The exploratory data analysis showed different risk subgroups among the investigated obese women. The development of predictive tools that include not only traditional metabolic risk factors, but also markers of stress response systems might help for specific risk estimation in obesity patients.


2018 ◽  
Author(s):  
Yuanchao Zhang ◽  
Man S. Kim ◽  
Erin R. Reichenberger ◽  
Ben Stear ◽  
Deanne M. Taylor

AbstractIn single-cell RNA-seq (scRNA-seq) experiments, the number of individual cells has increased exponentially, and the sequencing depth of each cell has decreased significantly. As a result, analyzing scRNA-seq data requires extensive considerations of program efficiency and method selection. In order to reduce the complexity of scRNA-seq data analysis, we present scedar, a scalable Python package for scRNA-seq exploratory data analysis. The package provides a convenient and reliable interface for performing visualization, imputation of gene dropouts, detection of rare transcriptomic profiles, and clustering on large-scale scRNA-seq datasets. The analytical methods are efficient, and they also do not assume that the data follow certain statistical distributions. The package is extensible and modular, which would facilitate the further development of functionalities for future requirements with the open-source development community. The scedar package is distributed under the terms of the MIT license at https://pypi.org/project/scedar.


2020 ◽  
Vol 16 (4) ◽  
pp. e1007794
Author(s):  
Yuanchao Zhang ◽  
Man S. Kim ◽  
Erin R. Reichenberger ◽  
Ben Stear ◽  
Deanne M. Taylor

Entropy ◽  
2021 ◽  
Vol 23 (5) ◽  
pp. 594
Author(s):  
Fushing Hsieh ◽  
Elizabeth P. Chou ◽  
Ting-Li Chen

We develop Categorical Exploratory Data Analysis (CEDA) with mimicking to explore and exhibit the complexity of information content that is contained within any data matrix: categorical, discrete, or continuous. Such complexity is shown through visible and explainable serial multiscale structural dependency with heterogeneity. CEDA is developed upon all features’ categorical nature via histogram and it is guided by all features’ associative patterns (order-2 dependence) in a mutual conditional entropy matrix. Higher-order structural dependency of k(≥3) features is exhibited through block patterns within heatmaps that are constructed by permuting contingency-kD-lattices of counts. By growing k, the resultant heatmap series contains global and large scales of structural dependency that constitute the data matrix’s information content. When involving continuous features, the principal component analysis (PCA) extracts fine-scale information content from each block in the final heatmap. Our mimicking protocol coherently simulates this heatmap series by preserving global-to-fine scales structural dependency. Upon every step of mimicking process, each accepted simulated heatmap is subject to constraints with respect to all of the reliable observed categorical patterns. For reliability and robustness in sciences, CEDA with mimicking enhances data visualization by revealing deterministic and stochastic structures within each scale-specific structural dependency. For inferences in Machine Learning (ML) and Statistics, it clarifies, upon which scales, which covariate feature-groups have major-vs.-minor predictive powers on response features. For the social justice of Artificial Intelligence (AI) products, it checks whether a data matrix incompletely prescribes the targeted system.


Author(s):  
Matteo Falasconi ◽  
Matteo Pardo ◽  
Giorgio Sberveglieri

Visualization and initial examination of the Electronic Nose data is one of the most important parts of the data analysis cycle. This aspect of data investigation should ideally be performed iteratively together with data collection in order to optimize experimental protocols and final results. Once exploration has been completed, a complete supervised data analysis on a full dataset can be run, leading to prediction and thereby to e-nose performance evaluation. Exploratory Data Analysis (EDA) comprises three tasks: checking the quality of the data, calculating summary statistics, and producing plots of the data to get a feel of their structure. Graphical visualization of data allows checking for instrumental malfunctioning, discovering human errors, removing outliers, understanding the influence of experimental parameters, verifying the ability of the machine in discriminating the examined samples, and eventually formulating new hypotheses. A number of different techniques have been developed for data visualization, including multivariate statistical analysis, non-linear mapping, and clustering techniques. This chapter will present an overview of methods, tools, and software for EDA of artificial olfaction experiments. These will cover visualization and data mining tools for both raw and preprocessed data, such as: histograms, scatter plots, feature and box plots, Principal Component Analysis (PCA), Cluster Analysis (CA), and Cluster Validity (CV). Some case studies that demonstrate the application of the methods to specific chemical sensing problems will be illustrated.


2013 ◽  
Author(s):  
Stephen J. Tueller ◽  
Richard A. Van Dorn ◽  
Georgiy Bobashev ◽  
Barry Eggleston

Sign in / Sign up

Export Citation Format

Share Document