scholarly journals CYBERTRACK2.0: zero-inflated model-based cell clustering and population tracking method for longitudinal mass cytometry data

Author(s):  
Kodai Minoura ◽  
Ko Abe ◽  
Yuka Maeda ◽  
Hiroyoshi Nishikawa ◽  
Teppei Shimamura

Abstract Summary Recent advancements in high-dimensional single-cell technologies, such as mass cytometry, enable longitudinal experiments to track dynamics of cell populations and identify change points where the proportions vary significantly. However, current research is limited by the lack of tools specialized for analyzing longitudinal mass cytometry data. In order to infer cell population dynamics from such data, we developed a statistical framework named CYBERTRACK2.0. The framework’s analytic performance was validated against synthetic and real data, showing that its results are consistent with previous research. Availability and implementation CYBERTRACK2.0 is available at https://github.com/kodaim1115/CYBERTRACK2. Supplementary information Supplementary data are available at Bioinformatics online.

2021 ◽  
Author(s):  
Lajos Horváth ◽  
Zhenya Liu ◽  
Gregory Rice ◽  
Yuqian Zhao

Abstract The problem of detecting change points in the mean of high dimensional panel data with potentially strong cross–sectional dependence is considered. Under the assumption that the cross–sectional dependence is captured by an unknown number of common factors, a new CUSUM type statistic is proposed. We derive its asymptotic properties under three scenarios depending on to what extent the common factors are asymptotically dominant. With panel data consisting of N cross sectional time series of length T, the asymptotic results hold under the mild assumption that min {N, T} → ∞, with an otherwise arbitrary relationship between N and T, allowing the results to apply to most panel data examples. Bootstrap procedures are proposed to approximate the sampling distribution of the test statistics. A Monte Carlo simulation study showed that our test outperforms several other existing tests in finite samples in a number of cases, particularly when N is much larger than T. The practical application of the proposed results are demonstrated with real data applications to detecting and estimating change points in the high dimensional FRED-MD macroeconomic data set.


2019 ◽  
Vol 35 (20) ◽  
pp. 4063-4071 ◽  
Author(s):  
Tamim Abdelaal ◽  
Thomas Höllt ◽  
Vincent van Unen ◽  
Boudewijn P F Lelieveldt ◽  
Frits Koning ◽  
...  

Abstract Motivation High-dimensional mass cytometry (CyTOF) allows the simultaneous measurement of multiple cellular markers at single-cell level, providing a comprehensive view of cell compositions. However, the power of CyTOF to explore the full heterogeneity of a biological sample at the single-cell level is currently limited by the number of markers measured simultaneously on a single panel. Results To extend the number of markers per cell, we propose an in silico method to integrate CyTOF datasets measured using multiple panels that share a set of markers. Additionally, we present an approach to select the most informative markers from an existing CyTOF dataset to be used as a shared marker set between panels. We demonstrate the feasibility of our methods by evaluating the quality of clustering and neighborhood preservation of the integrated dataset, on two public CyTOF datasets. We illustrate that by computationally extending the number of markers we can further untangle the heterogeneity of mass cytometry data, including rare cell-population detection. Availability and implementation Implementation is available on GitHub (https://github.com/tabdelaal/CyTOFmerge). Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 36 (1) ◽  
pp. 317-329 ◽  
Author(s):  
Vincent Briane ◽  
Myriam Vimond ◽  
Cesar Augusto Valades-Cruz ◽  
Antoine Salomon ◽  
Christian Wunder ◽  
...  

Abstract Motivation Recent advances in molecular biology and fluorescence microscopy imaging have made possible the inference of the dynamics of single molecules in living cells. Changes of dynamics can occur along a trajectory. Then, an issue is to estimate the temporal change-points that is the times at which a change of dynamics occurs. The number of points in the trajectory required to detect such changes will depend on both the magnitude and type of the motion changes. Here, the number of points per trajectory is of the order of 102, even if in practice dramatic motion changes can be detected with less points. Results We propose a non-parametric procedure based on test statistics computed on local windows along the trajectory to detect the change-points. This algorithm controls the number of false change-point detections in the case where the trajectory is fully Brownian. We also develop a strategy for aggregating the detections obtained with different window sizes so that the window size is no longer a parameter to optimize. A Monte Carlo study is proposed to demonstrate the performances of the method and also to compare the procedure to two competitive algorithms. At the end, we illustrate the efficacy of the method on real data in 2D and 3D, depicting the motion of mRNA complexes—called mRNA-binding proteins—in neuronal dendrites, Galectin-3 endocytosis and trafficking within the cell. Availability and implementation A user-friendly Matlab package containing examples and the code of the simulations used in the paper is available at http://serpico.rennes.inria.fr/doku.php? id=software:cpanalysis:index. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Author(s):  
Pratyaydipta Rudra ◽  
Ryan Baxter ◽  
Elena WY Hsieh ◽  
Debashis Ghosh

Motivation: Cell type abundance data arising from mass cytometry experiments are compositional in nature. Classical association tests do not apply to the compositional data due to their non-Euclidean nature. Existing methods for analysis of cell type abundance data suffer from several limitations for high-dimensional mass cytometry data, especially when the sample size is small. Results: We proposed a new multivariate statistical learning methodology, Compositional Data Analysis using Kernels (CODAK), based on the kernel distance covariance (KDC) framework to test the association of the cell type compositions with important predictors (categorical or continuous) such as disease status. CODAK scales well for high-dimensional data and provides satisfactory performance for small sample sizes (n<25). We conducted simulation studies to compare the performance of the method with existing methods of analyzing cell type abundance data from mass cytometry studies. The method is also applied to a high-dimensional dataset containing different subgroups of populations including Systemic Lupus Erythematosus (SLE) patients and healthy control subjects. Availability and Implementation: CODAK is implemented using R. The codes and the data used in this manuscript are available on the web at http://github.com/GhoshLab/CODAK/. Supplementary information: Supplementary Materials.pdf.


2020 ◽  
Author(s):  
Gustavo de los Campos ◽  
Torsten Pook ◽  
Agustin Gonzalez-Raymundez ◽  
Henner Simianer ◽  
George Mias ◽  
...  

AbstractMotivationModern genomic data sets often involve multiple data-layers (e.g., DNA-sequence, gene expression), each of which itself can be high-dimensional. The biological processes underlying these data-layers can lead to intricate multivariate association patterns.ResultsWe propose and evaluate two methods for analysis variance when both input and output sets are high-dimensional. Our approach uses random effects models to estimate the proportion of variance of vectors in the linear span of the output set that can be explained by regression on the input set. We consider a method based on orthogonal basis (Eigen-ANOVA) and one that uses random vectors (Monte Carlo ANOVA, MC-ANOVA) in the linear span of the output set. We used simulations to assess the bias and variance of each of the methods, and to compare it with that of the Partial Least Squares (PLS)–an approach commonly used in multivariate-high-dimensional regressions. The MC-ANOVA method gave nearly unbiased estimates in all the simulation scenarios considered. Estimates produced by Eigen-ANOVA and PLS had noticeable biases. Finally, we demonstrate insight that can be obtained with the of MC-ANOVA and Eigen-ANOVA by applying these two methods to the study of multi-locus linkage disequilibrium in chicken genomes and to the assessment of inter-dependencies between gene expression, methylation and copy-number-variants in data from breast cancer tumors.AvailabilityThe Supplementary data includes an R-implementation of each of the proposed methods as well as the scripts used in simulations and in the real-data [email protected] informationSupplementary data are available at Bioinformatics online.


Author(s):  
Yue Wang ◽  
Kunqi Chen ◽  
Zhen Wei ◽  
Frans Coenen ◽  
Jionglong Su ◽  
...  

Abstract Motivation The distribution of biological features strongly indicates their functional relevance. Compared to DNA-related features, deciphering the distribution of mRNA-related features is non-trivial due to the existence of isoform ambiguity and compositional diversity of mRNAs. Results We propose here a rigorous statistical framework, MetaTX, for deciphering the distribution of mRNA-related features. Through a standardized mRNA model, MetaTX firstly unifies various mRNA transcripts of diverse compositions, and then corrects the isoform ambiguity by incorporating the overall distribution pattern of the features through an EM algorithm. MetaTX was tested on both simulated and real data. Results suggested that MetaTX substantially outperformed existing direct methods on simulated datasets, and that a more informative distribution pattern was produced for all the three datasets tested, which contain N6-Methyladenosine sites generated by different technologies. MetaTX should make a useful tool for studying the distribution and functions of mRNA-related biological features, especially for mRNA modifications such as N6-Methyladenosine. Availability and implementation The MetaTX R package is freely available at GitHub: https://github.com/yue-wang-biomath/MetaTX.1.0. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (23) ◽  
pp. 4962-4970
Author(s):  
Xiangqi Bai ◽  
Liang Ma ◽  
Lin Wan

Abstract Motivation Cell fate determination is a continuous process in which one cell type diversifies to other cell types following a hierarchical path. Advancements in single-cell technologies provide the opportunity to reveal the continuum of cell progression which forms a structured continuous tree (SCTree). Computational algorithms, which are usually based on a priori assumptions on the hidden structures, have previously been proposed as a means of recovering pseudo trajectory along cell differentiation process. However, there still lack of statistical framework on the assessments of intrinsic structure embedded in high-dimensional gene expression profile. Inherit noise and cell-to-cell variation underlie the single-cell data, however, pose grand challenges to testing even basic structures, such as linear versus bifurcation. Results In this study, we propose an adaptive statistical framework, termed SCTree, to test the intrinsic structure of a high-dimensional single-cell dataset. SCTree test is conducted based on the tools derived from metric geometry and random matrix theory. In brief, by extending the Gromov–Farris transform and utilizing semicircular law, we formulate the continuous tree structure testing problem into a signal matrix detection problem. We show that the SCTree test is most powerful when the signal-to-noise ratio exceeds a moderate value. We also demonstrate that SCTree is able to robustly detect linear, single and multiple branching events with simulated datasets and real scRNA-seq datasets. Overall, the SCTree test provides a unified statistical assessment of the significance of the hidden structure of single-cell data. Availability and implementation SCTree software is available at https://github.com/XQBai/SCTree-test. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (10) ◽  
pp. 3288-3289
Author(s):  
Miroslav Kratochvíl ◽  
David Bednárek ◽  
Tomáš Sieger ◽  
Karel Fišer ◽  
Jiří Vondrášek

Abstract Summary ShinySOM offers a user-friendly interface for reproducible, high-throughput analysis of high-dimensional flow and mass cytometry data guided by self-organizing maps. The software implements a FlowSOM-style workflow, with improvements in performance, visualizations and data dissection possibilities. The outputs of the analysis include precise statistical information about the dissected samples, and R-compatible metadata useful for the batch processing of large sample volumes. Availability and implementation ShinySOM is free and open-source, available online at gitlab.com/exaexa/ShinySOM. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Alma Andersson ◽  
Joakim Lundeberg

Abstract Motivation Collection of spatial signals in large numbers has become a routine task in multiple omics-fields, but parsing of these rich datasets still pose certain challenges. In whole or near-full transcriptome spatial techniques, spurious expression profiles are intermixed with those exhibiting an organized structure. To distinguish profiles with spatial patterns from the background noise, a metric that enables quantification of spatial structure is desirable. Current methods designed for similar purposes tend to be built around a framework of statistical hypothesis testing, hence we were compelled to explore a fundamentally different strategy. Results We propose an unexplored approach to analyze spatial transcriptomics data, simulating diffusion of individual transcripts to extract genes with spatial patterns. The method performed as expected when presented with synthetic data. When applied to real data, it identified genes with distinct spatial profiles, involved in key biological processes or characteristic for certain cell types. Compared to existing methods, ours seemed to be less informed by the genes’ expression levels and showed better time performance when run with multiple cores. Availabilityand implementation Open-source Python package with a command line interface (CLI), freely available at https://github.com/almaan/sepal under an MIT licence. A mirror of the GitHub repository can be found at Zenodo, doi: 10.5281/zenodo.4573237. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 18 (1) ◽  
Author(s):  
Christos Nikolaou ◽  
Kerstin Muehle ◽  
Stephan Schlickeiser ◽  
Alberto Sada Japp ◽  
Nadine Matzmohr ◽  
...  

An amendment to this paper has been published and can be accessed via the original article.


Sign in / Sign up

Export Citation Format

Share Document