glmGamPoi: Fitting Gamma-Poisson Generalized Linear Models on Single Cell Count Data

Mapping Intimacies ◽

10.1101/2020.08.13.249623 ◽

2020 ◽

Cited By ~ 1

Author(s):

Constantin Ahlmann-Eltze ◽

Wolfgang Huber

Keyword(s):

Principal Component Analysis ◽

Single Cell ◽

Poisson Distribution ◽

Generalized Linear Models ◽

Count Data ◽

Linear Models ◽

Differential Expression Analysis ◽

Source Code ◽

Principal Component ◽

Single Cell Rna Sequencing

AbstractMotivationThe Gamma-Poisson distribution is a theoretically and empirically motivated model for the sampling variability of single cell RNA-sequencing counts (Grün et al., 2014; Townes et al., 2019; Svensson, 2020; Silverman et al., 2018; Hafemeister and Satija, 2019) and an essential building block for analysis approaches including differential expression analysis (Robinson et al., 2010; McCarthy et al., 2012; Anders and Huber, 2010; Love et al., 2014), principal component analysis (Townes et al., 2019) and factor analysis (Risso et al., 2018). Existing implementations for inferring its parameters from data often struggle with the size of single cell datasets, which typically comprise thousands or millions of cells; at the same time, they do not take full advantage of the fact that zero and other small numbers are frequent in the data. These limitations have hampered uptake of the model, leaving room for statistically inferior approaches such as logarithm(-like) transformation.ResultsWe present a new R package for fitting the Gamma-Poisson distribution to data with the characteristics of modern single cell datasets more quickly and more accurately than existing methods. The software can work with data on disk without having to load them into RAM simultaneously.AvailabilityThe package glmGamPoi is available from Bioconductor (since release 3.11) for Windows, macOS, and Linux, and source code is available on GitHub under a GPL-3 license. The scripts to reproduce the results of this paper are available on GitHub as [email protected]

Download Full-text

glmGamPoi: Fitting Gamma-Poisson Generalized Linear Models on Single Cell Count Data

Bioinformatics ◽

10.1093/bioinformatics/btaa1009 ◽

2020 ◽

Author(s):

Constantin Ahlmann-Eltze ◽

Wolfgang Huber

Keyword(s):

Single Cell ◽

Poisson Distribution ◽

Generalized Linear Models ◽

Count Data ◽

Linear Models ◽

Differential Expression Analysis ◽

Source Code ◽

Principal Component ◽

R Package ◽

Single Cell Rna Sequencing

Abstract Motivation The Gamma-Poisson distribution is a theoretically and empirically motivated model for the sampling variability of single cell RNA-sequencing counts (Grün et al., 2014; Svensson, 2020; Silverman et al., 2018; Hafemeister and Satija, 2019) and an essential building block for analysis approaches including differential expression analysis (Robinson et al., 2010; McCarthy et al., 2012; Anders and Huber, 2010; Love et al., 2014), principal component analysis (Townes et al., 2019) and factor analysis (Risso et al., 2018). Existing implementations for inferring its parameters from data often struggle with the size of single cell datasets, which can comprise millions of cells; at the same time, they do not take full advantage of the fact that zero and other small numbers are frequent in the data. These limitations have hampered uptake of the model, leaving room for statistically inferior approaches such as logarithm(-like) transformation. Results We present a new R package for fitting the Gamma-Poisson distribution to data with the characteristics of modern single cell datasets more quickly and more accurately than existing methods. The software can work with data on disk without having to load them into RAM simultaneously. Availability The package glmGamPoi is available from Bioconductor for Windows, macOS, and Linux, and source code is available on github.com/const-ae/glmGamPoi under a GPL-3 license.

Download Full-text

Truncated Robust Principal Component Analysis and Noise Reduction for Single Cell RNA Sequencing Data

Journal of Computational Biology ◽

10.1089/cmb.2018.0255 ◽

2019 ◽

Vol 26 (8) ◽

pp. 782-793 ◽

Cited By ~ 1

Author(s):

Krzysztof Gogolewski ◽

Maciej Sykulski ◽

Neo Christopher Chung ◽

Anna Gambin

Keyword(s):

Principal Component Analysis ◽

Noise Reduction ◽

Single Cell ◽

Rna Sequencing ◽

Principal Component ◽

Component Analysis ◽

Sequencing Data ◽

Robust Principal Component Analysis ◽

Single Cell Rna Sequencing

Download Full-text

Benchmarking principal component analysis for large-scale single-cell RNA-sequencing

Genome Biology ◽

10.1186/s13059-019-1900-3 ◽

2020 ◽

Vol 21 (1) ◽

Cited By ~ 8

Author(s):

Koki Tsuyuzaki ◽

Hiroyuki Sato ◽

Kenta Sato ◽

Itoshi Nikaido

Keyword(s):

Principal Component Analysis ◽

Single Cell ◽

Rna Sequencing ◽

Large Scale ◽

Principal Component ◽

Component Analysis ◽

Single Cell Rna Sequencing

Download Full-text

Analysis of Single-Cell RNA-Sequencing Data: A Step-by-Step Guide

BioMedInformatics ◽

10.3390/biomedinformatics2010003 ◽

2021 ◽

Vol 2 (1) ◽

pp. 43-61

Author(s):

Aanchal Malhotra ◽

Samarendra Das ◽

Shesh N. Rai

Keyword(s):

Single Cell ◽

Differential Expression ◽

Rna Sequencing ◽

Count Data ◽

Negative Binomial ◽

Expression Profiles ◽

Differential Expression Analysis ◽

Sequencing Data ◽

Zero Inflation ◽

Single Cell Rna Sequencing

Single-cell RNA-sequencing (scRNA-seq) technology provides an excellent platform for measuring the expression profiles of genes in heterogeneous cell populations. Multiple tools for the analysis of scRNA-seq data have been developed over the years. The tools require complicated commands and steps to analyze the underlying data, which are not easy to follow by genome researchers and experimental biologists. Therefore, we describe a step-by-step workflow for processing and analyzing the scRNA-seq unique molecular identifier (UMI) data from Human Lung Adenocarcinoma cell lines. We demonstrate the basic analyses including quality check, mapping and quantification of transcript abundance through suitable real data example to obtain UMI count data. Further, we performed basic statistical analyses, such as zero-inflation, differential expression and clustering analyses on the obtained count data. We studied the effects of excess zero-inflation present in scRNA-seq data on the downstream analyses. Our findings indicate that the zero-inflation associated with UMI data had no or minimal role in clustering, while it had significant effect on identifying differentially expressed genes. We also provide an insight into the comparative analysis for differential expression analysis tools based on zero-inflated negative binomial and negative binomial models on scRNA-seq data. The sensitivity analysis enhanced our findings in that the negative binomial model-based tool did not provide an accurate and efficient way to analyze the scRNA-seq data. This study provides a set of guidelines for the users to handle and analyze real scRNA-seq data more easily.

Download Full-text

scAEspy: a unifying tool based on autoencoders for the analysis of single-cell RNA sequencing data

10.1101/727867 ◽

2019 ◽

Author(s):

Andrea Tangherloni ◽

Federico Ricciuti ◽

Daniela Besozzi ◽

Pietro Liò ◽

Ana Cvejic

Keyword(s):

Principal Component Analysis ◽

Single Cell ◽

Rna Sequencing ◽

Principal Component ◽

Loss Functions ◽

Gene Interactions ◽

Sequencing Data ◽

Single Cell Rna Sequencing ◽

The Common ◽

Public Datasets

Autoencoders (AEs) have been effectively used to capture the non-linearities among gene interactions of single-cell RNA sequencing (scRNA-Seq) data. However, their integration with the common scRNA-Seq bioinformatics pipelines still poses a challenge. Here, we introduce scAEspy, a unifying tool that embodies five of the most advanced AEs and different loss functions, including two novel AEs that we developed. scAEspy allows the integration of data generated using different scRNA-Seq platforms. We benchmarked scAEspy against principal component analysis (PCA) on five public datasets, showing that our new AEs outperform the existing solutions, achieving more than 20% increase of the Rand Index in the identification of cell clusters.

Download Full-text

Modeling dynamic correlation in zero‐inflated bivariate count data with applications to single‐cell RNA sequencing data

Biometrics ◽

10.1111/biom.13457 ◽

2021 ◽

Author(s):

Zhen Yang ◽

Yen‐Yi Ho

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Count Data ◽

Sequencing Data ◽

Dynamic Correlation ◽

Single Cell Rna Sequencing

Download Full-text

Accounting for technical noise in differential expression analysis of single-cell RNA sequencing data

Nucleic Acids Research ◽

10.1093/nar/gkx754 ◽

2017 ◽

Vol 45 (19) ◽

pp. 10978-10988 ◽

Cited By ~ 26

Author(s):

Cheng Jia ◽

Yu Hu ◽

Derek Kelly ◽

Junhyong Kim ◽

Mingyao Li ◽

...

Keyword(s):

Single Cell ◽

Differential Expression ◽

Rna Sequencing ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Sequencing Data ◽

Technical Noise ◽

Single Cell Rna Sequencing

Download Full-text

Evaluation and Comparison of Patterns of Maternal Complications Using Generalized Linear Models of Count Data Time Series

International Journal of Statistics in Medical Research ◽

10.6000/1929-6029.2019.08.05 ◽

2019 ◽

Vol 8 ◽

pp. 32-39

Author(s):

Collins Odhiambo ◽

◽

Freda Kinoti

Keyword(s):

Time Series ◽

Generalized Linear Models ◽

Count Data ◽

Linear Models ◽

Maternal Complications ◽

Count Data Time Series

Download Full-text

Analyzing over-dispersed count data in two-way cross-classification problems using generalized linear models

Journal of Statistical Computation and Simulation ◽

10.1080/00949659908811956 ◽

1999 ◽

Vol 63 (3) ◽

pp. 263-281 ◽

Cited By ~ 3

Author(s):

Nancy L. Campbell ◽

Linda J. Young ◽

George A. Capuano

Keyword(s):

Generalized Linear Models ◽

Count Data ◽

Linear Models ◽

Classification Problems ◽

Cross Classification

Download Full-text

Truncated Robust Principal Component Analysis and Noise Reduction for Single Cell RNA-seq Data

Bioinformatics Research and Applications - Lecture Notes in Computer Science ◽

10.1007/978-3-319-94968-0_32 ◽

2018 ◽

pp. 335-346

Author(s):

Krzysztof Gogolewski ◽

Maciej Sykulski ◽

Neo Christopher Chung ◽

Anna Gambin

Keyword(s):

Principal Component Analysis ◽

Noise Reduction ◽

Single Cell ◽

Principal Component ◽

Component Analysis ◽

Rna Seq ◽

Robust Principal Component Analysis

Download Full-text