Application of the Power Series Probability Distributions for the Analysis of Zero-Inflated Insect Count Data

AbstractSingle-cell RNA-sequencing (scRNA-seq) analyses typically begin by clustering a gene-by-cell expression matrix to empirically define groups of cells with similar expression profiles. We describe new methods and a new open source library, minicore, for efficient k-means++ center finding and k-means clustering of scRNA-seq data. Minicore works with sparse count data, as it emerges from typical scRNA-seq experiments, as well as with dense data from after dimensionality reduction. Minicore’s novel vectorized weighted reservoir sampling algorithm allows it to find initial k-means++ centers for a 4-million cell dataset in 1.5 minutes using 20 threads. Minicore can cluster using Euclidean distance, but also supports a wider class of measures like Jensen-Shannon Divergence, Kullback-Leibler Divergence, and the Bhattacharyya distance, which can be directly applied to count data and probability distributions.Further, minicore produces lower-cost centerings more efficiently than scikit-learn for scRNA-seq datasets with millions of cells. With careful handling of priors, minicore implements these distance measures with only minor (<2-fold) speed differences among all distances. We show that a minicore pipeline consisting of k-means++, localsearch++ and minibatch k-means can cluster a 4-million cell dataset in minutes, using less than 10GiB of RAM. This memory-efficiency enables atlas-scale clustering on laptops and other commodity hardware. Finally, we report findings on which distance measures give clusterings that are most consistent with known cell type labels.AvailabilityThe open source library is at https://github.com/dnbaker/minicore. Code used for experiments is at https://github.com/dnbaker/minicore-experiments.

Download Full-text

Some families of generalized Mathieu-type power series, associated probability distributions and related inequalities involving complete monotonicity and log-convexity

Mathematical Inequalities & Applications ◽

10.7153/mia-2017-20-61 ◽

2017 ◽

pp. 973-986 ◽

Cited By ~ 1

Author(s):

Živorad Tomovski ◽

Khaled Mehrez

Keyword(s):

Power Series ◽

Probability Distributions ◽

Complete Monotonicity ◽

Type Power

Download Full-text

Burmann-series distributions and approximation operators

Studia Scientiarum Mathematicarum Hungarica ◽

10.1556/sscmath.42.2005.4.2 ◽

2005 ◽

Vol 42 (4) ◽

pp. 355-369

Author(s):

J. P. King

Keyword(s):

Power Series ◽

Probability Distributions ◽

Linear Operators ◽

Positive Linear Operators ◽

Approximation Operators ◽

Real Functions

Burmann series are used to give probability distributions which generalize the known class of distributions given by power series. Positive linear operators associated with Burmann-series distribution are described. Convergence of these operators to continuous real functions is studied. Examples are discussed.

Download Full-text

A Moment Preserving Finitization Across the Power Series Family of Probability Distributions

Communication in Statistics- Theory and Methods ◽

10.1080/03610926.2010.529536 ◽

2012 ◽

Vol 41 (4) ◽

pp. 653-664 ◽

Cited By ~ 3

Author(s):

Martin S. Levy ◽

James J. Cochran ◽

Saeed Golnabi

Keyword(s):

Power Series ◽

Probability Distributions

Download Full-text

The Orthogonal Polynomials of Power Series Probability Distributions and Their Uses

Biometrika ◽

10.2307/2334058 ◽

1966 ◽

Vol 53 (1/2) ◽

pp. 121

Author(s):

D. F. I van Heerden ◽

H. T. Gonin

Keyword(s):

Power Series ◽

Orthogonal Polynomials ◽

Probability Distributions

Download Full-text

From Invariance Under Binomial Thinning to Unification of the Cauchy and the Gołąb–Schinzel-Type Equations

Results in Mathematics ◽

10.1007/s00025-021-01457-8 ◽

2021 ◽

Vol 76 (4) ◽

Author(s):

Karol Baron ◽

Jacek Wesolowski

Keyword(s):

Power Series ◽

Functional Equations ◽

Probability Distributions ◽

Binomial Thinning ◽

Additive Form

AbstractWe point out to a connection between a problem of invariance of power series families of probability distributions under binomial thinning and functional equations which generalize both the Cauchy and an additive form of the Gołąb–Schinzel equation. We solve these equations in several settings with no or mild regularity assumptions imposed on unknown functions.

Download Full-text

The T–R {Y} power series family of probability distributions

Journal of the Egyptian Mathematical Society ◽

10.1186/s42787-020-00083-7 ◽

2020 ◽

Vol 28 (1) ◽

Cited By ~ 1

Author(s):

Patrick Osatohanmwen ◽

Francis O. Oyegue ◽

Sunday M. Ogbonmwan

Keyword(s):

Power Series ◽

Probability Distributions

Download Full-text

On the Discretization of Continuous Probability Distributions Using a Probabilistic Rounding Mechanism

Mathematics ◽

10.3390/math9050555 ◽

2021 ◽

Vol 9 (5) ◽

pp. 555

Author(s):

Chénangnon Frédéric Tovissodé ◽

Sèwanou Hermann Honfo ◽

Jonas Têlé Doumatè ◽

Romain Glèlè Kakaï

Keyword(s):

Poisson Distribution ◽

Count Data ◽

Probability Distributions ◽

Continuous Distribution ◽

Simple Expression ◽

Distribution Model ◽

Stochastic Representation ◽

Gamma Distributions ◽

The Family ◽

Jensen Shannon Divergence

Most existing flexible count distributions allow only approximate inference when used in a regression context. This work proposes a new framework to provide an exact and flexible alternative for modeling and simulating count data with various types of dispersion (equi-, under-, and over-dispersion). The new method, referred to as “balanced discretization”, consists of discretizing continuous probability distributions while preserving expectations. It is easy to generate pseudo random variates from the resulting balanced discrete distribution since it has a simple stochastic representation (probabilistic rounding) in terms of the continuous distribution. For illustrative purposes, we develop the family of balanced discrete gamma distributions that can model equi-, under-, and over-dispersed count data. This family of count distributions is appropriate for building flexible count regression models because the expectation of the distribution has a simple expression in terms of the parameters of the distribution. Using the Jensen–Shannon divergence measure, we show that under the equidispersion restriction, the family of balanced discrete gamma distributions is similar to the Poisson distribution. Based on this, we conjecture that while covering all types of dispersions, a count regression model based on the balanced discrete gamma distribution will allow recovering a near Poisson distribution model fit when the data are Poisson distributed.

Download Full-text