scholarly journals Application of the Power Series Probability Distributions for the Analysis of Zero-Inflated Insect Count Data

OALib ◽  
2018 ◽  
Vol 05 (10) ◽  
pp. 1-11
Author(s):  
Remi Mrume Sakia
2021 ◽  
Author(s):  
Daniel N. Baker ◽  
Nathan Dyjack ◽  
Vladimir Braverman ◽  
Stephanie C. Hicks ◽  
Ben Langmead

AbstractSingle-cell RNA-sequencing (scRNA-seq) analyses typically begin by clustering a gene-by-cell expression matrix to empirically define groups of cells with similar expression profiles. We describe new methods and a new open source library, minicore, for efficient k-means++ center finding and k-means clustering of scRNA-seq data. Minicore works with sparse count data, as it emerges from typical scRNA-seq experiments, as well as with dense data from after dimensionality reduction. Minicore’s novel vectorized weighted reservoir sampling algorithm allows it to find initial k-means++ centers for a 4-million cell dataset in 1.5 minutes using 20 threads. Minicore can cluster using Euclidean distance, but also supports a wider class of measures like Jensen-Shannon Divergence, Kullback-Leibler Divergence, and the Bhattacharyya distance, which can be directly applied to count data and probability distributions.Further, minicore produces lower-cost centerings more efficiently than scikit-learn for scRNA-seq datasets with millions of cells. With careful handling of priors, minicore implements these distance measures with only minor (<2-fold) speed differences among all distances. We show that a minicore pipeline consisting of k-means++, localsearch++ and minibatch k-means can cluster a 4-million cell dataset in minutes, using less than 10GiB of RAM. This memory-efficiency enables atlas-scale clustering on laptops and other commodity hardware. Finally, we report findings on which distance measures give clusterings that are most consistent with known cell type labels.AvailabilityThe open source library is at https://github.com/dnbaker/minicore. Code used for experiments is at https://github.com/dnbaker/minicore-experiments.


2005 ◽  
Vol 42 (4) ◽  
pp. 355-369
Author(s):  
J. P. King

Burmann series are used to give probability distributions which generalize the known class of distributions given by power series. Positive linear operators associated with Burmann-series distribution are described. Convergence of these operators to continuous real functions is studied. Examples are discussed.


Biometrika ◽  
1966 ◽  
Vol 53 (1/2) ◽  
pp. 121
Author(s):  
D. F. I van Heerden ◽  
H. T. Gonin

2021 ◽  
Vol 76 (4) ◽  
Author(s):  
Karol Baron ◽  
Jacek Wesolowski

AbstractWe point out to a connection between a problem of invariance of power series families of probability distributions under binomial thinning and functional equations which generalize both the Cauchy and an additive form of the Gołąb–Schinzel equation. We solve these equations in several settings with no or mild regularity assumptions imposed on unknown functions.


Author(s):  
Patrick Osatohanmwen ◽  
Francis O. Oyegue ◽  
Sunday M. Ogbonmwan

Mathematics ◽  
2021 ◽  
Vol 9 (5) ◽  
pp. 555
Author(s):  
Chénangnon Frédéric Tovissodé ◽  
Sèwanou Hermann Honfo ◽  
Jonas Têlé Doumatè ◽  
Romain Glèlè Kakaï

Most existing flexible count distributions allow only approximate inference when used in a regression context. This work proposes a new framework to provide an exact and flexible alternative for modeling and simulating count data with various types of dispersion (equi-, under-, and over-dispersion). The new method, referred to as “balanced discretization”, consists of discretizing continuous probability distributions while preserving expectations. It is easy to generate pseudo random variates from the resulting balanced discrete distribution since it has a simple stochastic representation (probabilistic rounding) in terms of the continuous distribution. For illustrative purposes, we develop the family of balanced discrete gamma distributions that can model equi-, under-, and over-dispersed count data. This family of count distributions is appropriate for building flexible count regression models because the expectation of the distribution has a simple expression in terms of the parameters of the distribution. Using the Jensen–Shannon divergence measure, we show that under the equidispersion restriction, the family of balanced discrete gamma distributions is similar to the Poisson distribution. Based on this, we conjecture that while covering all types of dispersions, a count regression model based on the balanced discrete gamma distribution will allow recovering a near Poisson distribution model fit when the data are Poisson distributed.


Sign in / Sign up

Export Citation Format

Share Document