scholarly journals A Stochastic Condensation Mechanism for Inducing Underdispersion in Count Models

Author(s):  
Chenangnon Frédéric Tovissodé ◽  
Romain Glele Kakai

It is quite easy to stochastically distort an original count variable to obtain a new count variable with relatively more variability than in the original variable. Many popular overdispersion models (variance greater than mean) can indeed be obtained by mixtures, compounding or randomlystopped sums. There is no analogous stochastic mechanism for the construction of underdispersed count variables (variance less than mean), starting from an original count distribution of interest. This work proposes a generic method to stochastically distort an original count variable to obtain a new count variable with relatively less variability than in the original variable. The proposed mechanism, termed condensation, attracts probability masses from the quantiles in the tails of the original distribution and redirect them toward quantiles around the expected value. If the original distribution can be simulated, then the simulation of variates from a condensed distribution is straightforward. Moreover, condensed distributions have a simple mean-parametrization, a characteristic useful in a count regression context. An application to the negative binomial distribution resulted in a distribution allowing under, equi and overdispersion. In addition to graphical insights, fields of applications of special cases of condensed Poisson and condensed negative binomial distributions were pointed out as an indication of the potential of condensation for a flexible analysis of count data

1996 ◽  
Vol 6 ◽  
pp. 175-212 ◽  
Author(s):  
Timothy W. Amato

In this article, the mathematical and probabilistic foundations of Gary King's “generalized event count” (GEC) model for dealing with unequally dispersed event count data are explored. It is shown that the GEC model is a probability model that joins together the binomial, negative binomial, and Poisson distributions. Some aspects of the GEC's reparameterization are described and extended and it is shown how different reparameterizations lead to different interpretations of the dispersion parameter. The common mathematical and statistical structure of “unequally dispersed” event count models as models that require estimation of the “number of trials” parameter along with the “probability” component is derived. Some questions pertaining to estimation of this class of models are raised for future discussion.


Author(s):  
Cindy Xin Feng

AbstractCounts data with excessive zeros are frequently encountered in practice. For example, the number of health services visits often includes many zeros representing the patients with no utilization during a follow-up time. A common feature of this type of data is that the count measure tends to have excessive zero beyond a common count distribution can accommodate, such as Poisson or negative binomial. Zero-inflated or hurdle models are often used to fit such data. Despite the increasing popularity of ZI and hurdle models, there is still a lack of investigation of the fundamental differences between these two types of models. In this article, we reviewed the zero-inflated and hurdle models and highlighted their differences in terms of their data generating processes. We also conducted simulation studies to evaluate the performances of both types of models. The final choice of regression model should be made after a careful assessment of goodness of fit and should be tailored to a particular data in question.


2019 ◽  
Vol 43 ◽  
Author(s):  
Thomas Bruno Michelon ◽  
Cesar Augusto Taconeli ◽  
Elisa Serra Negra Vieira ◽  
Maristela Panobianco

ABSTRACT Generalized linear models (GLMs) are an extension of the linear model and include the normal, Poisson, and negative binomial distributions. Although GLMs were introduced in 1972, most seed technology studies, especially those involving count data, such as germination tests of seeds from the genus Eucalyptus, still using the analysis of variance, without analysis of the fit of other models. Thus, this study aimed to evaluate the most appropriate model in the GLM class for seed count data of Eucalyptus cloeziana. Data were obtained from a germination test using seeds from three lots of E. cloeziana. Each lot was separated by sieving into three material fractions based on size: small (<0.84 mm), medium (from 1.18 to 1.00 mm), and large (>1.18 mm). The data analysis was based on the use of GLMs adjusted to normal, Poisson, and negative binomial distributions, and the models were evaluated by the Akaike and Bayesian Schwartz criteria and Cook’s distance and half-normal diagnostic graphs. Compared to other adjustments, the normal distribution adjustment differed in the configuration of means submitted to the Tukey test, and although the data met all normality assumptions, the adjustment with the Poisson distribution was the most suitable for the count data from a germination test of E. cloeziana seeds.


1979 ◽  
Vol 16 (01) ◽  
pp. 154-162 ◽  
Author(s):  
Lars Holst

An urn contains A balls of each of N colours. At random n balls are drawn in succession without replacement, with replacement or with replacement together with S new balls of the same colour. Let Xk be the number of drawn balls having colour k, k = 1, …, N. For a given function f the characteristic function of the random variable ZM = f(X 1)+ … + f(XM ), M ≦ N, is derived. A limit theorem for ZM when M, N, n → ∞is proved by a general method. The theorem covers many special cases discussed separately in the literature. As applications of the theorem limit distributions are obtained for some occupancy problems and for dispersion statistics for the binomial, Poisson and negative-binomial distribution.


1986 ◽  
Vol 18 (03) ◽  
pp. 660-678 ◽  
Author(s):  
C. Radhakrishna Rao ◽  
D. N. Shanbhag

The problem of identifying solutions of general convolution equations relative to a group has been studied in two classical papers by Choquet and Deny (1960) and Deny (1961). Recently, Lau and Rao (1982) have considered the analogous problem relative to a certain semigroup of the real line, which extends the results of Marsaglia and Tubilla (1975) and a lemma of Shanbhag (1977). The extended versions of Deny&amp;s theorem contained in the papers by Lau and Rao, and Shanbhag (which we refer to as LRS theorems) yield as special cases improved versions of several characterizations of exponential, Weibull, stable, Pareto, geometric, Poisson and negative binomial distributions obtained by various authors during the last few years. In this paper we review some of the recent contributions to characterization of probability distributions (whose authors do not seem to be aware of LRS theorems or special cases existing earlier) and show how improved versions of these results follow as immediate corollaries to LRS theorems. We also give a short proof of Lau–Rao theorem based on Deny&amp;s theorem and thus establish a direct link between the results of Deny (1961) and those of Lau and Rao (1982). A variant of Lau–Rao theorem is proved and applied to some characterization problems.


2016 ◽  
Vol 23 (3) ◽  
pp. 359-385 ◽  
Author(s):  
Claude Manté ◽  
Saikou Oumar Kidé ◽  
Anne-Francoise Yao-Lafourcade ◽  
Bastien Mérigot

2015 ◽  
Vol 57 (1) ◽  
pp. 1-19
Author(s):  
Norbert Mielenz ◽  
Joachim Spilke ◽  
Eberhard von Borell

Population-averaged and subject-specific models are available to evaluate count data when repeated observations per subject are present. The latter are also known in the literature as generalised linear mixed models (GLMM). In GLMM repeated measures are taken into account explicitly through random animal effects in the linear predictor. In this paper the relevant GLMMs are presented based on conditional Poisson or negative binomial distribution of the response variable for given random animal effects. Equations for the repeatability of count data are derived assuming normal distribution and logarithmic gamma distribution for the random animal effects. Using count data on aggressive behaviour events of pigs (barrows, sows and boars) in mixed-sex housing, we demonstrate the use of the Poisson »log-gamma intercept«, the Poisson »normal intercept« and the »normal intercept« model with negative binomial distribution. Since not all count data can definitely be seen as Poisson or negative-binomially distributed, questions of model selection and model checking are examined. Emanating from the example, we also interpret the least squares means, estimated on the link as well as the response scale. Options provided by the SAS procedure NLMIXED for estimating model parameters and for estimating marginal expected values are presented.


2021 ◽  
Vol 14 ◽  
pp. 1-8
Author(s):  
Yook-Ngor Phang ◽  
Seng-Huat Ong ◽  
Yeh-Ching Low

The Poisson inverse Gaussian and generalized Poisson distributions are widely used in modelling overdispersed count data which are commonly found in healthcare, insurance, engineering, econometric and ecology. The inverse trinomial distribution is a relatively new count distribution arising from a one-dimensional random walk model (Shimizu & Yanagimoto, 1991). The Poisson inverse Gaussian distribution is a popular count model that has been proposed as an alternative to the negative binomial distribution. The inverse trinomial and generalized Poisson models possess a common characteristic of having a cubic variance function, while the Poisson inverse Gaussian has a quadratic variance function. The nature of the variance function seems to be an important property in modelling overdispersed count data. Hence it is of interest to be able to select among the three models in practical applications. This paper considers discrimination of three models based on the likelihood ratio statistic and computes via Monte Carlo simulation the probability of correct selection.


2020 ◽  
Vol 4 (4) ◽  
pp. 615-626
Author(s):  
Choirun Nisa ◽  
Muhammad Nur Aidi ◽  
I Made Sumertajaya

The negative binomial distribution is one of the data collection counts that focuses on success and failure events. This study conducted a study of the distribution of negative binomial data to determine the characterization of the distribution based on the value of Variance Mean Ratio (VMR). Simulation data are generated based on negative binomial distribution with a combination of p and n parameters. The results of the VMR study on negative binomial distribution simulation data show that the VMR value will be smaller when the p-value is large and the VMR value is more stable as the sample size increases. Simulation data of negative binomial distribution when p≥0.5 begins to change data distribution to the distribution of Poisson and binomial. The calculation VMR value can be used as a reference for detecting patterns of data count distribution.


Author(s):  
Anwer Khurshid ◽  
Ashit B Chakraborty

<p><span>The negative binomial distribution (NBD) is extensively used for the<br /><span>description of data too heterogeneous to be fitted by Poisson<br /><span>distribution. Observed samples, however may be truncated, in the<br /><span>sense that the number of individuals falling into zero class cannot be<br /><span>determined, or the observational apparatus becomes active when at<br /><span>least one event occurs. Chakraborty and Kakoty (1987) and<br /><span>Chakraborty and Singh (1990) have constructed CUSUM and<br /><span>Shewhart charts for zero-truncated Poisson distribution respectively.<br /><span>Recently, Chakraborty and Khurshid (2011 a, b) have constructed<br /><span>CUSUM charts for zero-truncated binomial distribution and doubly<br /><span>truncated binomial distribution respectively. Apparently, very little<br /><span>work has specifically addressed control charts for the NBD (see, for<br /><span>example, Kaminsky et al., 1992; Ma and Zhang, 1995; Hoffman, 2003;<br /><span>Schwertman. 2005).<br /></span></span></span></span></span></span></span></span></span></span></span></span></span></span></p><p><span><span><span><span><span><span><span><span><span><span><span><span><span><span><span>The purpose of this paper is to construct Shewhart control charts<br /><span>for zero-truncated negative binomial distribution (ZTNBD). Formulae<br /><span>for the Average run length (ARL) of the charts are derived and studied<br /><span>for different values of the parameters of the distribution. OC curves<br /><span>are also drawn.</span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span><br /><br class="Apple-interchange-newline" /></span></p>


Sign in / Sign up

Export Citation Format

Share Document