scholarly journals Distributions You Can Count On …But What’s the Point?

Econometrics ◽  
2020 ◽  
Vol 8 (1) ◽  
pp. 9 ◽  
Author(s):  
Brendan P. M. McCabe ◽  
Christopher L. Skeels

The Poisson regression model remains an important tool in the econometric analysis of count data. In a pioneering contribution to the econometric analysis of such models, Lung-Fei Lee presented a specification test for a Poisson model against a broad class of discrete distributions sometimes called the Katz family. Two members of this alternative class are the binomial and negative binomial distributions, which are commonly used with count data to allow for under- and over-dispersion, respectively. In this paper we explore the structure of other distributions within the class and their suitability as alternatives to the Poisson model. Potential difficulties with the Katz likelihood leads us to investigate a class of point optimal tests of the Poisson assumption against the alternative of over-dispersion in both the regression and intercept only cases. In a simulation study, we compare score tests of ‘Poisson-ness’ with various point optimal tests, based on the Katz family, and conclude that it is possible to choose a point optimal test which is better in the intercept only case, although the nuisance parameters arising in the regression case are problematic. One possible cause is poor choice of the point at which to optimize. Consequently, we explore the use of Hellinger distance to aid this choice. Ultimately we conclude that score tests remain the most practical approach to testing for over-dispersion in this context.

2012 ◽  
Vol 57 (1) ◽  
Author(s):  
SEYED EHSAN SAFFAR ◽  
ROBIAH ADNAN ◽  
WILLIAM GREENE

A Poisson model typically is assumed for count data. In many cases, there are many zeros in the dependent variable and because of these many zeros, the mean and the variance values of the dependent variable are not the same as before. In fact, the variance value of the dependent variable will be much more than the mean value of the dependent variable and this is called over–dispersion. Therefore, Poisson model is not suitable anymore for this kind of data because of too many zeros. Thus, it is suggested to use a hurdle Poisson regression model to overcome over–dispersion problem. Furthermore, the response variable in such cases is censored for some values. In this paper, a censored hurdle Poisson regression model is introduced on count data with many zeros. In this model, we consider a response variable and one or more than one explanatory variables. The estimation of regression parameters using the maximum likelihood method is discussed and the goodness–of–fit for the regression model is examined. We study the effects of right censoring on estimated parameters and their standard errors via an example.


2016 ◽  
Vol 63 (1) ◽  
pp. 77-87 ◽  
Author(s):  
William H. Fisher ◽  
Stephanie W. Hartwell ◽  
Xiaogang Deng

Poisson and negative binomial regression procedures have proliferated, and now are available in virtually all statistical packages. Along with the regression procedures themselves are procedures for addressing issues related to the over-dispersion and excessive zeros commonly observed in count data. These approaches, zero-inflated Poisson and zero-inflated negative binomial models, use logit or probit models for the “excess” zeros and count regression models for the counted data. Although these models are often appropriate on statistical grounds, their interpretation may prove substantively difficult. This article explores this dilemma, using data from a study of individuals released from facilities maintained by the Massachusetts Department of Correction.


2021 ◽  
Vol 5 (1) ◽  
pp. 130-140
Author(s):  
Jajang Jajang ◽  
Budi Pratikno ◽  
Mashuri Mashuri

In 2019 the number of people with TB (Tuberculosis) in Banyumas, Central Java, is high (1,910 people have been detected with TB). The number of people infected Tuberculosis (TB) in Banyumas is the count data and it is also the area data. In modeling, the parameter estimation and characteristic of the data need to be considered. Here, we studied comparing Generalized Poisson (GP), negative binomial (NB), and Poisson and CAR.BYM model for TB cases in Banyumas. Here, we use two methods for parameter estimation, maximum likelihood estimation (MLE) and Bayes. The MLE is used for GP and NB models, whereas Bayes is used for Poisson and CAR-BYM. The results showed that Poisson model detected overdispersion where deviance value is 67.38 for 22 degrees of freedom. Therefore, ratio of deviance to degrees of freedom is 3.06 (>1). This indicates that there was overdispersion. The folowing GP, NB, Poisson-Bayes and CAR-BYM are used to modeling TB data in Banyumas and we compare their RMSE. With refer to RMES criteria, we found that CAR-BYM is the best model for modeling TB in Banyumas because its RMSE is smallest.


2007 ◽  
Vol 34 (12) ◽  
pp. 1659-1674 ◽  
Author(s):  
Glenn D. Walters

The benchmark model for count data is the Poisson distribution, and the standard statistical procedure for analyzing count data is Poisson regression. However, highly restrictive assumptions lead to frequent misspecification of the Poisson model. Alternate approaches, such as negative binomial regression, zero modified procedures, and truncated and censored models are consequently required to handle count data in many social science contexts. Empirical examples from correctional and forensic psychology are provided to illustrate the importance of replacing ordinary least squares regression with Poisson class procedures in situations when count data are analyzed.


2021 ◽  
Vol 16 (2) ◽  
Author(s):  
Thabo Lephoto ◽  
Henry Mwambi ◽  
Oliver Bodhlyera ◽  
Holly Gaff

There is a vast amount of geo-referenced data in many fields of study including ecological studies. Geo-referencing is usually by point referencing; that is, latitudes and longitudes or by areal referencing, which includes districts, counties, states, provinces and other administrative units. The availability of large geo-referenced datasets for modelling has necessitated the development and application of spatial statistical methods. However, spatial varying coefficients models exploring the abundance of tick counts remain limited. In this study we used data that was collected and prepared by researchers in the Department of Biological Sciences from the Old Dominion University, Virginia, USA. We modelled tick life-stage counts and abundance variability from 12 sampling locations, with 5 different habitats (numbered 1-5), three habitat types; namely: woods, edges and grass; collected monthly from May 2009 through December 2018. Spatio-temporal Poisson and spatio-temporal negative binomial (NB) count data models were fitted to the data and compared using the deviance information criteria (DIC). The NB model outperformed the Poisson models with all its DIC values being smaller than those of the Poisson model. Results showed that the covariates varied spatially across counties. There was a decreasing time (in years) effect over the study period. However, even though the time effect was decreasing over the study period, space-time interaction effects were seen to be increasing over time in York County.


2011 ◽  
Vol 140 (6) ◽  
pp. 1087-1094 ◽  
Author(s):  
J.-H. LEE ◽  
G. HAN ◽  
W. J. FULP ◽  
A. R. GIULIANO

SUMMARYThe Poisson model can be applied to the count of events occurring within a specific time period. The main feature of the Poisson model is the assumption that the mean and variance of the count data are equal. However, this equal mean-variance relationship rarely occurs in observational data. In most cases, the observed variance is larger than the assumed variance, which is called overdispersion. Further, when the observed data involve excessive zero counts, the problem of overdispersion results in underestimating the variance of the estimated parameter, and thus produces a misleading conclusion. We illustrated the use of four models for overdispersed count data that may be attributed to excessive zeros. These are Poisson, negative binomial, zero-inflated Poisson and zero-inflated negative binomial models. The example data in this article deal with the number of incidents involving human papillomavirus infection. The four models resulted in differing statistical inferences. The Poisson model, which is widely used in epidemiology research, underestimated the standard errors and overstated the significance of some covariates.


2019 ◽  
Vol 49 (4) ◽  
Author(s):  
Edilson Marcelino Silva ◽  
Thais Destefani Ribeiro Furtado ◽  
Jaqueline Gonçalves Fernandes ◽  
Marcelo Ângelo Cirillo ◽  
Joel Augusto Muniz

ABSTRACT: Coffee crops play an important role in Brazilian agriculture, with a high level of social and economic participation resulting from the jobs created in the supply chain and from the income obtained by producers and the revenue generated for the country from coffee bean export. In coffee plant growth, leaves have a determinant role in higher production; therefore, the leaf count per plant provides relevant information to producers for adequate crop management, such as foliar fertilizer applications. To describe count data, the Poisson model is the most commonly employed model; when count data show overdispersion, the negative binomial model has been determined to be more adequate. The objective of this study was to compare the fitness of the Poisson and negative binomial models to data on the leaf count per plant in coffee seedlings. Data were collected from an experiment with a randomized block design with 30 treatments and three replicates and four plants per plot. Data from only one treatment, in which the number of leaves was counted over time, were employed. The first count was conducted on 8 April 2016, and the other counts were performed 18, 32, 47, 62, 76, 95, 116, 133, and 153 days after the first evaluation, for a total of ten measurements. The fitness of the models was assessed based on deviance values and simulated envelopes for residuals. Results of fitness assessment indicated that the Poisson model was inadequate for describing the data due to overdispersion. The negative binomial model adequately fitted the observations and was indicated to describe the number of leaves of coffee plants. Based on the negative binomial model, the expected relative increase in the number of leaves was 0.9768% per day.


1999 ◽  
Vol 29 (2) ◽  
pp. 327-337 ◽  
Author(s):  
Meng Shengwang ◽  
Yuan Wei ◽  
G.A. Whitmore

AbstractIndividual automobile insurance claims are characterized by over-dispersion relative to the Poisson model. In addition, claim propensities vary among individuals in any insurance portfolio. This paper presents a model which takes account of both characteristics. The model employs the negative-binomial distribution as the distribution for individual-level claims and a Pareto distribution as the distribution for claim propensities within the portfolio. The paper shows that the resulting model is tractable and has a number of attractive properties which make it suitable for this application. The fit of the model to actual claim numbers for automobile third party liability insurance is examined and found acceptable. Bayes theorem is then applied to this model to calculate illustrative optimal premiums under the Bonus-Malus System (BMS).


2017 ◽  
Vol 9 (3) ◽  
pp. 6
Author(s):  
Volition Tlhalitshi Montshiwa ◽  
Ntebogang Dinah Moroke

Abstract: Sample size requirements are common in many multivariate analysis techniques as one of the measures taken to ensure the robustness of such techniques, such requirements have not been of interest in the area of count data models. As such, this study investigated the effect of sample size on the efficiency of six commonly used count data models namely: Poisson regression model (PRM), Negative binomial regression model (NBRM), Zero-inflated Poisson (ZIP), Zero-inflated negative binomial (ZINB), Poisson Hurdle model (PHM) and Negative binomial hurdle model (NBHM). The data used in this study were sourced from Data First and were collected by Statistics South Africa through the Marriage and Divorce database. PRM, NBRM, ZIP, ZINB, PHM and NBHM were applied to ten randomly selected samples ranging from 4392 to 43916 and differing by 10% in size. The six models were compared using the Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), Vuong’s test for over-dispersion, McFadden RSQ, Mean Square Error (MSE) and Mean Absolute Deviation (MAD).The results revealed that generally, the Negative Binomial-based models outperformed Poisson-based models. However, the results did not reveal the effect of sample size variations on the efficiency of the models since there was no consistency in the change in AIC, BIC, Vuong’s test for over-dispersion, McFadden RSQ, MSE and MAD as the sample size increased.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Ashenafi A. Yirga ◽  
Sileshi F. Melesse ◽  
Henry G. Mwambi ◽  
Dawit G. Ayele

Abstract It is of great interest for a biomedical analyst or an investigator to correctly model the CD4 cell count or disease biomarkers of a patient in the presence of covariates or factors determining the disease progression over time. The Poisson mixed-effects models (PMM) can be an appropriate choice for repeated count data. However, this model is not realistic because of the restriction that the mean and variance are equal. Therefore, the PMM is replaced by the negative binomial mixed-effects model (NBMM). The later model effectively manages the over-dispersion of the longitudinal data. We evaluate and compare the proposed models and their application to the number of CD4 cells of HIV-Infected patients recruited in the CAPRISA 002 Acute Infection Study. The results display that the NBMM has appropriate properties and outperforms the PMM in terms of handling over-dispersion of the data. Multiple imputation techniques are also used to handle missing values in the dataset to get valid inferences for parameter estimates. In addition, the results imply that the effect of baseline BMI, HAART initiation, baseline viral load, and the number of sexual partners were significantly associated with the patient’s CD4 count in both fitted models. Comparison, discussion, and conclusion of the results of the fitted models complete the study.


Sign in / Sign up

Export Citation Format

Share Document