Modeling Count Data: The Poisson and Negative Binomial Regression Models

Poisson and negative binomial regression procedures have proliferated, and now are available in virtually all statistical packages. Along with the regression procedures themselves are procedures for addressing issues related to the over-dispersion and excessive zeros commonly observed in count data. These approaches, zero-inflated Poisson and zero-inflated negative binomial models, use logit or probit models for the “excess” zeros and count regression models for the counted data. Although these models are often appropriate on statistical grounds, their interpretation may prove substantively difficult. This article explores this dilemma, using data from a study of individuals released from facilities maintained by the Massachusetts Department of Correction.

Download Full-text

A Study of Count Regression Models for Mortality Rate

CAUCHY ◽

10.18860/ca.v7i1.13642 ◽

2021 ◽

Vol 7 (1) ◽

pp. 142-151

Author(s):

Anwar Fitrianto

Keyword(s):

Mortality Rate ◽

Regression Model ◽

Count Data ◽

Bayesian Information Criterion ◽

Regression Models ◽

Negative Binomial ◽

Negative Binomial Regression ◽

Information Criterion ◽

Poisson Regression Model ◽

Binomial Regression

This paper discusses how overdispersed count data to be fit. Poisson regression model, Negative Binomial 1 regression model (NEGBIN 1) and Negative Binomial regression 2 (NEGBIN 2) model were proposed to fit mortality rate data. The method used is comparing the values of Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) to find out which method suits the data the most. The results show that the data indeed display higher variability. Among the three models, the model preferred is NEGBIN 1 model.

Download Full-text

Overdisp: A Stata (and Mata) Package for Direct Detection of Overdispersion in Poisson and Negative Binomial Regression Models

Statistics Optimization & Information Computing ◽

10.19139/soic-2310-5070-557 ◽

2020 ◽

Vol 8 (3) ◽

pp. 773-789

Author(s):

Luiz Paulo Lopes Fávero ◽

Patrícia Belfiore ◽

Marco Aurélio dos Santos ◽

R. Freitas Souza

Keyword(s):

Count Data ◽

Data Model ◽

Regression Models ◽

Negative Binomial ◽

Direct Detection ◽

Negative Binomial Regression ◽

The Other ◽

Explanatory Variables ◽

Data Regression ◽

Binomial Regression

Stata has several procedures that can be used in analyzing count-data regression models and, more specifically, in studying the behavior of the dependent variable, conditional on explanatory variables. Identifying overdispersion in countdata models is one of the most important procedures that allow researchers to correctly choose estimations such as Poisson or negative binomial, given the distribution of the dependent variable. The main purpose of this paper is to present a new command for the identification of overdispersion in the data as an alternative to the procedure presented by Cameron and Trivedi [5], since it directly identifies overdispersion in the data, without the need to previously estimate a specific type of count-data model. When estimating Poisson or negative binomial regression models in which the dependent variable is quantitative, with discrete and non-negative values, the new Stata package overdisp helps researchers to directly propose more consistent and adequate models. As a second contribution, we also present a simulation to show the consistency of the overdispersion test using the overdisp command. Findings show that, if the test indicates equidispersion in the data, there are consistent evidence that the distribution of the dependent variable is, in fact, Poisson. If, on the other hand, the test indicates overdispersion in the data, researchers should investigate more deeply whether the dependent variable actually exhibits better adherence to the Poisson-Gamma distribution or not.

Download Full-text

Statistical models for analyzing count data: predictors of length of stay among HIV patients in Portugal using a multilevel model

BMC Health Services Research ◽

10.1186/s12913-021-06389-1 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Ahmed Nabil Shaaban ◽

Bárbara Peleteiro ◽

Maria Rosario O. Martins

Keyword(s):

Length Of Stay ◽

Regression Model ◽

Random Effects ◽

Count Data ◽

Negative Binomial ◽

Negative Binomial Regression ◽

Comprehensive Approach ◽

Negative Binomial Regression Model ◽

Hiv Patients ◽

Binomial Regression

Abstract Background This study offers a comprehensive approach to precisely analyze the complexly distributed length of stay among HIV admissions in Portugal. Objective To provide an illustration of statistical techniques for analysing count data using longitudinal predictors of length of stay among HIV hospitalizations in Portugal. Method Registered discharges in the Portuguese National Health Service (NHS) facilities Between January 2009 and December 2017, a total of 26,505 classified under Major Diagnostic Category (MDC) created for patients with HIV infection, with HIV/AIDS as a main or secondary cause of admission, were used to predict length of stay among HIV hospitalizations in Portugal. Several strategies were applied to select the best count fit model that includes the Poisson regression model, zero-inflated Poisson, the negative binomial regression model, and zero-inflated negative binomial regression model. A random hospital effects term has been incorporated into the negative binomial model to examine the dependence between observations within the same hospital. A multivariable analysis has been performed to assess the effect of covariates on length of stay. Results The median length of stay in our study was 11 days (interquartile range: 6–22). Statistical comparisons among the count models revealed that the random-effects negative binomial models provided the best fit with observed data. Admissions among males or admissions associated with TB infection, pneumocystis, cytomegalovirus, candidiasis, toxoplasmosis, or mycobacterium disease exhibit a highly significant increase in length of stay. Perfect trends were observed in which a higher number of diagnoses or procedures lead to significantly higher length of stay. The random-effects term included in our model and refers to unexplained factors specific to each hospital revealed obvious differences in quality among the hospitals included in our study. Conclusions This study provides a comprehensive approach to address unique problems associated with the prediction of length of stay among HIV patients in Portugal.

Download Full-text

Detection of influenza-like illness aberrations by directly monitoring Pearson residuals of fitted negative binomial regression models

BMC Public Health ◽

10.1186/s12889-015-1500-4 ◽

2015 ◽

Vol 15 (1) ◽

Cited By ~ 5

Author(s):

Ta-Chien Chan ◽

Yung-Chu Teng ◽

Jing-Shiang Hwang

Keyword(s):

Regression Models ◽

Negative Binomial ◽

Negative Binomial Regression ◽

Influenza Like Illness ◽

Pearson Residuals ◽

Binomial Regression

Download Full-text

A Combined PLS and Negative Binomial Regression Model for Inferring Association Networks from Next-Generation Sequencing Count Data

IEEE/ACM Transactions on Computational Biology and Bioinformatics ◽

10.1109/tcbb.2017.2665495 ◽

2018 ◽

Vol 15 (3) ◽

pp. 760-773 ◽

Cited By ~ 2

Author(s):

Maiju Pesonen ◽

Jaakko Nevalainen ◽

Steven Potter ◽

Somnath Datta ◽

Susmita Datta

Keyword(s):

Next Generation Sequencing ◽

Regression Model ◽

Count Data ◽

Negative Binomial ◽

Negative Binomial Regression ◽

Negative Binomial Regression Model ◽

Next Generation ◽

Binomial Regression ◽

Generation Sequencing

Download Full-text

Group regularization for zero-inflated negative binomial regression models with an application to health care demand in Germany

Statistics in Medicine ◽

10.1002/sim.7804 ◽

2018 ◽

Vol 37 (20) ◽

pp. 3012-3026 ◽

Cited By ~ 2

Author(s):

Saptarshi Chatterjee ◽

Shrabanti Chowdhury ◽

Himel Mallick ◽

Prithish Banerjee ◽

Broti Garai

Keyword(s):

Health Care ◽

Regression Models ◽

Negative Binomial ◽

Negative Binomial Regression ◽

Health Care Demand ◽

Binomial Regression

Download Full-text

PEMODELAN DENGAN GEOGRAPHICALLY WEIGHTED NEGATIVE BINOMIAL REGRESSION (Studi kasus: Banyaknya Penderita Kusta di Jawa Barat)

Xplore Journal of Statistics ◽

10.29244/xplore.v10i3.833 ◽

2021 ◽

Vol 10 (3) ◽

pp. 226-236

Author(s):

Khusnul Khotimah ◽

Itasia Dina Sulvianti ◽

Pika Silvianti

Keyword(s):

Regression Model ◽

Count Data ◽

Poisson Regression ◽

Negative Binomial ◽

Negative Binomial Regression ◽

Kernel Weight ◽

Negative Binomial Regression Model ◽

West Java ◽

Binomial Regression ◽

Spatial Heterogenity

The number of leper in West Java is an example of the count data case. The analyzes commonly used in count data is Poisson regression. This research will determine the variables that influence the number of leper in West Java. The data used is the number of leper in West Java in 2019. This data has an overdispersion condition and spatial heterogenity. To handle overdispersion, the negative binomial regression model can be employed. While spatial heterogenity is overcome by adding adaptive bisquare kernel weight. This research resulted Geographically Weighted Negative Binomial Regression (GWNBR) with a weighting adaptive bisquare kernel classifies regency/city in West Java into ten groups based on the variables that sigfinicantly influence the number of leper. In general, the variable in the percentage of households with Clean and Healthy Behavior (PHBS) has a significant effect in all regency/city in West Java. Especially for Bogor Regency, Depok City, Bogor City, and Pangandaran Regency, the variable of the percentage of people poverty does not have a significant effect on the number leper.

Download Full-text

6. Poisson and Negative Binomial Regression Models

Regression Models for Categorical, Count, and Related Variables ◽

10.1525/9780520965492-008 ◽

2019 ◽

pp. 131-158

Keyword(s):

Regression Models ◽

Negative Binomial ◽

Negative Binomial Regression ◽

Binomial Regression

Download Full-text

What Do We Know About Changing Economic Activity of Firms?

Studies in Microeconomics ◽

10.1177/2321022218869790 ◽

2019 ◽

pp. 232102221886979

Author(s):

Radhika Pandey ◽

Amey Sapre ◽

Pramod Sinha

Keyword(s):

Economic Activity ◽

Regression Models ◽

Statistical Approach ◽

Negative Binomial ◽

Large Data ◽

Poisson Model ◽

Negative Binomial Regression ◽

Data Set ◽

Binomial Regression ◽

History Of

Identification of primary economic activity of firms is a prerequisite for compiling several macro aggregates. In this paper, we take a statistical approach to understand the extent of changes in primary economic activity of firms over time and across different industries. We use the history of economic activity of over 46,000 firms spread over 25 years from CMIE Prowess to identify the number of times firms change the nature of their business. Using the count of changes, we estimate Poisson and Negative Binomial regression models to gain predictability over changing economic activity across industry groups. We show that a Poisson model accurately characterizes the distribution of count of changes across industries and that firms with a long history are more likely to have changed their primary economic activity over the years. Findings show that classification can be a crucial problem in a large data set like the MCA21 and can even lead to distortions in value addition estimates at the industry level. JEL Classifications: D22, E00, E01

Download Full-text