Poisson regression for linguists: A tutorial introduction to modelling count data with brms

A Poisson model typically is assumed for count data. In many cases, there are many zeros in the dependent variable and because of these many zeros, the mean and the variance values of the dependent variable are not the same as before. In fact, the variance value of the dependent variable will be much more than the mean value of the dependent variable and this is called over–dispersion. Therefore, Poisson model is not suitable anymore for this kind of data because of too many zeros. Thus, it is suggested to use a hurdle Poisson regression model to overcome over–dispersion problem. Furthermore, the response variable in such cases is censored for some values. In this paper, a censored hurdle Poisson regression model is introduced on count data with many zeros. In this model, we consider a response variable and one or more than one explanatory variables. The estimation of regression parameters using the maximum likelihood method is discussed and the goodness–of–fit for the regression model is examined. We study the effects of right censoring on estimated parameters and their standard errors via an example.

Download Full-text

MODELING COUNT DATA USING POISSON REGRESSION IN EVALUATION OF EDUCATIONAL PERFORMANCE: A NEW PARADIGM

Advances and Applications in Statistics ◽

10.17654/as066010077 ◽

2021 ◽

Vol 66 (1) ◽

pp. 77-95

Author(s):

Nawal G. Alghamdi ◽

Muhammad Aslam ◽

Khushnoor Khan

Keyword(s):

Count Data ◽

Poisson Regression ◽

Educational Performance ◽

New Paradigm

Download Full-text

PERBANDINGAN MODEL REGRESI BINOMIAL NEGATIF BIVARIAT DENGAN MODEL GEOGRAPHICALLY WEIGHTED NEGATIVE BINOMIAL BIVARIAT REGRESSION (GWNBBR) PADA KASUS ANGKA KEMATIAN BAYI DAN KEMATIAN IBU DI JAWA TENGAH

Jurnal Gaussian ◽

10.14710/j.gauss.v10i4.33096 ◽

2022 ◽

Vol 10 (4) ◽

pp. 488-498

Author(s):

Yashmine Noor Islami ◽

Dwi Ispriyanti ◽

Puspita Kartikasari

Keyword(s):

Infant Mortality ◽

Maternal Mortality ◽

Count Data ◽

Poisson Regression ◽

Negative Binomial ◽

Poor People ◽

Significant Variable ◽

Central Java ◽

Observation Area ◽

Bivariate Regression

Infant mortality (0-11 months) and maternal mortality (during pregnancy, childbirth, and postpartum) are significant indicators in determining the level of public health. Central Java Province which has 35 regencies/cities is included in the top five regions with the highest number of infant and maternal mortality in Indonesia. The data characteristics of the number of infants and maternal mortality are count data. Therefore, the Poisson Regression method can be used to analyze the factors that influence the number of infants and maternal mortality. In Poisson regression analysis, there must be a fulfilled assumption, called equidispersion. Frequently, the variance of count data is greater than the mean, which is known as the overdispersion. The research, binomial negative bivariate regression is used as a solutions to overcome the problem of overdispersion in poisson regression. This method produce a global model. In reality, the geographical, socio-cultural, and economic conditions of each region will be different. This illustrates the effect of spatial heterogeneity, so it needs to be developed into Geographically Weighted Negative Binomial Bivariate Regression (GWNBBR). The model of GWNBBR provides weighting based on the position or distance from one observation area to another. Significant variables for modeling infant mortality cases included the percentage of obstetric complications treated (X1), the percentage of infants who were exclusively breastfed (X3), and the percentage of poor people (X5). Significant variable for modeling maternal mortality cases is the percentage of poor people (X5). Based on the AIC value, GWNBBR model is better than binomial negatif bivariat regression model because it has a smaller AIC value.

Download Full-text

PEMODELAN DENGAN GEOGRAPHICALLY WEIGHTED NEGATIVE BINOMIAL REGRESSION (Studi kasus: Banyaknya Penderita Kusta di Jawa Barat)

Xplore Journal of Statistics ◽

10.29244/xplore.v10i3.833 ◽

2021 ◽

Vol 10 (3) ◽

pp. 226-236

Author(s):

Khusnul Khotimah ◽

Itasia Dina Sulvianti ◽

Pika Silvianti

Keyword(s):

Regression Model ◽

Count Data ◽

Poisson Regression ◽

Negative Binomial ◽

Negative Binomial Regression ◽

Kernel Weight ◽

Negative Binomial Regression Model ◽

West Java ◽

Binomial Regression ◽

Spatial Heterogenity

The number of leper in West Java is an example of the count data case. The analyzes commonly used in count data is Poisson regression. This research will determine the variables that influence the number of leper in West Java. The data used is the number of leper in West Java in 2019. This data has an overdispersion condition and spatial heterogenity. To handle overdispersion, the negative binomial regression model can be employed. While spatial heterogenity is overcome by adding adaptive bisquare kernel weight. This research resulted Geographically Weighted Negative Binomial Regression (GWNBR) with a weighting adaptive bisquare kernel classifies regency/city in West Java into ten groups based on the variables that sigfinicantly influence the number of leper. In general, the variable in the percentage of households with Clean and Healthy Behavior (PHBS) has a significant effect in all regency/city in West Java. Especially for Bogor Regency, Depok City, Bogor City, and Pangandaran Regency, the variable of the percentage of people poverty does not have a significant effect on the number leper.

Download Full-text

Approaches for dealing with various sources of overdispersion in modeling count data: Scale adjustment versus modeling

Statistical Methods in Medical Research ◽

10.1177/0962280215588569 ◽

2015 ◽

Vol 26 (4) ◽

pp. 1802-1823 ◽

Cited By ~ 13

Author(s):

Elizabeth H Payne ◽

James W Hardin ◽

Leonard E Egede ◽

Viswanathan Ramakrishnan ◽

Anbesaw Selassie ◽

...

Keyword(s):

Error Estimates ◽

Standard Error ◽

Count Data ◽

Poisson Regression ◽

Negative Binomial ◽

Rank Order ◽

Good Practice ◽

Information Criteria ◽

Population Heterogeneity ◽

Count Response

Overdispersion is a common problem in count data. It can occur due to extra population-heterogeneity, omission of key predictors, and outliers. Unless properly handled, this can lead to invalid inference. Our goal is to assess the differential performance of methods for dealing with overdispersion from several sources. We considered six different approaches: unadjusted Poisson regression (Poisson), deviance-scale-adjusted Poisson regression (DS-Poisson), Pearson-scale-adjusted Poisson regression (PS-Poisson), negative-binomial regression (NB), and two generalized linear mixed models (GLMM) with random intercept, log-link and Poisson (Poisson-GLMM) and negative-binomial (NB-GLMM) distributions. To rank order the preference of the models, we used Akaike's information criteria/Bayesian information criteria values, standard error, and 95% confidence-interval coverage of the parameter values. To compare these methods, we used simulated count data with overdispersion of different magnitude from three different sources. Mean of the count response was associated with three predictors. Data from two real-case studies are also analyzed. The simulation results showed that NB and NB-GLMM were preferred for dealing with overdispersion resulting from any of the sources we considered. Poisson and DS-Poisson often produced smaller standard-error estimates than expected, while PS-Poisson conversely produced larger standard-error estimates. Thus, it is good practice to compare several model options to determine the best method of modeling count data.

Download Full-text