Using Poisson Class Regression To Analyze Count Data in Correctional and Forensic Psychology

2007 ◽  
Vol 34 (12) ◽  
pp. 1659-1674 ◽  
Author(s):  
Glenn D. Walters

The benchmark model for count data is the Poisson distribution, and the standard statistical procedure for analyzing count data is Poisson regression. However, highly restrictive assumptions lead to frequent misspecification of the Poisson model. Alternate approaches, such as negative binomial regression, zero modified procedures, and truncated and censored models are consequently required to handle count data in many social science contexts. Empirical examples from correctional and forensic psychology are provided to illustrate the importance of replacing ordinary least squares regression with Poisson class procedures in situations when count data are analyzed.

2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Ahmed Nabil Shaaban ◽  
Bárbara Peleteiro ◽  
Maria Rosario O. Martins

Abstract Background This study offers a comprehensive approach to precisely analyze the complexly distributed length of stay among HIV admissions in Portugal. Objective To provide an illustration of statistical techniques for analysing count data using longitudinal predictors of length of stay among HIV hospitalizations in Portugal. Method Registered discharges in the Portuguese National Health Service (NHS) facilities Between January 2009 and December 2017, a total of 26,505 classified under Major Diagnostic Category (MDC) created for patients with HIV infection, with HIV/AIDS as a main or secondary cause of admission, were used to predict length of stay among HIV hospitalizations in Portugal. Several strategies were applied to select the best count fit model that includes the Poisson regression model, zero-inflated Poisson, the negative binomial regression model, and zero-inflated negative binomial regression model. A random hospital effects term has been incorporated into the negative binomial model to examine the dependence between observations within the same hospital. A multivariable analysis has been performed to assess the effect of covariates on length of stay. Results The median length of stay in our study was 11 days (interquartile range: 6–22). Statistical comparisons among the count models revealed that the random-effects negative binomial models provided the best fit with observed data. Admissions among males or admissions associated with TB infection, pneumocystis, cytomegalovirus, candidiasis, toxoplasmosis, or mycobacterium disease exhibit a highly significant increase in length of stay. Perfect trends were observed in which a higher number of diagnoses or procedures lead to significantly higher length of stay. The random-effects term included in our model and refers to unexplained factors specific to each hospital revealed obvious differences in quality among the hospitals included in our study. Conclusions This study provides a comprehensive approach to address unique problems associated with the prediction of length of stay among HIV patients in Portugal.


2016 ◽  
Vol 63 (1) ◽  
pp. 77-87 ◽  
Author(s):  
William H. Fisher ◽  
Stephanie W. Hartwell ◽  
Xiaogang Deng

Poisson and negative binomial regression procedures have proliferated, and now are available in virtually all statistical packages. Along with the regression procedures themselves are procedures for addressing issues related to the over-dispersion and excessive zeros commonly observed in count data. These approaches, zero-inflated Poisson and zero-inflated negative binomial models, use logit or probit models for the “excess” zeros and count regression models for the counted data. Although these models are often appropriate on statistical grounds, their interpretation may prove substantively difficult. This article explores this dilemma, using data from a study of individuals released from facilities maintained by the Massachusetts Department of Correction.


2021 ◽  
Vol 10 (3) ◽  
pp. 226-236
Author(s):  
Khusnul Khotimah ◽  
Itasia Dina Sulvianti ◽  
Pika Silvianti

The number of leper in West Java is an example of the count data case. The analyzes commonly used in count data is Poisson regression. This research will determine the variables that influence the number of leper in West Java. The data used is the number of leper in West Java in 2019. This data has an overdispersion condition and spatial heterogenity. To handle overdispersion, the negative binomial regression model can be employed. While spatial heterogenity is overcome by adding adaptive bisquare kernel weight. This research resulted Geographically Weighted Negative Binomial Regression (GWNBR) with a weighting adaptive bisquare kernel classifies regency/city in West Java into ten groups based on the variables that sigfinicantly influence the number of leper. In general, the variable in the percentage of households with Clean and Healthy Behavior (PHBS) has a significant effect in all regency/city in West Java. Especially for Bogor Regency, Depok City, Bogor City, and Pangandaran Regency, the variable of the percentage of people poverty does not have a significant effect on the number leper.


2019 ◽  
pp. 232102221886979
Author(s):  
Radhika Pandey ◽  
Amey Sapre ◽  
Pramod Sinha

Identification of primary economic activity of firms is a prerequisite for compiling several macro aggregates. In this paper, we take a statistical approach to understand the extent of changes in primary economic activity of firms over time and across different industries. We use the history of economic activity of over 46,000 firms spread over 25 years from CMIE Prowess to identify the number of times firms change the nature of their business. Using the count of changes, we estimate Poisson and Negative Binomial regression models to gain predictability over changing economic activity across industry groups. We show that a Poisson model accurately characterizes the distribution of count of changes across industries and that firms with a long history are more likely to have changed their primary economic activity over the years. Findings show that classification can be a crucial problem in a large data set like the MCA21 and can even lead to distortions in value addition estimates at the industry level. JEL Classifications: D22, E00, E01


2020 ◽  
Vol 2 (3) ◽  
Author(s):  
Yuqing Zhang ◽  
Giovanni Parmigiani ◽  
W Evan Johnson

Abstract The benefit of integrating batches of genomic data to increase statistical power is often hindered by batch effects, or unwanted variation in data caused by differences in technical factors across batches. It is therefore critical to effectively address batch effects in genomic data to overcome these challenges. Many existing methods for batch effects adjustment assume the data follow a continuous, bell-shaped Gaussian distribution. However in RNA-seq studies the data are typically skewed, over-dispersed counts, so this assumption is not appropriate and may lead to erroneous results. Negative binomial regression models have been used previously to better capture the properties of counts. We developed a batch correction method, ComBat-seq, using a negative binomial regression model that retains the integer nature of count data in RNA-seq studies, making the batch adjusted data compatible with common differential expression software packages that require integer counts. We show in realistic simulations that the ComBat-seq adjusted data results in better statistical power and control of false positives in differential expression compared to data adjusted by the other available methods. We further demonstrated in a real data example that ComBat-seq successfully removes batch effects and recovers the biological signal in the data.


Author(s):  
Luay Habeeb Hashim ◽  
Ahmad Naeem Flaih

28   Counts data models cope with the response variable counts, where the number of times that a certain event occurs in a fixed point is called count data, its observations consists of non-negative integers values {0,1,2,…}. Because of the nature of count data, the response variables are usually considered doing not follow normal distribution. Therefore, linear regression is not an appropriate method to analysis count data due to the skewed distribution. Hence, using linear regression model to analysis count data is likely to bias the results, under these limitations, Poisson regression model and “Negative binomial regression” are likely the appropriate models to analysis count data. Sometimes researchers may Counts more zeros than the expected. Count data with many Zeros leads to a concept called “Zero-inflation”. Data with abundant zeros are especially popular in health, marketing, finance, econometric, ecology, statistics quality control, geographical, and environmental fields when counting the occurrence of certain behavioral and natural events, such as frequency of alcohol use, take drugs, number of cigarettes smoked, the occurrence of earthquakes, rainfall, and etc. Some models have been used to analyzing count data such as the “zero- altered Poisson” (ZAP) model and the “negative binomial” model. In this paper, the models, Poisson, Negative Binomial, ZAP, and ZANB were been used to analyze rainfall data.


Author(s):  
Takuya Hasebe

In this article, I describe the escount command, which implements the estimation of an endogenous switching model with count-data outcomes, where a potential outcome differs across two alternate treatment statuses. escount allows for either a Poisson or a negative binomial regression model with lognormal latent heterogeneity. After estimating the parameters of the switching regression model, one can estimate various treatment effects with the command teescount. I also describe the command lncount, which fits the Poisson or negative binomial regression model with lognormal latent heterogeneity.


Sign in / Sign up

Export Citation Format

Share Document