Inference about the ratio of means from Negative Binomial paired count data

AbstractCounts data with excessive zeros are frequently encountered in practice. For example, the number of health services visits often includes many zeros representing the patients with no utilization during a follow-up time. A common feature of this type of data is that the count measure tends to have excessive zero beyond a common count distribution can accommodate, such as Poisson or negative binomial. Zero-inflated or hurdle models are often used to fit such data. Despite the increasing popularity of ZI and hurdle models, there is still a lack of investigation of the fundamental differences between these two types of models. In this article, we reviewed the zero-inflated and hurdle models and highlighted their differences in terms of their data generating processes. We also conducted simulation studies to evaluate the performances of both types of models. The final choice of regression model should be made after a careful assessment of goodness of fit and should be tailored to a particular data in question.

Download Full-text

Analysis of count data in the setting of cervical cancer detection

Journal of Investigative Medicine ◽

10.1136/jim-2020-001381 ◽

2020 ◽

Vol 68 (6) ◽

pp. 1196-1198

Author(s):

Christina G Bracamontes ◽

Thelma Carrillo ◽

Jane Montealegre ◽

Leonid Fradkin ◽

Michele Follen ◽

...

Keyword(s):

Sexual Abuse ◽

Count Data ◽

Pap Smear ◽

Negative Binomial ◽

El Paso ◽

Language Preference ◽

A Value ◽

History Of ◽

Zip Model ◽

Endocervical Canal

Women with an abnormal Pap smear are often referred to colposcopy, a procedure during which endocervical curettage (ECC) may be performed. ECC is a scraping of the endocervical canal lining. Our goal was to compare the performance of a naïve Poisson (NP) regression model with that of a zero-inflated Poisson (ZIP) model when identifying predictors of the number of distress/pain vocalizations made by women undergoing ECC. Data on women seen in the colposcopy clinic at a medical school in El Paso, Texas, were analyzed. The outcome was the number of pain vocalizations made by the patient during ECC. Six dichotomous predictors were evaluated. Initially, NP regression was used to model the data. A high proportion of patients did not make any vocalizations, and hence a ZIP model was also fit and relative rates (RRs) and 95% CIs were calculated. AIC was used to identify the best model (NP or ZIP). Of the 210 women, 154 (73.3%) had a value of 0 for the number of ECC vocalizations. NP identified three statistically significant predictors (language preference of the subject, sexual abuse history and length of the colposcopy), while ZIP identified one: history of sexual abuse (yes vs no; adjusted RR=2.70, 95% CI 1.47 to 4.97). ZIP was preferred over NP. ZIP performed better than NP regression. Clinicians and epidemiologists should consider using the ZIP model (or the zero-inflated negative binomial model) for zero-inflated count data.

Download Full-text

Statistical models for analyzing count data: predictors of length of stay among HIV patients in Portugal using a multilevel model

BMC Health Services Research ◽

10.1186/s12913-021-06389-1 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Ahmed Nabil Shaaban ◽

Bárbara Peleteiro ◽

Maria Rosario O. Martins

Keyword(s):

Length Of Stay ◽

Regression Model ◽

Random Effects ◽

Count Data ◽

Negative Binomial ◽

Negative Binomial Regression ◽

Comprehensive Approach ◽

Negative Binomial Regression Model ◽

Hiv Patients ◽

Binomial Regression

Abstract Background This study offers a comprehensive approach to precisely analyze the complexly distributed length of stay among HIV admissions in Portugal. Objective To provide an illustration of statistical techniques for analysing count data using longitudinal predictors of length of stay among HIV hospitalizations in Portugal. Method Registered discharges in the Portuguese National Health Service (NHS) facilities Between January 2009 and December 2017, a total of 26,505 classified under Major Diagnostic Category (MDC) created for patients with HIV infection, with HIV/AIDS as a main or secondary cause of admission, were used to predict length of stay among HIV hospitalizations in Portugal. Several strategies were applied to select the best count fit model that includes the Poisson regression model, zero-inflated Poisson, the negative binomial regression model, and zero-inflated negative binomial regression model. A random hospital effects term has been incorporated into the negative binomial model to examine the dependence between observations within the same hospital. A multivariable analysis has been performed to assess the effect of covariates on length of stay. Results The median length of stay in our study was 11 days (interquartile range: 6–22). Statistical comparisons among the count models revealed that the random-effects negative binomial models provided the best fit with observed data. Admissions among males or admissions associated with TB infection, pneumocystis, cytomegalovirus, candidiasis, toxoplasmosis, or mycobacterium disease exhibit a highly significant increase in length of stay. Perfect trends were observed in which a higher number of diagnoses or procedures lead to significantly higher length of stay. The random-effects term included in our model and refers to unexplained factors specific to each hospital revealed obvious differences in quality among the hospitals included in our study. Conclusions This study provides a comprehensive approach to address unique problems associated with the prediction of length of stay among HIV patients in Portugal.

Download Full-text

Transition models for count data: a flexible alternative to fixed distribution models

Statistical Methods & Applications ◽

10.1007/s10260-021-00558-6 ◽

2021 ◽

Author(s):

Moritz Berger ◽

Gerhard Tutz

Keyword(s):

Count Data ◽

Regression Models ◽

Negative Binomial ◽

Real Data ◽

Distribution Models ◽

Explanatory Variables ◽

Excess Zeros ◽

Proposed Model ◽

Transition Models ◽

Fixed Distribution

AbstractA flexible semiparametric class of models is introduced that offers an alternative to classical regression models for count data as the Poisson and Negative Binomial model, as well as to more general models accounting for excess zeros that are also based on fixed distributional assumptions. The model allows that the data itself determine the distribution of the response variable, but, in its basic form, uses a parametric term that specifies the effect of explanatory variables. In addition, an extended version is considered, in which the effects of covariates are specified nonparametrically. The proposed model and traditional models are compared in simulations and by utilizing several real data applications from the area of health and social science.

Download Full-text

Beta-binomial models for meta-analysis with binary outcomes: Variations, extensions, and additional insights from econometrics

Research Methods in Medicine & Health Sciences ◽

10.1177/2632084321996225 ◽

2021 ◽

pp. 263208432199622

Author(s):

Tim Mathes ◽

Oliver Kuss

Keyword(s):

Simulation Study ◽

Count Data ◽

Negative Binomial ◽

Meta Analysis ◽

Negative Binomial Regression ◽

Binary Outcomes ◽

Small Scale ◽

Panel Count Data ◽

Count Data Models ◽

Meta Analyses

Background Meta-analysis of systematically reviewed studies on interventions is the cornerstone of evidence based medicine. In the following, we will introduce the common-beta beta-binomial (BB) model for meta-analysis with binary outcomes and elucidate its equivalence to panel count data models. Methods We present a variation of the standard “common-rho” BB (BBST model) for meta-analysis, namely a “common-beta” BB model. This model has an interesting connection to fixed-effect negative binomial regression models (FE-NegBin) for panel count data. Using this equivalence, it is possible to estimate an extension of the FE-NegBin with an additional multiplicative overdispersion term (RE-NegBin), while preserving a closed form likelihood. An advantage due to the connection to econometric models is, that the models can be easily implemented because “standard” statistical software for panel count data can be used. We illustrate the methods with two real-world example datasets. Furthermore, we show the results of a small-scale simulation study that compares the new models to the BBST. The input parameters of the simulation were informed by actually performed meta-analysis. Results In both example data sets, the NegBin, in particular the RE-NegBin showed a smaller effect and had narrower 95%-confidence intervals. In our simulation study, median bias was negligible for all methods, but the upper quartile for median bias suggested that BBST is most affected by positive bias. Regarding coverage probability, BBST and the RE-NegBin model outperformed the FE-NegBin model. Conclusion For meta-analyses with binary outcomes, the considered common-beta BB models may be valuable extensions to the family of BB models.

Download Full-text

Analysis of one-way layout of count data with negative binomial variation

Biometrika ◽

10.1093/biomet/75.2.215 ◽

1988 ◽

Vol 75 (2) ◽

pp. 215-222 ◽

Cited By ~ 29

Author(s):

R. K. BARNWAL ◽

S. R. PAUL

Keyword(s):

Count Data ◽

Negative Binomial

Download Full-text

An Application Comparison of Two Negative Binomial Models on Rainfall Count Data

Journal of Physics Conference Series ◽

10.1088/1742-6596/1818/1/012100 ◽

2021 ◽

Vol 1818 (1) ◽

pp. 012100

Author(s):

L. H. Hashim ◽

N. K. Dreeb ◽

K. H. Hashim ◽

Mushtak A. K. Shiker

Keyword(s):

Count Data ◽

Negative Binomial ◽

Negative Binomial Models ◽

Binomial Models

Download Full-text

Managing Inflation

Crime & Delinquency ◽

10.1177/0011128716679796 ◽

2016 ◽

Vol 63 (1) ◽

pp. 77-87 ◽

Cited By ~ 5

Author(s):

William H. Fisher ◽

Stephanie W. Hartwell ◽

Xiaogang Deng

Keyword(s):

Count Data ◽

Regression Models ◽

Negative Binomial ◽

Negative Binomial Regression ◽

Excess Zeros ◽

Binomial Regression ◽

Negative Binomial Models ◽

Using Data ◽

Over Dispersion ◽

Binomial Models

Poisson and negative binomial regression procedures have proliferated, and now are available in virtually all statistical packages. Along with the regression procedures themselves are procedures for addressing issues related to the over-dispersion and excessive zeros commonly observed in count data. These approaches, zero-inflated Poisson and zero-inflated negative binomial models, use logit or probit models for the “excess” zeros and count regression models for the counted data. Although these models are often appropriate on statistical grounds, their interpretation may prove substantively difficult. This article explores this dilemma, using data from a study of individuals released from facilities maintained by the Massachusetts Department of Correction.

Download Full-text

BayCount: A Bayesian Decomposition Method for Inferring Tumor Heterogeneity using RNA-Seq Counts

10.1101/218511 ◽

2017 ◽

Cited By ~ 1

Author(s):

Fangzheng Xie ◽

Mingyuan Zhou ◽

Yanxun Xu

Keyword(s):

Rna Sequencing ◽

Count Data ◽

Decomposition Method ◽

Tumor Heterogeneity ◽

Negative Binomial ◽

Expression Profiles ◽

The Cancer Genome Atlas ◽

Cancer Prognosis ◽

Real World Data ◽

Tumor Sample

AbstractTumors are heterogeneous - a tumor sample usually consists of a set of subclones with distinct transcriptional profiles and potentially different degrees of aggressiveness and responses to drugs. Understanding tumor heterogeneity is therefore critical for precise cancer prognosis and treatment. In this paper, we introduce BayCount, a Bayesian decomposition method to infer tumor heterogeneity with highly over-dispersed RNA sequencing count data. Using negative binomial factor analysis, BayCount takes into account both the between-sample and gene-specific random effects on raw counts of sequencing reads mapped to each gene. For the posterior inference, we develop an efficient compound Poisson based blocked Gibbs sampler. Simulation studies show that BayCount is able to accurately estimate the subclonal inference, including number of subclones, the proportions of these subclones in each tumor sample, and the gene expression profiles in each subclone. For real-world data examples, we apply BayCount to The Cancer Genome Atlas lung cancer and kidney cancer RNA sequencing count data and obtain biologically interpretable results. Our method represents the first effort in characterizing tumor heterogeneity using RNA sequencing count data that simultaneously removes the need of normalizing the counts, achieves statistical robustness, and obtains biologically/clinically meaningful insights. The R package BayCount implementing our model and algorithm is available for download.

Download Full-text

PERBANDINGAN MODEL REGRESI BINOMIAL NEGATIF BIVARIAT DENGAN MODEL GEOGRAPHICALLY WEIGHTED NEGATIVE BINOMIAL BIVARIAT REGRESSION (GWNBBR) PADA KASUS ANGKA KEMATIAN BAYI DAN KEMATIAN IBU DI JAWA TENGAH

Jurnal Gaussian ◽

10.14710/j.gauss.v10i4.33096 ◽

2022 ◽

Vol 10 (4) ◽

pp. 488-498

Author(s):

Yashmine Noor Islami ◽

Dwi Ispriyanti ◽

Puspita Kartikasari

Keyword(s):

Infant Mortality ◽

Maternal Mortality ◽

Count Data ◽

Poisson Regression ◽

Negative Binomial ◽

Poor People ◽

Significant Variable ◽

Central Java ◽

Observation Area ◽

Bivariate Regression

Infant mortality (0-11 months) and maternal mortality (during pregnancy, childbirth, and postpartum) are significant indicators in determining the level of public health. Central Java Province which has 35 regencies/cities is included in the top five regions with the highest number of infant and maternal mortality in Indonesia. The data characteristics of the number of infants and maternal mortality are count data. Therefore, the Poisson Regression method can be used to analyze the factors that influence the number of infants and maternal mortality. In Poisson regression analysis, there must be a fulfilled assumption, called equidispersion. Frequently, the variance of count data is greater than the mean, which is known as the overdispersion. The research, binomial negative bivariate regression is used as a solutions to overcome the problem of overdispersion in poisson regression. This method produce a global model. In reality, the geographical, socio-cultural, and economic conditions of each region will be different. This illustrates the effect of spatial heterogeneity, so it needs to be developed into Geographically Weighted Negative Binomial Bivariate Regression (GWNBBR). The model of GWNBBR provides weighting based on the position or distance from one observation area to another. Significant variables for modeling infant mortality cases included the percentage of obstetric complications treated (X1), the percentage of infants who were exclusively breastfed (X3), and the percentage of poor people (X5). Significant variable for modeling maternal mortality cases is the percentage of poor people (X5). Based on the AIC value, GWNBBR model is better than binomial negatif bivariat regression model because it has a smaller AIC value.

Download Full-text