Negative binomial graphical model with excess zeros

Author(s):  
Beomjin Park ◽  
Hosik Choi ◽  
Changyi Park
Author(s):  
Moritz Berger ◽  
Gerhard Tutz

AbstractA flexible semiparametric class of models is introduced that offers an alternative to classical regression models for count data as the Poisson and Negative Binomial model, as well as to more general models accounting for excess zeros that are also based on fixed distributional assumptions. The model allows that the data itself determine the distribution of the response variable, but, in its basic form, uses a parametric term that specifies the effect of explanatory variables. In addition, an extended version is considered, in which the effects of covariates are specified nonparametrically. The proposed model and traditional models are compared in simulations and by utilizing several real data applications from the area of health and social science.


2016 ◽  
Vol 5 (4) ◽  
pp. 133
Author(s):  
NI PUTU PREMA DEWANTI ◽  
MADE SUSILAWATI ◽  
I GUSTI AYU MADE SRINADI

Poisson regression is a nonlinear regression which is often used for count data and has equidispersion assumption (variance value equal to mean value). However in practice, equidispersion assumption is often violated. One of it violations is overdispersion (variance value greater than the mean value). One of the causes of overdipersion is excessive number of zero values on the response variable (excess zeros). There are many methods to handle overdispersion because of excess zeros. Two of them are Zero Inflated Poisson (ZIP) regression and Zero Inflated Negative Binomial (ZINB) regression. The purpose of this research is to determine which regression models is better in handling overdispersion data. The data that can be analyzed using the ZIP and ZINB regression is maternal mortality rate in the Province of Bali. Maternal mortality rate data has proportion of zeros value more than 50% on the response variable.  In this research, ZINB regression better than ZIP regression for modeling maternal mortality rate. The independent variable that affects the number of maternal mortality rate in the Province of Bali  is the percentage of mothers who carry a pregnancy visit, with ZINB regression models and . 


2016 ◽  
Vol 63 (1) ◽  
pp. 77-87 ◽  
Author(s):  
William H. Fisher ◽  
Stephanie W. Hartwell ◽  
Xiaogang Deng

Poisson and negative binomial regression procedures have proliferated, and now are available in virtually all statistical packages. Along with the regression procedures themselves are procedures for addressing issues related to the over-dispersion and excessive zeros commonly observed in count data. These approaches, zero-inflated Poisson and zero-inflated negative binomial models, use logit or probit models for the “excess” zeros and count regression models for the counted data. Although these models are often appropriate on statistical grounds, their interpretation may prove substantively difficult. This article explores this dilemma, using data from a study of individuals released from facilities maintained by the Massachusetts Department of Correction.


2011 ◽  
Vol 54 (6) ◽  
pp. 661-675
Author(s):  
N. Mielenz ◽  
K. Thamm ◽  
M. Bulang ◽  
J. Spilke

Abstract. In this paper count data with excess zeros and repeated observations per subject are evaluated. If the number of values observed for the zero event in the trial substantially exceeds the expected number (derived from the Poisson or from the negative binomial distribution), then there is an excess of zeros. Hurdle and zero-inflated models with random effects are available in order to evaluate this type of data. In this paper both model approaches are presented and are used for the evaluation of the number of visits to the feeder per cow per hour. Finally, for the analysis of the target trait a hurdle model with random effects based on a negative binomial distribution was used. This analysis was derived from a detailed comparison of models and was needed because of a simpler computer implementation. For improved interpretation of the results, the levels of the explanatory factors (for example, the classes of lactation) were not averaged in the link scale, but rather in the response scale. The deciding explanatory variables for the pattern of visiting activities in the 24-hour cycle are the milking and cleaning times at hours 4, 7, 12 and 20. The highly significant differences in the visiting frequencies of cows of the first lactation and those of higher lactations were explained by competition for access to the feeder and thus to the feed.


Parasitology ◽  
2009 ◽  
Vol 136 (13) ◽  
pp. 1695-1705 ◽  
Author(s):  
P. VOUNATSOU ◽  
G. RASO ◽  
M. TANNER ◽  
E. K. N'GORAN ◽  
J. UTZINGER

SUMMARYProgress has been made in mapping and predicting the risk of schistosomiasis using Bayesian geostatistical inference. Applications primarily focused on risk profiling of prevalence rather than infection intensity, although the latter is particularly important for morbidity control. In this review, the underlying assumptions used in a study mapping Schistosoma mansoni infection intensity in East Africa are examined. We argue that the assumption of stationarity needs to be relaxed, and that the negative binomial assumption might result in misleading inference because of a high number of excess zeros (individuals without an infection). We developed a Bayesian geostatistical zero-inflated (ZI) regression model that assumes a non-stationary spatial process. Our model is validated with a high-quality georeferenced database from western Côte d'Ivoire, consisting of demographic, environmental, parasitological and socio-economic data. Nearly 40% of the 3818 participating schoolchildren were infected with S. mansoni, and the mean egg count among infected children was 162 eggs per gram of stool (EPG), ranging between 24 and 6768 EPG. Compared to a negative binomial and ZI Poisson and negative binomial models, the Bayesian non-stationary ZI negative binomial model showed a better fit to the data. We conclude that geostatistical ZI models produce more accurate maps of helminth infection intensity than the spatial negative binomial ones.


2017 ◽  
Vol 23 (2) ◽  
Author(s):  
Samuel Iddi ◽  
Esther O. Nwoko

AbstractCount outcomes are often modelled using the Poisson regression. However, this model imposes a strict mean-variance relationship that is unappealing in many contexts. Several studies in the life sciences result in count outcomes with excessive amounts of zeros. The presence of the excess zeros introduces extra dispersion in the data which cannot be accounted for by the traditional Poisson regression. The zero-inflated Poisson (ZIP) and zero-inflated negative binomial models are popular alternative. The zero-inflated models comprise two key components; a logistic part which models the zeros, and a Poisson component to handle the positive counts. Both components allow the inclusion of covariates. Civettini and Hines [


2013 ◽  
Vol 838-841 ◽  
pp. 2081-2087
Author(s):  
Jian Xu ◽  
Ke Si You

Developing statistics methods to distinguish significant factors associated with roadways is one of the most feasible accesses to understand the nature of traffic accidents. In this study, zero-inflated negative binomial (ZINB) model was developed to allow for overdispersion and excess zeros, as well as the factors of land use, design and environment to examine the effects. The statistical tests show that ZINB model is preferred to zero-inflated Poisson and negative binomial models due to its ability to describe crash counts associated with severe injuries and fatalities more effectively. The results show that fatalities are positively associated with segment length, surface width, land use variables and rainfall. For example, an increase of one inch rainfall will result in an increase of 0.02% in fatalities. Interestingly, distances to hospitals yield positive impact, which suggests that longer distances lead to higher fatalities, presumably due to time lost in transporting crash victims to hospitals.


2016 ◽  
Vol 27 (4) ◽  
pp. 1187-1201 ◽  
Author(s):  
Marzieh Mahmoodi ◽  
Abbas Moghimbeigi ◽  
Kazem Mohammad ◽  
Javad Faradmal

This study proposes semiparametric models for analysis of hierarchical count data containing excess zeros and overdispersion simultaneously. The methods discussed in this paper handle nonlinear covariate effects through flexible semiparametric multilevel regression techniques. This is performed by providing a comprehensive comparison of semiparametric multilevel zero-inflated negative binomial and semiparametric multilevel zero-inflated generalized Poisson models under the real and simulated data. An EM algorithm based on Newton–Raphson equations for maximum penalized likelihood estimation approach is developed. The performance of the proposed models is assessed by using a Monte Carlo simulation study. We also illustrated the methods by the analysis of decayed, missing, and filled teeth of children aged 5–14 years old.


2020 ◽  
Author(s):  
Hunyong Cho ◽  
Chuwen Liu ◽  
John S. Preisser ◽  
Di Wu

SummaryMeasuring gene-gene dependence in single cell RNA sequencing (scRNA-seq) count data is often of interest and remains challenging, because an unidentified portion of the zero counts represent non-detected RNA due to technical reasons. Conventional statistical methods that fail to account for technical zeros incorrectly measure the dependence among genes. To address this problem, we propose a bivariate zero-inflated negative binomial (BZINB) model constructed using a bivariate Poisson-gamma mixture with dropout indicators for the technical (excess) zeros. Parameters are estimated based on the EM algorithm and are used to measure the underlying dependence by decomposing the two sources of zeros. Compared to existing models, the proposed BZINB model is specifically designed for estimating dependence and is more flexible, while preserving the marginal zero-inflated negative binomial distributions. Additionally, it has a simple latent variable framework, allowing parameters to have clear and intuitive interpretations, and its computation is feasible with large scale data. Using a recent scRNA-seq dataset, we illustrate model fitting and how the model-based measures can be different from naive measures. The inferential ability of the proposed model is evaluated in a simulation study. An R package ‘bzinb’ is available on CRAN.


Sign in / Sign up

Export Citation Format

Share Document