scholarly journals Modeling excess zeros and heterogeneity in count data from a complex survey design with application to the demographic health survey in sub-Saharan Africa

2016 ◽  
Vol 27 (1) ◽  
pp. 208-220 ◽  
Author(s):  
Lin Dai ◽  
Michael D Sweat ◽  
Mulugeta Gebregziabher

Purpose To show a novel application of a weighted zero-inflated negative binomial model in modeling count data with excess zeros and heterogeneity to quantify the regional variation in HIV-AIDS prevalence in sub-Saharan African countries. Methods Data come from latest round of the Demographic and Health Survey (DHS) conducted in three countries (Ethiopia-2011, Kenya-2009 and Rwanda-2010) using a two-stage cluster sampling design. The outcome is an aggregate count of HIV cases in each census enumeration area of each country. The outcome data are characterized by excess zeros and heterogeneity due to clustering. We compare scale weighted zero-inflated negative binomial models with and without random effects to account for zero-inflation, complex survey design and clustering. Finally, we provide marginalized rate ratio estimates from the best zero-inflated negative binomial model. Results The best fitting zero-inflated negative binomial model is scale weighted and with a common random intercept for the three countries. Rate ratio estimates from the final model show that HIV prevalence is associated with age and gender distribution, HIV acceptance, HIV knowledge, and its regional variation is associated with divorce rate, burden of sexually transmitted diseases and rural residence. Conclusions Scale weighted zero-inflated negative binomial with proper modeling of random effects is shown to be the best model for count data from a complex survey design characterized by excess zeros and extra heterogeneity. In our data example, the final rate ratio estimates show significant regional variation in the factors associated with HIV prevalence indicating that HIV intervention strategies should be tailored to the unique factors found in each country.

2018 ◽  
Vol 7 (3) ◽  
pp. 22 ◽  
Author(s):  
Oyindamola B Yusuf ◽  
Rotimi Felix Afolabi ◽  
Ayoola S Ayoola

Poisson and negative binomial regression models have been used as a standard for modelling count outcomes; but these methods do not take into account the problems associated with excess zeros. However, zero-inflated and hurdle models have been proposed to model count data with excess zeros. The study therefore compared the performance of Zero-inflated (Zero-inflated Poisson (ZIP) and Zero-inflated negative binomial (ZINB)), and hurdle (Hurdle Poisson (HP) and Hurdle negative binomial (HNB)) models in determining the factors associated with the number of Antenatal Care (ANC) visits in Nigeria. Using the 2013 Nigeria Demographic and Health Survey dataset, a sample of 19 652 women of reproductive age who gave birth five years prior to the survey and provided information about ANC visits was utilised. Data were analysed using descriptive statistics, ZIP, ZINB, HP and HNB models, and information criteria (AIC/BIC) was used to assess model fit. Participants’ mean age was 29.5 ± 7.3 years and median number of ANC visits was 4 (range: 0 - 30). About half (54.9%) of the participants had at least 4 ANC visits while 33.9% had none. The ZINB (AIC = 83 039.4; BIC = 83 470.3) fitted the data better than the ZIP or HP; however, HNB (AIC = 83 041.4; BIC = 83 472.3) competed favorably well with it. The Zero-inflated negative binomial model provided the better fit for the data. We suggest the Zero-inflated negative binomial model for count data with excess zeros of unknown sources such as the number of ANC visits in Nigeria.


2019 ◽  
Vol 49 (4) ◽  
Author(s):  
Edilson Marcelino Silva ◽  
Thais Destefani Ribeiro Furtado ◽  
Jaqueline Gonçalves Fernandes ◽  
Marcelo Ângelo Cirillo ◽  
Joel Augusto Muniz

ABSTRACT: Coffee crops play an important role in Brazilian agriculture, with a high level of social and economic participation resulting from the jobs created in the supply chain and from the income obtained by producers and the revenue generated for the country from coffee bean export. In coffee plant growth, leaves have a determinant role in higher production; therefore, the leaf count per plant provides relevant information to producers for adequate crop management, such as foliar fertilizer applications. To describe count data, the Poisson model is the most commonly employed model; when count data show overdispersion, the negative binomial model has been determined to be more adequate. The objective of this study was to compare the fitness of the Poisson and negative binomial models to data on the leaf count per plant in coffee seedlings. Data were collected from an experiment with a randomized block design with 30 treatments and three replicates and four plants per plot. Data from only one treatment, in which the number of leaves was counted over time, were employed. The first count was conducted on 8 April 2016, and the other counts were performed 18, 32, 47, 62, 76, 95, 116, 133, and 153 days after the first evaluation, for a total of ten measurements. The fitness of the models was assessed based on deviance values and simulated envelopes for residuals. Results of fitness assessment indicated that the Poisson model was inadequate for describing the data due to overdispersion. The negative binomial model adequately fitted the observations and was indicated to describe the number of leaves of coffee plants. Based on the negative binomial model, the expected relative increase in the number of leaves was 0.9768% per day.


Author(s):  
Luay Habeeb Hashim ◽  
Ahmad Naeem Flaih

Count data, including zero counts arise in a wide variety of application, hence models for counts have become widely popular in many fields. In the statistics field, one may define the count data as that type of observation which takes only the non-negative integers value. Sometimes researchers may Counts more zeros than the expected. Excess zero can be defined as Zero-Inflation. Data with abundant zeros are especially popular in health, marketing, finance, econometric, ecology, statistics quality control, geographical, and environmental fields when counting the occurrence of certain behavioral and natural events, such as frequency of alcohol use, take drugs, number of cigarettes smoked, the occurrence of earthquakes, rainfall, and etc.  Some models have been used to analyzing count data such as the zero-inflated Poisson (ZIP) model and the negative binomial model. In this paper, the models, Poisson, Negative Binomial, ZIP, and ZINB were been used to analyze rainfall data.


2019 ◽  
Vol 47 (2) ◽  
pp. 287-305 ◽  
Author(s):  
Eghbal Zandkarimi ◽  
Abbas Moghimbeigi ◽  
Hossein Mahjub ◽  
Reza Majdzadeh

Sign in / Sign up

Export Citation Format

Share Document