Using Count Data Models to Predict Epiphytic Bryophyte Recruitment in Schima superba Gardn. et Champ. Plantations in Urban Forests

Epiphytic bryophytes are known to perform essential ecosystem functions, but their sensitivity to environmental quality and change makes their survival and development vulnerable to global changes, especially habitat loss in urban environments. Fortunately, extensive urban tree planting programs worldwide have had a positive effect on the colonization and development of epiphytic bryophytes. However, how epiphytic bryophytes occur and grow on planted trees remain poorly known, especially in urban environments. In the present study, we surveyed the distribution of epiphytic bryophytes on tree trunks in a Schima superba Gardn. et Champ. urban plantation and then developed count data models, including tree characteristics, stand characteristics, human disturbance, terrain factors, and microclimate to predict the drivers on epiphytic bryophyte recruitment. Different counting models (Poisson, Negative binomial, Zero-inflated Poisson, Zero-inflated negative binomial, Hurdle-Poisson, Hurdle-negative binomial) were compared for a data analysis to account for the zero-inflated data structure. Our results show that (i) the shaded side and base of tree trunks were the preferred locations for bryophytes to colonize in urban plantations, (ii) both hurdle models performed well in modeling epiphytic bryophyte recruitment, and (iii) both hurdle models showed that the tree height, diameter at breast height (DBH), leaf area index (LAI), and altitude (ALT) promoted the occurrence of epiphytic bryophytes, but the height under branch and interference intensity of human activities opposed the occurrence of epiphytic bryophytes. Specifically, DBH and LAI had positive effects on the species richness recruitment count; similarly, DBH and ALT had positive effects on the abundance recruitment count, but slope had a negative effect. To promote the occurrence and growth of epiphytic bryophytes in urban tree planting programs, we suggest that managers regulate suitable habitats by cultivating and protecting large trees, promoting canopy closure, and controlling human disturbance.

Download Full-text

A comparison of zero-inflated and hurdle models for modeling zero-inflated count data

Journal of Statistical Distributions and Applications ◽

10.1186/s40488-021-00121-4 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Cindy Xin Feng

Keyword(s):

Health Services ◽

Count Data ◽

Goodness Of Fit ◽

Negative Binomial ◽

Simulation Studies ◽

Final Choice ◽

Hurdle Models ◽

Count Distribution ◽

Careful Assessment

AbstractCounts data with excessive zeros are frequently encountered in practice. For example, the number of health services visits often includes many zeros representing the patients with no utilization during a follow-up time. A common feature of this type of data is that the count measure tends to have excessive zero beyond a common count distribution can accommodate, such as Poisson or negative binomial. Zero-inflated or hurdle models are often used to fit such data. Despite the increasing popularity of ZI and hurdle models, there is still a lack of investigation of the fundamental differences between these two types of models. In this article, we reviewed the zero-inflated and hurdle models and highlighted their differences in terms of their data generating processes. We also conducted simulation studies to evaluate the performances of both types of models. The final choice of regression model should be made after a careful assessment of goodness of fit and should be tailored to a particular data in question.

Download Full-text

Beta-binomial models for meta-analysis with binary outcomes: Variations, extensions, and additional insights from econometrics

Research Methods in Medicine & Health Sciences ◽

10.1177/2632084321996225 ◽

2021 ◽

pp. 263208432199622

Author(s):

Tim Mathes ◽

Oliver Kuss

Keyword(s):

Simulation Study ◽

Count Data ◽

Negative Binomial ◽

Meta Analysis ◽

Negative Binomial Regression ◽

Binary Outcomes ◽

Small Scale ◽

Panel Count Data ◽

Count Data Models ◽

Meta Analyses

Background Meta-analysis of systematically reviewed studies on interventions is the cornerstone of evidence based medicine. In the following, we will introduce the common-beta beta-binomial (BB) model for meta-analysis with binary outcomes and elucidate its equivalence to panel count data models. Methods We present a variation of the standard “common-rho” BB (BBST model) for meta-analysis, namely a “common-beta” BB model. This model has an interesting connection to fixed-effect negative binomial regression models (FE-NegBin) for panel count data. Using this equivalence, it is possible to estimate an extension of the FE-NegBin with an additional multiplicative overdispersion term (RE-NegBin), while preserving a closed form likelihood. An advantage due to the connection to econometric models is, that the models can be easily implemented because “standard” statistical software for panel count data can be used. We illustrate the methods with two real-world example datasets. Furthermore, we show the results of a small-scale simulation study that compares the new models to the BBST. The input parameters of the simulation were informed by actually performed meta-analysis. Results In both example data sets, the NegBin, in particular the RE-NegBin showed a smaller effect and had narrower 95%-confidence intervals. In our simulation study, median bias was negligible for all methods, but the upper quartile for median bias suggested that BBST is most affected by positive bias. Regarding coverage probability, BBST and the RE-NegBin model outperformed the FE-NegBin model. Conclusion For meta-analyses with binary outcomes, the considered common-beta BB models may be valuable extensions to the family of BB models.

Download Full-text

Modeling the Frequency of Auto Insurance Claims by Means of Poisson and Negative Binomial Models

Annals of the Alexandru Ioan Cuza University - Economics ◽

10.1515/aicue-2015-0011 ◽

2015 ◽

Vol 62 (2) ◽

pp. 151-168 ◽

Cited By ~ 3

Author(s):

Mihaela David ◽

Dănuţ-Vasile Jemna

Keyword(s):

Risk Factors ◽

Count Data ◽

Negative Binomial ◽

Information Criteria ◽

Count Data Models ◽

Auto Insurance ◽

Insurance Portfolio ◽

Negative Binomial Models ◽

Degree Of Risk ◽

Binomial Models

Abstract Within non-life insurance pricing, an accurate evaluation of claim frequency, also known in theory as count data, represents an essential part in determining an insurance premium according to the policyholder’s degree of risk. Count regression analysis allows the identification of the risk factors and the prediction of the expected frequency of claims given the characteristics of policyholders. The aim of this paper is to verify several hypothesis related to the methodology of count data models and also to the risk factors used to explain the frequency of claims. In addition to the standard Poisson regression, Negative Binomial models are applied to a French auto insurance portfolio. The best model was chosen by means of the log-likelihood ratio and the information criteria. Based on this model, the profile of the policyholders with the highest degree of risk is determined

Download Full-text

Spatio-temporal modelling of tick life-stage count data with spatially varying coefficients

Geospatial health ◽

10.4081/gh.2021.1004 ◽

2021 ◽

Vol 16 (2) ◽

Author(s):

Thabo Lephoto ◽

Henry Mwambi ◽

Oliver Bodhlyera ◽

Holly Gaff

Keyword(s):

Count Data ◽

Negative Binomial ◽

Life Stage ◽

Poisson Model ◽

Information Criteria ◽

Count Data Models ◽

Varying Coefficients ◽

Time Interaction ◽

York County ◽

Spatio Temporal

There is a vast amount of geo-referenced data in many fields of study including ecological studies. Geo-referencing is usually by point referencing; that is, latitudes and longitudes or by areal referencing, which includes districts, counties, states, provinces and other administrative units. The availability of large geo-referenced datasets for modelling has necessitated the development and application of spatial statistical methods. However, spatial varying coefficients models exploring the abundance of tick counts remain limited. In this study we used data that was collected and prepared by researchers in the Department of Biological Sciences from the Old Dominion University, Virginia, USA. We modelled tick life-stage counts and abundance variability from 12 sampling locations, with 5 different habitats (numbered 1-5), three habitat types; namely: woods, edges and grass; collected monthly from May 2009 through December 2018. Spatio-temporal Poisson and spatio-temporal negative binomial (NB) count data models were fitted to the data and compared using the deviance information criteria (DIC). The NB model outperformed the Poisson models with all its DIC values being smaller than those of the Poisson model. Results showed that the covariates varied spatially across counties. There was a decreasing time (in years) effect over the study period. However, even though the time effect was decreasing over the study period, space-time interaction effects were seen to be increasing over time in York County.

Download Full-text

Extended negative binomial hurdle models

Statistical Methods in Medical Research ◽

10.1177/0962280218766567 ◽

2018 ◽

Vol 28 (5) ◽

pp. 1540-1551

Author(s):

Maengseok Noh ◽

Youngjo Lee

Keyword(s):

Count Data ◽

Negative Binomial ◽

Random Effect ◽

Random Effect Model ◽

Real Data ◽

Hurdle Models ◽

Poisson Models ◽

Effect Model ◽

Zero Rate ◽

General Statistical

Poisson models are widely used for statistical inference on count data. However, zero-inflation or zero-deflation with either overdispersion or underdispersion could occur. Currently, there is no available model for count data, that allows excessive occurrence of zeros along with underdispersion in non-zero counts, even though there have been reported necessity of such models. Furthermore, given an excessive zero rate, we need a model that allows a larger degree of overdispersion than existing models. In this paper, we use a random-effect model to produce a general statistical model for accommodating such phenomenon occurring in real data analyses.

Download Full-text

Discrete Distribution Based on Compound Sum to Model Dental Caries Count Data

Caries Research ◽

10.1159/000450891 ◽

2016 ◽

Vol 51 (1) ◽

pp. 68-78 ◽

Cited By ~ 1

Author(s):

Jean-Noel Vergnes ◽

Jean-Philippe Boucher ◽

Nathalie Lelong ◽

Michel Sixou ◽

Cathy Nabet

Keyword(s):

Dental Caries ◽

Count Data ◽

Negative Binomial ◽

Disease Process ◽

Discrete Distribution ◽

Risk Indicators ◽

Numerical Application ◽

Score Functions ◽

Hurdle Models ◽

Epidemiological Surveys

Methods for analysing dental caries and associated risk indicators have evolved considerably in recent decades. The use of zero-inflated or hurdle models is increasing so as to take account of the decayed, missing, and filled teeth (DMFT) distribution, which is positively skewed and has a high proportion of zero scores. However, there is a need to develop new statistical models that involve pragmatic biological considerations on dental caries in epidemiological surveys. In this paper, we show that the zero-inflated and the hurdle models can both be expressed as a compound sum. Using the same compound sum, we then present the generalized negative binomial (GNB) distribution for dental caries count data, and provide a numerical application using the data of the EPIPAP study. The GNB model generates the best score functions while handling the lifetime dental caries disease process better. In conclusion, the GNB model suits the nature of some count data, in particular when structural zeros are unlikely to occur and when several latent spells can lead to new countable events. For these reasons, the use of the GNB distribution appears to be relevant for the modelling of dental caries count data.

Download Full-text

intcount: A command for fitting count-data models from interval data

The Stata Journal Promoting communications on statistics and Stata ◽

10.1177/1536867x19874240 ◽

2019 ◽

Vol 19 (3) ◽

pp. 645-666 ◽

Cited By ~ 1

Author(s):

Stephen Pudney

Keyword(s):

Count Data ◽

Regression Models ◽

Negative Binomial ◽

Interval Data ◽

Healthcare Services ◽

Data Models ◽

Count Data Models ◽

The Uk

In this article, I describe a community-contributed command, intcount, that fits one of several regression models for count data observed in interval form. The models available are Poisson, negative binomial, and binomial, and they can be fit in standard or zero-inflated form. I illustrate the command with an application to analysis of data from the UK Understanding Society survey on the demand for healthcare services.

Download Full-text

The Effect of Sample Size on the Efficiency of Count Data Models: Application to Marriage Data

Journal of Economics and Behavioral Studies ◽

10.22610/jebs.v9i3.1742 ◽

2017 ◽

Vol 9 (3) ◽

pp. 6

Author(s):

Volition Tlhalitshi Montshiwa ◽

Ntebogang Dinah Moroke

Keyword(s):

Regression Model ◽

Sample Size ◽

Count Data ◽

Negative Binomial ◽

Information Criterion ◽

Data Models ◽

Hurdle Model ◽

Negative Binomial Regression Model ◽

Count Data Models ◽

Over Dispersion

Abstract: Sample size requirements are common in many multivariate analysis techniques as one of the measures taken to ensure the robustness of such techniques, such requirements have not been of interest in the area of count data models. As such, this study investigated the effect of sample size on the efficiency of six commonly used count data models namely: Poisson regression model (PRM), Negative binomial regression model (NBRM), Zero-inflated Poisson (ZIP), Zero-inflated negative binomial (ZINB), Poisson Hurdle model (PHM) and Negative binomial hurdle model (NBHM). The data used in this study were sourced from Data First and were collected by Statistics South Africa through the Marriage and Divorce database. PRM, NBRM, ZIP, ZINB, PHM and NBHM were applied to ten randomly selected samples ranging from 4392 to 43916 and differing by 10% in size. The six models were compared using the Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), Vuong’s test for over-dispersion, McFadden RSQ, Mean Square Error (MSE) and Mean Absolute Deviation (MAD).The results revealed that generally, the Negative Binomial-based models outperformed Poisson-based models. However, the results did not reveal the effect of sample size variations on the efficiency of the models since there was no consistency in the change in AIC, BIC, Vuong’s test for over-dispersion, McFadden RSQ, MSE and MAD as the sample size increased.

Download Full-text

Spatiotemporal hurdle models for zero-inflated count data: Exploring trends in emergency department visits

Statistical Methods in Medical Research ◽

10.1177/0962280214527079 ◽

2016 ◽

Vol 25 (6) ◽

pp. 2558-2576 ◽

Cited By ~ 11

Author(s):

Brian Neelon ◽

Howard H Chang ◽

Qiang Ling ◽

Nicole S Hastings

Keyword(s):

Emergency Department ◽

Random Effects ◽

Count Data ◽

Negative Binomial ◽

Emergency Department Visits ◽

Emergency Department Use ◽

Generalized Poisson ◽

Hurdle Models ◽

Spatiotemporal Trends ◽

Temporal Smoothing

Motivated by a study exploring spatiotemporal trends in emergency department use, we develop a class of two-part hurdle models for the analysis of zero-inflated areal count data. The models consist of two components—one for the probability of any emergency department use and one for the number of emergency department visits given use. Through a hierarchical structure, the models incorporate both patient- and region-level predictors, as well as spatially and temporally correlated random effects for each model component. The random effects are assigned multivariate conditionally autoregressive priors, which induce dependence between the components and provide spatial and temporal smoothing across adjacent spatial units and time periods, resulting in improved inferences. To accommodate potential overdispersion, we consider a range of parametric specifications for the positive counts, including truncated negative binomial and generalized Poisson distributions. We adopt a Bayesian inferential approach, and posterior computation is handled conveniently within standard Bayesian software. Our results indicate that the negative binomial and generalized Poisson hurdle models vastly outperform the Poisson hurdle model, demonstrating that overdispersed hurdle models provide a useful approach to analyzing zero-inflated spatiotemporal data.

Download Full-text

The Effect of Sample Size on the Efficiency of Count Data Models: Application to Marriage Data

Journal of Economics and Behavioral Studies ◽

10.22610/jebs.v9i3(j).1742 ◽

2017 ◽

Vol 9 (3(J)) ◽

pp. 6-18

Author(s):

Volition Tlhalitshi Montshiwa ◽

Ntebogang Dinah Moroke

Keyword(s):

Regression Model ◽

Sample Size ◽

Count Data ◽

Negative Binomial ◽

Information Criterion ◽

Data Models ◽

Hurdle Model ◽

Negative Binomial Regression Model ◽

Count Data Models ◽

Over Dispersion

Abstract: Sample size requirements are common in many multivariate analysis techniques as one of the measures taken to ensure the robustness of such techniques, such requirements have not been of interest in the area of count data models. As such, this study investigated the effect of sample size on the efficiency of six commonly used count data models namely: Poisson regression model (PRM), Negative binomial regression model (NBRM), Zero-inflated Poisson (ZIP), Zero-inflated negative binomial (ZINB), Poisson Hurdle model (PHM) and Negative binomial hurdle model (NBHM). The data used in this study were sourced from Data First and were collected by Statistics South Africa through the Marriage and Divorce database. PRM, NBRM, ZIP, ZINB, PHM and NBHM were applied to ten randomly selected samples ranging from 4392 to 43916 and differing by 10% in size. The six models were compared using the Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), Vuongâ€™s test for over-dispersion, McFadden RSQ, Mean Square Error (MSE) and Mean Absolute Deviation (MAD).The results revealed that generally, the Negative Binomial-based models outperformed Poisson-based models. However, the results did not reveal the effect of sample size variations on the efficiency of the models since there was no consistency in the change in AIC, BIC, Vuongâ€™s test for over-dispersion, McFadden RSQ, MSE and MAD as the sample size increased.

Download Full-text