Modeling Count Data for Healthcare Utilization: an Empirical Study of Outpatient Visits Among Vietnamese Older People
Abstract Background Vietnam is undergoing a fast aging process that poses potential critical issues for older people and central among those is demand for healthcare utilization. However, healthcare utilization, here measured as count data, creates challenges for modeling because such data typically has distributions that are skewed with a large mass at zero. This study compares empirical econometric strategies for the modeling of healthcare utilization (measured as the number of outpatient visits in the last 12 months), and identifies the determinants of healthcare utilization among Vietnamese older people based on the best-fitting model identified. Methods Using the Vietnam Household Living Standard Survey in 2006 (N=2426), nine econometric regression models for count data were examined to identify the best-fitting one. We used model selection criteria; statistical tests; and goodness-of-fit for in-sample model selection. In addition, we conducted 10-fold cross-validation checks to examine reliability of in-sample model selection. Finally, we utilized marginal effects to identify the factors associated with the number of outpatient visits among Vietnamese older people based on the best-fitting model identified. Results We found strong evidence in favor of hurdle negative binomial model 2 (HNB2) for both in-sample selection and 10-fold cross-validation checks. The marginal effect results of the HNB2 showed that ethnicity, region, household size, health insurance, smoking status, non-communicable diseases, and disability were significantly associated with the number of outpatient visits. The predicted probabilities for each count event showed the distinct trends of healthcare utilization among specific groups: at low count events, women and people in the younger age group used more healthcare utilization than did men and their counterparts in older age groups, but a reversed trend was found at higher count events. Conclusions Data come in all shapes and sizes, this study highlights the importance of model specification checks and model selection criteria to avoid potential biased estimates as a result of model misspecifications. This study’s findings lay the groundwork for future research on the modeling of healthcare utilization in developing countries and those findings could be used to forecast on healthcare demand and making provisions for healthcare costs.