Abstract
Motivated by the Coronavirus Disease (COVID-19) pandemic, which is due to the SARS-CoV-2 virus, and the important problem of forecasting the number of daily deaths and the number of cumulative deaths, this paper examines the construction of prediction regions or intervals under the no-covariate or intercept-only Poisson model, the Poisson regression model, and a new over-dispersed Poisson regression model. These models are useful for settings with events of interest that are rare. For the no-covariate Poisson and the Poisson regression model, several prediction regions are developed and their performances are compared through simulation studies. The methods are applied to the problem of forecasting the number of daily deaths and the number of cumulative deaths in the United States (US) due to COVID-19. To examine their predictive accuracy in light of what actually happened, daily deaths data until May 15, 2020 were used to forecast cumulative deaths by June 1, 2020. It was observed that there is over-dispersion in the observed data relative to the Poisson regression model. A novel over-dispersed Poisson regression model is therefore proposed. This new model, which is distinct from the negative binomial regression (NBR) model, builds on frailty ideas in Survival Analysis and over-dispersion is quantified through an additional parameter. It has the flavor of a discrete measurement error model and with a viable physical interpretation in contrast to the NBR model. The Poisson regression model is a hidden model in this over-dispersed Poisson regression model, obtained as a limiting case when the over-dispersion parameter increases to infinity. A prediction region for the cumulative number of US deaths due to COVID-19 by October 1, 2020, given the data until September 1, 2020, is presented. Realized daily and cumulative deaths values from September 1st until September 25th are compared to the prediction region limits. Finally, the paper discusses limitations of the proposed procedures and mentions open research problems. It also pinpoints dangers and pitfalls when forecasting on a long horizon, especially during a pandemic where events, both foreseen and unforeseen, could impact point predictions and prediction regions.