Evaluation of a Parsimonious COVID-19 Outbreak Prediction Model using publicly available datasets (Preprint)
BACKGROUND Coronavirus disease 2019 (COVID-19) pandemic has changed public health policies and personal lifestyles through lockdowns and mandates. Governments are rapidly evolving policies to increase hospital capacity and supply personal protective equipment to mitigate disease spread in distressed regions. Current models that predict COVID-19 case counts and spread, such as deep learning, offer limited explainability and generalizability. This creates a gap for highly accurate and robust outbreak prediction models which balance parsimony and fit. OBJECTIVE We seek to leverage various readily accessible datasets extracted from multiple states to train and evaluate a parsimonious predictive model capable of identifying county-level risk of COVID-19 outbreaks on a day-to-day basis. METHODS Our methods use the following data inputs: COVID-19 case counts per county per day and county populations. We developed an outbreak gold standard across California, Indiana, and Iowa. The model was trained on data between 3/1/20-8/31/20, then tested from 9/1/20 to 10/31/20 against the gold standard to derive confusion matrix statistics. RESULTS The model reported sensitivities of 92%, 90%, and 81% for Indiana, Iowa, and California respectively. The precision in each state was above 85%, and the specificity and accuracy were generally greater than 95%. CONCLUSIONS The parsimonious model provide a generalizable and simple alternative approach to outbreak prediction. Our methodology could be tested on diverse regions to aid government officials and hospitals with resource allocation.