AbstractThe number of confirmed COVID-19 cases, relative to population size, has varied greatly throughout the United States and even within the same city. In different zip codes in New York City, the epicentre of the epidemic, the number of cases per 100,000 residents has ranged from 437 to 4227, a 1:10 ratio. To guide policy decisions regarding containment and reopening of the economy, schools and other institutions, it is vital to identify the factors that drive this large variation.This paper reports on a statistical study of incidence variation by zip code across New York City. Among many socio-economic and demographic measures considered, the average household size emerges as the single most important explanatory variable: an increase in average household size by one member increases the zip code incidence rate, in our final model specification, by at least 876 cases, 23% of the range of incidence rates, at a 95% confidence level.The percentage of the population above the age of 65, the percentage below the poverty line, and their interaction term are also strongly positively associated with zip code incidence rates, In terms of ethnic/racial characteristics, the percentages of African Americans, Hispanics and Asians within the population, are significantly associated, but the magnitude of the impact is considerably smaller. (The proportion of Asians within a zip code has a negative association.)These significant associations may be explained by comorbidities, known to be more (less) prevalent among the black and Hispanic (Asian) population segments. In turn, the increased prevalence of these comorbidities among the black and Hispanic population, is, in large part, the result of poorer dietary habits and more limited access to healthcare, themselves driven by lower incomesContrary to popular belief, population density, per se, does not have a significantly positive impact. Indeed, population density and zip code incidence rate are negatively correlated, with a -33% correlation coefficient.Our model specification is based on a well-established epidemiologic model that explains the effects of household sizes on R0, the basic reproductive number of an epidemic.Our findings support implemented and proposed policies to quarantine pre-acute and post-acute patients, as well as nursing home admission policies