Geographically Masking Addresses to Study COVID-19 Clusters
Abstract Background: The spatio-temporal analysis of cases is a good way an epidemic, and the recent COVID-19 pandemic unfortunately generated a huge amount of data. But analysing this raw data, with for instance the address of the people who contracted COVID-19, raises some privacy issues, and geomasking is necessary topreserve both people privacy and the spatial accuracy required for analysis. This paper proposes dierent geomasking techniques adapted to this COVID-19 data.Methods: Different techniques are adapted from the literature, and tested on a synthetic dataset mimicking the COVID-19 spatio-temporal spreading in Paris and a more rural nearby region. Theses techniques are assessed in terms of k-anonymity and cluster preservation.Results: Three adapted geomasking techniques are proposed: aggregation, bimodal gaussian perturbation, and simulated crowding. All three can be useful in different use cases, but the bimodal gaussian perturbation is the overall best techniques, and the simulated crowding is the most promising one, provided some improvements are introduced to avoid points with a low k-anonymity.Conclusions: It is possible to use geomasking techniques on addresses of people who caught COVID-19, while preserving the important spatial patterns.