scholarly journals Measuring The Impact of Spatial Perturbations on The Relationship Between Data Privacy and Validity of Descriptive Statistics

2020 ◽  
Author(s):  
Kelly Broen ◽  
Rob Trangucci ◽  
Jon Zelner

Abstract Background: Like many scientific fields, epidemiology is addressing issues of research reproducibility. Spatial epidemiology, which often uses the inherently identifiable variable of participant address, must balance reproducibility with participant privacy. In this study, we assess the impact of several different data perturbation methods on key spatial statistics and patient privacy. Methods: We analyzed the impact of perturbation on spatial patterns in the full set of address- level mortality data from Lawrence, MA during the period from 1911-1913. The original death locations were perturbed using seven different published approaches to stochastic and deterministic spatial data anonymization. Key spatial descriptive statistics were calculated for each perturbation, including changes in spatial pattern center, Global Moran’s I, Local Moran’s I, distance to the k-th nearest neighbors, and the L-function (a normalized form of Ripley’s K). A spatially adapted form of k-anonymity was used to measure the privacy protection conferred by each method, and the its compliance with HIPAA privacy standards. Results: Random perturbation at 50 meters, donut masking between 5 and 50 meters, and Voronoi masking maintain the validity of descriptive spatial statistics better than other perturbations. Grid center masking with both 100x100 and 250x250 meter cells led to large changes in descriptive spatial statistics. None of the perturbation methods adhered to the HIPAA standard that all points have a k-anonymity > 10. All other perturbation methods employed had at least 265 points, or over 6%, not adhering to the HIPAA standard. Conclusions: Using the set of published perturbation methods applied in this analysis, HIPAA- compliant de-identification was not compatible with maintaining key spatial patterns as measured by our chosen summary statistics. Further research should investigate alternate methods to balancing tradeoffs between spatial data privacy and preservation of key patterns in public health data that are of scientific and medical importance.

2021 ◽  
Vol 20 (1) ◽  
Author(s):  
Kelly Broen ◽  
Rob Trangucci ◽  
Jon Zelner

Abstract Background Like many scientific fields, epidemiology is addressing issues of research reproducibility. Spatial epidemiology, which often uses the inherently identifiable variable of participant address, must balance reproducibility with participant privacy. In this study, we assess the impact of several different data perturbation methods on key spatial statistics and patient privacy. Methods We analyzed the impact of perturbation on spatial patterns in the full set of address-level mortality data from Lawrence, MA during the period from 1911 to 1913. The original death locations were perturbed using seven different published approaches to stochastic and deterministic spatial data anonymization. Key spatial descriptive statistics were calculated for each perturbation, including changes in spatial pattern center, Global Moran’s I, Local Moran’s I, distance to the k-th nearest neighbors, and the L-function (a normalized form of Ripley’s K). A spatially adapted form of k-anonymity was used to measure the privacy protection conferred by each method, and its compliance with HIPAA and GDPR privacy standards. Results Random perturbation at 50 m, donut masking between 5 and 50 m, and Voronoi masking maintain the validity of descriptive spatial statistics better than other perturbations. Grid center masking with both 100 × 100 and 250 × 250 m cells led to large changes in descriptive spatial statistics. None of the perturbation methods adhered to the HIPAA standard that all points have a k-anonymity > 10. All other perturbation methods employed had at least 265 points, or over 6%, not adhering to the HIPAA standard. Conclusions Using the set of published perturbation methods applied in this analysis, HIPAA and GDPR compliant de-identification was not compatible with maintaining key spatial patterns as measured by our chosen summary statistics. Further research should investigate alternate methods to balancing tradeoffs between spatial data privacy and preservation of key patterns in public health data that are of scientific and medical importance.


Author(s):  
D. Ballari ◽  
L. Campozano ◽  
E. Samaniego ◽  
D. Orellana

Abstract. Climate teleconnections show remote and large-scale relationships between distant points on Earth. Their relations to precipitation are important to monitor and anticipate the anomalies that they can produce in the local climate, such as flood and drought events impacting agriculture, health, and hydropower generation. Climate teleconnections in relation to precipitation have been widely studied. Nevertheless, the spatial association of the teleconnection patterns (i.e. the spatial delineation of regions with teleconnections) has been unattended. Such spatial association allows to characterize how stable (heterogeneity/dependent and statistically significant) is the underlying spatial phenomena for a given pattern. Thus our objective was to characterize the spatial association of climate teleconnection patterns related to precipitation using an exploratory spatial data analysis approach. Global and local indicators of spatial association (Moran’s I and LISA) were used to detect spatial patterns of teleconnections based on TRMM satellite images and climate indices. Moran’s I depicted high positive spatial association for different climate indices, and LISA depicted two types of teleconnections patterns. The homogenous patterns were localized in the Coast and Amazonian regions, meanwhile the disperse patterns had a major presence in the Highlands. The results also showed some areas that, although with moderate to high teleconnection influences, had a random spatial patterns (i.e. non-significant spatial association). Other areas showed both teleconnections and significant spatial association, but with dispersed patterns. This pointed out the need to explore the local underlying features (topography, orientation, wind and micro-climates) that restrict (non-significant spatial association) or reaffirm (disperse patterns) the teleconnection patterns.


2021 ◽  
Vol 13 (21) ◽  
pp. 12277
Author(s):  
Xinba Li ◽  
Chuanrong Zhang

While it is well-known that housing prices generally increased in the United States (U.S.) during the COVID-19 pandemic crisis, to the best of our knowledge, there has been no research conducted to understand the spatial patterns and heterogeneity of housing price changes in the U.S. real estate market during the crisis. There has been less attention on the consequences of this pandemic, in terms of the spatial distribution of housing price changes in the U.S. The objective of this study was to explore the spatial patterns and heterogeneous distribution of housing price change rates across different areas of the U.S. real estate market during the COVID-19 pandemic. We calculated the global Moran’s I, Anselin’s local Moran’s I, and Getis-Ord’s statistics of the housing price change rates in 2856 U.S. counties. The following two major findings were obtained: (1) The influence of the COVID-19 pandemic crisis on housing price change varied across space in the U.S. The patterns not only differed from metropolitan areas to rural areas, but also varied from one metropolitan area to another. (2) It seems that COVID-19 made Americans more cautious about buying property in densely populated urban downtowns that had higher levels of virus infection; therefore, it was found that during the COVID-19 pandemic year of 2020–2021, the housing price hot spots were typically located in more affordable suburbs, smaller cities, and areas away from high-cost, high-density urban downtowns. This study may be helpful for understanding the relationship between the COVID-19 pandemic and the real estate market, as well as human behaviors in response to the pandemic.


2014 ◽  
Vol 955-959 ◽  
pp. 3893-3898
Author(s):  
Yu Hong Wu

Based on the exploratory spatial data analysis (ESDA) and GIS technology, the spatial differences of the rural economic development level of Qinhuangdao city was investigated by adopting the rural resident’s per capita net income data at town level in Qinhuangdao city from 2007 to 2011. The results of global Moran’s I value for rural resident’s per capita net income at town level showed that there existed significant positive spatial autocorrelation and significant spatial aggregation in the spatial distribution of rural resident’s per capita net income. However, the global Moran’s I value showed a decreasing trend during 2007 to 2011, indicating an enlarged spatial disparity of rural economy at the town level. The results of the Moran scatter plots and LISA cluster maps of 2007 and 2011 showed that most of towns were characterized by positive local spatial association , ie. They were located in the HH or the LL quadrant. The significant HH towns were mostly to be found in the south of Qinhuangdao city, Haigang district, Changli county, Lulong county. The significant LL towns were mostly to be found in the Qinglong county, north of Qinhuangdao city.


Author(s):  
Tess McCarthy ◽  
Jerry Ratcliffe

Advances in computing technology and analytical techniques have given crime analysts increasingly powerful toolboxes with which to unlock the spatial patterns and processes of crime. However, the utility of such tools is still bounded by the “garbage in, garbage out,” maxim, whereby analytical output is only as reliable as the analytical input. Therefore, this chapter reviews some of the sources of spatial data inaccuracy that must be considered when analyzing crime. Given the prevalence of street addresses as a spatial location identifier for crime events, particular attention is given to the accuracy and optimum parameters for geographically referencing address data. Example data drawn from burglary records in the city of Wollongong, Australia, illustrate the significance of the issues and the impact that poor address management can have on the analysis of crime. The chapter emphasizes the practical, by outlining address correction options and summarizing recent research that identifies optimum settings for geocoding software tools.


2020 ◽  
Vol 15 (1) ◽  
Author(s):  
Huling Li ◽  
Hui Li ◽  
Zhongxing Ding ◽  
Zhibin Hu ◽  
Feng Chen ◽  
...  

The cluster of pneumonia cases linked to coronavirus disease 2019 (Covid-19), first reported in China in late December 2019 raised global concern, particularly as the cumulative number of cases reported between 10 January and 5 March 2020 reached 80,711. In order to better understand the spread of this new virus, we characterized the spatial patterns of Covid-19 cumulative cases using ArcGIS v.10.4.1 based on spatial autocorrelation and cluster analysis using Global Moran’s I (Moran, 1950), Local Moran’s I and Getis-Ord General G (Ord and Getis, 2001). Up to 5 March 2020, Hubei Province, the origin of the Covid-19 epidemic, had reported 67,592 Covid-19 cases, while the confirmed cases in the surrounding provinces Guangdong, Henan, Zhejiang and Hunan were 1351, 1272, 1215 and 1018, respectively. The top five regions with respect to incidence were the following provinces: Hubei (11.423/10,000), Zhejiang (0.212/10,000), Jiangxi (0.201/10,000), Beijing (0.196/10,000) and Chongqing (0.186/10,000). Global Moran’s I analysis results showed that the incidence of Covid-19 is not negatively correlated in space (p=0.407413>0.05) and the High-Low cluster analysis demonstrated that there were no high-value incidence clusters (p=0.076098>0.05), while Local Moran’s I analysis indicated that Hubei is the only province with High-Low aggregation (p<0.0001).


2018 ◽  
Vol 46 (6) ◽  
pp. 647-658 ◽  
Author(s):  
Mohammadreza Rajabi ◽  
Ali Mansourian ◽  
Petter Pilesjö ◽  
Daniel Oudin Åström ◽  
Klas Cederin ◽  
...  

Aims: Cardiovascular disease (CVD) is one of the leading causes of mortality and morbidity worldwide, including in Sweden. The main aim of this study was to explore the temporal trends and spatial patterns of CVD in Sweden using spatial autocorrelation analyses. Methods: The CVD admission rates between 2000 and 2010 throughout Sweden were entered as the input disease data for the analytic processes performed for the Swedish capital, Stockholm, and also for the whole of Sweden. Age-adjusted admission rates were calculated using a direct standardisation approach for men and women, and temporal trends analysis were performed on the standardised rates. Global Moran’s I was used to explore the structure of patterns and Anselin’s local Moran’s I, together with Kulldorff’s scan statistic were applied to explore the geographical patterns of admission rates. Results: The rates followed a spatially clustered pattern in Sweden with differences occurring between sexes. Accordingly, hot spots were identified in northern Sweden, with higher intensity identified for men, together with clusters in central Sweden. Cold spots were identified in the adjacency of the three major Swedish cities of Stockholm, Gothenburg and Malmö. Conclusions: The findings of this study can serve as a basis for distribution of health-care resources, preventive measures and exploration of aetiological factors.


2021 ◽  
Vol 9 (1) ◽  
pp. e001731
Author(s):  
Fernando Gomez-Peralta ◽  
Cristina Abreu ◽  
Manuel Benito ◽  
Rafael J Barranco

IntroductionThe geographical distribution of hypoglycemic events requiring emergency assistance was explored in Andalusia (Spain), and potentially associated societal factors were determined.Research design and methodsThis was a database analysis of hypoglycemia requiring prehospital emergency assistance from the Public Company for Health Emergencies (Empresa Pública de Emergencias Sanitarias (EPES)) in Andalusia during 2012, which served 8 393 159 people. Databases of the National Statistics Institute, Basic Spatial Data of Andalusia and System of Multiterritorial Information of Andalusia were used to retrieve spatial data and population characteristics. Geographic Information System software (QGIS and GeoDA) was used for analysis and linkage across databases. Spatial analyses of geographical location influence in hypoglycemic events were assessed using Moran’s I statistics, and linear regressions were used to determine their association with population characteristics.ResultsThe EPES attended 1 137 738 calls requesting medical assistance, with a mean hypoglycemia incidence of 95.0±61.6 cases per 100 000 inhabitants. There were significant differences in hypoglycemia incidence between basic healthcare zones attributable to their geographical location in the overall population (Moran’s I index 0.122, z-score 7.870, p=0.001), women (Moran’s I index 0.088, z-score 6.285, p=0.001), men (Moran’s I index 0.076, z-score 4.914, p=0.001) and aged >64 years (Moran’s I index 0.147, z-score 9.753, p=0.001). Hypoglycemia incidence was higher within unemployed individuals (β=0.003, p=0.001) and unemployed women (β=0.005, p=0.001), while lower within individuals aged <16 years (β=−0.004, p=0.040), higher academic level (secondary studies) (β=−0.003, p=0.004) and women with secondary studies (β=−0.005, p<0.001). In subjects aged >64 years, lower rate of hypoglycemia was associated with more single-person homes (β=−0.008, p=0.022) and sports facilities (β=−0.342, p=0.012).ConclusionsThis analysis supports the geographical distribution of hypoglycemia in the overall population, both genders and subjects aged >64 years, which was affected by societal factors such as unemployment, literacy/education, housing and sports facilities. These data can be useful to design specific prevention programs.


2019 ◽  
Author(s):  
Tongtiegang Zhao ◽  
Wei Zhang ◽  
Yongyong Zhang ◽  
Xiaohong Chen

Abstract. Fully-coupled global climate models (GCMs) generate a vast amount of high-dimensional forecast data of the global climate; therefore, interpreting and understanding the predictive performance is a critical issue in applying GCM forecasts. Spatial plotting is a powerful tool to identify where forecasts perform well and where forecasts are not satisfactory. Here we build upon the spatial plotting of anomaly correlation between forecast ensemble mean and observations and derive significant spatial patterns to illustrate the predictive performance. For the anomaly correlation derived from the ten sets of forecasts archived in the North America Multi-Model Ensemble (NMME) experiment, the global and local Moran's I are calculated to associate anomaly correlation at neighbouring grid cells to one another. The global Moran's I indicates that at the global scale anomaly correlation at one grid cell relates significantly and positively to anomaly correlation at surrounding grid cells, while the local Moran's I reveals clusters of grid cells with high, neutral, and low anomaly correlation. Overall, the forecasts produced by GCMs of similar settings and at the same climate center exhibit similar clustering of anomaly correlation. In the meantime, the forecasts in NMME show complementary performances. About 80 % of grid cells across the globe fall into the cluster of high anomaly correlation under at least one of the ten sets of forecasts. While anomaly correlation exhibits substantial spatial variability, the clustering approach serves as a filter of noise to identify spatial patterns and yields insights into the predictive performance of GCM seasonal forecasts of global precipitation.


2021 ◽  
Vol 10 (1) ◽  
pp. 31-45
Author(s):  
Resha Moniyana ◽  
Ahmad Dhea Pratama

The analysis results used in the problem of poverty are increasingly developing as the understanding of the problem of poverty becomes more complex in the spatial and temporal patterns, seeing the patterns and characteristics of a phenomenon with spatial imaging and study of patterns is the main objective of this study by looking at the pattern of the percentage of poor people and the level of inequality. The method used is processing Moran's I spatial data, Moranscatterplot and LISA, testing development inequality with the Williamson Index, The research area covers 15 districts/cities in 2015-2019. Spatial linkages The percentage of poor people between districts/cities in Lampung Province has a positive Moran's I value, has a spatial pattern with the same characteristics and is clustered. Development inequality is negative Moran's I, Development inequality has a spatial pattern with different characteristics in 2015 -2019. Poverty analysis indicates that during the 5-year study period, 5 districts in Lampung Province were still trapped in high poverty levels, The results of regional development inequality with the Williamson index indicate 3 regions with high levels of inequality, 4 areas of moderate inequality and 8 regions with low levels of inequality.


Sign in / Sign up

Export Citation Format

Share Document