scholarly journals Real-world trajectory sharing with local differential privacy

2021 ◽  
Vol 14 (11) ◽  
pp. 2283-2295
Author(s):  
Teddy Cunningham ◽  
Graham Cormode ◽  
Hakan Ferhatosmanoglu ◽  
Divesh Srivastava

Sharing trajectories is beneficial for many real-world applications, such as managing disease spread through contact tracing and tailoring public services to a population's travel patterns. However, public concern over privacy and data protection has limited the extent to which this data is shared. Local differential privacy enables data sharing in which users share a perturbed version of their data, but existing mechanisms fail to incorporate user-independent public knowledge (e.g., business locations and opening times, public transport schedules, geo-located tweets). This limitation makes mechanisms too restrictive, gives unrealistic outputs, and ultimately leads to low practical utility. To address these concerns, we propose a local differentially private mechanism that is based on perturbing hierarchically-structured, overlapping n -grams (i.e., contiguous subsequences of length n ) of trajectory data. Our mechanism uses a multi-dimensional hierarchy over publicly available external knowledge of real-world places of interest to improve the realism and utility of the perturbed, shared trajectories. Importantly, including real-world public data does not negatively affect privacy or efficiency. Our experiments, using real-world data and a range of queries, each with real-world application analogues, demonstrate the superiority of our approach over a range of alternative methods.

BMJ Open ◽  
2021 ◽  
Vol 11 (7) ◽  
pp. e045886
Author(s):  
Yiying Hu ◽  
Jianying Guo ◽  
Guanqiao Li ◽  
Xi Lu ◽  
Xiang Li ◽  
...  

ObjectivesThis study quantified how the efficiency of testing and contact tracing impacts the spread of COVID-19. The average time interval between infection and quarantine, whether asymptomatic cases are tested or not, and initial delays to beginning a testing and tracing programme were investigated.SettingWe developed a novel individual-level network model, called CoTECT (Testing Efficiency and Contact Tracing model for COVID-19), using key parameters from recent studies to quantify the impacts of testing and tracing efficiency. The model distinguishes infection from confirmation by integrating a ‘T’ compartment, which represents infections confirmed by testing and quarantine. The compartments of presymptomatic (E), asymptomatic (I), symptomatic (Is), and death with (F) or without (f) test confirmation were also included in the model. Three scenarios were evaluated in a closed population of 3000 individuals to mimic community-level dynamics. Real-world data from four Nordic countries were also analysed.Primary and secondary outcome measuresSimulation result: total/peak daily infections and confirmed cases, total deaths (confirmed/unconfirmed by testing), fatalities and the case fatality rate. Real-world analysis: confirmed cases and deaths per million people.Results(1) Shortening the duration between Is and T from 12 to 4 days reduces infections by 85.2% and deaths by 88.8%. (2) Testing and tracing regardless of symptoms reduce infections by 35.7% and deaths by 46.2% compared with testing only symptomatic cases. (3) Reducing the delay to implementing a testing and tracing programme from 50 to 10 days reduces infections by 35.2% and deaths by 44.6%. These results were robust to sensitivity analysis. An analysis of real-world data showed that tests per case early in the pandemic are critical for reducing confirmed cases and the fatality rate.ConclusionsReducing testing delays will help to contain outbreaks. These results provide policymakers with quantitative evidence of efficiency as a critical value in developing testing and contact tracing strategies.


Complexity ◽  
2020 ◽  
Vol 2020 ◽  
pp. 1-13 ◽  
Author(s):  
Zhao Li ◽  
Long Zhang ◽  
Chenyi Lei ◽  
Xia Chen ◽  
Jianliang Gao ◽  
...  

Modeling user behaviors as sequential learning provides key advantages in predicting future user actions, such as predicting the next product to purchase or the next song to listen to, for the purpose of personalized search and recommendation. Traditional methods for modeling sequential user behaviors usually depend on the premise of Markov processes, while recently recurrent neural networks (RNNs) have been adopted to leverage their power in modeling sequences. In this paper, we propose integrating attention mechanism into RNNs for better modeling sequential user behaviors. Specifically, we design a network featuring Attention with Long-term Interval-based Gated Recurrent Units (ALI-GRU) to model temporal sequences of user actions. Compared to previous works, our network can exploit the information of temporal dimension extracted by time interval-based GRU in addition to normal GRU to encoding user actions and has a specially designed matrix-form attention function to characterize both long-term preferences and short-term intents of users, while the attention-weighted features are finally decoded to predict the next user action. We have performed experiments on two well-known public datasets as well as a huge dataset built from real-world data of one of the largest online shopping websites. Experimental results show that the proposed ALI-GRU achieves significant improvement compared to state-of-the-art RNN-based methods. ALI-GRU is also adopted in a real-world application and results of the online A/B test further demonstrate its practical value.


2021 ◽  
Author(s):  
MinDong Sung ◽  
Dongchul Cha ◽  
Yu Rang Park

BACKGROUND Privacy is of increasing interest in the present big data era, particularly regarding medical data. Specifically, differential privacy has emerged as the standard method for privacy-preserving data analysis and data publishing. OBJECTIVE We applied differential privacy to medical data with diverse parameters and checked the (i) feasibility of our algorithms with synthetic data and (ii) the balance between data privacy and utility, using machine learning techniques. METHODS All data were normalized to range between –1 and 1, and the bounded Laplacian method was applied to prevent the generation of out-of-bound values after applying the differential privacy algorithm. To preserve the categorical variables’ cardinality, we performed post-processing via discretization. The algorithm was evaluated using both synthetic and real-world data (eICU Collaborative Research Database). We evaluated the difference between the original data and perturbated data using misclassification rates and the mean squared error, for categorical data and continuous data, respectively. Further, we compared the performances of classification models that predict in-hospital mortality using real-world data. RESULTS The misclassification rate of categorical variables ranged between 0.49 and 0.85, when epsilon was 0.1, and it converged to 0 when epsilon was increased. When epsilon was between 102 and 103, the misclassification rate rapidly dropped to 0. Similarly, the mean squared error of continuous variables decreased as epsilon increased. The performance of the model developed from perturbed data converged to that of the model developed from original data as epsilon increased. In particular, the accuracy of a random forest model developed from original data was 0.801, and it ranged from 0.757 to 0.81 when epsilon was 0.1 and 10,000. CONCLUSIONS We applied local differential privacy to medical domain data, which are diverse and high-dimensional. Higher noise may offer enhanced privacy, but it simultaneously hinders utility. We should choose an appropriate degree of noise for data perturbation to balance privacy and utility depending on specific situations.


2020 ◽  
Author(s):  
Deepti Gurdasani ◽  
Hisham Ziauddeen

In the early stages of pandemics, mathematical models can provide invaluable insights into transmission dynamics, help predict disease spread, and evaluate control measures. However models are only valid within the limits of the parameters examined. As reliable parameter estimates are rarely available early in a new pandemic, best-guess estimates are used, which need to be constantly reviewed as new real-world data emerge. Estimating how sensitive the model is to changes in its parameters can provide useful information about validity when parameters are uncertain. Interpreting models without considering these factors can lead to flawed inferences, which can have far reaching effects when they inform public health policy. We illustrate this, here, using an example from the Hellewell et al. model published in Lancet Global Health, 2020. This model suggested that case detection and contact tracing was unlikely to be an effective strategy for pandemic control, and is likely to have informed UK government strategy to cease testing and contact tracing on the 12th March 2020. We show that this model is very sensitive to the parameter of delay between case detection and isolation. We demonstrate that when the delay scenario parameter is changed to a median of 1 day, which is very plausible in the context of current rapid testing, this model predicts a >80% probability of controlling the epidemic within 12 weeks, with relatively modest contact tracing. These results suggest that rapid testing, contact tracing and isolation could be effective strategies to control transmission.


2010 ◽  
Vol 19 (05) ◽  
pp. 647-677 ◽  
Author(s):  
LAURA DIOŞAN ◽  
ALEXANDRINA ROGOZAN ◽  
JEAN-PIERRE PECUCHET

Classic kernel-based classifiers use only a single kernel, but the real-world applications have emphasized the need to consider a combination of kernels — also known as a multiple kernel (MK) — in order to boost the classification accuracy by adapting better to the characteristics of the data. Our purpose is to automatically design a complex multiple kernel by evolutionary means. In order to achieve this purpose we propose a hybrid model that combines a Genetic Programming (GP) algorithm and a kernel-based Support Vector Machine (SVM) classifier. In our model, each GP chromosome is a tree that encodes the mathematical expression of a multiple kernel. The evolutionary search process of the optimal MK is guided by the fitness function (or efficiency) of each possible MK. The complex multiple kernels which are evolved in this manner (eCMKs) are compared to several classic simple kernels (SKs), to a convex linear multiple kernel (cLMK) and to an evolutionary linear multiple kernel (eLMK) on several real-world data sets from UCI repository. The numerical experiments show that the SVM involving the evolutionary complex multiple kernels perform better than the classic simple kernels. Moreover, on the considered data sets, the new multiple kernels outperform both the cLMK and eLMK — linear multiple kernels. These results emphasize the fact that the SVM algorithm requires a combination of kernels more complex than a linear one in order to boost its performance.


2020 ◽  
Vol 9 (6) ◽  
pp. 404
Author(s):  
Ruihong Yao ◽  
Fei Wang ◽  
Shuhui Chen ◽  
Shuang Zhao

The popularity of mobile locate-enabled devices and Location Based Service (LBS) generates massive spatio-temporal data every day. Due to the close relationship between behavior patterns and movement trajectory, trajectory data mining has been applied in numerous fields to find the behavior pattern. Among them, discovering traveling companions is one of the most fundamental techniques in these areas. This paper proposes a flexible framework named GroupSeeker for discovering traveling companions in vast real-world trajectory data. In the real-world data resource, it is significant to avoid the companion candidate omitting problem happening in the time-snapshot-slicing-based method. These methods do not work well with the sparse real-world data, which is caused by the equipment sampling failure or manual intervention. In this paper, a 5-stage framework including Data Preprocessing, Spatio-temporal Clustering, Candidate Voting, Pseudo-companion Filtering, and Group Merging is proposed to discover traveling companions. The framework even works well when there is a long time span during several days. The experiments result on two real-world data sources which offer massive amount of data subsets with different scale and different sampling frequencies show the effective and robustness of this framework. Besides, the proposed framework has a higher-efficiency performing when discovering satisfying companions over a long-term period.


2018 ◽  
Vol 7 (1) ◽  
pp. 24
Author(s):  
Juan Ignacio Martín-Legendre

This paper presents a review of the main available indicators to measure poverty and income inequality, examining their properties and suitability for different types of economic analyses, and providing real-world data to illustrate how they work. Although some of these metrics –such as the Gini coefficient– are most frequently used for this purpose, it is crucially important for researchers and policy-makers to take into account alternative methods that can offer complementary information in order to better understand these issues at all levels.


2021 ◽  
Vol 13 (18) ◽  
pp. 3713
Author(s):  
Jie Liu ◽  
Xin Cao ◽  
Pingchuan Zhang ◽  
Xueli Xu ◽  
Yangyang Liu ◽  
...  

As an essential step in the restoration of Terracotta Warriors, the results of fragments classification will directly affect the performance of fragments matching and splicing. However, most of the existing methods are based on traditional technology and have low accuracy in classification. A practical and effective classification method for fragments is an urgent need. In this case, an attention-based multi-scale neural network named AMS-Net is proposed to extract significant geometric and semantic features. AMS-Net is a hierarchical structure consisting of a multi-scale set abstraction block (MS-BLOCK) and a fully connected (FC) layer. MS-BLOCK consists of a local-global layer (LGLayer) and an improved multi-layer perceptron (IMLP). With a multi-scale strategy, LGLayer can parallel extract the local and global features from different scales. IMLP can concatenate the high-level and low-level features for classification tasks. Extensive experiments on the public data set (ModelNet40/10) and the real-world Terracotta Warrior fragments data set are conducted. The accuracy results with normal can achieve 93.52% and 96.22%, respectively. For real-world data sets, the accuracy is best among the existing methods. The robustness and effectiveness of the performance on the task of 3D point cloud classification are also investigated. It proves that the proposed end-to-end learning network is more effective and suitable for the classification of the Terracotta Warrior fragments.


2016 ◽  
Vol 22 ◽  
pp. 219
Author(s):  
Roberto Salvatori ◽  
Olga Gambetti ◽  
Whitney Woodmansee ◽  
David Cox ◽  
Beloo Mirakhur ◽  
...  

VASA ◽  
2019 ◽  
Vol 48 (2) ◽  
pp. 134-147 ◽  
Author(s):  
Mirko Hirschl ◽  
Michael Kundi

Abstract. Background: In randomized controlled trials (RCTs) direct acting oral anticoagulants (DOACs) showed a superior risk-benefit profile in comparison to vitamin K antagonists (VKAs) for patients with nonvalvular atrial fibrillation. Patients enrolled in such studies do not necessarily reflect the whole target population treated in real-world practice. Materials and methods: By a systematic literature search, 88 studies including 3,351,628 patients providing over 2.9 million patient-years of follow-up were identified. Hazard ratios and event-rates for the main efficacy and safety outcomes were extracted and the results for DOACs and VKAs combined by network meta-analysis. In addition, meta-regression was performed to identify factors responsible for heterogeneity across studies. Results: For stroke and systemic embolism as well as for major bleeding and intracranial bleeding real-world studies gave virtually the same result as RCTs with higher efficacy and lower major bleeding risk (for dabigatran and apixaban) and lower risk of intracranial bleeding (all DOACs) compared to VKAs. Results for gastrointestinal bleeding were consistently better for DOACs and hazard ratios of myocardial infarction were significantly lower in real-world for dabigatran and apixaban compared to RCTs. By a ranking analysis we found that apixaban is the safest anticoagulant drug, while rivaroxaban closely followed by dabigatran are the most efficacious. Risk of bias and heterogeneity was assessed and had little impact on the overall results. Analysis of effect modification could guide the clinical decision as no single DOAC was superior/inferior to the others under all conditions. Conclusions: DOACs were at least as efficacious as VKAs. In terms of safety endpoints, DOACs performed better under real-world conditions than in RCTs. The current real-world data showed that differences in efficacy and safety, despite generally low event rates, exist between DOACs. Knowledge about these differences in performance can contribute to a more personalized medicine.


Sign in / Sign up

Export Citation Format

Share Document