scholarly journals Histogram Publication over Numerical Values under Local Differential Privacy

2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Xu Zheng ◽  
Ke Yan ◽  
Jingyuan Duan ◽  
Wenyi Tang ◽  
Ling Tian

Local differential privacy has been considered the standard measurement for privacy preservation in distributed data collection. Corresponding mechanisms have been designed for multiple types of tasks, like the frequency estimation for categorical values and the mean value estimation for numerical values. However, the histogram publication of numerical values, containing abundant and crucial clues for the whole dataset, has not been thoroughly considered under this measurement. To simply encode data into different intervals upon each query will soon exhaust the bandwidth and the privacy budgets, which is infeasible for real scenarios. Therefore, this paper proposes a highly efficient framework for differentially private histogram publication of numerical values in a distributed environment. The proposed algorithms can efficiently adopt the correlations among multiple queries and achieve an optimal resource consumption. We also conduct extensive experiments on real-world data traces, and the results validate the improvement of proposed algorithms.

2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Guoming Lu ◽  
Xu Zheng ◽  
Jingyuan Duan ◽  
Ling Tian ◽  
Xia Wang

The data publication from multiple contributors has been long considered a fundamental task for data processing in various domains. It has been treated as one prominent prerequisite for enabling AI techniques in wireless networks. With the emergence of diversified smart devices and applications, data held by individuals becomes more pervasive and nontrivial for publication. First, the data are more private and sensitive, as they cover every aspect of daily life, from the incoming data to the fitness data. Second, the publication of such data is also bandwidth-consuming, as they are likely to be stored on mobile devices. The local differential privacy has been considered a novel paradigm for such distributed data publication. However, existing works mostly request the encoding of contents into vector space for publication, which is still costly in network resources. Therefore, this work proposes a novel framework for highly efficient privacy-preserving data publication. Specifically, two sampling-based algorithms are proposed for the histogram publication, which is an important statistic for data analysis. The first algorithm applies a bit-level sampling strategy to both reduce the overall bandwidth and balance the cost among contributors. The second algorithm allows consumers to adjust their focus on different intervals and can properly allocate the sampling ratios to optimize the overall performance. Both the analysis and the validation of real-world data traces have demonstrated the advancement of our work.


Author(s):  
Shalin Eliabeth S. ◽  
Sarju S.

Big data privacy preservation is one of the most disturbed issues in current industry. Sometimes the data privacy problems never identified when input data is published on cloud environment. Data privacy preservation in hadoop deals in hiding and publishing input dataset to the distributed environment. In this paper investigate the problem of big data anonymization for privacy preservation from the perspectives of scalability and time factor etc. At present, many cloud applications with big data anonymization faces the same kind of problems. For recovering this kind of problems, here introduced a data anonymization algorithm called Two Phase Top-Down Specialization (TPTDS) algorithm that is implemented in hadoop. For the data anonymization-45,222 records of adults information with 15 attribute values was taken as the input big data. With the help of multidimensional anonymization in map reduce framework, here implemented proposed Two-Phase Top-Down Specialization anonymization algorithm in hadoop and it will increases the efficiency on the big data processing system. By conducting experiment in both one dimensional and multidimensional map reduce framework with Two Phase Top-Down Specialization algorithm on hadoop, the better result shown in multidimensional anonymization on input adult dataset. Data sets is generalized in a top-down manner and the better result was shown in multidimensional map reduce framework by the better IGPL values generated by the algorithm. The anonymization was performed with specialization operation on taxonomy tree. The experiment shows that the solutions improves the IGPL values, anonymity parameter and decreases the execution time of big data privacy preservation by compared to the existing algorithm. This experimental result will leads to great application to the distributed environment.


Smart Cities ◽  
2021 ◽  
Vol 4 (1) ◽  
pp. 349-371
Author(s):  
Hassan Mehmood ◽  
Panos Kostakos ◽  
Marta Cortes ◽  
Theodoros Anagnostopoulos ◽  
Susanna Pirttikangas ◽  
...  

Real-world data streams pose a unique challenge to the implementation of machine learning (ML) models and data analysis. A notable problem that has been introduced by the growth of Internet of Things (IoT) deployments across the smart city ecosystem is that the statistical properties of data streams can change over time, resulting in poor prediction performance and ineffective decisions. While concept drift detection methods aim to patch this problem, emerging communication and sensing technologies are generating a massive amount of data, requiring distributed environments to perform computation tasks across smart city administrative domains. In this article, we implement and test a number of state-of-the-art active concept drift detection algorithms for time series analysis within a distributed environment. We use real-world data streams and provide critical analysis of results retrieved. The challenges of implementing concept drift adaptation algorithms, along with their applications in smart cities, are also discussed.


2019 ◽  
pp. 16-21
Author(s):  
Steve Selvin

The mean value is perhaps the most fundamental statistic and the chapter describes its properties and features that make it an important and necessary analytic tool.


Author(s):  
Kamalkumar Macwan ◽  
Sankita Patel

Recently, the social network platforms have gained the attention of people worldwide. People post, share, and update their views freely on such platforms. The huge data contained on social networks are utilized for various purposes like research, market analysis, product popularity, prediction, etc. Although it provides so much useful information, it raises the issue regarding user privacy. This chapter discusses the various privacy preservation methods applied to the original social network dataset to preserve privacy against attacks. The two areas for privacy preservation approaches addressed in this chapter are anonymization in social network data publication and differential privacy in node degree publishing.


Author(s):  
Sandro Fiore ◽  
Alessandro Negro ◽  
Salvatore Vadacca ◽  
Massimo Cafaro ◽  
Giovanni Aloisio ◽  
...  

Grid computing is an emerging and enabling technology allowing organizations to easily share, integrate and manage resources in a distributed environment. Computational Grid allows running millions of jobs in parallel, but the huge amount of generated data has caused another interesting problem: the management (classification, storage, discovery etc.) of distributed data, i.e., a Data Grid specific issue. In the last decade, many efforts concerning the management of data (grid-storage services, metadata services, grid-database access and integration services etc.) identify data management as a real challenge for the next generation petascale grid environments. This work provides an architectural overview of the GRelC DAS, a grid database access service developed in the context of the GRelC Project and currently used for production/tutorial activities both in gLite and Globus based grid environments.


Author(s):  
Iftikhar U. Sikder ◽  
Aryya Gangopadhyay

This chapter introduces the research issues on spatial decision-making in the context of distributed geo-spatial data warehouse. Spatial decision-making in a distributed environment involves access to data and models from heterogeneous sources and composing disparate services into a meaningful integration. The chapter reviews system integration and interoperability issues of spatial data and models in a distributed computing environment. We present a prototype system to illustrate the collaborative access to data and as a model for supporting spatial decision-making.


Sign in / Sign up

Export Citation Format

Share Document