Trajectory Clustering and k-NN for Robust Privacy Preserving k-NN Query Processing in GeoSpark

Privacy Preserving and Anonymity have gained significant concern from the big data perspective. We have the view that the forthcoming frameworks and theories will establish several solutions for privacy protection. The k-anonymity is considered a key solution that has been widely employed to prevent data re-identifcation and concerns us in the context of this work. Data modeling has also gained significant attention from the big data perspective. It is believed that the advancing distributed environments will provide users with several solutions for efficient spatio-temporal data management. GeoSpark will be utilized in the current work as it is a key solution that has been widely employed for spatial data. Specifically, it works on the top of Apache Spark, the main framework leveraged from the research community and organizations for big data transformation, processing and visualization. To this end, we focused on trajectory data representation so as to be applicable to the GeoSpark environment, and a GeoSpark-based approach is designed for the efficient management of real spatio-temporal data. Th next step is to gain deeper understanding of the data through the application of k nearest neighbor (k-NN) queries either using indexing methods or otherwise. The k-anonymity set computation, which is the main component for privacy preservation evaluation and the main issue of our previous works, is evaluated in the GeoSpark environment. More to the point, the focus here is on the time cost of k-anonymity set computation along with vulnerability measurement. The extracted results are presented into tables and figures for visual inspection.

Download Full-text

Trajectory Clustering and k-NN for Robust Privacy Preserving Spatiotemporal Databases

Algorithms ◽

10.3390/a11120207 ◽

2018 ◽

Vol 11 (12) ◽

pp. 207 ◽

Cited By ~ 2

Author(s):

Elias Dritsas ◽

Maria Trigka ◽

Panagiotis Gerolymatos ◽

Spyros Sioutas

Keyword(s):

Nearest Neighbor ◽

Dimensional Space ◽

Motion Vector ◽

Research Work ◽

Privacy Preserving ◽

Mobile Users ◽

Trajectory Clustering ◽

K Nearest Neighbor ◽

Trajectory Data ◽

Spatiotemporal Databases

In the context of this research work, we studied the problem of privacy preserving on spatiotemporal databases. In particular, we investigated the k-anonymity of mobile users based on real trajectory data. The k-anonymity set consists of the k nearest neighbors. We constructed a motion vector of the form (x,y,g,v) where x and y are the spatial coordinates, g is the angle direction, and v is the velocity of mobile users, and studied the problem in four-dimensional space. We followed two approaches. The former applied only k-Nearest Neighbor (k-NN) algorithm on the whole dataset, while the latter combined trajectory clustering, based on K-means, with k-NN. Actually, it applied k-NN inside a cluster of mobile users with similar motion pattern (g,v). We defined a metric, called vulnerability, that measures the rate at which k-NNs are varying. This metric varies from 1 k (high robustness) to 1 (low robustness) and represents the probability the real identity of a mobile user being discovered from a potential attacker. The aim of this work was to prove that, with high probability, the above rate tends to a number very close to 1 k in clustering method, which means that the k-anonymity is highly preserved. Through experiments on real spatial datasets, we evaluated the anonymity robustness, the so-called vulnerability, of the proposed method.

Download Full-text

A Survey on Big Data Processing Frameworks for Mobility Analytics

ACM SIGMOD Record ◽

10.1145/3484622.3484626 ◽

2021 ◽

Vol 50 (2) ◽

pp. 18-29

Author(s):

Christos Doulkeridis ◽

Akrivi Vlachou ◽

Nikos Pelekis ◽

Yannis Theodoridis

Keyword(s):

Big Data ◽

Data Processing ◽

Data Management ◽

Spatial Data ◽

State Of The Art ◽

Temporal Data ◽

Big Data Processing ◽

Mobility Data ◽

New Challenges ◽

Spatio Temporal

In the current era of big spatial data, the vast amount of produced mobility data (by sensors, GPS-equipped devices, surveillance networks, radars, etc.) poses new challenges related to mobility analytics. A cornerstone facilitator for performing mobility analytics at scale is the availability of big data processing frameworks and techniques tailored for spatial and spatio-temporal data. Motivated by this pressing need, in this paper, we provide a survey of big data processing frameworks for mobility analytics. Particular focus is put on the underlying techniques; indexing, partitioning, query processing are essential for enabling efficient and scalable data management. In this way, this report serves as a useful guide of state-of-the-art methods and modern techniques for scalable mobility data management and analytics.

Download Full-text

A Survey on Big Data for Trajectory Analytics

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi9020088 ◽

2020 ◽

Vol 9 (2) ◽

pp. 88

Author(s):

Damião Ribeiro de Almeida ◽

Cláudio de Souza Baptista ◽

Fabio Gomes de Andrade ◽

Amilcar Soares

Keyword(s):

Big Data ◽

Data Analysis ◽

Moving Objects ◽

Research Field ◽

Temporal Data ◽

Trajectory Data ◽

Management Technology ◽

Spatial Big Data ◽

Analytical Processing ◽

Spatio Temporal

Trajectory data allow the study of the behavior of moving objects, from humans to animals. Wireless communication, mobile devices, and technologies such as Global Positioning System (GPS) have contributed to the growth of the trajectory research field. With the considerable growth in the volume of trajectory data, storing such data into Spatial Database Management Systems (SDBMS) has become challenging. Hence, Spatial Big Data emerges as a data management technology for indexing, storing, and retrieving large volumes of spatio-temporal data. A Data Warehouse (DW) is one of the premier Big Data analysis and complex query processing infrastructures. Trajectory Data Warehouses (TDW) emerge as a DW dedicated to trajectory data analysis. A list and discussions on problems that use TDW and forward directions for the works in this field are the primary goals of this survey. This article collected state-of-the-art on Big Data trajectory analytics. Understanding how the research in trajectory data are being conducted, what main techniques have been used, and how they can be embedded in an Online Analytical Processing (OLAP) architecture can enhance the efficiency and development of decision-making systems that deal with trajectory data.

Download Full-text

An Enhanced Performance of K-Nearest Neighbor (K-NN) Classifier to Meet New Big Data Necessities

IOP Conference Series Materials Science and Engineering ◽

10.1088/1757-899x/928/3/032013 ◽

2020 ◽

Vol 928 ◽

pp. 032013

Author(s):

Ihab L.Hussein Alsammak ◽

Humam M. Abdul Sahib ◽

Wasan H.Itwee

Keyword(s):

Big Data ◽

Nearest Neighbor ◽

K Nearest Neighbor

Download Full-text

Precision Pig Farming Image Analysis Using Random Forest and Boruta Predictive Big Data Analysis Using Neural Network and K- Nearest Neighbor

2021 2nd International Conference on Intelligent Engineering and Management (ICIEM) ◽

10.1109/iciem51511.2021.9445328 ◽

2021 ◽

Author(s):

S. A. Shaik Mazhar ◽

G. Suseendran

Keyword(s):

Neural Network ◽

Image Analysis ◽

Big Data ◽

Data Analysis ◽

Random Forest ◽

Nearest Neighbor ◽

Big Data Analysis ◽

K Nearest Neighbor ◽

Pig Farming

Download Full-text

A Survey on Spatio-temporal Data Analytics Systems

ACM Computing Surveys ◽

10.1145/3507904 ◽

2022 ◽

Author(s):

Md Mahbub Alam ◽

Luis Torgo ◽

Albert Bifet

Keyword(s):

Programming Languages ◽

Spatial Data ◽

Data Analytics ◽

Spatial Databases ◽

Location Based Services ◽

Temporal Data ◽

Wide Range ◽

Data Volume ◽

Spatio Temporal ◽

Gis Software

Due to the surge of spatio-temporal data volume, the popularity of location-based services and applications, and the importance of extracted knowledge from spatio-temporal data to solve a wide range of real-world problems, a plethora of research and development work has been done in the area of spatial and spatio-temporal data analytics in the past decade. The main goal of existing works was to develop algorithms and technologies to capture, store, manage, analyze, and visualize spatial or spatio-temporal data. The researchers have contributed either by adding spatio-temporal support with existing systems, by developing a new system from scratch, or by implementing algorithms for processing spatio-temporal data. The existing ecosystem of spatial and spatio-temporal data analytics systems can be categorized into three groups, (1) spatial databases (SQL and NoSQL), (2) big spatial data processing infrastructures, and (3) programming languages and GIS software. Since existing surveys mostly investigated infrastructures for processing big spatial data, this survey has explored the whole ecosystem of spatial and spatio-temporal analytics. This survey also portrays the importance and future of spatial and spatio-temporal data analytics.

Download Full-text

Parallel kNN Queries for Big Data Based on Voronoi Diagram Using MapReduce

Advances in Data Mining and Database Management - Handbook of Research on Innovative Database Query Processing Techniques ◽

10.4018/978-1-4666-8767-7.ch014 ◽

2015 ◽

pp. 392-414

Author(s):

Wei Yan

Keyword(s):

Big Data ◽

Voronoi Diagram ◽

Spatial Databases ◽

Nearest Neighbor ◽

Programming Model ◽

Dimensional Space ◽

Data Sets ◽

Two Dimensional ◽

K Nearest Neighbor ◽

K Nearest Neighbors

In cloud computing environments parallel kNN queries for big data is an important issue. The k nearest neighbor queries (kNN queries), designed to find k nearest neighbors from a dataset S for every object in another dataset R, is a primitive operator widely adopted by many applications including knowledge discovery, data mining, and spatial databases. This chapter proposes a parallel method of kNN queries for big data using MapReduce programming model. Firstly, this chapter proposes an approximate algorithm that is based on mapping multi-dimensional data sets into two-dimensional data sets, and transforming kNN queries into a sequence of two-dimensional point searches. Then, in two-dimensional space this chapter proposes a partitioning method using Voronoi diagram, which incorporates the Voronoi diagram into R-tree. Furthermore, this chapter proposes an efficient algorithm for processing kNN queries based on R-tree using MapReduce programming model. Finally, this chapter presents the results of extensive experimental evaluations which indicate efficiency of the proposed approach.

Download Full-text

Parallel Queries of Cluster-Based k Nearest Neighbor in MapReduce

Advances in Systems Analysis, Software Engineering, and High Performance Computing - Managing Big Data in Cloud Computing Environments ◽

10.4018/978-1-4666-9834-5.ch007 ◽

2016 ◽

pp. 163-182

Author(s):

Wei Yan

Keyword(s):

Spatial Data ◽

Spatial Databases ◽

Nearest Neighbor ◽

Programming Model ◽

K Nearest Neighbor ◽

K Nearest Neighbors ◽

Data Intensive ◽

Parallel Queries ◽

Massive Spatial Data ◽

Nearest Neighbor Queries

Parallel queries of k Nearest Neighbor for massive spatial data are an important issue. The k nearest neighbor queries (kNN queries), designed to find k nearest neighbors from a dataset S for every point in another dataset R, is a useful tool widely adopted by many applications including knowledge discovery, data mining, and spatial databases. In cloud computing environments, MapReduce programming model is a well-accepted framework for data-intensive application over clusters of computers. This chapter proposes a parallel method of kNN queries based on clusters in MapReduce programming model. Firstly, this chapter proposes a partitioning method of spatial data using Voronoi diagram. Then, this chapter clusters the data point after partition using k-means method. Furthermore, this chapter proposes an efficient algorithm for processing kNN queries based on k-means clusters using MapReduce programming model. Finally, extensive experiments evaluate the efficiency of the proposed approach.

Download Full-text

Improving k-Nearest Neighbor Pattern Recognition Models for Privacy-Preserving Data Analysis

2019 IEEE International Conference on Big Data (Big Data) ◽

10.1109/bigdata47090.2019.9006281 ◽

2019 ◽

Author(s):

Walisa Romsaiyud ◽

Henning Schnoor ◽

Wilhelm Hasselbring

Keyword(s):

Pattern Recognition ◽

Data Analysis ◽

Nearest Neighbor ◽

Privacy Preserving ◽

K Nearest Neighbor

Download Full-text

Storage Efficient Trajectory Clustering and k-NN for Robust Privacy Preserving Spatio-Temporal Databases

Algorithms ◽

10.3390/a12120266 ◽

2019 ◽

Vol 12 (12) ◽

pp. 266 ◽

Cited By ~ 2

Author(s):

Elias Dritsas ◽

Andreas Kanavos ◽

Maria Trigka ◽

Spyros Sioutas ◽

Athanasios Tsakalidis

Keyword(s):

Privacy Preservation ◽

Dimensional Space ◽

Research Work ◽

Temporal Databases ◽

Mobile Object ◽

Linear Component ◽

Trajectory Data ◽

Spatio Temporal ◽

Storage Problem ◽

The One

The need to store massive volumes of spatio-temporal data has become a difficult task as GPS capabilities and wireless communication technologies have become prevalent to modern mobile devices. As a result, massive trajectory data are produced, incurring expensive costs for storage, transmission, as well as query processing. A number of algorithms for compressing trajectory data have been proposed in order to overcome these difficulties. These algorithms try to reduce the size of trajectory data, while preserving the quality of the information. In the context of this research work, we focus on both the privacy preservation and storage problem of spatio-temporal databases. To alleviate this issue, we propose an efficient framework for trajectories representation, entitled DUST (DUal-based Spatio-temporal Trajectory), by which a raw trajectory is split into a number of linear sub-trajectories which are subjected to dual transformation that formulates the representatives of each linear component of initial trajectory; thus, the compressed trajectory achieves compression ratio equal to M : 1 . To our knowledge, we are the first to study and address k-NN queries on nonlinear moving object trajectories that are represented in dual dimensional space. Additionally, the proposed approach is expected to reinforce the privacy protection of such data. Specifically, even in case that an intruder has access to the dual points of trajectory data and try to reproduce the native points that fit a specific component of the initial trajectory, the identity of the mobile object will remain secure with high probability. In this way, the privacy of the k-anonymity method is reinforced. Through experiments on real spatial datasets, we evaluate the robustness of the new approach and compare it with the one studied in our previous work.

Download Full-text