Efficient spark-based framework for big geospatial data query processing and analysis

We are moving towards digitization and making all our devices, such as sensors and cameras, connected to internet, producing bigdata. This bigdata has variety of data and has paved the way to the emergence of NoSQL databases, like Cassandra, for achieving scalability and availability. Hadoop framework has been developed for storing and processing distributed data. In this chapter, the authors investigated the storage and retrieval of geospatial data by integrating Hadoop and Cassandra using prefix-based partitioning and Cassandra's default partitioning algorithm (i.e., Murmur3partitioner) techniques. Geohash value is generated, which acts as a partition key and also helps in effective search. Hence, the time taken for retrieving data is optimized. When users request spatial queries like finding nearest locations, searching in Cassandra database starts using both partitioning techniques. A comparison on query response time is made so as to verify which method is more effective. Results show the prefix-based partitioning technique is more efficient than Murmur3 partitioning technique.

Download Full-text

Framework for GeoSpatial Query Processing by Integrating Cassandra With Hadoop

GIS Applications in the Tourism and Hospitality Industry - Advances in Hospitality, Tourism, and the Services Industry ◽

10.4018/978-1-5225-5088-4.ch001 ◽

2018 ◽

pp. 1-41

Author(s):

S. Vasavi ◽

Mallela Padma Priya ◽

Anu A. Gokhale

Keyword(s):

Response Time ◽

Query Processing ◽

Geospatial Data ◽

Distributed Data ◽

Nosql Databases ◽

Storage And Retrieval ◽

Query Response Time ◽

Partitioning Algorithm ◽

Hadoop Framework ◽

Partitioning Technique

We are moving towards digitization and making all our devices, such as sensors and cameras, connected to internet, producing bigdata. This bigdata has variety of data and has paved the way to the emergence of NoSQL databases, like Cassandra, for achieving scalability and availability. Hadoop framework has been developed for storing and processing distributed data. In this chapter, the authors investigated the storage and retrieval of geospatial data by integrating Hadoop and Cassandra using prefix-based partitioning and Cassandra's default partitioning algorithm (i.e., Murmur3partitioner) techniques. Geohash value is generated, which acts as a partition key and also helps in effective search. Hence, the time taken for retrieving data is optimized. When users request spatial queries like finding nearest locations, searching in Cassandra database starts using both partitioning techniques. A comparison on query response time is made so as to verify which method is more effective. Results show the prefix-based partitioning technique is more efficient than Murmur3 partitioning technique.

Download Full-text

Integrated k-NN Query Processing Based on Geospatial Data Services

Grid and Cooperative Computing - GCC 2005 - Lecture Notes in Computer Science ◽

10.1007/11590354_71 ◽

2005 ◽

pp. 554-559

Author(s):

Guifen Tang ◽

Luo Chen ◽

Yunxiang Liu ◽

Shulei Liu ◽

Ning Jing

Keyword(s):

Query Processing ◽

Geospatial Data ◽

Data Services

Download Full-text

Research on the Spatial Query Technology of Geospatial Database

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.571-572.600 ◽

2014 ◽

Vol 571-572 ◽

pp. 600-605

Author(s):

Lei Gang Sun ◽

Jian Feng Liu ◽

Quan Hong Xu

Keyword(s):

Data Model ◽

Storage Management ◽

Geospatial Data ◽

Spatial Query ◽

Oracle Spatial ◽

Practical Applications ◽

Data Query ◽

Geospatial Database ◽

Data Volume ◽

Query Model

The application requirement of Geospatial data is increasing and complex as it is getting numerous as a result of furthering study on geosciences. Based on a deeply research on Oracle Spatial storage management mechanism, this paper proposed a method that applies the graph theory to domain of optimizing spatial query of massive geographical data, and established a geospatial data query model in order to settle a problem of lower spatial query efficiency in geospatial database. Combining with the practical applications, this paper did a conventional spatial query test and a spatial query based on geospatial data model respectively. The result is that the spatial query based on geospatial data query model has a better efficiency than that on conventional method. Besides, this model can greatly improve the spatial query performance and this improvement will be increasingly apparent as the data volume increases.

Download Full-text

Efficient social network data query processing on MapReduce

Proceedings of the 5th ACM workshop on HotPlanet - HotPlanet '13 ◽

10.1145/2491159.2491169 ◽

2013 ◽

Cited By ~ 7

Author(s):

Liu Liu ◽

Jiangtao Yin ◽

Lixin Gao

Keyword(s):

Social Network ◽

Query Processing ◽

Network Data ◽

Social Network Data ◽

Data Query

Download Full-text

Adaptive Query Processing in Data Grids

Handbook of Research on P2P and Grid Systems for Service-Oriented Computing ◽

10.4018/978-1-61520-686-5.ch016 ◽

2010 ◽

pp. 382-395

Author(s):

Chunjiang Zhao ◽

Junwei Cao ◽

Huarui Wu ◽

Weiwei Chen

Keyword(s):

Query Processing ◽

Data Grid ◽

Service Differentiation ◽

Wide Area ◽

Data Grids ◽

Single Node ◽

Data Query ◽

Qos Differentiation ◽

Multiple Data ◽

Autonomous Data Sources

The data grid integrates wide-area autonomous data sources and provides users with a unified data query and processing infrastructure. Adaptive data query and processing is required by data grids to provide better quality of services (QoS) to users and applications in spite of dynamically changing resources and environments. Existing AQP techniques can only meet partially data grid requirements. Some existing work is either addressing domain-specific or single-node query processing problems. Data grids provide new mechanisms for monitoring and discovering data and resources in a cross-domain wide area. Data query in grids can benefit from this information and provide better adaptability to the dynamic nature of the grid environment. In this work, an adaptive controller is proposed that dynamically adjusts resource shares to multiple data query requests in order to meet a specified level of service differentiation. The controller parameters are automatically tuned at runtime based on a predefined cost function and an online learning method. Simulation results show that our controller can meet given QoS differentiation targets and adapt to dynamic system resources among multiple data query processing requests while total demand from users and applications exceeds system capability.

Download Full-text