Efficient spark-based framework for big geospatial data query processing and analysis

Author(s):  
Isam Mashhour Aljawarneh ◽  
Paolo Bellavista ◽  
Antonio Corradi ◽  
Rebecca Montanari ◽  
Luca Foschini ◽  
...  
Author(s):  
Tian Zhao ◽  
Chuanrong Zhang ◽  
Mingzhen Wei ◽  
Zhong-Ren Peng
Keyword(s):  

2019 ◽  
pp. 353-388
Author(s):  
S. Vasavi ◽  
Mallela Padma Priya ◽  
Anu A. Gokhale

We are moving towards digitization and making all our devices, such as sensors and cameras, connected to internet, producing bigdata. This bigdata has variety of data and has paved the way to the emergence of NoSQL databases, like Cassandra, for achieving scalability and availability. Hadoop framework has been developed for storing and processing distributed data. In this chapter, the authors investigated the storage and retrieval of geospatial data by integrating Hadoop and Cassandra using prefix-based partitioning and Cassandra's default partitioning algorithm (i.e., Murmur3partitioner) techniques. Geohash value is generated, which acts as a partition key and also helps in effective search. Hence, the time taken for retrieving data is optimized. When users request spatial queries like finding nearest locations, searching in Cassandra database starts using both partitioning techniques. A comparison on query response time is made so as to verify which method is more effective. Results show the prefix-based partitioning technique is more efficient than Murmur3 partitioning technique.


Author(s):  
S. Vasavi ◽  
Mallela Padma Priya ◽  
Anu A. Gokhale

We are moving towards digitization and making all our devices, such as sensors and cameras, connected to internet, producing bigdata. This bigdata has variety of data and has paved the way to the emergence of NoSQL databases, like Cassandra, for achieving scalability and availability. Hadoop framework has been developed for storing and processing distributed data. In this chapter, the authors investigated the storage and retrieval of geospatial data by integrating Hadoop and Cassandra using prefix-based partitioning and Cassandra's default partitioning algorithm (i.e., Murmur3partitioner) techniques. Geohash value is generated, which acts as a partition key and also helps in effective search. Hence, the time taken for retrieving data is optimized. When users request spatial queries like finding nearest locations, searching in Cassandra database starts using both partitioning techniques. A comparison on query response time is made so as to verify which method is more effective. Results show the prefix-based partitioning technique is more efficient than Murmur3 partitioning technique.


Author(s):  
Guifen Tang ◽  
Luo Chen ◽  
Yunxiang Liu ◽  
Shulei Liu ◽  
Ning Jing

2014 ◽  
Vol 571-572 ◽  
pp. 600-605
Author(s):  
Lei Gang Sun ◽  
Jian Feng Liu ◽  
Quan Hong Xu

The application requirement of Geospatial data is increasing and complex as it is getting numerous as a result of furthering study on geosciences. Based on a deeply research on Oracle Spatial storage management mechanism, this paper proposed a method that applies the graph theory to domain of optimizing spatial query of massive geographical data, and established a geospatial data query model in order to settle a problem of lower spatial query efficiency in geospatial database. Combining with the practical applications, this paper did a conventional spatial query test and a spatial query based on geospatial data model respectively. The result is that the spatial query based on geospatial data query model has a better efficiency than that on conventional method. Besides, this model can greatly improve the spatial query performance and this improvement will be increasingly apparent as the data volume increases.


Author(s):  
Chunjiang Zhao ◽  
Junwei Cao ◽  
Huarui Wu ◽  
Weiwei Chen

The data grid integrates wide-area autonomous data sources and provides users with a unified data query and processing infrastructure. Adaptive data query and processing is required by data grids to provide better quality of services (QoS) to users and applications in spite of dynamically changing resources and environments. Existing AQP techniques can only meet partially data grid requirements. Some existing work is either addressing domain-specific or single-node query processing problems. Data grids provide new mechanisms for monitoring and discovering data and resources in a cross-domain wide area. Data query in grids can benefit from this information and provide better adaptability to the dynamic nature of the grid environment. In this work, an adaptive controller is proposed that dynamically adjusts resource shares to multiple data query requests in order to meet a specified level of service differentiation. The controller parameters are automatically tuned at runtime based on a predefined cost function and an online learning method. Simulation results show that our controller can meet given QoS differentiation targets and adapt to dynamic system resources among multiple data query processing requests while total demand from users and applications exceeds system capability.


Sign in / Sign up

Export Citation Format

Share Document