◾ Approaches for High-Performance Big Data Processing: Applications and Challenges

With the rapid development of Internet of Things (IoT) technologies, the increasing volume and diversity of sources of geospatial big data have created challenges in storing, managing, and processing data. In addition to the general characteristics of big data, the unique properties of spatial data make the handling of geospatial big data even more complicated. To facilitate users implementing geospatial big data applications in a MapReduce framework, several big data processing systems have extended the original Hadoop to support spatial properties. Most of those platforms, however, have included spatial functionalities by embedding them as a form of plug-in. Although offering a convenient way to add new features to an existing system, the plug-in has several limitations. In particular, while executing spatial and nonspatial operations by alternating between the existing system and the plug-in, additional read and write overheads have to be added to the workflow, significantly reducing performance efficiency. To address this issue, we have developed Marmot, a high-performance, geospatial big data processing system based on MapReduce. Marmot extends Hadoop at a low level to support seamless integration between spatial and nonspatial operations of a solid framework, allowing improved performance of geoprocessing workflow. This paper explains the overall architecture and data model of Marmot as well as the main algorithm for automatic construction of MapReduce jobs from a given spatial analysis task. To illustrate how Marmot transforms a sequence of operators for spatial analysis to map and reduce functions in a way to achieve better performance, this paper presents an example of spatial analysis retrieving the number of subway stations per city in Korea. This paper also experimentally demonstrates that Marmot generally outperforms SpatialHadoop, one of the top plug-in based spatial big data frameworks, particularly in dealing with complex and time-intensive queries involving spatial index.

Download Full-text

A Comparative Analysis of Recursive Query Algorithm Implementations based on High Performance Distributed In-Memory Big Data Processing Platforms

Journal of KIISE ◽

10.5626/jok.2016.43.6.621 ◽

2016 ◽

Vol 43 (6) ◽

pp. 621-626

Author(s):

Minseo Kang ◽

Jaesung Kim ◽

Jaegil Lee

Keyword(s):

Big Data ◽

Comparative Analysis ◽

Data Processing ◽

High Performance ◽

Big Data Processing ◽

Recursive Query ◽

Query Algorithm

Download Full-text

Sketching-based high-performance biomedical big data processing accelerator

2016 IEEE International Symposium on Circuits and Systems (ISCAS) ◽

10.1109/iscas.2016.7527446 ◽

2016 ◽

Cited By ~ 10

Author(s):

Amey Kulkarni ◽

Ali Jafari ◽

Chris Sagedy ◽

Tinoosh Mohsenin

Keyword(s):

Big Data ◽

Data Processing ◽

High Performance ◽

Big Data Processing

Download Full-text

High-Performance Techniques for Big Data Processing

Knowledge Discovery in Big Data from Astronomy and Earth Observation ◽

10.1016/b978-0-12-819154-5.00017-5 ◽

2020 ◽

pp. 137-158

Author(s):

Philipp Neumann ◽

Julian Kunkel

Keyword(s):

Big Data ◽

Data Processing ◽

High Performance ◽

Big Data Processing ◽

Performance Techniques

Download Full-text

Towards High Performance Big Data Processing by Making Use of Non-volatile Memory

Proceedings of the 5th International Conference on Cloud Computing and Services Science ◽

10.5220/0005463605290534 ◽

2015 ◽

Author(s):

Shuichi Oikawa

Keyword(s):

Big Data ◽

Data Processing ◽

High Performance ◽

Big Data Processing ◽

Non Volatile Memory ◽

Volatile Memory

Download Full-text

A novel approach for big data processing using message passing interface based on memory mapping

Journal Of Big Data ◽

10.1186/s40537-019-0275-3 ◽

2019 ◽

Vol 6 (1) ◽

Cited By ~ 1

Author(s):

Saad Ahmed Dheyab ◽

Mohammed Najm Abdullah ◽

Buthainah Fahran Abed

Keyword(s):

Big Data ◽

Data Processing ◽

Message Passing ◽

High Performance ◽

Message Passing Interface ◽

Memory Storage ◽

Big Data Processing ◽

Memory Space ◽

Novel Approach ◽

Memory Mapping

AbstractThe analysis and processing of big data are one of the most important challenges that researchers are working on to find the best approaches to handle it with high performance, low cost and high accuracy. In this paper, a novel approach for big data processing and management was proposed that differed from the existing ones; the proposed method employs not only the memory space to reads and handle big data, it also uses space of memory-mapped extended from memory storage. From a methodological viewpoint, the novelty of this paper is the segmentation stage of big data using memory mapping and broadcasting all segments to a number of processors using a parallel message passing interface. From an application viewpoint, the paper presents a high-performance approach based on a homogenous network which works parallelly to encrypt-decrypt big data using AES algorithm. This approach can be done on Windows Operating System using .NET libraries.

Download Full-text

A COMPARATIVE ANALYSIS OF CONVENTIONAL HADOOP WITH PROPOSED CLOUD ENABLED HADOOP FRAMEWORK FOR SPATIAL BIG DATA PROCESSING

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-annals-iv-5-425-2018 ◽

2018 ◽

Vol IV-5 ◽

pp. 425-430 ◽

Cited By ~ 1

Author(s):

A. K. Tripathi ◽

S. Agrawal ◽

R. D. Gupta

Keyword(s):

Big Data ◽

Data Processing ◽

High Performance ◽

Computing Environment ◽

Big Data Processing ◽

Huge Data ◽

Spatial Big Data ◽

New Research ◽

Performance Computing ◽

Hadoop Framework

<p><strong>Abstract.</strong> The emergence of new tools and technologies to gather the information generate the problem of processing spatial big data. The solution of this problem requires new research, techniques, innovation and development. Spatial big data is categorized by the five V’s: volume, velocity, veracity, variety and value. Hadoop is a most widely used framework which address these problems. But it requires high performance computing resources to store and process such huge data. The emergence of cloud computing has provided, on demand, elastic, scalable and payment based computing resources to users to develop their own computing environment. The main objective of this paper is to develop a cloud enabled hadoop framework which combines cloud technology and high computing resources with the conventional hadoop framework to support the spatial big data solutions. The paper also compares the conventional hadoop framework and proposed cloud enabled hadoop framework. It is observed that the propose cloud enabled hadoop framework is much efficient to spatial big data processing than the current available solutions.</p>

Download Full-text