Data Prefetching and Eviction Mechanisms of In-Memory Storage Systems Based on Scheduling for Big Data Processing

Purpose – The purpose of this paper is to integrate and optimize a multiple big data processing platform with the features of high performance, high availability and high scalability in big data environment. Design/methodology/approach – First, the integration of Apache Hive, Cloudera Impala and BDAS Shark make the platform support SQL-like query. Next, users can access a single interface and select the best performance of big data warehouse platform automatically by the proposed optimizer. Finally, the distributed memory storage system Memcached incorporated into the distributed file system, Apache HDFS, is employed for fast caching query results. Therefore, if users query the same SQL command, the same result responds rapidly from the cache system instead of suffering the repeated searches in a big data warehouse and taking a longer time to retrieve. Findings – As a result the proposed approach significantly improves the overall performance and dramatically reduces the search time as querying a database, especially applying for the high-repeatable SQL commands under multi-user mode. Research limitations/implications – Currently, Shark’s latest stable version 0.9.1 does not support the latest versions of Spark and Hive. In addition, this series of software only supports Oracle JDK7. Using Oracle JDK8 or Open JDK will cause serious errors, and some software will be unable to run. Practical implications – The problem with this system is that some blocks are missing when too many blocks are stored in one result (about 100,000 records). Another problem is that the sequential writing into In-memory cache wastes time. Originality/value – When the remaining memory capacity is 2 GB or less on each server, Impala and Shark will have a lot of page swapping, causing extremely low performance. When the data scale is larger, it may cause the JVM I/O exception and make the program crash. However, when the remaining memory capacity is sufficient, Shark is faster than Hive and Impala. Impala’s consumption of memory resources is between those of Shark and Hive. This amount of remaining memory is sufficient for Impala’s maximum performance. In this study, each server allocates 20 GB of memory for cluster computing and sets the amount of remaining memory as Level 1: 3 percent (0.6 GB), Level 2: 15 percent (3 GB) and Level 3: 75 percent (15 GB) as the critical points. The program automatically selects Hive when memory is less than 15 percent, Impala at 15 to 75 percent and Shark at more than 75 percent.

Download Full-text

A novel approach for big data processing using message passing interface based on memory mapping

Journal Of Big Data ◽

10.1186/s40537-019-0275-3 ◽

2019 ◽

Vol 6 (1) ◽

Cited By ~ 1

Author(s):

Saad Ahmed Dheyab ◽

Mohammed Najm Abdullah ◽

Buthainah Fahran Abed

Keyword(s):

Big Data ◽

Data Processing ◽

Message Passing ◽

High Performance ◽

Message Passing Interface ◽

Memory Storage ◽

Big Data Processing ◽

Memory Space ◽

Novel Approach ◽

Memory Mapping

AbstractThe analysis and processing of big data are one of the most important challenges that researchers are working on to find the best approaches to handle it with high performance, low cost and high accuracy. In this paper, a novel approach for big data processing and management was proposed that differed from the existing ones; the proposed method employs not only the memory space to reads and handle big data, it also uses space of memory-mapped extended from memory storage. From a methodological viewpoint, the novelty of this paper is the segmentation stage of big data using memory mapping and broadcasting all segments to a number of processors using a parallel message passing interface. From an application viewpoint, the paper presents a high-performance approach based on a homogenous network which works parallelly to encrypt-decrypt big data using AES algorithm. This approach can be done on Windows Operating System using .NET libraries.

Download Full-text