EFFICIENT USAGE OF MEMORY MANAGEMENT IN BIG DATA USING “ANTI-CACHING”

B. Sri Divya .

doi:10.15623/ijret.2015.0408052

HAMR: A dataflow-based real-time in-memory cluster computing engine

The International Journal of High Performance Computing Applications ◽

10.1177/1094342016672080 ◽

2016 ◽

Vol 31 (5) ◽

pp. 361-374 ◽

Cited By ~ 3

Author(s):

Yao Wu ◽

Long Zheng ◽

Brian Heilig ◽

Guang R Gao

Keyword(s):

Big Data ◽

Memory Management ◽

High Performance ◽

Cluster Computing ◽

Programming Model ◽

Distributed Processing ◽

Large Data ◽

Computing System ◽

Fine Grain ◽

Execution Model

As the attention given to big data grows, cluster computing systems for distributed processing of large data sets become the mainstream and critical requirement in high performance distributed system research. One of the most successful systems is Hadoop, which uses MapReduce as a programming/execution model and takes disks as intermedia to process huge volumes of data. Spark, as an in-memory computing engine, can solve the iterative and interactive problems more efficiently. However, currently it is a consensus that they are not the final solutions to big data due to a MapReduce-like programming model, synchronous execution model and the constraint that only supports batch processing, and so on. A new solution, especially, a fundamental evolution is needed to bring big data solutions into a new era. In this paper, we introduce a new cluster computing system called HAMR which supports both batch and streaming processing. To achieve better performance, HAMR integrates high performance computing approaches, i.e. dataflow fundamental into a big data solution. With more specifications, HAMR is fully designed based on in-memory computing to reduce the unnecessary disk access overhead; task scheduling and memory management are in fine-grain manner to explore more parallelism; asynchronous execution improves efficiency of computation resource usage, and also makes workload balance across the whole cluster better. The experimental results show that HAMR can outperform Hadoop MapReduce and Spark by up to 19x and 7x respectively, in the same cluster environment. Furthermore, HAMR can handle scaling data size well beyond the capabilities of Spark.

Download Full-text

“Anti-Caching”-based elastic memory management for Big Data

2015 IEEE 31st International Conference on Data Engineering ◽

10.1109/icde.2015.7113375 ◽

2015 ◽

Cited By ~ 4

Author(s):

Hao Zhang ◽

Gang Chen ◽

Beng Chin Ooi ◽

Weng-Fai Wong ◽

Shensen Wu ◽

...

Keyword(s):

Big Data ◽

Memory Management

Download Full-text

Memory Management for Big Data Mining – Cache Hit Rate Estimation of LessFU

Procedia Technology ◽

10.1016/j.protcy.2014.10.214 ◽

2014 ◽

Vol 17 ◽

pp. 114-121

Author(s):

Kenichi Yoshida

Keyword(s):

Data Mining ◽

Big Data ◽

Memory Management ◽

Rate Estimation ◽

Hit Rate ◽

Big Data Mining

Download Full-text

Speculative region-based memory management for big data systems

Proceedings of the 8th Workshop on Programming Languages and Operating Systems - PLOS '15 ◽

10.1145/2818302.2818308 ◽

2015 ◽

Cited By ~ 3

Author(s):

Khanh Nguyen ◽

Lu Fang ◽

Guoqing Xu ◽

Brian Demsky

Keyword(s):

Big Data ◽

Memory Management ◽

Data Systems ◽

Big Data Systems

Download Full-text

PFB+_ Tree For Big Data Memory Management System

Indian Journal of Public Health Research & Development ◽

10.5958/0976-5506.2018.00666.6 ◽

2018 ◽

Vol 9 (6) ◽

pp. 531

Author(s):

K Santhi ◽

T T Chellatamilan ◽

B Valarmathi

Keyword(s):

Big Data ◽

Management System ◽

Memory Management ◽

Data Memory

Download Full-text

Panthera: holistic memory management for big data processing over hybrid memories

Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation - PLDI 2019 ◽

10.1145/3314221.3314650 ◽

2019 ◽

Cited By ~ 4

Author(s):

Chenxi Wang ◽

Huimin Cui ◽

Ting Cao ◽

John Zigman ◽

Haris Volos ◽

...

Keyword(s):

Big Data ◽

Data Processing ◽

Memory Management ◽

Big Data Processing

Download Full-text

BigData Analysis in Healthcare: Apache Hadoop , Apache spark and Apache Flink

Frontiers in Health Informatics ◽

10.30699/fhi.v8i1.180 ◽

2019 ◽

Vol 8 (1) ◽

pp. 14 ◽

Cited By ~ 1

Author(s):

Elham Nazari ◽

Mohammad Hasan Shahriari ◽

Hamed Tabesh

Keyword(s):

Big Data ◽

Error Detection ◽

Memory Management ◽

High Speed ◽

Scientific Information ◽

High Volume ◽

Apache Spark ◽

Data Set ◽

Apache Hadoop ◽

The Subject

Introduction: Health care data is increasing. The correct analysis of such data will improve the quality of care and reduce costs. This kind of data has certain features such as high volume, variety, high-speed production, etc. It makes it impossible to analyze with ordinary hardware and software platforms. Choosing the right platform for managing this kind of data is very important. The purpose of this study is to introduce and compare the most popular and most widely used platform for processing big data, Apache Hadoop MapReduce, and the two Apache Spark and Apache Flink platforms, which have recently been featured with great prominence.Material and Methods: This study is a survey whose content is based on the subject matter search of the Proquest, PubMed, Google Scholar, Science Direct, Scopus, IranMedex, Irandoc, Magiran, ParsMedline and Scientific Information Database (SID) databases, as well as Web reviews, specialized books with related keywords and standard. Finally, 80 articles related to the subject of the study were reviewed.Results: The findings showed that each of the studied platforms has features, such as data processing, support for different languages, processing speed, computational model, memory management, optimization, delay, error tolerance, scalability, performance, compatibility, Security and so on. Overall, the findings showed that the Apache Hadoop environment has simplicity, error detection, and scalability management based on clusters, but because its processing is based on batch processing, it works for slow complex analyzes and does not support flow processing, Apache Spark is also distributed as a computational platform that can process a big data set in memory with a very fast response time, the Apache Flink allows users to store data in memory and load them multiple times and provide a complex Fault Tolerance mechanism Continuously retrieves data flow status.Conclusion: The application of big data analysis and processing platforms varies according to the needs. In other words, it can be said that each technology is complementary, each of which is applicable in a particular field and cannot be separated from one another and depending on the purpose and the expected expectation, and the platform must be selected for analysis or whether custom tools are designed on these platforms.

Download Full-text