DERBY: a memory management system for distributed main memory databases

Author(s):  
J. Griffioen ◽  
T. Anderson ◽  
Y. Breitbart ◽  
R. Vingralek
Author(s):  
J. D. Sewall ◽  
S. J. Pennycook ◽  
A. Duran ◽  
X. Tian ◽  
R. Narayanaswamy

Author(s):  
G. Kornaros ◽  
I. Papaefstathiou ◽  
A. Nikologiannis ◽  
N. Zervos

2021 ◽  
Vol 11 (18) ◽  
pp. 8476
Author(s):  
June Choi ◽  
Jaehyun Lee ◽  
Jik-Soo Kim ◽  
Jaehwan Lee

In this paper, we present several optimization strategies that can improve the overall performance of the distributed in-memory computing system, “Apache Spark”. Despite its distributed memory management capability for iterative jobs and intermediate data, Spark has a significant performance degradation problem when the available amount of main memory (DRAM, typically used for data caching) is limited. To address this problem, we leverage an SSD (solid-state drive) to supplement the lack of main memory bandwidth. Specifically, we present an effective optimization methodology for Apache Spark by collectively investigating the effects of changing the capacity fraction ratios of the shuffle and storage spaces in the “Spark JVM Heap Configuration” and applying different “RDD Caching Policies” (e.g., SSD-backed memory caching). Our extensive experimental results show that by utilizing the proposed optimization techniques, we can improve the overall performance by up to 42%.


Sign in / Sign up

Export Citation Format

Share Document