An efficient cache management scheme for accessing small files in Distributed File Systems

Due to large data volume and low latency requirements of modern web services, the use of an in-memory key-value (KV) cache often becomes an inevitable choice (e.g., Redis and Memcached). The in-memory cache holds hot data, reduces request latency, and alleviates the load on background databases. Inheriting from the traditional hardware cache design, many existing KV cache systems still use recency-based cache replacement algorithms, e.g., least recently used or its approximations. However, the diversity of miss penalty distinguishes a KV cache from a hardware cache. Inadequate consideration of penalty can substantially compromise space utilization and request service time. KV accesses also demonstrate locality, which needs to be coordinated with miss penalty to guide cache management. In this article, we first discuss how to enhance the existing cache model, the Average Eviction Time model, so that it can adapt to modeling a KV cache. After that, we apply the model to Redis and propose pRedis, Penalty- and Locality-aware Memory Allocation in Redis, which synthesizes data locality and miss penalty, in a quantitative manner, to guide memory allocation and replacement in Redis. At the same time, we also explore the diurnal behavior of a KV store and exploit long-term reuse. We replace the original passive eviction mechanism with an automatic dump/load mechanism, to smooth the transition between access peaks and valleys. Our evaluation shows that pRedis effectively reduces the average and tail access latency with minimal time and space overhead. For both real-world and synthetic workloads, our approach delivers an average of 14.0%∼52.3% latency reduction over a state-of-the-art penalty-aware cache management scheme, Hyperbolic Caching (HC), and shows more quantitative predictability of performance. Moreover, we can obtain even lower average latency (1.1%∼5.5%) when dynamically switching policies between pRedis and HC.

Download Full-text

A validated performance model for distributed file systems

Journal of Systems and Software ◽

10.1016/0164-1212(89)90030-7 ◽

1989 ◽

Vol 10 (3) ◽

pp. 169-185 ◽

Cited By ~ 1

Author(s):

Anna Hać

Keyword(s):

File Systems ◽

Performance Model ◽

Distributed File Systems

Download Full-text

PABIRS: A data access middleware for distributed file systems

2015 IEEE 31st International Conference on Data Engineering ◽

10.1109/icde.2015.7113277 ◽

2015 ◽

Cited By ~ 1

Author(s):

Sai Wu ◽

Gang Chen ◽

Xianke Zhou ◽

Zhenjie Zhang ◽

Anthony K. H. Tung ◽

...

Keyword(s):

File Systems ◽

Data Access ◽

Distributed File Systems

Download Full-text

Distributed File Systems

Proceedings Thirteenth IEEE Symposium on Mass Storage Systems. Toward Distributed Storage and Data Management Systems ◽

10.1109/mass.1994.373020 ◽

2005 ◽

Author(s):

R. Watson

Keyword(s):

File Systems ◽

Distributed File Systems

Download Full-text

ALDM: Adaptive Loading Data Migration in Distributed File Systems

IEEE Transactions on Magnetics ◽

10.1109/tmag.2013.2251616 ◽

2013 ◽

Vol 49 (6) ◽

pp. 2645-2652 ◽

Cited By ~ 2

Author(s):

Zhipeng Tan ◽

Wei Zhou ◽

Dan Feng ◽

Wenhua Zhang

Keyword(s):

File Systems ◽

Data Migration ◽

Distributed File Systems

Download Full-text

Performance Evaluations of Distributed File Systems for Scientific Big Data in FUSE Environment

Electronics ◽

10.3390/electronics10121471 ◽

2021 ◽

Vol 10 (12) ◽

pp. 1471

Author(s):

Jun-Yeong Lee ◽

Moon-Hyun Kim ◽

Syed Asif Raza Raza Shah ◽

Sang-Un Ahn ◽

Heejun Yoon ◽

...

Keyword(s):

Data Storage ◽

Scale Up ◽

File Systems ◽

Performance Evaluations ◽

Distributed File Systems ◽

Data Intensive Computing ◽

Data Intensive ◽

Tremendous Amount ◽

Computing Environments ◽

And Performance

Data are important and ever growing in data-intensive scientific environments. Such research data growth requires data storage systems that play pivotal roles in data management and analysis for scientific discoveries. Redundant Array of Independent Disks (RAID), a well-known storage technology combining multiple disks into a single large logical volume, has been widely used for the purpose of data redundancy and performance improvement. However, this requires RAID-capable hardware or software to build up a RAID-enabled disk array. In addition, it is difficult to scale up the RAID-based storage. In order to mitigate such a problem, many distributed file systems have been developed and are being actively used in various environments, especially in data-intensive computing facilities, where a tremendous amount of data have to be handled. In this study, we investigated and benchmarked various distributed file systems, such as Ceph, GlusterFS, Lustre and EOS for data-intensive environments. In our experiment, we configured the distributed file systems under a Reliable Array of Independent Nodes (RAIN) structure and a Filesystem in Userspace (FUSE) environment. Our results identify the characteristics of each file system that affect the read and write performance depending on the features of data, which have to be considered in data-intensive computing environments.

Download Full-text

Load-aware Adaptive Cache Management Scheme for Enterprise-level Stackable Cryptographic File System*

2020 IEEE 22nd International Conference on High Performance Computing and Communications; IEEE 18th International Conference on Smart City; IEEE 6th International Conference on Data Science and Systems (HPCC/SmartCity/DSS) ◽

10.1109/hpcc-smartcity-dss50907.2020.00006 ◽

2020 ◽

Author(s):

Chunhua Xiao ◽

Yanyue Pan ◽

Dandan Xu ◽

Weichen Liu ◽

Shuting Sun ◽

...

Keyword(s):

File System ◽

Cache Management ◽

Management Scheme ◽

Enterprise Level ◽

Adaptive Cache

Download Full-text

Octopus + : An RDMA-Enabled Distributed Persistent Memory File System

ACM Transactions on Storage ◽

10.1145/3448418 ◽

2021 ◽

Vol 17 (3) ◽

pp. 1-25

Author(s):

Bohong Zhu ◽

Youmin Chen ◽

Qing Wang ◽

Youyou Lu ◽

Jiwu Shu

Keyword(s):

High Speed ◽

High Performance ◽

File System ◽

Direct Memory Access ◽

File Systems ◽

Distributed File Systems ◽

Persistent Memory ◽

Memory Modules ◽

Non Volatile Memory ◽

Volatile Memory

Non-volatile memory and remote direct memory access (RDMA) provide extremely high performance in storage and network hardware. However, existing distributed file systems strictly isolate file system and network layers, and the heavy layered software designs leave high-speed hardware under-exploited. In this article, we propose an RDMA-enabled distributed persistent memory file system, Octopus + , to redesign file system internal mechanisms by closely coupling non-volatile memory and RDMA features. For data operations, Octopus + directly accesses a shared persistent memory pool to reduce memory copying overhead, and actively fetches and pushes data all in clients to rebalance the load between the server and network. For metadata operations, Octopus + introduces self-identified remote procedure calls for immediate notification between file systems and networking, and an efficient distributed transaction mechanism for consistency. Octopus + is enabled with replication feature to provide better availability. Evaluations on Intel Optane DC Persistent Memory Modules show that Octopus + achieves nearly the raw bandwidth for large I/Os and orders of magnitude better performance than existing distributed file systems.

Download Full-text

An efficient cache management scheme for accessing small files in Distributed File Systems

Design and Implementation of a Metadata Management Scheme for Large Distributed File Systems

HCCache: A Hybrid Client-Side Cache Management Scheme for I/O-intensive Workloads in Network-Based File Systems

Penalty- and Locality-aware Memory Allocation in Redis Using Enhanced AET

A validated performance model for distributed file systems

PABIRS: A data access middleware for distributed file systems

Distributed File Systems

ALDM: Adaptive Loading Data Migration in Distributed File Systems

Performance Evaluations of Distributed File Systems for Scientific Big Data in FUSE Environment

Load-aware Adaptive Cache Management Scheme for Enterprise-level Stackable Cryptographic File System*

Octopus + : An RDMA-Enabled Distributed Persistent Memory File System

Export Citation Format