Reducing Write Amplification for Inodes of Journaling File System using Persistent Memory

Author(s):  
Chaoshu Yang ◽  
Duo Liu ◽  
Xianzhang Chen ◽  
Runyu Zhang ◽  
Wenbin Wang ◽  
...  
2021 ◽  
Vol 17 (3) ◽  
pp. 1-25
Author(s):  
Bohong Zhu ◽  
Youmin Chen ◽  
Qing Wang ◽  
Youyou Lu ◽  
Jiwu Shu

Non-volatile memory and remote direct memory access (RDMA) provide extremely high performance in storage and network hardware. However, existing distributed file systems strictly isolate file system and network layers, and the heavy layered software designs leave high-speed hardware under-exploited. In this article, we propose an RDMA-enabled distributed persistent memory file system, Octopus + , to redesign file system internal mechanisms by closely coupling non-volatile memory and RDMA features. For data operations, Octopus + directly accesses a shared persistent memory pool to reduce memory copying overhead, and actively fetches and pushes data all in clients to rebalance the load between the server and network. For metadata operations, Octopus + introduces self-identified remote procedure calls for immediate notification between file systems and networking, and an efficient distributed transaction mechanism for consistency. Octopus + is enabled with replication feature to provide better availability. Evaluations on Intel Optane DC Persistent Memory Modules show that Octopus + achieves nearly the raw bandwidth for large I/Os and orders of magnitude better performance than existing distributed file systems.


2021 ◽  
Vol 14 (10) ◽  
pp. 1872-1885
Author(s):  
Baoyue Yan ◽  
Xuntao Cheng ◽  
Bo Jiang ◽  
Shibin Chen ◽  
Canfang Shang ◽  
...  

The recent byte-addressable and large-capacity commercialized persistent memory (PM) is promising to drive database as a service (DBaaS) into unchartered territories. This paper investigates how to leverage PMs to revisit the conventional LSM-tree based OLTP storage engines designed for DRAM-SSD hierarchy for DBaaS instances. Specifically we (1) propose a light-weight PM allocator named Hal-loc customized for LSM-tree, (2) build a high-performance Semi-persistent Memtable utilizing the persistent in-memory writes of PM, (3) design a concurrent commit algorithm named Reorder Ring to aschieve log-free transaction processing for OLTP workloads and (4) present a Global Index as the new globally sorted persistent level with non-blocking in-memory compaction. The design of Reorder Ring and Semi-persistent Memtable achieves fast writes without synchronized logging overheads and achieves near instant recovery time. Moreover, the design of Semi-persistent Memtable and Global Index with in-memory compaction enables the byte-addressable persistent levels in PM, which significantly reduces the read and write amplification as well as the background compaction overheads. The overall evaluation shows that the performance of our proposal over PM-SSD hierarchy outperforms the baseline by up to 3.8x in YCSB benchmark and by 2x in TPC-C benchmark.


2018 ◽  
Vol 67 (7) ◽  
pp. 1023-1038 ◽  
Author(s):  
Tseng-Yi Chen ◽  
Yuan-Hao Chang ◽  
Shuo-Han Chen ◽  
Chih-Ching Kuo ◽  
Ming-Chang Yang ◽  
...  

2021 ◽  
Vol 17 (3) ◽  
pp. 1-26
Author(s):  
Baoquan Zhang ◽  
David H. C. Du

Computer systems utilizing byte-addressable Non-Volatile Memory ( NVM ) as memory/storage can provide low-latency data persistence. The widely used key-value stores using Log-Structured Merge Tree ( LSM-Tree ) are still beneficial for NVM systems in aspects of the space and write efficiency. However, the significant write amplification introduced by the leveled compaction of LSM-Tree degrades the write performance of the key-value store and shortens the lifetime of the NVM devices. The existing studies propose new compaction methods to reduce write amplification. Unfortunately, they result in a relatively large read amplification. In this article, we propose NVLSM, a key-value store for NVM systems using LSM-Tree with new accumulative compaction. By fully utilizing the byte-addressability of NVM, accumulative compaction uses pointers to accumulate data into multiple floors in a logically sorted run to reduce the number of compactions required. We have also proposed a cascading searching scheme for reads among the multiple floors to reduce read amplification. Therefore, NVLSM reduces write amplification with small increases in read amplification. We compare NVLSM with key-value stores using LSM-Tree with two other compaction methods: leveled compaction and fragmented compaction. Our evaluations show that NVLSM reduces write amplification by up to 67% compared with LSM-Tree using leveled compaction without significantly increasing the read amplification. In write-intensive workloads, NVLSM reduces the average latency by 15.73%–41.2% compared to other key-value stores.


2013 ◽  
Vol 664 ◽  
pp. 1050-1054
Author(s):  
Jun Wang ◽  
Ge Huang

The overhead brings by metadata journaling is extra space and performance degrade caused by frequent journal data flush. A journal file system based remote journal scheme was designed and implementation. The remote journal scheme removes the frequent journal I/O activities from local disk to remote server. According to the experiments in this paper, remote journal increases about 8% to 19% performance, but the penalty is light. Although remote journal does need more CPU time for network transfer, the overhead is less than 8%. And the extra network bandwidth taken by remote journal is less than 6% in metadata bound workload.


Sign in / Sign up

Export Citation Format

Share Document