object storage Latest Research Papers

The line between HPC and Cloud is getting blurry: Performance is still the main driver in HPC, while cloud storage systems are assumed to offer low latency, high throughput, high availability, and scalability. The Simple Storage Service S3 has emerged as the de facto storage API for object storage in the Cloud. This paper seeks to check if the S3 API is already a viable alternative for HPC access patterns in terms of performance or if further performance advancements are necessary. For this purpose: (a) We extend two common HPC I/O benchmarks—the IO500 and MD-Workbench—to quantify the performance of the S3 API. We perform the analysis on the Mistral supercomputer by launching the enhanced benchmarks against different S3 implementations: on-premises (Swift, MinIO) and in the Cloud (Google, IBM…). We find that these implementations do not yet meet the demanding performance and scalability expectations of HPC workloads. (b) We aim to identify the cause for the performance loss by systematically replacing parts of a popular S3 client library with lightweight replacements of lower stack components. The created S3Embedded library is highly scalable and leverages the shared cluster file systems of HPC infrastructure to accommodate arbitrary S3 client applications. Another introduced library, S3remote, uses TCP/IP for communication instead of HTTP; it provides a single local S3 gateway on each node. By broadening the scope of the IO500, this research enables the community to track the performance growth of S3 and encourage sharing best practices for performance optimization. The analysis also proves that there can be a performance convergence—at the storage level—between Cloud and HPC over time by using a high-performance S3 library like S3Embedded.

Download Full-text

On distributed object storage architecture based on mimic defense

China Communications ◽

10.23919/jcc.2021.08.009 ◽

2021 ◽

Vol 18 (8) ◽

pp. 109-120

Author(s):

Haiyang Yu ◽

Hui Li ◽

Xin Yang ◽

Huajun Ma

Keyword(s):

Distributed Object ◽

Object Storage

Download Full-text

Enabling near-data processing in distributed object storage systems

Proceedings of the 13th ACM Workshop on Hot Topics in Storage and File Systems ◽

10.1145/3465332.3470881 ◽

2021 ◽

Author(s):

Ian F. Adams ◽

Neha Agrawal ◽

Michael P. Mesnier

Keyword(s):

Data Processing ◽

Storage Systems ◽

Distributed Object ◽

Object Storage

Download Full-text

A Survey on Different Storage Architectures

Journal of University of Shanghai for Science and Technology ◽

10.51201/jusst/21/05212 ◽

2021 ◽

Vol 23 (05) ◽

pp. 791-796

Author(s):

Rahul Jyoti ◽

◽

Saumitra Kulkarni ◽

Kirti Wanjale ◽

◽

...

Keyword(s):

Data Storage ◽

Detailed Comparison ◽

Sufficient Information ◽

Use Case ◽

Survey Paper ◽

File Storage ◽

Depth Analysis ◽

Object Storage ◽

Different Types ◽

Block Storage

This paper surveys different types of data storage architectures used in today’s world to store data of various types and presents an in-depth analysis of the use cases and drawbacks of the various storage architectures in different scenarios. We also examine the limitations of the traditional storage architectures in today’s world and also discuss the modern solutions to these problems. In this survey paper, we have provided a detailed comparison of three different storage architectures namely file storage, block storage, and object storage. This paper provides sufficient information to the reader to be able to choose among these architectures for storing their data as per the use case.

Download Full-text

Using ceph's BlueStore as object storage in HPC storage framework

Proceedings of the Workshop on Challenges and Opportunities of Efficient and Performant Storage Systems ◽

10.1145/3439839.3458734 ◽

2021 ◽

Author(s):

Kira Duwe ◽

Michael Kuhn

Keyword(s):

Object Storage

Download Full-text

Transfer Data from NetCDF on Hierarchical Storage to Zarr on Object Storage: CMIP6 Climate Data Use Case

10.5194/egusphere-egu21-2442 ◽

2021 ◽

Author(s):

Marco Kulüke ◽

Fabian Wachsmann ◽

Georg Leander Siemund ◽

Hannes Thiemann ◽

Stephan Kindermann

Keyword(s):

Storage Systems ◽

Storage System ◽

File Systems ◽

Data Access ◽

Data Retrieval ◽

Use Case ◽

Climate Data ◽

Object Storage ◽

Hierarchical Storage ◽

Data Transfers

This study provides a guidance to data providers on how to transfer existing NetCDF data from a hierarchical storage system into Zarr to an object storage system.In recent years, object storage systems became an alternative to traditional hierarchical file systems, because they are easily scalable and offer faster data retrieval, as compared to hierarchical storage systems.Earth system sciences, and climate science in particular, handle large amounts of data. These data usually are represented as multi-dimensional arrays and traditionally stored in netCDF format on hierarchical file systems. However, the current netCDF-4 format is not yet optimized for object storage systems. NetCDF data transfers from an object storage can only be conducted on file level which results in heavy download volumes. An improvement to mitigate this problem can be the Zarr format, which reduces data transfers, due to the direct chunk and meta data access and hence increases the input/output operation speed in parallel computing environments.As one of the largest climate data providers worldwide, the German Climate Computing Center (DKRZ) continuously works towards efficient ways to make data accessible for the user. This use case shows the conversion and the transfer of a subset of the Coupled Model Intercomparison Project Phase 6 (CMIP6) climate data archive from netCDF on the hierarchical file system into Zarr to the OpenStack object store, known as Swift, by using the Zarr Python package. Conclusively, this study will evaluate to what extent Zarr formatted climate data on an object storage system is a meaningful addition to the existing high performance computing environment of the DKRZ.

Download Full-text