object storage
Recently Published Documents


TOTAL DOCUMENTS

184
(FIVE YEARS 44)

H-INDEX

10
(FIVE YEARS 2)

Author(s):  
Yaoguang Huo ◽  
Junfeng Ma ◽  
Hui Li ◽  
Xin Yang ◽  
Han Wang ◽  
...  

2021 ◽  
Author(s):  
Germán T. Eizaguirre ◽  
Marc Sánchez-Artigas ◽  
Pedro García-López
Keyword(s):  

2021 ◽  
Vol 11 (18) ◽  
pp. 8540
Author(s):  
Frank Gadban ◽  
Julian Kunkel

The line between HPC and Cloud is getting blurry: Performance is still the main driver in HPC, while cloud storage systems are assumed to offer low latency, high throughput, high availability, and scalability. The Simple Storage Service S3 has emerged as the de facto storage API for object storage in the Cloud. This paper seeks to check if the S3 API is already a viable alternative for HPC access patterns in terms of performance or if further performance advancements are necessary. For this purpose: (a) We extend two common HPC I/O benchmarks—the IO500 and MD-Workbench—to quantify the performance of the S3 API. We perform the analysis on the Mistral supercomputer by launching the enhanced benchmarks against different S3 implementations: on-premises (Swift, MinIO) and in the Cloud (Google, IBM…). We find that these implementations do not yet meet the demanding performance and scalability expectations of HPC workloads. (b) We aim to identify the cause for the performance loss by systematically replacing parts of a popular S3 client library with lightweight replacements of lower stack components. The created S3Embedded library is highly scalable and leverages the shared cluster file systems of HPC infrastructure to accommodate arbitrary S3 client applications. Another introduced library, S3remote, uses TCP/IP for communication instead of HTTP; it provides a single local S3 gateway on each node. By broadening the scope of the IO500, this research enables the community to track the performance growth of S3 and encourage sharing best practices for performance optimization. The analysis also proves that there can be a performance convergence—at the storage level—between Cloud and HPC over time by using a high-performance S3 library like S3Embedded.


2021 ◽  
Vol 18 (8) ◽  
pp. 109-120
Author(s):  
Haiyang Yu ◽  
Hui Li ◽  
Xin Yang ◽  
Huajun Ma

2021 ◽  
Vol 23 (05) ◽  
pp. 791-796
Author(s):  
Rahul Jyoti ◽  
◽  
Saumitra Kulkarni ◽  
Kirti Wanjale ◽  
◽  
...  

This paper surveys different types of data storage architectures used in today’s world to store data of various types and presents an in-depth analysis of the use cases and drawbacks of the various storage architectures in different scenarios. We also examine the limitations of the traditional storage architectures in today’s world and also discuss the modern solutions to these problems. In this survey paper, we have provided a detailed comparison of three different storage architectures namely file storage, block storage, and object storage. This paper provides sufficient information to the reader to be able to choose among these architectures for storing their data as per the use case.


2021 ◽  
Author(s):  
Marco Kulüke ◽  
Fabian Wachsmann ◽  
Georg Leander Siemund ◽  
Hannes Thiemann ◽  
Stephan Kindermann

<p>This study provides a guidance to data providers on how to transfer existing NetCDF data from a hierarchical storage system into Zarr to an object storage system.</p><p>In recent years, object storage systems became an alternative to traditional hierarchical file systems, because they are easily scalable and offer faster data retrieval, as compared to hierarchical storage systems.</p><p>Earth system sciences, and climate science in particular, handle large amounts of data. These data usually are represented as multi-dimensional arrays and traditionally stored in netCDF format on hierarchical file systems. However, the current netCDF-4 format is not yet optimized for object storage systems. NetCDF data transfers from an object storage can only be conducted on file level which results in heavy download volumes. An improvement to mitigate this problem can be the Zarr format, which reduces data transfers, due to the direct chunk and meta data access and hence increases the input/output operation speed in parallel computing environments.</p><p>As one of the largest climate data providers worldwide, the German Climate Computing Center (DKRZ) continuously works towards efficient ways to make data accessible for the user. This use case shows the conversion and the transfer of a subset of the Coupled Model Intercomparison Project Phase 6 (CMIP6) climate data archive from netCDF on the hierarchical file system into Zarr to the OpenStack object store, known as Swift, by using the Zarr Python package. Conclusively, this study will evaluate to what extent Zarr formatted climate data on an object storage system is a meaningful addition to the existing high performance computing environment of the DKRZ.</p>


Sign in / Sign up

Export Citation Format

Share Document