Hierarchical Storage Systems and File Formats for Web Archiving

Author(s):  
Hiroyuki Kawano
Author(s):  
Phillip K.C. Tse

The data striping technique has been successfully applied on disks to reduce the time to access objects from the disks as shown in Chapter VI. Similarly, the striping technique has been investigated to reduce the time to access objects from the tape libraries. Similar to the striping on disks, the objective of the parallel striping method is to reduce the time to access objects from the tape libraries. The parallel tape striping directly applies the striping technique to place data stripes on tapes. The triangular placement method changes the order in which data stripes are stored on tapes to further enhance the performance. In the next section, the parallel tape striping method will be described. The performance of the parallel tape striping follows. After that, the triangular placement method is explained, and it is followed by the performance of the triangular placement method.


Author(s):  
Phillip K.C. Tse

The main objective of the tertiary storage level is to provide huge storage capacity at low cost. Several types of storage devices are available to be used at the tertiary storage level in Hierarchical Storage Systems (HSS). They include: • Magnetic tapes • Optical disks • Optical tapes These storage devices are composed of fixed storage drives and removable media units. The storage drives are fixed to the computer system. The removable media unit can be removed from the drives so that the storage capacity can be expanded with more media units. When data on a media are accessed, the media unit is accessed from their normal location. One of the storage drives on the computer system is chosen. If there is a media unit in the storage drive, the old media unit is unloaded and ejected. The new media unit is then loaded to the drive. Each type of storage drive may handle the storage drives and media units differently. The magnetic tapes are described below in the next section. Then, the optical tapes are presented. Afterwards, the optical disks are briefly described before this chapter is summarized.


Author(s):  
Phillip K.C. Tse

We have described the contiguous placement in the previous chapter and the statistical strategy to place objects on disks in Chapter IV. In this chapter, we describe the statistical strategy to place them on hierarchical storage systems. The objective of the data placement methods is to minimize the time to access object from the hierarchical storage system. The statistical strategy changes the statistical time to access objects so that the mean access time is optimal. The objective of the frequency based placement method is to differentiate objects according to their access frequencies. The objects that are more frequently accessed are placed in the more convenient locations. The objects that are less frequently accessed are placed in the less convenient locations. We will describe the frequency based placement method in the next section. Afterwards, we will analyze its performance. Last, we summarize this chapter.


2013 ◽  
Vol 21 (3-4) ◽  
pp. 65-78
Author(s):  
Wei Ding ◽  
Yuanrui Zhang ◽  
Mahmut Kandemir ◽  
Seung Woo Son

File layout of array data is a critical factor that effects the behavior of storage caches, and has so far taken not much attention in the context of hierarchical storage systems. The main contribution of this paper is a compiler-driven file layout optimization scheme for hierarchical storage caches. This approach, fully automated within an optimizing compiler, analyzes a multi-threaded application code and determines a file layout for each disk-resident array referenced by the code, such that the performance of the target storage cache hierarchy is maximized. We tested our approach using 16 I/O intensive application programs and compared its performance against two previously proposed approaches under different cache space management schemes. Our experimental results show that the proposed approach improves the execution time of these parallel applications by 23.7% on average.


2021 ◽  
Author(s):  
Marco Kulüke ◽  
Fabian Wachsmann ◽  
Georg Leander Siemund ◽  
Hannes Thiemann ◽  
Stephan Kindermann

<p>This study provides a guidance to data providers on how to transfer existing NetCDF data from a hierarchical storage system into Zarr to an object storage system.</p><p>In recent years, object storage systems became an alternative to traditional hierarchical file systems, because they are easily scalable and offer faster data retrieval, as compared to hierarchical storage systems.</p><p>Earth system sciences, and climate science in particular, handle large amounts of data. These data usually are represented as multi-dimensional arrays and traditionally stored in netCDF format on hierarchical file systems. However, the current netCDF-4 format is not yet optimized for object storage systems. NetCDF data transfers from an object storage can only be conducted on file level which results in heavy download volumes. An improvement to mitigate this problem can be the Zarr format, which reduces data transfers, due to the direct chunk and meta data access and hence increases the input/output operation speed in parallel computing environments.</p><p>As one of the largest climate data providers worldwide, the German Climate Computing Center (DKRZ) continuously works towards efficient ways to make data accessible for the user. This use case shows the conversion and the transfer of a subset of the Coupled Model Intercomparison Project Phase 6 (CMIP6) climate data archive from netCDF on the hierarchical file system into Zarr to the OpenStack object store, known as Swift, by using the Zarr Python package. Conclusively, this study will evaluate to what extent Zarr formatted climate data on an object storage system is a meaningful addition to the existing high performance computing environment of the DKRZ.</p>


Sign in / Sign up

Export Citation Format

Share Document