storage hierarchy
Recently Published Documents


TOTAL DOCUMENTS

44
(FIVE YEARS 7)

H-INDEX

5
(FIVE YEARS 1)

2021 ◽  
Vol 17 (4) ◽  
pp. 1-21
Author(s):  
Devarshi Ghoshal ◽  
Lavanya Ramakrishnan

Scientific workflows in High Performance Computing ( HPC ) environments are processing large amounts of data. The storage hierarchy on HPC systems is getting deeper, driven by new technologies (NVRAMs, SSDs, etc.) There is a need for new programming abstractions that allow users to seamlessly manage data at the workflow level on multi-tiered storage systems, and provide optimal workflow performance and use of storage resources. In previous work, we introduced a software architecture Managing Data on Tiered Storage for Scientific Workflows (MaDaTS ) that used a Virtual Data Space ( VDS ) abstraction to hide the complexities of the underlying storage system while allowing users to control data management strategies. In this article, we detail the data-centric programming abstractions that allow users to manage a workflow around its data on the storage layer. The programming abstractions simplify data management for scientific workflows on multi-tiered storage systems, without affecting workflow performance or storage capacity. We measure the overheads and effectiveness introduced by the programming abstractions of MaDaTS. Our results show that these abstractions can optimally use the storage capacity in lesser capacity storage tiers, and simplify data management without adding any performance overheads.


2021 ◽  
Vol 55 (1) ◽  
pp. 99-107
Author(s):  
Luis Thomas ◽  
Sebastien Gougeaud ◽  
Stéphane Rubini ◽  
Philippe Deniel ◽  
Jalil Boukhobza

The emergence of Exascale machines in HPC will have the foreseen consequence of putting more pressure on the storage systems in place, not only in terms of capacity but also bandwidth and latency. With limited budget we cannot imagine using only storage class memory, which leads to the use of a heterogeneous tiered storage hierarchy. In order to make the most efficient use of the high performance tier in this storage hierarchy, we need to be able to place user data on the right tier and at the right time. In this paper, we assume a 2-tier storage hierarchy with a high performance tier and a high capacity archival tier. Files are placed on the high performance tier at creation time and moved to capacity tier once their lifetime expires (that is once they are no more accessed). The main contribution of this paper lies in the design of a file lifetime prediction model solely based on its path based on the use of Convolutional Neural Network. Results show that our solution strikes a good trade-off between accuracy and under-estimation. Compared to previous work, our model made it possible to reach an accuracy close to previous work (around 98.60% compared to 98.84%) while reducing the underestimations by almost 10x to reach 2.21% (compared to 21.86%). The reduction in underestimations is crucial as it avoids misplacing files in the capacity tier while they are still in use.


2020 ◽  
Vol 20 (3) ◽  
pp. 211-222
Author(s):  
Nikolaus Glombiewski ◽  
Philipp Götze ◽  
Michael Körber ◽  
Andreas Morgen ◽  
Bernhard Seeger

Abstract Event stores face the difficult challenge of continuously ingesting massive temporal data streams while satisfying demanding query and recovery requirements. Many of today’s systems deal with multiple hardware-based trade-offs. For instance, long-term storage solutions balance keeping data in cheap secondary media (SSDs, HDDs) and performance-oriented main-memory caches. As an alternative, in-memory systems focus on performance, while sacrificing monetary costs, and, to some degree, recovery guarantees. The advent of persistent memory (PMem) led to a multitude of novel research proposals aiming to alleviate those trade-offs in various fields. So far, however, there is no proposal for a PMem-powered specialized event store. Based on ChronicleDB, we will present several complementary approaches for a three-layer architecture featuring main memory, PMem, and secondary storage. We enhance some of ChronicleDB’s components with PMem for better insertion and query performance as well as better recovery guarantees. At the same time, the three-layer architecture aims to keep the overall dollar cost of a system low. The limitations and opportunities of a PMem-enhanced event store serve as important groundwork for comprehensive system design exploiting a modern storage hierarchy.


2019 ◽  
Vol 62 (11) ◽  
pp. 114-120 ◽  
Author(s):  
Raja Appuswamy ◽  
Goetz Graefe ◽  
Renata Borovica-Gajic ◽  
Anastasia Ailamaki
Keyword(s):  

Author(s):  
Dayananda P. ◽  
Sowmyarani C. N.

The size of semi-structured data is increasing continuously. Handling semi-structured data efficiently is a challenging task. Keyword search is an important task, and required information can be retrieved without having knowledge of data storage hierarchy. There are several challenges in handling XML data. This chapter discusses various challenges in terms of lowest common ancestor (LCA) semantics, processing of queries efficiently, retrieving top-k results for user needed data. The existing approach is defined under many classes based on how the problem and solution are tackled. Analysis of keyword search and ranking techniques for retrieving desired information are discussed in detail.


Author(s):  
A. Akoguz ◽  
S. Bozkurt ◽  
A. A. Gozutok ◽  
G. Alp ◽  
E. G. Turan ◽  
...  

High resolution level in satellite imagery came with its fundamental problem as big amount of telemetry data which is to be stored after the downlink operation. Moreover, later the post-processing and image enhancement steps after the image is acquired, the file sizes increase even more and then it gets a lot harder to store and consume much more time to transmit the data from one source to another; hence, it should be taken into account that to save even more space with file compression of the raw and various levels of processed data is a necessity for archiving stations to save more space. Lossless data compression algorithms that will be examined in this study aim to provide compression without any loss of data holding spectral information. Within this objective, well-known open source programs supporting related compression algorithms have been implemented on processed GeoTIFF images of Airbus Defence & Spaces SPOT 6 & 7 satellites having 1.5 m. of GSD, which were acquired and stored by ITU Center for Satellite Communications and Remote Sensing (ITU CSCRS), with the algorithms Lempel-Ziv-Welch (LZW), Lempel-Ziv-Markov chain Algorithm (LZMA & LZMA2), Lempel-Ziv-Oberhumer (LZO), Deflate & Deflate 64, Prediction by Partial Matching (PPMd or PPM2), Burrows-Wheeler Transform (BWT) in order to observe compression performances of these algorithms over sample datasets in terms of how much of the image data can be compressed by ensuring lossless compression.


Author(s):  
A. Akoguz ◽  
S. Bozkurt ◽  
A. A. Gozutok ◽  
G. Alp ◽  
E. G. Turan ◽  
...  

High resolution level in satellite imagery came with its fundamental problem as big amount of telemetry data which is to be stored after the downlink operation. Moreover, later the post-processing and image enhancement steps after the image is acquired, the file sizes increase even more and then it gets a lot harder to store and consume much more time to transmit the data from one source to another; hence, it should be taken into account that to save even more space with file compression of the raw and various levels of processed data is a necessity for archiving stations to save more space. Lossless data compression algorithms that will be examined in this study aim to provide compression without any loss of data holding spectral information. Within this objective, well-known open source programs supporting related compression algorithms have been implemented on processed GeoTIFF images of Airbus Defence & Spaces SPOT 6 & 7 satellites having 1.5 


Sign in / Sign up

Export Citation Format

Share Document