Distributed Storage Systems for Data Intensive Computing

In this chapter, the authors present an overview of the utility of distributed storage systems in supporting modern applications that are increasingly becoming data intensive. Their coverage of distributed storage systems is based on the requirements imposed by data intensive computing and not a mere summary of storage systems. To this end, they delve into several aspects of supporting data-intensive analysis, such as data staging, offloading, checkpointing, and end-user access to terabytes of data, and illustrate the use of novel techniques and methodologies for realizing distributed storage systems therein. The data deluge from scientific experiments, observations, and simulations is affecting all of the aforementioned day-to-day operations in data-intensive computing. Modern distributed storage systems employ techniques that can help improve application performance, alleviate I/O bandwidth bottleneck, mask failures, and improve data availability. They present key guiding principles involved in the construction of such storage systems, associated tradeoffs, design, and architecture, all with an eye toward addressing challenges of data-intensive scientific applications. They highlight the concepts involved using several case studies of state-of-the-art storage systems that are currently available in the data-intensive computing landscape.

Download Full-text

Overview of Big Data-Intensive Storage and its Technologies for Cloud and Fog Computing

Research Anthology on Privatizing and Securing Data ◽

10.4018/978-1-7998-8954-0.ch005 ◽

2021 ◽

pp. 112-153

Author(s):

Richard S. Segall ◽

Jeffrey S Cook ◽

Gao Niu

Keyword(s):

Big Data ◽

Data Storage ◽

High Performance ◽

Storage Systems ◽

Fog Computing ◽

Storage Management ◽

Data Intensive Computing ◽

Computing Systems ◽

Application Performance ◽

Data Intensive

Computing systems are becoming increasingly data-intensive because of the explosion of data and the needs for processing the data, and subsequently storage management is critical to application performance in such data-intensive computing systems. However, if existing resource management frameworks in these systems lack the support for storage management, this would cause unpredictable performance degradation when applications are under input/output (I/O) contention. Storage management of data-intensive systems is a challenge. Big Data plays a most major role in storage systems for data-intensive computing. This article deals with these difficulties along with discussion of High Performance Computing (HPC) systems, background for storage systems for data-intensive applications, storage patterns and storage mechanisms for Big Data, the Top 10 Cloud Storage Systems for data-intensive computing in today's world, and the interface between Big Data Intensive Storage and Cloud/Fog Computing. Big Data storage and its server statistics and usage distributions for the Top 500 Supercomputers in the world are also presented graphically and discussed as data-intensive storage components that can be interfaced with Fog-to-cloud interactions and enabling protocols.

Download Full-text

Overview of Big Data-Intensive Storage and its Technologies for Cloud and Fog Computing

International Journal of Fog Computing ◽

10.4018/ijfc.2019010104 ◽

2019 ◽

Vol 2 (1) ◽

pp. 74-113 ◽

Cited By ~ 1

Author(s):

Richard S. Segall ◽

Jeffrey S Cook ◽

Gao Niu

Keyword(s):

Big Data ◽

Data Storage ◽

High Performance ◽

Storage Systems ◽

Fog Computing ◽

Storage Management ◽

Data Intensive Computing ◽

Computing Systems ◽

Application Performance ◽

Data Intensive

Download Full-text

An Application-Oriented Cache Allocation and Prefetching Method for Long-Running Applications in Distributed Storage Systems

Chinese Journal of Electronics ◽

10.1049/cje.2019.05.004 ◽

2019 ◽

Vol 28 (4) ◽

pp. 773-780 ◽

Cited By ~ 1

Author(s):

Chang Guo ◽

Ying Li ◽

Hongzhi Liu ◽

Zhonghai Wu

Keyword(s):

Storage Systems ◽

Distributed Storage ◽

Distributed Storage Systems ◽

Cache Allocation

Download Full-text

Analysis of a Stochastic Model of Replication in Large Distributed Storage Systems

ACM SIGMETRICS Performance Evaluation Review ◽

10.1145/3143314.3078531 ◽

2017 ◽

Vol 45 (1) ◽

pp. 51-51

Author(s):

Wen Sun ◽

Véronique Simon ◽

Sébastien Monnet ◽

Philippe Robert ◽

Pierre Sens

Keyword(s):

Stochastic Model ◽

Storage Systems ◽

Distributed Storage ◽

Distributed Storage Systems

Download Full-text

Optimal Node Selection for Data Regeneration in Heterogeneous Distributed Storage Systems

2015 44th International Conference on Parallel Processing ◽

10.1109/icpp.2015.48 ◽

2015 ◽

Cited By ~ 6

Author(s):

Qingyuan Gong ◽

Jiaqi Wang ◽

Dongsheng Wei ◽

Jin Wang ◽

Xin Wang

Keyword(s):

Storage Systems ◽

Distributed Storage ◽

Node Selection ◽

Optimal Node ◽

Distributed Storage Systems ◽

Selection For

Download Full-text

A Generic Transformation to Enable Optimal Repair in MDS Codes for Distributed Storage Systems

IEEE Transactions on Information Theory ◽

10.1109/tit.2018.2855059 ◽

2018 ◽

Vol 64 (9) ◽

pp. 6257-6267 ◽

Cited By ~ 13

Author(s):

Jie Li ◽

Xiaohu Tang ◽

Chao Tian

Keyword(s):

Storage Systems ◽

Distributed Storage ◽

Mds Codes ◽

Distributed Storage Systems ◽

Generic Transformation

Download Full-text

Data placement strategy in data center distributed storage systems

2016 IEEE International Conference on Communication Systems (ICCS) ◽

10.1109/iccs.2016.7833566 ◽

2016 ◽

Cited By ~ 1

Author(s):

Yang Qin ◽

Xiao Ai ◽

Lingjian Chen ◽

Weihong Yang

Keyword(s):

Data Center ◽

Storage Systems ◽

Distributed Storage ◽

Data Placement ◽

Distributed Storage Systems

Download Full-text

Tree-Structured Parallel Regeneration Based on Regenerating Codes for Multiple Data Losses in Distributed Storage Systems

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.918.295 ◽

2014 ◽

Vol 918 ◽

pp. 295-300

Author(s):

Peng Fei You ◽

Yu Xing Peng ◽

Zhen Huang ◽

Chang Jian Wang

Keyword(s):

Storage Systems ◽

Distributed Storage ◽

Data Loss ◽

Data Reliability ◽

Data Redundancy ◽

Regeneration Time ◽

Multiple Data ◽

Distributed Storage Systems ◽

Regenerating Codes ◽

Reliability And Availability

In distributed storage systems, erasure codes represent an attractive data redundancy solution which can provide the same reliability as replication requiring much less storage space. Multiple data losses happens usually and the lost data should be regenerated to maintain data redundancy in distributed storage systems. Regeneration for multiple data losses is expected to be finished as soon as possible, because the regeneration time can influence the data reliability and availability of distributed storage systems. However, multiple data losses is usually regenerated by regenerating single data loss one by one, which brings high entire regeneration time and severely reduces the data reliability and availability of distributed storage systems. In this paper, we propose a tree-structured parallel regeneration scheme based on regenerating codes (TPRORC) for multiple data losses in distributed storage systems. In our scheme, multiple regeneration trees based on regenerating code are constructed. Firstly, these trees are created independently, each of which dose not share any edges from the others and is responsible for one data loss; secondly, every regeneration tree based on regenerating codes owns the least network traffic and bandwidth optimized-paths for regenerating its data loss. Thus it can perform parallel regeneration for multiple data losses by using multiple optimized topology trees, in which network bandwidth is utilized efficiently and entire regeneration is overlapped. Our simulation results show that the tree-structured parallel regeneration scheme reduces the regeneration time significantly, compared to other regular regeneration schemes.

Download Full-text