A Data Management in a Private Cloud Storage Environment Utilizing High Performance Distributed File Systems

Non-volatile memory and remote direct memory access (RDMA) provide extremely high performance in storage and network hardware. However, existing distributed file systems strictly isolate file system and network layers, and the heavy layered software designs leave high-speed hardware under-exploited. In this article, we propose an RDMA-enabled distributed persistent memory file system, Octopus + , to redesign file system internal mechanisms by closely coupling non-volatile memory and RDMA features. For data operations, Octopus + directly accesses a shared persistent memory pool to reduce memory copying overhead, and actively fetches and pushes data all in clients to rebalance the load between the server and network. For metadata operations, Octopus + introduces self-identified remote procedure calls for immediate notification between file systems and networking, and an efficient distributed transaction mechanism for consistency. Octopus + is enabled with replication feature to provide better availability. Evaluations on Intel Optane DC Persistent Memory Modules show that Octopus + achieves nearly the raw bandwidth for large I/Os and orders of magnitude better performance than existing distributed file systems.

Download Full-text

Evolution and analysis of distributed file systems in cloud storage: Analytical survey

2016 International Conference on Computing, Communication and Automation (ICCCA) ◽

10.1109/ccaa.2016.7813828 ◽

2016 ◽

Cited By ~ 1

Author(s):

Dharavath Ramesh ◽

Neeraj Patidar ◽

Gaurav Kumar ◽

Teja Vunnam

Keyword(s):

Cloud Storage ◽

File Systems ◽

Distributed File Systems

Download Full-text

Distributed Data Management and Distributed File Systems

Journal of Physics Conference Series ◽

10.1088/1742-6596/664/4/042022 ◽

2015 ◽

Vol 664 (4) ◽

pp. 042022

Author(s):

Maria Girone

Keyword(s):

Data Management ◽

File Systems ◽

Distributed Data ◽

Distributed File Systems ◽

Distributed Data Management

Download Full-text

Analyzing the Performance of the S3 Object Storage API for HPC Workloads

Applied Sciences ◽

10.3390/app11188540 ◽

2021 ◽

Vol 11 (18) ◽

pp. 8540

Author(s):

Frank Gadban ◽

Julian Kunkel

Keyword(s):

Best Practices ◽

Performance Optimization ◽

Cloud Storage ◽

High Performance ◽

File Systems ◽

Performance Loss ◽

Object Storage ◽

Storage Service ◽

Main Driver ◽

Access Patterns

The line between HPC and Cloud is getting blurry: Performance is still the main driver in HPC, while cloud storage systems are assumed to offer low latency, high throughput, high availability, and scalability. The Simple Storage Service S3 has emerged as the de facto storage API for object storage in the Cloud. This paper seeks to check if the S3 API is already a viable alternative for HPC access patterns in terms of performance or if further performance advancements are necessary. For this purpose: (a) We extend two common HPC I/O benchmarks—the IO500 and MD-Workbench—to quantify the performance of the S3 API. We perform the analysis on the Mistral supercomputer by launching the enhanced benchmarks against different S3 implementations: on-premises (Swift, MinIO) and in the Cloud (Google, IBM…). We find that these implementations do not yet meet the demanding performance and scalability expectations of HPC workloads. (b) We aim to identify the cause for the performance loss by systematically replacing parts of a popular S3 client library with lightweight replacements of lower stack components. The created S3Embedded library is highly scalable and leverages the shared cluster file systems of HPC infrastructure to accommodate arbitrary S3 client applications. Another introduced library, S3remote, uses TCP/IP for communication instead of HTTP; it provides a single local S3 gateway on each node. By broadening the scope of the IO500, this research enables the community to track the performance growth of S3 and encourage sharing best practices for performance optimization. The analysis also proves that there can be a performance convergence—at the storage level—between Cloud and HPC over time by using a high-performance S3 library like S3Embedded.

Download Full-text

Modeling of distributed file System in big data storage by event- B

MATEC Web of Conferences ◽

10.1051/matecconf/201821004042 ◽

2018 ◽

Vol 210 ◽

pp. 04042

Author(s):

Ammar Alhaj Ali ◽

Pavel Varacha ◽

Said Krayem ◽

Roman Jasek ◽

Petr Zacek ◽

...

Keyword(s):

Big Data ◽

Data Storage ◽

High Performance ◽

File System ◽

Formal Method ◽

File Systems ◽

Distributed File System ◽

Distributed File Systems ◽

Data Systems ◽

Big Data Systems

Nowadays, a wide set of systems and application, especially in high performance computing, depends on distributed environments to process and analyses huge amounts of data. As we know, the amount of data increases enormously, and the goal to provide and develop efficient, scalable and reliable storage solutions has become one of the major issue for scientific computing. The storage solution used by big data systems is Distributed File Systems (DFSs), where DFS is used to build a hierarchical and unified view of multiple file servers and shares on the network. In this paper we will offer Hadoop Distributed File System (HDFS) as DFS in big data systems and we will present an Event-B as formal method that can be used in modeling, where Event-B is a mature formal method which has been widely used in a number of industry projects in a number of domains, such as automotive, transportation, space, business information, medical device and so on, And will propose using the Rodin as modeling tool for Event-B, which integrates modeling and proving as well as the Rodin platform is open source, so it supports a large number of plug-in tools.

Download Full-text

High Performance Metadata Management Engine for Large-Scale Distributed File Systems

2015 9th International Conference on Future Generation Communication and Networking (FGCN) ◽

10.1109/fgcn.2015.12 ◽

2015 ◽

Cited By ~ 1

Author(s):

Myung-Hoon Cha ◽

Sang-Min Lee ◽

Dong-Oh Kim ◽

Hong-Yeon Kim ◽

Young-Kyun Kim

Keyword(s):

High Performance ◽

Large Scale ◽

File Systems ◽

Metadata Management ◽

Distributed File Systems

Download Full-text

Big Data Processing and Big Analytics

Advances in Data Mining and Database Management - Emerging Technologies and Applications in Data Processing and Management ◽

10.4018/978-1-5225-8446-9.ch014 ◽

2019 ◽

pp. 285-315

Author(s):

Jaroslav Pokorny ◽

Bela Stantic

Keyword(s):

Big Data ◽

Data Processing ◽

Data Management ◽

File Systems ◽

Distributed File Systems ◽

Big Data Processing ◽

Daily Lives ◽

Scalable Systems ◽

Diverse Data ◽

Special Category

Development and wide acceptance of data-driven applications in many aspects of our daily lives is generating waste volume of diverse data, which can be collected and analyzed to support various valuable decisions. Management and processing of this big data is a challenge. The development and extensive use of highly distributed and scalable systems to process big data have been widely considered. New data management architectures (e.g., distributed file systems and NoSQL databases) are used in this context. However, features of big data like their complexity and data analytics demands indicate that these concepts solve big data problems only partially. A development of so called NewSQL databases is highly relevant and even special category of big data management systems is considered. In this chapter, the authors discuss these trends and evaluate some current approaches to big data processing and analytics, identify the current challenges, and suggest possible research directions.

Download Full-text

Challenges and Opportunities in Big Data Processing

Big Data ◽

10.4018/978-1-4666-9840-6.ch096 ◽

2016 ◽

pp. 2074-2097 ◽

Cited By ~ 1

Author(s):

Jaroslav Pokorny ◽

Bela Stantic

Keyword(s):

Big Data ◽

Data Processing ◽

Data Management ◽

File Systems ◽

Distributed File Systems ◽

Big Data Processing ◽

Data Management Systems ◽

Scalable Systems ◽

Challenges And Opportunities ◽

Special Category

The development and extensive use of highly distributed and scalable systems to process Big Data have been widely considered. New data management architectures, e.g. distributed file systems and NoSQL databases, are used in this context. However, features of Big Data like their complexity and data analytics demands indicate that these concepts solve Big Data problems only partially. A development of so called NewSQL databases is highly relevant and even special category of Big Data Management Systems is considered. In this work we will discuss these trends and evaluate some current approaches to Big Data processing, identify the current challenges, and suggest possible research directions.

Download Full-text

Data Consistency Research of the Cloud Storage Environment Based on P2P Technology

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.380-384.2589 ◽

2013 ◽

Vol 380-384 ◽

pp. 2589-2592

Author(s):

Jun Wei Ge ◽

Feng Yang ◽

Yi Qiu Fang

Keyword(s):

Cloud Storage ◽

High Performance ◽

Storage System ◽

Model Systems ◽

Data Consistency ◽

Service Reliability ◽

Cloud Storage Environment ◽

P2p Technology ◽

Server Architecture

Many of the characteristics of P2P technology such as decentralization, scalability, robustness, high performance and load balancing are in line with the cloud storage design requirements. In this article, it proposesd a cloud storage model based on P2P technology. This model uses multivariate data server architecture, which can effectively solve the bottleneck problem of centralized cloud storage system. Thereby increasing the cloud storage system performance and the quality of cloud storage services, enhance service reliability. In this article, through researching data consistency algorithm named Paxos and the model systems metadata consistency problem, we improve and optimize Basic Paxos making it applied in the model system.

Download Full-text