Secure resilient high performance file system for distributed systems

Non-volatile memory and remote direct memory access (RDMA) provide extremely high performance in storage and network hardware. However, existing distributed file systems strictly isolate file system and network layers, and the heavy layered software designs leave high-speed hardware under-exploited. In this article, we propose an RDMA-enabled distributed persistent memory file system, Octopus + , to redesign file system internal mechanisms by closely coupling non-volatile memory and RDMA features. For data operations, Octopus + directly accesses a shared persistent memory pool to reduce memory copying overhead, and actively fetches and pushes data all in clients to rebalance the load between the server and network. For metadata operations, Octopus + introduces self-identified remote procedure calls for immediate notification between file systems and networking, and an efficient distributed transaction mechanism for consistency. Octopus + is enabled with replication feature to provide better availability. Evaluations on Intel Optane DC Persistent Memory Modules show that Octopus + achieves nearly the raw bandwidth for large I/Os and orders of magnitude better performance than existing distributed file systems.

Download Full-text

On the benefits of a workflow-aware file system in high-performance computing systems

Eighth International Conference on High-Performance Computing in Asia-Pacific Region (HPCASIA'05) ◽

10.1109/hpcasia.2005.58 ◽

2005 ◽

Author(s):

Yang Wang ◽

P. Lu

Keyword(s):

High Performance Computing ◽

High Performance ◽

File System ◽

Computing Systems ◽

Performance Computing

Download Full-text

Flow control for fully adaptive routing††Part of this research was first presented at the 18th IEEE International Symposium on High Performance Computer Architecture (HPCA-2012) [32]. The journal extension edition of the HPCA-2012 paper was published in IEEE Transactions on Parallel and Distributed Systems [33].

Networks-On-Chip ◽

10.1016/b978-0-12-800979-6.00006-8 ◽

2015 ◽

pp. 175-214

Keyword(s):

Distributed Systems ◽

Flow Control ◽

International Symposium ◽

Computer Architecture ◽

High Performance ◽

Adaptive Routing ◽

Fully Adaptive ◽

High Performance Computer ◽

Ieee International Symposium

Download Full-text

High Performance Storage for Big Data Analytics and Visualization

Advances in Data Mining and Database Management - Handbook of Research on Big Data Storage and Visualization Techniques ◽

10.4018/978-1-5225-3142-5.ch010 ◽

2018 ◽

pp. 254-275

Author(s):

Armando Fandango ◽

William Rivera

Keyword(s):

Big Data ◽

High Speed ◽

High Performance ◽

File System ◽

Predictive Analytics ◽

Big Data Analytics ◽

File Systems ◽

Distributed Applications ◽

System Level ◽

File Formats

Scientific Big Data being gathered at exascale needs to be stored, retrieved and manipulated. The storage stack for scientific Big Data includes a file system at the system level for physical organization of the data, and a file format and input/output (I/O) system at the application level for logical organization of the data; both of them of high-performance variety for exascale. The high-performance file system is designed with concurrent access, high-speed transmission and fault tolerance characteristics. High-performance file formats and I/O are designed to allow parallel and distributed applications with easy and fast access to Big Data. These specialized file formats make it easier to store and access Big Data for scientific visualization and predictive analytics. This chapter provides a brief review of the characteristics of high-performance file systems such as Lustre and GPFS, and high-performance file formats such as HDF5, NetCDF, MPI-IO, and HDFS.

Download Full-text

Adaptive Threshold Based Scheduler for Batch of Independent Jobs for Cloud Computing System

Research Anthology on Architectures, Frameworks, and Integration Strategies for Distributed and Cloud Computing ◽

10.4018/978-1-7998-5339-8.ch110 ◽

2021 ◽

pp. 2246-2266

Author(s):

TAJ ALAM ◽

PARITOSH DUBEY ◽

ANKIT KUMAR

Keyword(s):

Cloud Computing ◽

Distributed Systems ◽

High Performance ◽

Large Scale ◽

Real Life ◽

Interval Estimation ◽

Computing System ◽

Adaptive Threshold ◽

Batch Simulation ◽

Heterogeneous Distributed Systems

Distributed systems are efficient means of realizing high-performance computing (HPC). They are used in meeting the demand of executing large-scale high-performance computational jobs. Scheduling the tasks on such computational resources is one of the prime concerns in the heterogeneous distributed systems. Scheduling jobs on distributed systems are NP-complete in nature. Scheduling requires either heuristic or metaheuristic approach for sub-optimal but acceptable solutions. An adaptive threshold-based scheduler is one such heuristic approach. This work proposes adaptive threshold-based scheduler for batch of independent jobs (ATSBIJ) with the objective of optimizing the makespan of the jobs submitted for execution on cloud computing systems. ATSBIJ exploits the features of interval estimation for calculating the threshold values for generation of efficient schedule of the batch. Simulation studies on CloudSim ensures that the ATSBIJ approach works effectively for real life scenario.

Download Full-text