A high-performance distributed file system for large-scale concurrent HD video streams

Infrastructure-As-A-Service (IAAS) provides an environmental setup under any type of cloud. In Distributed file system (DFS), nodes are simultaneously serve computing and storage functions; that is parallel Data Processing and storage in cloud. Here, file is considered as a data or load. That file is partitioned into a number of File chunks (FC) allocated in distinct nodes so that Map Reduce tasks can be performed in parallel over the nodes. Files and Nodes can be dynamically created, deleted, and added. This results in load imbalance in a distributed file system; that is, the file chunks are not distributed as uniformly as possible among the Chunk Servers (CS). Emerging distributed file systems in production systems strongly depend on a central node for chunk reallocation or Distributed node to maintain global knowledge of all chunks. This dependence is clearly inadequate in a large-scale, failure-prone environment because the central load balancer is put under considerable workload that is linearly scaled with the system size, it may thus become the performance bottleneck and the single point of failure and memory wastage in distributed nodes. So, we have to enhance the Client side module with server side module to create, delete and update the file chunks in Client Module. And manage the overall private cloud and apply dynamic load balancing algorithm to perform auto scaling options in private cloud. In this project, a fully distributed load rebalancing algorithm is presented to cope with the load imbalance problem.

Download Full-text

Large-scale simulation of replica placement algorithms for a serverless distributed file system

MASCOTS 2001, Proceedings Ninth International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems ◽

10.1109/mascot.2001.948882 ◽

2002 ◽

Cited By ~ 12

Author(s):

J.R. Douceur ◽

R.P. Wattenhofer

Keyword(s):

Large Scale ◽

File System ◽

Distributed File System ◽

Replica Placement ◽

Large Scale Simulation ◽

Placement Algorithms

Download Full-text

MAHA-FS : A Distributed File System for High Performance Metadata Processing and Random IO

KIPS Transactions on Software and Data Engineering ◽

10.3745/ktsde.2013.2.2.091 ◽

2013 ◽

Vol 2 (2) ◽

pp. 91-96 ◽

Cited By ~ 7

Author(s):

Young Chang Kim ◽

Dong Oh Kim ◽

Hong Yeon Kim ◽

Young Kyun Kim ◽

Wan Choi

Keyword(s):

High Performance ◽

File System ◽

Distributed File System ◽

Metadata Processing

Download Full-text

Modeling of distributed file System in big data storage by event- B

MATEC Web of Conferences ◽

10.1051/matecconf/201821004042 ◽

2018 ◽

Vol 210 ◽

pp. 04042

Author(s):

Ammar Alhaj Ali ◽

Pavel Varacha ◽

Said Krayem ◽

Roman Jasek ◽

Petr Zacek ◽

...

Keyword(s):

Big Data ◽

Data Storage ◽

High Performance ◽

File System ◽

Formal Method ◽

File Systems ◽

Distributed File System ◽

Distributed File Systems ◽

Data Systems ◽

Big Data Systems

Nowadays, a wide set of systems and application, especially in high performance computing, depends on distributed environments to process and analyses huge amounts of data. As we know, the amount of data increases enormously, and the goal to provide and develop efficient, scalable and reliable storage solutions has become one of the major issue for scientific computing. The storage solution used by big data systems is Distributed File Systems (DFSs), where DFS is used to build a hierarchical and unified view of multiple file servers and shares on the network. In this paper we will offer Hadoop Distributed File System (HDFS) as DFS in big data systems and we will present an Event-B as formal method that can be used in modeling, where Event-B is a mature formal method which has been widely used in a number of industry projects in a number of domains, such as automotive, transportation, space, business information, medical device and so on, And will propose using the Rodin as modeling tool for Event-B, which integrates modeling and proving as well as the Rodin platform is open source, so it supports a large number of plug-in tools.

Download Full-text

Data Storage Technology and its Development Based on Cloud Computing

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.756-759.1275 ◽

2013 ◽

Vol 756-759 ◽

pp. 1275-1279

Author(s):

Lin Na Huang ◽

Feng Hua Liu

Keyword(s):

Cloud Computing ◽

Data Storage ◽

Cloud Storage ◽

High Performance ◽

File System ◽

Storage System ◽

Distributed File System ◽

Cloud Data ◽

Storage Technology ◽

Cloud Data Storage

Cloud storage of high performance is the basic condition for cloud computing. This article introduces the concept and advantage of cloud storage, discusses the infrastructure of cloud storage system as well as the architecture of cloud data storage, researches the details about the design of Distributed File System within cloud data storage, at the same time, puts forward different developing strategies for the enterprises according to the different roles that the enterprises are acting as during the developing process of cloud computing.

Download Full-text

Research on parallel data processing of data mining platform in the background of cloud computing

Journal of Intelligent Systems ◽

10.1515/jisys-2020-0113 ◽

2021 ◽

Vol 30 (1) ◽

pp. 479-486

Author(s):

Lingrui Bu ◽

Hui Zhang ◽

Haiyan Xing ◽

Lijun Wu

Keyword(s):

Data Mining ◽

Data Processing ◽

Parallel Algorithm ◽

Large Scale ◽

File System ◽

Large Data ◽

Distributed File System ◽

Data Set ◽

Traditional Algorithm ◽

Hadoop Distributed File System

Abstract The efficient processing of large-scale data has very important practical value. In this study, a data mining platform based on Hadoop distributed file system was designed, and then K-means algorithm was improved with the idea of max-min distance. On Hadoop distributed file system platform, the parallelization was realized by MapReduce. Finally, the data processing effect of the algorithm was analyzed with Iris data set. The results showed that the parallel algorithm divided more correct samples than the traditional algorithm; in the single-machine environment, the parallel algorithm ran longer; in the face of large data sets, the traditional algorithm had insufficient memory, but the parallel algorithm completed the calculation task; the acceleration ratio of the parallel algorithm was raised with the expansion of cluster size and data set size, showing a good parallel effect. The experimental results verifies the reliability of parallel algorithm in big data processing, which makes some contributions to further improve the efficiency of data mining.

Download Full-text

McrEngine: A Scalable Checkpointing System Using Data-Aware Aggregation and Compression

Scientific Programming ◽

10.1155/2013/341672 ◽

2013 ◽

Vol 21 (3-4) ◽

pp. 149-163 ◽

Cited By ~ 5

Author(s):

Tanzima Zerin Islam ◽

Kathryn Mohror ◽

Saurabh Bagchi ◽

Adam Moody ◽

Bronis R. de Supinski ◽

...

Keyword(s):

High Performance Computing ◽

High Performance ◽

Large Scale ◽

File System ◽

Scale Up ◽

Parallel File System ◽

Data Semantics ◽

Using Data ◽

Performance Computing ◽

Compute Time

High performance computing (HPC) systems use checkpoint-restart to tolerate failures. Typically, applications store their states in checkpoints on a parallel file system (PFS). As applications scale up, checkpoint-restart incurs high overheads due to contention for PFS resources. The high overheads force large-scale applications to reduce checkpoint frequency, which means more compute time is lost in the event of failure. We alleviate this problem through a scalable checkpoint-restart system, mcrEngine. McrEngine aggregates checkpoints from multiple application processes with knowledge of the data semantics available through widely-used I/O libraries, e.g., HDF5 and netCDF, and compresses them. Our novel scheme improves compressibility of checkpoints up to 115% over simple concatenation and compression. Our evaluation with large-scale application checkpoints show that mcrEngine reduces checkpointing overhead by up to 87% and restart overhead by up to 62% over a baseline with no aggregation or compression.

Download Full-text

Can Parallel Replication Benefit Hadoop Distributed File System for High Performance Interconnects?

2013 IEEE 21st Annual Symposium on High-Performance Interconnects ◽

10.1109/hoti.2013.24 ◽

2013 ◽

Cited By ~ 12

Author(s):

Nusrat S. Islam ◽

Xiaoyi Lu ◽

Md. Wasi-ur-Rahman ◽

Dhabaleswar K. Panda

Keyword(s):

High Performance ◽

File System ◽

Distributed File System ◽

Hadoop Distributed File System

Download Full-text