Comparative Analysis of Distributed File Systems

It has been observed that there has been a great interest in computing experiments which has been useful on shared nothing computers and commodity machines. We need multiple systems running in parallel working closely together towards the same goal. Frequently it has been experienced and observed that the distributed execution engine named MapReduce handles the primary input-output workload for such clusters. There are numerous distributed file systems around viz. NTFS,ReFS,FAT,FAT32 in windows and linux, we studied them and implemented a few distributed file systems. It has been studied that distributed file systems (DFS) work very well on many small files but some do not generate expected output on large files. We implemented benchmark testing algorithms in each distributed files systems for small and large files, and the analysis is been put forward in this paper. Even we came across the various implementation issues of various DFS, they have also been mentioned in this paper.

Download Full-text

Analiza właściwości wybranych rozwiązań rozproszonych systemów plików

Przegląd Teleinformatyczny ◽

10.5604/01.3001.0012.9738 ◽

2017 ◽

Vol 5 (23) (3) ◽

pp. 53-73

Author(s):

Karolina Kula ◽

Marcin Markowski

Keyword(s):

Comparative Analysis ◽

Transfer Rate ◽

Data Transfer ◽

File Systems ◽

Evaluation Criterion ◽

Distributed File Systems ◽

Data Transfer Rate ◽

Experimental Part ◽

Experimental Testbed

W artykule przedstawiono analizę porównawczą trzech najpopularniejszych rozproszonych systemów plików: Ceph, LizardFS, GlusterFS. Głównym kryterium oceny badanych systemów była ich wydajność, rozumiana jako liczba operacji zapisu i odczytu w jednostce czasu oraz maksymalny transfer danych. Analizę oparto na wynikach licznych eksperymentów przeprowadzonych w opracowanym w tym celu stanowisku badawczym. Część badawczą poprzedzono przeglądem technologii oraz najpopularniejszych istniejących systemów tego typu. W pracy przedstawiono wyniki eksperymentów, ich analizę oraz praktyczne wnioski. In this paper we present a comparative analysis of three most popular distributed file systems: Ceph, LizardFS nad GlusterFS. The main evaluation criterion was the efficiency of systems, meant as the number of read and write operation per second and maximum data transfer rate. The analysis is based on results of many experiments conducted in experimental testbed, prepared for this purpose. Experimental part of the paper is preceded with a survey of technology related to most important distributed file systems. The results of experiments have been presented and analyzed, also some practical conclusions have been formulated.

Download Full-text

Comparative analysis of various distributed file systems & performance evaluation using map reduce implementation

2016 International Conference on Recent Advances and Innovations in Engineering (ICRAIE) ◽

10.1109/icraie.2016.7939473 ◽

2016 ◽

Cited By ~ 1

Author(s):

Madhavi Vaidya ◽

Shrinivas Deshpande

Keyword(s):

Performance Evaluation ◽

Comparative Analysis ◽

File Systems ◽

Map Reduce ◽

Distributed File Systems

Download Full-text

Erasure-Coding-Based Storage and Recovery for Distributed Exascale Storage Systems

Applied Sciences ◽

10.3390/app11083298 ◽

2021 ◽

Vol 11 (8) ◽

pp. 3298

Author(s):

Jeong-Joon Kim

Keyword(s):

File System ◽

File Systems ◽

Data Availability ◽

Distributed File System ◽

Distributed File Systems ◽

Erasure Coding ◽

Input Output ◽

Space Efficiency ◽

Random Block ◽

Replication Technique

Various techniques have been used in distributed file systems for data availability and stability. Typically, a method for storing data in a replication technique-based distributed file system is used, but due to the problem of space efficiency, an erasure-coding (EC) technique has been utilized more recently. The EC technique improves the space efficiency problem more than the replication technique does. However, the EC technique has various performance degradation factors, such as encoding and decoding and input and output (I/O) degradation. Thus, this study proposes a buffering and combining technique in which various I/O requests that occurred during encoding in an EC-based distributed file system are combined into one and processed. In addition, it proposes four recovery measures (disk input/output load distribution, random block layout, multi-thread-based parallel recovery, and matrix recycle technique) to distribute the disk input/output loads generated during decoding.

Download Full-text

Method of value chain mapping with international input–output data: application to the agricultural value chain in three Greater Mekong Subregion countries

Journal of Economic Structures ◽

10.1186/s40008-021-00235-7 ◽

2021 ◽

Vol 10 (1) ◽

Author(s):

Ikuo Kuroiwa

Keyword(s):

Value Chain ◽

Value Added ◽

Input Output ◽

Output Data ◽

Primary Input ◽

Goods And Services ◽

Unit Structure ◽

Data Application ◽

Final Output ◽

Greater Mekong Subregion

AbstractExtending the technique of unit structure analysis, which was originally developed by Ozaki (J Econ 73(5):720–748, 1980), this study introduces a method of value chain mapping that uses international input–output data and reveals both the upstream and downstream transactions of goods and services, as well as primary input (value added) and final output (final demand) transactions, which emerge along the entire value chain. This method is then applied to the agricultural value chain of three Greater Mekong Subregion countries: Thailand, Vietnam, and Cambodia. The results show that the agricultural value chain has been increasingly internationalized, although there is still room to benefit from participating in global value chains, especially in a country such as Cambodia. Although there are some constraints regarding the methodology and data, the method proves useful in tracing the entire value chain.

Download Full-text

A validated performance model for distributed file systems

Journal of Systems and Software ◽

10.1016/0164-1212(89)90030-7 ◽

1989 ◽

Vol 10 (3) ◽

pp. 169-185 ◽

Cited By ~ 1

Author(s):

Anna Hać

Keyword(s):

File Systems ◽

Performance Model ◽

Distributed File Systems

Download Full-text

PABIRS: A data access middleware for distributed file systems

2015 IEEE 31st International Conference on Data Engineering ◽

10.1109/icde.2015.7113277 ◽

2015 ◽

Cited By ~ 1

Author(s):

Sai Wu ◽

Gang Chen ◽

Xianke Zhou ◽

Zhenjie Zhang ◽

Anthony K. H. Tung ◽

...

Keyword(s):

File Systems ◽

Data Access ◽

Distributed File Systems

Download Full-text

Distributed File Systems

Proceedings Thirteenth IEEE Symposium on Mass Storage Systems. Toward Distributed Storage and Data Management Systems ◽

10.1109/mass.1994.373020 ◽

2005 ◽

Author(s):

R. Watson

Keyword(s):

File Systems ◽

Distributed File Systems

Download Full-text

ALDM: Adaptive Loading Data Migration in Distributed File Systems

IEEE Transactions on Magnetics ◽

10.1109/tmag.2013.2251616 ◽

2013 ◽

Vol 49 (6) ◽

pp. 2645-2652 ◽

Cited By ~ 2

Author(s):

Zhipeng Tan ◽

Wei Zhou ◽

Dan Feng ◽

Wenhua Zhang

Keyword(s):

File Systems ◽

Data Migration ◽

Distributed File Systems

Download Full-text

Un modèle de projection des prix utilisant les relations intersectorielles de l’économie canadienne

L Actualité économique ◽

10.7202/800606ar ◽

2009 ◽

Vol 51 (1) ◽

pp. 71-85

Author(s):

R. Rioux

Keyword(s):

Commodity Prices ◽

Price Formation ◽

Input Output ◽

Price Model ◽

Primary Input ◽

Time Period ◽

Statistics Canada ◽

The Impact ◽

Push Model ◽

Input Prices

This paper describes a simple cost-push price model which has been developed at the Structural Analysis Division of Statistics Canada. This price model is a traditional input/output cost-push model which has been adapted to utilize the rectangular industry by commodity input/output tables for Canada. It can be considered as the "dual" of the output model. Instead of analysing the propagation of demand through the economic system, the price model serves to analyse the propagation of factor prices throughout the system. The purpose of such a price formation model is to determine the impact on industry selling prices and domestic commodity prices arising from a change in impart commodity prices and primary input prices. This price model is of a static type; it accepts no substitutions and its structure is quite rigid. It is considered as being an annual model although it can be used for a different time period. This model is fully operational and is widely used by many government and private agencies.

Download Full-text

Performance Evaluations of Distributed File Systems for Scientific Big Data in FUSE Environment

Electronics ◽

10.3390/electronics10121471 ◽

2021 ◽

Vol 10 (12) ◽

pp. 1471

Author(s):

Jun-Yeong Lee ◽

Moon-Hyun Kim ◽

Syed Asif Raza Raza Shah ◽

Sang-Un Ahn ◽

Heejun Yoon ◽

...

Keyword(s):

Data Storage ◽

Scale Up ◽

File Systems ◽

Performance Evaluations ◽

Distributed File Systems ◽

Data Intensive Computing ◽

Data Intensive ◽

Tremendous Amount ◽

Computing Environments ◽

And Performance

Data are important and ever growing in data-intensive scientific environments. Such research data growth requires data storage systems that play pivotal roles in data management and analysis for scientific discoveries. Redundant Array of Independent Disks (RAID), a well-known storage technology combining multiple disks into a single large logical volume, has been widely used for the purpose of data redundancy and performance improvement. However, this requires RAID-capable hardware or software to build up a RAID-enabled disk array. In addition, it is difficult to scale up the RAID-based storage. In order to mitigate such a problem, many distributed file systems have been developed and are being actively used in various environments, especially in data-intensive computing facilities, where a tremendous amount of data have to be handled. In this study, we investigated and benchmarked various distributed file systems, such as Ceph, GlusterFS, Lustre and EOS for data-intensive environments. In our experiment, we configured the distributed file systems under a Reliable Array of Independent Nodes (RAIN) structure and a Filesystem in Userspace (FUSE) environment. Our results identify the characteristics of each file system that affect the read and write performance depending on the features of data, which have to be considered in data-intensive computing environments.

Download Full-text