Comparative Analysis of Distributed File Systems

Author(s):  
Basireddy Ithihas Reddy

It has been observed that there has been a great interest in computing experiments which has been useful on shared nothing computers and commodity machines. We need multiple systems running in parallel working closely together towards the same goal. Frequently it has been experienced and observed that the distributed execution engine named MapReduce handles the primary input-output workload for such clusters. There are numerous distributed file systems around viz. NTFS,ReFS,FAT,FAT32 in windows and linux, we studied them and implemented a few distributed file systems. It has been studied that distributed file systems (DFS) work very well on many small files but some do not generate expected output on large files. We implemented benchmark testing algorithms in each distributed files systems for small and large files, and the analysis is been put forward in this paper. Even we came across the various implementation issues of various DFS, they have also been mentioned in this paper.

2017 ◽  
Vol 5 (23) (3) ◽  
pp. 53-73
Author(s):  
Karolina Kula ◽  
Marcin Markowski

W artykule przedstawiono analizę porównawczą trzech najpopularniejszych rozproszonych systemów plików: Ceph, LizardFS, GlusterFS. Głównym kryterium oceny badanych systemów była ich wydajność, rozumiana jako liczba operacji zapisu i odczytu w jednostce czasu oraz maksymalny transfer danych. Analizę oparto na wynikach licznych eksperymentów przeprowadzonych w opracowanym w tym celu stanowisku badawczym. Część badawczą poprzedzono przeglądem technologii oraz najpopularniejszych istniejących systemów tego typu. W pracy przedstawiono wyniki eksperymentów, ich analizę oraz praktyczne wnioski. In this paper we present a comparative analysis of three most popular distributed file systems: Ceph, LizardFS nad GlusterFS. The main evaluation criterion was the efficiency of systems, meant as the number of read and write operation per second and maximum data transfer rate. The analysis is based on results of many experiments conducted in experimental testbed, prepared for this purpose. Experimental part of the paper is preceded with a survey of technology related to most important distributed file systems. The results of experiments have been presented and analyzed, also some practical conclusions have been formulated.


2021 ◽  
Vol 11 (8) ◽  
pp. 3298
Author(s):  
Jeong-Joon Kim

Various techniques have been used in distributed file systems for data availability and stability. Typically, a method for storing data in a replication technique-based distributed file system is used, but due to the problem of space efficiency, an erasure-coding (EC) technique has been utilized more recently. The EC technique improves the space efficiency problem more than the replication technique does. However, the EC technique has various performance degradation factors, such as encoding and decoding and input and output (I/O) degradation. Thus, this study proposes a buffering and combining technique in which various I/O requests that occurred during encoding in an EC-based distributed file system are combined into one and processed. In addition, it proposes four recovery measures (disk input/output load distribution, random block layout, multi-thread-based parallel recovery, and matrix recycle technique) to distribute the disk input/output loads generated during decoding.


2021 ◽  
Vol 10 (1) ◽  
Author(s):  
Ikuo Kuroiwa

AbstractExtending the technique of unit structure analysis, which was originally developed by Ozaki (J Econ 73(5):720–748, 1980), this study introduces a method of value chain mapping that uses international input–output data and reveals both the upstream and downstream transactions of goods and services, as well as primary input (value added) and final output (final demand) transactions, which emerge along the entire value chain. This method is then applied to the agricultural value chain of three Greater Mekong Subregion countries: Thailand, Vietnam, and Cambodia. The results show that the agricultural value chain has been increasingly internationalized, although there is still room to benefit from participating in global value chains, especially in a country such as Cambodia. Although there are some constraints regarding the methodology and data, the method proves useful in tracing the entire value chain.


Author(s):  
Sai Wu ◽  
Gang Chen ◽  
Xianke Zhou ◽  
Zhenjie Zhang ◽  
Anthony K. H. Tung ◽  
...  

2013 ◽  
Vol 49 (6) ◽  
pp. 2645-2652 ◽  
Author(s):  
Zhipeng Tan ◽  
Wei Zhou ◽  
Dan Feng ◽  
Wenhua Zhang

2009 ◽  
Vol 51 (1) ◽  
pp. 71-85
Author(s):  
R. Rioux

This paper describes a simple cost-push price model which has been developed at the Structural Analysis Division of Statistics Canada. This price model is a traditional input/output cost-push model which has been adapted to utilize the rectangular industry by commodity input/output tables for Canada. It can be considered as the "dual" of the output model. Instead of analysing the propagation of demand through the economic system, the price model serves to analyse the propagation of factor prices throughout the system. The purpose of such a price formation model is to determine the impact on industry selling prices and domestic commodity prices arising from a change in impart commodity prices and primary input prices. This price model is of a static type; it accepts no substitutions and its structure is quite rigid. It is considered as being an annual model although it can be used for a different time period. This model is fully operational and is widely used by many government and private agencies.


Electronics ◽  
2021 ◽  
Vol 10 (12) ◽  
pp. 1471
Author(s):  
Jun-Yeong Lee ◽  
Moon-Hyun Kim ◽  
Syed Asif Raza Raza Shah ◽  
Sang-Un Ahn ◽  
Heejun Yoon ◽  
...  

Data are important and ever growing in data-intensive scientific environments. Such research data growth requires data storage systems that play pivotal roles in data management and analysis for scientific discoveries. Redundant Array of Independent Disks (RAID), a well-known storage technology combining multiple disks into a single large logical volume, has been widely used for the purpose of data redundancy and performance improvement. However, this requires RAID-capable hardware or software to build up a RAID-enabled disk array. In addition, it is difficult to scale up the RAID-based storage. In order to mitigate such a problem, many distributed file systems have been developed and are being actively used in various environments, especially in data-intensive computing facilities, where a tremendous amount of data have to be handled. In this study, we investigated and benchmarked various distributed file systems, such as Ceph, GlusterFS, Lustre and EOS for data-intensive environments. In our experiment, we configured the distributed file systems under a Reliable Array of Independent Nodes (RAIN) structure and a Filesystem in Userspace (FUSE) environment. Our results identify the characteristics of each file system that affect the read and write performance depending on the features of data, which have to be considered in data-intensive computing environments.


Sign in / Sign up

Export Citation Format

Share Document