distributed file systems Latest Research Papers

With opportunities brought by the Internet of Things (IoT), it is quite a challenge to maintain concurrency and privacy when a huge number of resource-constrained distributed devices are involved. Blockchain have become popular for its benefits, including decentralization, persistence, immutability, auditability, and consensus. Great attention has been received by the IoT based on the construction of distributed file systems worldwide. A new generation of IoT-based distributed file systems has been proposed with the integration of Blockchain technology, such as the Swarm and Interplanetary File System. By using IoT, new technical challenges, such as Credibility, Harmonization, large-volume data, heterogeneity, and constrained resources are arising. To ensure data security in IoT, centralized access control technologies do not provide credibility. In this work, we propose an attribute-based access control model for the IoT. The access control lists are not required for each device by the system. It enhances access management in terms of effectiveness. Moreover, we use blockchain technology for recording the attribute, avoiding data tempering, and eliminating a single point of failure at edge computing devices. IoT devices control the user’s environment as well as his or her private data collection; therefore, the exposure of the user’s personal data to non-trusted private and public servers may result in privacy leakage. To automate the system, smart contracts are used for data accessing, whereas Proof of Authority is used for enhancing the system’s performance and optimizing gas consumption. Through smart contracts, ciphertext can be stored on a blockchain by the data owner. Data can only be decrypted in a valid access period, whereas in blockchains, the trace function is achieved by the storage of invocation and the creation of smart contracts. Scalability issues can also be resolved by using the multichain blockchain. Eventually, it is concluded from the simulation results that the proposed system is efficient for IoT.

Download Full-text

Octopus + : An RDMA-Enabled Distributed Persistent Memory File System

ACM Transactions on Storage ◽

10.1145/3448418 ◽

2021 ◽

Vol 17 (3) ◽

pp. 1-25

Author(s):

Bohong Zhu ◽

Youmin Chen ◽

Qing Wang ◽

Youyou Lu ◽

Jiwu Shu

Keyword(s):

High Speed ◽

High Performance ◽

File System ◽

Direct Memory Access ◽

File Systems ◽

Distributed File Systems ◽

Persistent Memory ◽

Memory Modules ◽

Non Volatile Memory ◽

Volatile Memory

Non-volatile memory and remote direct memory access (RDMA) provide extremely high performance in storage and network hardware. However, existing distributed file systems strictly isolate file system and network layers, and the heavy layered software designs leave high-speed hardware under-exploited. In this article, we propose an RDMA-enabled distributed persistent memory file system, Octopus + , to redesign file system internal mechanisms by closely coupling non-volatile memory and RDMA features. For data operations, Octopus + directly accesses a shared persistent memory pool to reduce memory copying overhead, and actively fetches and pushes data all in clients to rebalance the load between the server and network. For metadata operations, Octopus + introduces self-identified remote procedure calls for immediate notification between file systems and networking, and an efficient distributed transaction mechanism for consistency. Octopus + is enabled with replication feature to provide better availability. Evaluations on Intel Optane DC Persistent Memory Modules show that Octopus + achieves nearly the raw bandwidth for large I/Os and orders of magnitude better performance than existing distributed file systems.

Download Full-text

An Effective Techniques Using Apriori and Logistic Methods in Cloud Computing

IARS' International Research Journal ◽

10.51611/iars.irj.v11i2.2021.167 ◽

2021 ◽

Vol 11 (2) ◽

pp. 35-39

Author(s):

S. Selvam

Keyword(s):

Cloud Computing ◽

File Systems ◽

Data Prefetching ◽

Distributed File Systems ◽

Storage Server ◽

Storage Servers ◽

Cloud Environments ◽

Prefetching Technique ◽

Client Device ◽

Client System

This paper presents a creativity data prefetching scheme on the loading servers in distributed file systems for cloud computing. The server will get and piggybacked the frequent data from the client system, after analyzing the fetched data is forward to the client machine from the server. To place this technique to work, the data about client nodes is piggybacked onto the real client I/O requests, and then forwarded to the relevant storage server. Next, dual prediction algorithms have been proposed to calculation future block access operations for directing what data should be fetched on storage servers in advance. Finally, the prefetching data can be pressed to the relevant client device from the storage server. Over a series of evaluation experiments with a group of application benchmarks, we have demonstrated that our presented initiative prefetching technique can benefit distributed file systems for cloud environments to achieve better I/O performance. In particular, configuration-limited client machines in the cloud are not answerable for predicting I/O access operations, which can certainly contribute to preferable system performance on them.

Download Full-text

Comparative Analysis of Distributed File Systems

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.37261 ◽

2021 ◽

Vol 9 (VIII) ◽

pp. 20-26

Author(s):

Basireddy Ithihas Reddy

Keyword(s):

Comparative Analysis ◽

File Systems ◽

Distributed File Systems ◽

Input Output ◽

Primary Input ◽

Multiple Systems ◽

Execution Engine ◽

Distributed Execution ◽

Testing Algorithms ◽

Distributed Files

It has been observed that there has been a great interest in computing experiments which has been useful on shared nothing computers and commodity machines. We need multiple systems running in parallel working closely together towards the same goal. Frequently it has been experienced and observed that the distributed execution engine named MapReduce handles the primary input-output workload for such clusters. There are numerous distributed file systems around viz. NTFS,ReFS,FAT,FAT32 in windows and linux, we studied them and implemented a few distributed file systems. It has been studied that distributed file systems (DFS) work very well on many small files but some do not generate expected output on large files. We implemented benchmark testing algorithms in each distributed files systems for small and large files, and the analysis is been put forward in this paper. Even we came across the various implementation issues of various DFS, they have also been mentioned in this paper.

Download Full-text

Performance Evaluations of Distributed File Systems for Scientific Big Data in FUSE Environment

Electronics ◽

10.3390/electronics10121471 ◽

2021 ◽

Vol 10 (12) ◽

pp. 1471

Author(s):

Jun-Yeong Lee ◽

Moon-Hyun Kim ◽

Syed Asif Raza Raza Shah ◽

Sang-Un Ahn ◽

Heejun Yoon ◽

...

Keyword(s):

Data Storage ◽

Scale Up ◽

File Systems ◽

Performance Evaluations ◽

Distributed File Systems ◽

Data Intensive Computing ◽

Data Intensive ◽

Tremendous Amount ◽

Computing Environments ◽

And Performance

Data are important and ever growing in data-intensive scientific environments. Such research data growth requires data storage systems that play pivotal roles in data management and analysis for scientific discoveries. Redundant Array of Independent Disks (RAID), a well-known storage technology combining multiple disks into a single large logical volume, has been widely used for the purpose of data redundancy and performance improvement. However, this requires RAID-capable hardware or software to build up a RAID-enabled disk array. In addition, it is difficult to scale up the RAID-based storage. In order to mitigate such a problem, many distributed file systems have been developed and are being actively used in various environments, especially in data-intensive computing facilities, where a tremendous amount of data have to be handled. In this study, we investigated and benchmarked various distributed file systems, such as Ceph, GlusterFS, Lustre and EOS for data-intensive environments. In our experiment, we configured the distributed file systems under a Reliable Array of Independent Nodes (RAIN) structure and a Filesystem in Userspace (FUSE) environment. Our results identify the characteristics of each file system that affect the read and write performance depending on the features of data, which have to be considered in data-intensive computing environments.

Download Full-text

Access Control Conflict Resolution in Distributed File Systems using CRDTs

Proceedings of the 8th Workshop on Principles and Practice of Consistency for Distributed Data ◽

10.1145/3447865.3457970 ◽

2021 ◽

Author(s):

Elena Yanakieva ◽

Michael Youssef ◽

Ahmad Hussein Rezae ◽

Annette Bieniusa

Keyword(s):

Conflict Resolution ◽

Access Control ◽

File Systems ◽

Distributed File Systems

Download Full-text

Erasure-Coding-Based Storage and Recovery for Distributed Exascale Storage Systems

Applied Sciences ◽

10.3390/app11083298 ◽

2021 ◽

Vol 11 (8) ◽

pp. 3298

Author(s):

Jeong-Joon Kim

Keyword(s):

File System ◽

File Systems ◽

Data Availability ◽

Distributed File System ◽

Distributed File Systems ◽

Erasure Coding ◽

Input Output ◽

Space Efficiency ◽

Random Block ◽

Replication Technique

Various techniques have been used in distributed file systems for data availability and stability. Typically, a method for storing data in a replication technique-based distributed file system is used, but due to the problem of space efficiency, an erasure-coding (EC) technique has been utilized more recently. The EC technique improves the space efficiency problem more than the replication technique does. However, the EC technique has various performance degradation factors, such as encoding and decoding and input and output (I/O) degradation. Thus, this study proposes a buffering and combining technique in which various I/O requests that occurred during encoding in an EC-based distributed file system are combined into one and processed. In addition, it proposes four recovery measures (disk input/output load distribution, random block layout, multi-thread-based parallel recovery, and matrix recycle technique) to distribute the disk input/output loads generated during decoding.

Download Full-text