Opass: Analysis and Optimization of Parallel Data Access on Distributed File Systems

Author(s):  
Jiangling Yin ◽  
Jun Wang ◽  
Jian Zhou ◽  
Tyler Lukasiewicz ◽  
Dan Huang ◽  
...  
2018 ◽  
Vol 67 (3) ◽  
pp. 388-402 ◽  
Author(s):  
Dan Huang ◽  
Dezhi Han ◽  
Jun Wang ◽  
Jiangling Yin ◽  
Xunchao Chen ◽  
...  

Author(s):  
Sai Wu ◽  
Gang Chen ◽  
Xianke Zhou ◽  
Zhenjie Zhang ◽  
Anthony K. H. Tung ◽  
...  

The current systems stress on protection of data stored in the cloud servers without giving much thought and consideration to the protection of data during user access. Encryption of data is a technique that is popularly used to protect stored data. Encryption essentially scrambles the data and stores it in a form which makes no sense unless decrypted with the suitable key. Every cloud service provider ensures data is stored in an encrypted form in its servers. Encryption of data is not sufficient to protect user data as acquiring the appropriate key can result in decrypting of the data. Encrypting the data before uploading the data to the cloud can help to an extent to preserve data. To access the data it would need to be encrypted twice- once by the cloud service provider and then by the user. Cloud service provider is prevented from accessing user data and also other third-party individuals. However, this approach too is not efficient and sufficient to protect user data. ORAM algorithm is used to enable access to user data stored on distributed file systems that comprises of multiple servers stored either at a single location or multiple locations across the globe in a manner which ensures the user privacy is protected when accessing the data. Reshuffle of data blocks stored in third party servers ensures the access pattern of the user remains hidden. ORAM algorithm does not cause any hindrance to the data access and does not lead to any major drop in data access rate. To ensure security, we propose a load balancing technique to guarantee smooth and safe approach for data access.


2013 ◽  
Vol 49 (6) ◽  
pp. 2645-2652 ◽  
Author(s):  
Zhipeng Tan ◽  
Wei Zhou ◽  
Dan Feng ◽  
Wenhua Zhang

Electronics ◽  
2021 ◽  
Vol 10 (12) ◽  
pp. 1471
Author(s):  
Jun-Yeong Lee ◽  
Moon-Hyun Kim ◽  
Syed Asif Raza Raza Shah ◽  
Sang-Un Ahn ◽  
Heejun Yoon ◽  
...  

Data are important and ever growing in data-intensive scientific environments. Such research data growth requires data storage systems that play pivotal roles in data management and analysis for scientific discoveries. Redundant Array of Independent Disks (RAID), a well-known storage technology combining multiple disks into a single large logical volume, has been widely used for the purpose of data redundancy and performance improvement. However, this requires RAID-capable hardware or software to build up a RAID-enabled disk array. In addition, it is difficult to scale up the RAID-based storage. In order to mitigate such a problem, many distributed file systems have been developed and are being actively used in various environments, especially in data-intensive computing facilities, where a tremendous amount of data have to be handled. In this study, we investigated and benchmarked various distributed file systems, such as Ceph, GlusterFS, Lustre and EOS for data-intensive environments. In our experiment, we configured the distributed file systems under a Reliable Array of Independent Nodes (RAIN) structure and a Filesystem in Userspace (FUSE) environment. Our results identify the characteristics of each file system that affect the read and write performance depending on the features of data, which have to be considered in data-intensive computing environments.


2021 ◽  
Vol 17 (3) ◽  
pp. 1-25
Author(s):  
Bohong Zhu ◽  
Youmin Chen ◽  
Qing Wang ◽  
Youyou Lu ◽  
Jiwu Shu

Non-volatile memory and remote direct memory access (RDMA) provide extremely high performance in storage and network hardware. However, existing distributed file systems strictly isolate file system and network layers, and the heavy layered software designs leave high-speed hardware under-exploited. In this article, we propose an RDMA-enabled distributed persistent memory file system, Octopus + , to redesign file system internal mechanisms by closely coupling non-volatile memory and RDMA features. For data operations, Octopus + directly accesses a shared persistent memory pool to reduce memory copying overhead, and actively fetches and pushes data all in clients to rebalance the load between the server and network. For metadata operations, Octopus + introduces self-identified remote procedure calls for immediate notification between file systems and networking, and an efficient distributed transaction mechanism for consistency. Octopus + is enabled with replication feature to provide better availability. Evaluations on Intel Optane DC Persistent Memory Modules show that Octopus + achieves nearly the raw bandwidth for large I/Os and orders of magnitude better performance than existing distributed file systems.


Sign in / Sign up

Export Citation Format

Share Document