Interoperable job execution and data access through UNICORE and the Global Federated File System

XRootD† has been established as a standard for WAN data access in HEP and HENP. Site specific features, like those existing at GSI, have historically been hard to implement with native methods. XRootD allows a custom replacement of basic functionality for native XRootD functions through the use of plug-ins. XRootD clients allow this since version 4.0. In this contribution, our XRootD based developments motivated by the use in the current ALICE Tier 2 Centre at GSI and the upcoming ALICE Analysis Facility will be shown. Among other things, an XRootD redirector plug-in which redirects local clients directly to a shared file system, as well as the needed changes to the XRootD base code, which are publicly available since XRootD version 4.8.0, will be presented. Furthermore, a prototype for an XRootD based disk caching system for opportunistic resources has been developed.

Download Full-text

A Novel Query Method for Spatial Data in Mobile Cloud Computing Environment

Wireless Communications and Mobile Computing ◽

10.1155/2018/1059231 ◽

2018 ◽

Vol 2018 ◽

pp. 1-11 ◽

Cited By ~ 1

Author(s):

Guangsheng Chen ◽

Pei Nie ◽

Weipeng Jing

Keyword(s):

Spatial Data ◽

File System ◽

Fold Increase ◽

Data Access ◽

Disk File ◽

Mobile Environment ◽

Traffic Demand ◽

Data Query ◽

Distributed Query ◽

Low Efficiency

With the development of network communication, a 1000-fold increase in traffic demand from 4G to 5G, it is critical to provide efficient and fast spatial data access interface for applications in mobile environment. In view of the low I/O efficiency and high latency of existing methods, this paper presents a memory-based spatial data query method that uses the distributed memory file system Alluxio to store data and build a two-level index based on the Alluxio key-value structure; moreover, it aims to solve the problem of low efficiency of traditional method; according to the characteristics of Spark computing framework, a data input format for spatial data query is proposed, which can selectively read the file data and reduce the data I/O. The comparative experiments show that the memory-based file system Alluxio has better I/O performance than the disk file system; compared with the traditional distributed query method, the method we proposed reduces the retrieval time greatly.

Download Full-text

GFFS — THE XSEDE GLOBAL FEDERATED FILE SYSTEM

Parallel Processing Letters ◽

10.1142/s0129626413400057 ◽

2013 ◽

Vol 23 (02) ◽

pp. 1340005 ◽

Cited By ~ 6

Author(s):

ANDREW GRIMSHAW ◽

MARK MORGAN ◽

AVINASH KALYANARAMAN

Keyword(s):

Access Control ◽

File System ◽

Data Access ◽

Computational Science ◽

Science And Engineering ◽

Organizational Boundaries ◽

Science Community ◽

Data Access Patterns ◽

Access Patterns ◽

Existing Data

Federated, secure, standardized, scalable, and transparent mechanism to access and share resources, particularly data resources, across organizational boundaries that does not require application modification and does not disrupt existing data access patterns has been needed for some time in the computational science community. The Global Federated File System (GFFS) addresses this need and is a foundational component of the NSF-funded eXtreme Science and Engineering Discovery Environment (XSEDE) program. The GFFS allows user applications to access (create, read, update, delete) remote resources in a location-transparent fashion. Existing applications, whether they are statically linked binaries, dynamically linked binaries, or scripts (shell, PERL, Python), can access resources anywhere in the GFFS without modification (subject to access control). In this paper we present an overview of the GFFS and its most common use cases: accessing data at an NSF center from a home or campus, accessing data on a campus machine from an NSF center, directly sharing data with a collaborator at another institution, accessing remote computing resources, and interacting with remote running jobs. We present these uses cases and how they are realized using the GFFS.

Download Full-text

Exploring Efficient Architectures on Remote In-Memory NVM over RDMA

ACM Transactions on Embedded Computing Systems ◽

10.1145/3477004 ◽

2021 ◽

Vol 20 (5s) ◽

pp. 1-20

Author(s):

Qingfeng Zhuge ◽

Hao Zhang ◽

Edwin Hsing-Mean Sha ◽

Rui Xu ◽

Jun Liu ◽

...

Keyword(s):

High Performance ◽

File System ◽

File Systems ◽

Data Access ◽

Main Memory ◽

Memory Modules ◽

Significant Performance ◽

Architectural Structures ◽

Memory Accesses ◽

Careful Design

Efficiently accessing remote file data remains a challenging problem for data processing systems. Development of technologies in non-volatile dual in-line memory modules (NVDIMMs), in-memory file systems, and RDMA networks provide new opportunities towards solving the problem of remote data access. A general understanding about NVDIMMs, such as Intel Optane DC Persistent Memory (DCPM), is that they expand main memory capacity with a cost of multiple times lower performance than DRAM. With an in-depth exploration presented in this paper, however, we show an interesting finding that the potential of NVDIMMs for high-performance, remote in-memory accesses can be revealed through careful design. We explore multiple architectural structures for accessing remote NVDIMMs in a real system using Optane DCPM, and compare the performance of various structures. Experiments are conducted to show significant performance gaps among different ways of using NVDIMMs as memory address space accessible through RDMA interface. Furthermore, we design and implement a prototype of user-level, in-memory file system, RIMFS, in the device DAX mode on Optane DCPM. By comparing against the DAX-supported Linux file system, Ext4-DAX, we show that the performance of remote reads on RIMFS over RDMA is 11.44 higher than that on a remote Ext4-DAX on average. The experimental results also show that the performance of remote accesses on RIMFS is maintained on a heavily loaded data server with CPU utilization as high as 90%, while the performance of remote reads on Ext4-DAX is significantly reduced by 49.3%, and the performance of local reads on Ext4-DAX is even more significantly reduced by 90.1%. The performance comparisons of writes exhibit the same trends.

Download Full-text

XtreemFS

Data Intensive Storage Services for Cloud Environments ◽

10.4018/978-1-4666-3934-8.ch016 ◽

2013 ◽

pp. 267-285 ◽

Cited By ~ 3

Author(s):

Jan Stender ◽

Michael Berlin ◽

Alexander Reinefeld

Keyword(s):

Data Storage ◽

Large Scale ◽

File System ◽

Fault Tolerant ◽

File Systems ◽

Data Access ◽

Comprehensive Overview ◽

Cloud Providers ◽

The Face ◽

Cloud Users

Cloud computing poses new challenges to data storage. While cloud providers use shared distributed hardware, which is inherently unreliable and insecure, cloud users expect their data to be safely and securely stored, available at any time, and accessible in the same way as their locally stored data. In this chapter, the authors present XtreemFS, a file system for the cloud. XtreemFS reconciles the need of cloud providers for cheap scale-out storage solutions with that of cloud users for a reliable, secure, and easy data access. The main contributions of the chapter are: a description of the internal architecture of XtreemFS, which presents an approach to build large-scale distributed POSIX-compliant file systems on top of cheap, off-the-shelf hardware; a description of the XtreemFS security infrastructure, which guarantees an isolation of individual users despite shared and insecure storage and network resources; a comprehensive overview of replication mechanisms in XtreemFS, which guarantee consistency, availability, and durability of data in the face of component failures; an overview of the snapshot infrastructure of XtreemFS, which allows to capture and freeze momentary states of the file system in a scalable and fault-tolerant fashion. The authors also compare XtreemFS with existing solutions and argue for its practicability and potential in the cloud storage market.

Download Full-text

Blockchain-Based Distributed Patient-Centric Image Management System

Applied Sciences ◽

10.3390/app11010196 ◽

2020 ◽

Vol 11 (1) ◽

pp. 196

Author(s):

Mohamed Yaseen Jabarulla ◽

Heung-No Lee

Keyword(s):

Access Control ◽

File System ◽

Medical Images ◽

Control Policy ◽

Data Access ◽

Image Management ◽

Concept Design ◽

Smart Contract ◽

Image Management System ◽

Patient Centric

In recent years, many researchers have focused on developing a feasible solution for storing and exchanging medical images in the field of health care. Current practices are deployed on cloud-based centralized data centers, which increase maintenance costs, require massive storage space, and raise privacy concerns about sharing information over a network. Therefore, it is important to design a framework to enable sharing and storing of big medical data efficiently within a trustless environment. In the present paper, we propose a novel proof-of-concept design for a distributed patient-centric image management (PCIM) system that is aimed to ensure safety and control of patient private data without using a centralized infrastructure. In this system, we employed an emerging Ethereum blockchain and a distributed file system technology called Inter-Planetary File System (IPFS). Then, we implemented an Ethereum smart contract called the patient-centric access control protocol to enable a distributed and trustworthy access control policy. IPFS provides the means for decentralized storage of medical images with global accessibility. We describe how the PCIM system architecture facilitates the distributed and secured patient-centric data access across multiple entities such as hospitals, patients, and image requestors. Finally, we deployed a smart contract prototype on an Ethereum testnet blockchain and evaluated the proposed framework within the Windows environment. The evaluation results demonstrated that the proposed scheme is efficient and feasible.

Download Full-text

Performance Evaluation of NVMe-over-TCP Using Journaling File Systems in International WAN

Electronics ◽

10.3390/electronics10202486 ◽

2021 ◽

Vol 10 (20) ◽

pp. 2486

Author(s):

Se-young Yu

Keyword(s):

File System ◽

Data Transfer ◽

Storage System ◽

File Systems ◽

Data Access ◽

Streaming Data ◽

Fine Tuning ◽

Long Distance ◽

Data Movement ◽

Remote Data

Distributing Big Data for science is pushing the capabilities of networks and computing systems. However, the fundamental concept of copying data from one machine to another has not been challenged in collaborative science. As recent storage system development uses modern fabrics to provide faster remote data access with lower overhead, traditional data movement using Data Transfer Nodes must cope with the paradigm shift from a store-and-forward model to streaming data with direct storage access over the networks. This study evaluates NVMe-over-TCP (NVMe-TCP) in a long-distance network using different file systems and configurations to characterize remote NVMe file system access performance in MAN and WAN data moving scenarios. We found that NVMe-TCP is more suitable for remote data read than remote data write over the networks, and using RAID0 can significantly improve performance in a long-distance network. Additionally, a fine-tuning file system can improve remote write performance in DTNs with a long-distance network.

Download Full-text

GFS: A Distributed File System with Multi-source Data Access and Replication for Grid Computing

Advances in Grid and Pervasive Computing - Lecture Notes in Computer Science ◽

10.1007/978-3-642-01671-4_12 ◽

2009 ◽

pp. 119-130 ◽

Cited By ~ 4

Author(s):

Chun-Ting Chen ◽

Chun-Chen Hsu ◽

Jan-Jan Wu ◽

Pangfeng Liu

Keyword(s):

Grid Computing ◽

File System ◽

Data Access ◽

Distributed File System ◽

Source Data

Download Full-text

Effective and Scalable Data Access Control in Onedata Large Scale Distributed Virtual File System

Procedia Computer Science ◽

10.1016/j.procs.2017.05.054 ◽

2017 ◽

Vol 108 ◽

pp. 445-454 ◽

Cited By ~ 1

Author(s):

Michaƚ Wrzeszcz ◽

Łukasz Opioƚa ◽

Konrad Zemek ◽

Bartosz Kryza ◽

Łukasz Dutka ◽

...

Keyword(s):

Access Control ◽

Large Scale ◽

File System ◽

Data Access ◽

Data Access Control ◽

Virtual File System

Download Full-text

ExSeisDat: A set of parallel I/O and workflow libraries for petroleum seismology

Oil & Gas Science and Technology – Revue d’IFP Energies nouvelles ◽

10.2516/ogst/2018048 ◽

2018 ◽

Vol 73 ◽

pp. 74 ◽

Cited By ~ 1

Author(s):

Meghan A. Fisher ◽

Pádraig Ó. Conbhuí ◽

Cathal Ó. Brion ◽

Jean-Thomas Acquaviva ◽

Seán Delaney ◽

...

Keyword(s):

Seismic Data ◽

Memory Management ◽

File System ◽

File Systems ◽

Data Access ◽

System Level ◽

Data Set ◽

Extreme Scale ◽

Data Access Patterns ◽

Access Patterns

Seismic data-sets are extremely large and are broken into data files, ranging in size from 100s of GiBs to 10s of TiBs and larger. The parallel I/O for these files is complex due to the amount of data along with varied and multiple access patterns within individual files. Properties of legacy file formats, such as the de-facto standard SEG-Y, also contribute to the decrease in developer productivity while working with these files. SEG-Y files embed their own internal layout which could lead to conflict with traditional, file-system-level layout optimization schemes. Additionally, as seismic files continue to increase in size, memory bottlenecks will be exacerbated, resulting in the need for smart I/O optimization not only to increase the efficiency of read/writes, but to manage memory usage as well. The ExSeisDat (Extreme-Scale Seismic Data) set of libraries addresses these problems through the development and implementation of easy to use, object oriented libraries that are portable and open source with bindings available in multiple languages. The lower level parallel I/O library, ExSeisPIOL (Extreme-Scale Seismic Parallel I/O Library), targets SEG-Y and other proprietary formats, simplifying I/O by internally interfacing MPI-I/O and other I/O interfaces. The I/O is explicitly handled; end users only need to define the memory limits, decomposition of I/O across processes, and data access patterns when reading and writing data. ExSeisPIOL bridges the layout gap between the SEG-Y file structure and file system organization. The higher level parallel seismic workflow library, ExSeisFlow (Extreme-Scale Seismic workFlow), leverages ExSeisPIOL, further simplifying I/O by implicitly handling all I/O parameters, thus allowing geophysicists to focus on domain-specific development. Operations in ExSeisFlow focus on prestack processing and can be performed on single traces, individual gathers, and across entire surveys, including out of core sorting, binning, filtering, and transforming. To optimize memory management, the workflow only reads in data pertinent to the operations being performed instead of an entire file. A smart caching system manages the read data, discarding it when no longer needed in the workflow. As the libraries are optimized to handle spatial and temporal locality, they are a natural fit to burst buffer technologies, particularly DDN’s Infinite Memory Engine (IME) system. With appropriate access semantics or through the direct exploitation of the low-level interfaces, the ExSeisDat stack on IME delivers a significant improvement to I/O performance over standalone parallel file systems like Lustre.

Download Full-text