Interoperable job execution and data access through UNICORE and the Global Federated File System

Author(s):  
Shahbaz Memon ◽  
Morris Riedel ◽  
Chris Koeritz ◽  
Andrew Grimshaw
Keyword(s):  
2019 ◽  
Vol 214 ◽  
pp. 04005
Author(s):  
Jan Knedlik ◽  
Paul Kramp ◽  
Kilian Schwarz ◽  
Thorsten Kollegger

XRootD† has been established as a standard for WAN data access in HEP and HENP. Site specific features, like those existing at GSI, have historically been hard to implement with native methods. XRootD allows a custom replacement of basic functionality for native XRootD functions through the use of plug-ins. XRootD clients allow this since version 4.0. In this contribution, our XRootD based developments motivated by the use in the current ALICE Tier 2 Centre at GSI and the upcoming ALICE Analysis Facility will be shown. Among other things, an XRootD redirector plug-in which redirects local clients directly to a shared file system, as well as the needed changes to the XRootD base code, which are publicly available since XRootD version 4.8.0, will be presented. Furthermore, a prototype for an XRootD based disk caching system for opportunistic resources has been developed.


2018 ◽  
Vol 2018 ◽  
pp. 1-11 ◽  
Author(s):  
Guangsheng Chen ◽  
Pei Nie ◽  
Weipeng Jing

With the development of network communication, a 1000-fold increase in traffic demand from 4G to 5G, it is critical to provide efficient and fast spatial data access interface for applications in mobile environment. In view of the low I/O efficiency and high latency of existing methods, this paper presents a memory-based spatial data query method that uses the distributed memory file system Alluxio to store data and build a two-level index based on the Alluxio key-value structure; moreover, it aims to solve the problem of low efficiency of traditional method; according to the characteristics of Spark computing framework, a data input format for spatial data query is proposed, which can selectively read the file data and reduce the data I/O. The comparative experiments show that the memory-based file system Alluxio has better I/O performance than the disk file system; compared with the traditional distributed query method, the method we proposed reduces the retrieval time greatly.


2013 ◽  
Vol 23 (02) ◽  
pp. 1340005 ◽  
Author(s):  
ANDREW GRIMSHAW ◽  
MARK MORGAN ◽  
AVINASH KALYANARAMAN

Federated, secure, standardized, scalable, and transparent mechanism to access and share resources, particularly data resources, across organizational boundaries that does not require application modification and does not disrupt existing data access patterns has been needed for some time in the computational science community. The Global Federated File System (GFFS) addresses this need and is a foundational component of the NSF-funded eXtreme Science and Engineering Discovery Environment (XSEDE) program. The GFFS allows user applications to access (create, read, update, delete) remote resources in a location-transparent fashion. Existing applications, whether they are statically linked binaries, dynamically linked binaries, or scripts (shell, PERL, Python), can access resources anywhere in the GFFS without modification (subject to access control). In this paper we present an overview of the GFFS and its most common use cases: accessing data at an NSF center from a home or campus, accessing data on a campus machine from an NSF center, directly sharing data with a collaborator at another institution, accessing remote computing resources, and interacting with remote running jobs. We present these uses cases and how they are realized using the GFFS.


2021 ◽  
Vol 20 (5s) ◽  
pp. 1-20
Author(s):  
Qingfeng Zhuge ◽  
Hao Zhang ◽  
Edwin Hsing-Mean Sha ◽  
Rui Xu ◽  
Jun Liu ◽  
...  

Efficiently accessing remote file data remains a challenging problem for data processing systems. Development of technologies in non-volatile dual in-line memory modules (NVDIMMs), in-memory file systems, and RDMA networks provide new opportunities towards solving the problem of remote data access. A general understanding about NVDIMMs, such as Intel Optane DC Persistent Memory (DCPM), is that they expand main memory capacity with a cost of multiple times lower performance than DRAM. With an in-depth exploration presented in this paper, however, we show an interesting finding that the potential of NVDIMMs for high-performance, remote in-memory accesses can be revealed through careful design. We explore multiple architectural structures for accessing remote NVDIMMs in a real system using Optane DCPM, and compare the performance of various structures. Experiments are conducted to show significant performance gaps among different ways of using NVDIMMs as memory address space accessible through RDMA interface. Furthermore, we design and implement a prototype of user-level, in-memory file system, RIMFS, in the device DAX mode on Optane DCPM. By comparing against the DAX-supported Linux file system, Ext4-DAX, we show that the performance of remote reads on RIMFS over RDMA is 11.44 higher than that on a remote Ext4-DAX on average. The experimental results also show that the performance of remote accesses on RIMFS is maintained on a heavily loaded data server with CPU utilization as high as 90%, while the performance of remote reads on Ext4-DAX is significantly reduced by 49.3%, and the performance of local reads on Ext4-DAX is even more significantly reduced by 90.1%. The performance comparisons of writes exhibit the same trends.


Author(s):  
Jan Stender ◽  
Michael Berlin ◽  
Alexander Reinefeld

Cloud computing poses new challenges to data storage. While cloud providers use shared distributed hardware, which is inherently unreliable and insecure, cloud users expect their data to be safely and securely stored, available at any time, and accessible in the same way as their locally stored data. In this chapter, the authors present XtreemFS, a file system for the cloud. XtreemFS reconciles the need of cloud providers for cheap scale-out storage solutions with that of cloud users for a reliable, secure, and easy data access. The main contributions of the chapter are: a description of the internal architecture of XtreemFS, which presents an approach to build large-scale distributed POSIX-compliant file systems on top of cheap, off-the-shelf hardware; a description of the XtreemFS security infrastructure, which guarantees an isolation of individual users despite shared and insecure storage and network resources; a comprehensive overview of replication mechanisms in XtreemFS, which guarantee consistency, availability, and durability of data in the face of component failures; an overview of the snapshot infrastructure of XtreemFS, which allows to capture and freeze momentary states of the file system in a scalable and fault-tolerant fashion. The authors also compare XtreemFS with existing solutions and argue for its practicability and potential in the cloud storage market.


2020 ◽  
Vol 11 (1) ◽  
pp. 196
Author(s):  
Mohamed Yaseen Jabarulla ◽  
Heung-No Lee

In recent years, many researchers have focused on developing a feasible solution for storing and exchanging medical images in the field of health care. Current practices are deployed on cloud-based centralized data centers, which increase maintenance costs, require massive storage space, and raise privacy concerns about sharing information over a network. Therefore, it is important to design a framework to enable sharing and storing of big medical data efficiently within a trustless environment. In the present paper, we propose a novel proof-of-concept design for a distributed patient-centric image management (PCIM) system that is aimed to ensure safety and control of patient private data without using a centralized infrastructure. In this system, we employed an emerging Ethereum blockchain and a distributed file system technology called Inter-Planetary File System (IPFS). Then, we implemented an Ethereum smart contract called the patient-centric access control protocol to enable a distributed and trustworthy access control policy. IPFS provides the means for decentralized storage of medical images with global accessibility. We describe how the PCIM system architecture facilitates the distributed and secured patient-centric data access across multiple entities such as hospitals, patients, and image requestors. Finally, we deployed a smart contract prototype on an Ethereum testnet blockchain and evaluated the proposed framework within the Windows environment. The evaluation results demonstrated that the proposed scheme is efficient and feasible.


Electronics ◽  
2021 ◽  
Vol 10 (20) ◽  
pp. 2486
Author(s):  
Se-young Yu

Distributing Big Data for science is pushing the capabilities of networks and computing systems. However, the fundamental concept of copying data from one machine to another has not been challenged in collaborative science. As recent storage system development uses modern fabrics to provide faster remote data access with lower overhead, traditional data movement using Data Transfer Nodes must cope with the paradigm shift from a store-and-forward model to streaming data with direct storage access over the networks. This study evaluates NVMe-over-TCP (NVMe-TCP) in a long-distance network using different file systems and configurations to characterize remote NVMe file system access performance in MAN and WAN data moving scenarios. We found that NVMe-TCP is more suitable for remote data read than remote data write over the networks, and using RAID0 can significantly improve performance in a long-distance network. Additionally, a fine-tuning file system can improve remote write performance in DTNs with a long-distance network.


2017 ◽  
Vol 108 ◽  
pp. 445-454 ◽  
Author(s):  
Michaƚ Wrzeszcz ◽  
Łukasz Opioƚa ◽  
Konrad Zemek ◽  
Bartosz Kryza ◽  
Łukasz Dutka ◽  
...  

Author(s):  
Meghan A. Fisher ◽  
Pádraig Ó. Conbhuí ◽  
Cathal Ó. Brion ◽  
Jean-Thomas Acquaviva ◽  
Seán Delaney ◽  
...  

Seismic data-sets are extremely large and are broken into data files, ranging in size from 100s of GiBs to 10s of TiBs and larger. The parallel I/O for these files is complex due to the amount of data along with varied and multiple access patterns within individual files. Properties of legacy file formats, such as the de-facto standard SEG-Y, also contribute to the decrease in developer productivity while working with these files. SEG-Y files embed their own internal layout which could lead to conflict with traditional, file-system-level layout optimization schemes. Additionally, as seismic files continue to increase in size, memory bottlenecks will be exacerbated, resulting in the need for smart I/O optimization not only to increase the efficiency of read/writes, but to manage memory usage as well. The ExSeisDat (Extreme-Scale Seismic Data) set of libraries addresses these problems through the development and implementation of easy to use, object oriented libraries that are portable and open source with bindings available in multiple languages. The lower level parallel I/O library, ExSeisPIOL (Extreme-Scale Seismic Parallel I/O Library), targets SEG-Y and other proprietary formats, simplifying I/O by internally interfacing MPI-I/O and other I/O interfaces. The I/O is explicitly handled; end users only need to define the memory limits, decomposition of I/O across processes, and data access patterns when reading and writing data. ExSeisPIOL bridges the layout gap between the SEG-Y file structure and file system organization. The higher level parallel seismic workflow library, ExSeisFlow (Extreme-Scale Seismic workFlow), leverages ExSeisPIOL, further simplifying I/O by implicitly handling all I/O parameters, thus allowing geophysicists to focus on domain-specific development. Operations in ExSeisFlow focus on prestack processing and can be performed on single traces, individual gathers, and across entire surveys, including out of core sorting, binning, filtering, and transforming. To optimize memory management, the workflow only reads in data pertinent to the operations being performed instead of an entire file. A smart caching system manages the read data, discarding it when no longer needed in the workflow. As the libraries are optimized to handle spatial and temporal locality, they are a natural fit to burst buffer technologies, particularly DDN’s Infinite Memory Engine (IME) system. With appropriate access semantics or through the direct exploitation of the low-level interfaces, the ExSeisDat stack on IME delivers a significant improvement to I/O performance over standalone parallel file systems like Lustre.


Sign in / Sign up

Export Citation Format

Share Document