Providing large-scale disk storage at CERN

The CERN IT Storage group operates multiple distributed storage systems and is responsible for the support of the infrastructure to accommodate all CERN storage requirements, from the physics data generated by LHC and non-LHC experiments to the personnel users' files. EOS is now the key component of the CERN Storage strategy. It allows to operate at high incoming throughput for experiment data-taking while running concurrent complex production work-loads. This high-performance distributed storage provides now more than 250PB of raw disks and it is the key component behind the success of CERNBox, the CERN cloud synchronisation service which allows syncing and sharing files on all major mobile and desktop platforms to provide offline availability to any data stored in the EOS infrastructure. CERNBox recorded an exponential growth in the last couple of year in terms of files and data stored thanks to its increasing popularity inside CERN users community and thanks to its integration with a multitude of other CERN services (Batch, SWAN, Microsoft Office). In parallel CASTOR is being simplified and transitioning from an HSM into an archival system, focusing mainly in the long-term data recording of the primary data from the detectors, preparing the road to the next-generation tape archival system, CTA. The storage services at CERN cover as well the needs of the rest of our community: Ceph as data back-end for the CERN OpenStack infrastructure, NFS services and S3 functionality; AFS for legacy home directory filesystem services and its ongoing phase-out and CVMFS for software distribution. In this paper we will summarise our experience in supporting all our distributed storage system and the ongoing work in evolving our infrastructure, testing very-dense storage building block (nodes with more than 1PB of raw space) for the challenges waiting ahead.

Download Full-text

Testing of complex, large-scale distributed storage systems: a CERN disk storage case study

EPJ Web of Conferences ◽

10.1051/epjconf/201921405008 ◽

2019 ◽

Vol 214 ◽

pp. 05008 ◽

Cited By ~ 1

Author(s):

Jozsef Makai ◽

Andreas Joachim Peters ◽

Georgios Bitzes ◽

Elvin Alin Sindrilaru ◽

Michal Kamil Simon ◽

...

Keyword(s):

Large Scale ◽

Distributed Storage ◽

Storage System ◽

Critical System ◽

Disk Storage ◽

Testing Framework ◽

The People ◽

Working Together ◽

Distributed Storage Systems

Complex, large-scale distributed systems are frequently used to solve extraordinary computing, storage and other problems. However, the development of these systems usually requires working with several software components, maintaining and improving a large codebase and also providing a collaborative environment for many developers working together. The central role that such complex systems play in mission critical tasks and also in the daily activity of the users means that any software bug affecting the availability of the service has far reaching effects. Providing an easily extensible testing framework is a pre-requisite for building both confidence in the system but also among developers who contribute to the code. The testing framework can address concrete bugs found in the odebase thus avoiding any future regressions and also provides a high degree of confidence for the people contributing new code. Easily incorporating other people's work into the project greatly helps scaling out manpower so that having more developers contributing to the project can actually result in more work being done rather then more bugs added. In this paper we go through the case study of EOS, the CERN disk storage system and introduce the methods and mechanisms of how to achieve all-automatic regression and robustness testing along with continuous integration for such a large-scale, complex and critical system using a container-based environment.

Download Full-text

Monitoring of a Grid Storage Virtualization Service

International Journal of Grid and High Performance Computing ◽

10.4018/jghpc.2013010104 ◽

2013 ◽

Vol 5 (1) ◽

pp. 53-69

Author(s):

Jacques Jorda ◽

Aurélien Ortiz ◽

Abdelaziz M’zoughi ◽

Salam Traboulsi

Keyword(s):

Monitoring System ◽

Data Storage ◽

Large Scale ◽

Distributed Storage ◽

Storage System ◽

Data Access ◽

Data Placement ◽

Workload Prediction ◽

Storage Virtualization

Grid computing is commonly used for large scale application requiring huge computation capabilities. In such distributed architectures, the data storage on the distributed storage resources must be handled by a dedicated storage system to ensure the required quality of service. In order to simplify the data placement on nodes and to increase the performance of applications, a storage virtualization layer can be used. This layer can be a single parallel filesystem (like GPFS) or a more complex middleware. The latter is preferred as it allows the data placement on the nodes to be tuned to increase both the reliability and the performance of data access. Thus, in such a middleware, a dedicated monitoring system must be used to ensure optimal performance. In this paper, the authors briefly introduce the Visage middleware – a middleware for storage virtualization. They present the most broadly used grid monitoring systems, and explain why they are not adequate for virtualized storage monitoring. The authors then present the architecture of their monitoring system dedicated to storage virtualization. We introduce the workload prediction model used to define the best node for data placement, and show on a simple experiment its accuracy.

Download Full-text

A Review: Map Reduce Framework for Cloud Computing

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i4.6.20224 ◽

2018 ◽

Vol 7 (4.6) ◽

pp. 13

Author(s):

Mekala Sandhya ◽

Ashish Ladda ◽

Dr. Uma N Dulhare ◽

. . ◽

. .

Keyword(s):

Data Mining ◽

Cloud Computing ◽

Distributed Computing ◽

Data Storage ◽

High Performance ◽

Large Scale ◽

Distributed Storage ◽

Large Data ◽

Mass Data ◽

Internet Information

In this generation of Internet, information and data are growing continuously. Even though various Internet services and applications. The amount of information is increasing rapidly. Hundred billions even trillions of web indexes exist. Such large data brings people a mass of information and more difficulty discovering useful knowledge in these huge amounts of data at the same time. Cloud computing can provide infrastructure for large data. Cloud computing has two significant characteristics of distributed computing i.e. scalability, high availability. The scalability can seamlessly extend to large-scale clusters. Availability says that cloud computing can bear node errors. Node failures will not affect the program to run correctly. Cloud computing with data mining does significant data processing through high-performance machine. Mass data storage and distributed computing provide a new method for mass data mining and become an effective solution to the distributed storage and efficient computing in data mining.

Download Full-text

Cache Support in a High Performance Fault-Tolerant Distributed Storage System for Cloud and Big Data

2015 IEEE International Parallel and Distributed Processing Symposium Workshop ◽

10.1109/ipdpsw.2015.65 ◽

2015 ◽

Cited By ~ 1

Author(s):

Lars Lundberg ◽

Hakan Grahn ◽

Dragos Ilie ◽

Christian Melander

Keyword(s):

Big Data ◽

High Performance ◽

Fault Tolerant ◽

Distributed Storage ◽

Storage System ◽

Distributed Storage System

Download Full-text

Large-Scale Distributed Storage System for Business Provenance

2011 IEEE 4th International Conference on Cloud Computing ◽

10.1109/cloud.2011.28 ◽

2011 ◽

Cited By ~ 15

Author(s):

Szabolcs Rozsnyai ◽

Aleksander Slominski ◽

Yurdaer Doganata

Keyword(s):

Large Scale ◽

Distributed Storage ◽

Storage System ◽

Distributed Storage System

Download Full-text

Using High Performance Scientific Computing to Accelerate the Discovery and Design of Nuclear Power Applications

Advances in Systems Analysis, Software Engineering, and High Performance Computing - Research and Applications in Global Supercomputing ◽

10.4018/978-1-4666-7461-5.ch005 ◽

2015 ◽

pp. 119-148

Author(s):

Liviu Popa-Simil

Keyword(s):

Nuclear Power ◽

High Performance ◽

Large Scale ◽

Scientific Computing ◽

Scale Up ◽

Development Stage ◽

Power Structures ◽

The Road ◽

Macro Scale ◽

Current Calculation

Present High Performance Scientific Computing (HPSC) systems are facing strong limitations when full integration from nano-materials to operational system is desired. The HPSC have to be upgraded from the actual designed exa-scale machines probably available after 2015 to even higher computer power and storage capability to yotta-scale in order to simulate systems from nano-scale up to macro scale as a way to greatly improve the safety and performances of the future advanced nuclear power structures. The road from the actual peta-scale systems to yotta-scale computers, which would barely be sufficient for current calculation needs, is difficult and requires new revolutionary ideas in HPSC, and probably the large-scale use of Quantum Supercomputers (QSC) that are now in the development stage.

Download Full-text

A Memory Architecture Design for High-Performance Cloud Computing

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.532-533.677 ◽

2012 ◽

Vol 532-533 ◽

pp. 677-681

Author(s):

Li Qun Luo ◽

Si Jin He

Keyword(s):

High Performance ◽

Large Scale ◽

Storage System ◽

Data Access ◽

Memory Model ◽

Low Latency ◽

Memory Architecture ◽

Distributed Environment ◽

Access Data ◽

Performance Computing

The advent of cloud is drastically changing the High Performance Computing (HPC) application scenarios. Current virtual machine-based IaaS architectures are not designed for HPC applications. This paper presents a new cloud oriented storage system by constructing a large scale memory grid in a distributed environment in order to support low latency data access of HPC applications. This Cloud Memory model is built through the implementation of a private virtual file system (PVFS) upon virtual operating system (OS) that allows HPC applications to access data in such a way that Cloud Memory can access local disks in the same fashion.

Download Full-text

A Large Site Centralized Data Classification Strategy Based on User Value

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.1030-1032.1619 ◽

2014 ◽

Vol 1030-1032 ◽

pp. 1619-1622

Author(s):

Bing Xin Zhu ◽

Jing Tao Li

Keyword(s):

High Performance ◽

Large Scale ◽

Storage System ◽

Data Classification ◽

Data Access ◽

Heat Index ◽

Storage Devices ◽

Hierarchical Storage ◽

Classification Strategy ◽

User Value

In large-scale storage system, variety of calculations, transfer, and storage devices both in performance and in characteristics such as reliability, there are physical differences. While operational load data access for storage devices is also not uniform, there is a big difference in space and time. If all the data is stored in the high-performance equipment is unrealistic and unwise. Hierarchical storage concept effectively solves this problem. It is able to monitor the data access loads, and depending on the load and application requirements based on storage resources optimally configure properties [1]. Traditional classification policy is generally against file data, based on frequency of access to files, file IO heat index for classification. This paper embarks from the website user value concept, aiming at the disadvantages of traditional data classification strategy, puts forward the centralized data classification strategy based on user value.

Download Full-text

Flex: A Flexible Block-Level Distributed Storage System

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.513-517.1046 ◽

2014 ◽

Vol 513-517 ◽

pp. 1046-1051

Author(s):

Yong Chuan Li ◽

Yu Xing Peng ◽

Hui Ba Li

Keyword(s):

High Performance ◽

Distributed Storage ◽

Rapid Development ◽

Storage System ◽

Distributed Storage System ◽

Block Level ◽

Storage Resource ◽

Distributed Storage Systems ◽

Performance Results ◽

Storage Structures

With the rapid development of cloud computing, there are many storage structures have been proposed for satisfying cloud-based softwares requirements. Most existing distributed storage systems focus on a certain objective and only provide a certain storage structure. In this paper we present a novel block-level distributed storage system named Flex which integrates storage resource dispersed on the network into a whole one. Flex uses a device mapping framework to create dynamic and flexible storage structures for users. We have implemented the prototype and evaluated its performance; results show that Flex can provide a high performance in diverse storage structures.

Download Full-text

Mass storage interface LTSM for FAIR Phase 0 data acquisition

EPJ Web of Conferences ◽

10.1051/epjconf/202024501018 ◽

2020 ◽

Vol 245 ◽

pp. 01018

Author(s):

Jörn Adamczewski-Musch ◽

Thomas Stibor

Keyword(s):

Data Acquisition ◽

Data Storage ◽

High Performance ◽

Large Scale ◽

Storage System ◽

Storage Management ◽

Mass Storage ◽

Branch System ◽

Phase 0 ◽

Mass Storage System

Since 2018 several FAIR Phase 0 beamtimes have been operated at GSI, Darmstadt. Here the new challenging technologies for the upcoming FAIR facility shall be tested while various physics experiments are performed with the existing GSI accelerators. One of these challenges concerns the performance, reliability, and scalability of the experiment data storage. Raw data as collected by event building software of large scale detector data acquisition has to be safely written to a mass storage system like a magnetic tape library. Besides this long term archive, it is often required to process this data as soon as possible on a high performance compute farm. The C library LTSM (“Lightweight Tivoli Storage Management”) has been developed at the GSI IT department based on the IBM TSM software. It provides a file API that allows for writing raw listmode data files via TCP/IP sockets directly to an IBM TSM storage server. Moreover, the LTSM library offers Lustre HSM (“Hierarchical Storage Management”) capabilities for seamlessly archiving and retrieving data stored on Lustre file system and TSM server. In spring 2019 LTSM has been employed at the FAIR Phase 0 beamtimes at GSI. For the HADES experiment LTSM was implemented into the DABC (“Data Acquisition Backbone Core”) event building software. During the 4 weeks of [email protected] AGeV beam, the HADES event builders have transferred about 400 TB of data via 8 parallel 10 GbE sockets, both to the TSM archive and to the “GSI green cube” HPC farm. For other FAIR Phase 0 experiments using the vintage MBS (“Multi Branch System”) event builders, an LTSM gateway application has been developed to connect the legacy RFIO (“Remote File I/O”) protocol of these DAQ systems with the new storage interface.

Download Full-text