The Belle II Raw Data Management System

The Belle II experiment, a major upgrade of the previous e+e− asymmetric collider experiment Belle, is expected to produce tens of petabytes of data per year due to the luminosity increase from the upgraded SuperKEKB accelerator. The distributed computing system of the Belle II experiment plays a key role, storing and distributing data in a reliable way to be easily accessed and analyzed by more than 1000 collaborators. In particular, the Belle II Raw Data Management system has been developed with an aim to upload output files onto grid storage, register them into the file and metadata catalogs, and make two replicas of the full raw data set using the Belle II Distributed Data Management system. It has been implemented as an extension of DIRAC (Distributed Infrastructure with Remote Agent Control) and consists of a database, services, client and monitoring tools, and several agents that treat the data automatically. The first year of data taken with the Belle II full detector has been managed by the Belle II Raw Data Management system successfully. The design, current status, and performance are presented. Prospects for improvements towards the full luminosity data taking are also reviewed.

Download Full-text

The Rucio File Catalog in DIRAC implemented for Belle II

EPJ Web of Conferences ◽

10.1051/epjconf/202125102026 ◽

2021 ◽

Vol 251 ◽

pp. 02026

Author(s):

Cédric Serfon ◽

John Steven De Stefano ◽

Michel Hernández Villanueva ◽

Hironori Ito ◽

Yuji Kato ◽

...

Keyword(s):

Data Management ◽

Management System ◽

Data Management System ◽

Distributed Data ◽

Workload Management ◽

Distributed Data Management ◽

Belle Ii

DIRAC and Rucio are two standard pieces of software widely used in the HEP domain. DIRAC provides Workload and Data Management functionalities, among other things, while Rucio is a dedicated, advanced Distributed Data Management system. Many communities that already use DIRAC have expressed their interest in using DIRAC for Workload Management in combination with Rucio for Data Management. In this paper, we describe the integration of the Rucio File Catalog into DIRAC that was initially developed for the Belle II collaboration.

Download Full-text

A distributed data management system to support large-scale data analysis

Journal of Systems and Software ◽

10.1016/j.jss.2018.11.007 ◽

2019 ◽

Vol 148 ◽

pp. 105-115 ◽

Cited By ~ 6

Author(s):

Tamer Z. Emara ◽

Joshua Zhexue Huang

Keyword(s):

Data Analysis ◽

Data Management ◽

Management System ◽

Large Scale ◽

Data Management System ◽

Distributed Data ◽

Distributed Data Management ◽

Large Scale Data ◽

Scale Data

Download Full-text

Experience of a low-maintenance distributed data management system

Journal of Physics Conference Series ◽

10.1088/1742-6596/513/3/032095 ◽

2014 ◽

Vol 513 (3) ◽

pp. 032095 ◽

Cited By ~ 1

Author(s):

Wataru Takase ◽

Yoshimi Matsumoto ◽

Adil Hasan ◽

Francesca Di Lodovico ◽

Yoshiyuki Watase ◽

...

Keyword(s):

Data Management ◽

Management System ◽

Data Management System ◽

Distributed Data ◽

Distributed Data Management

Download Full-text

BESIII Data Management System

EPJ Web of Conferences ◽

10.1051/epjconf/201921404001 ◽

2019 ◽

Vol 214 ◽

pp. 04001

Author(s):

Qiumei Ma ◽

Yao Zhang

Keyword(s):

Data Management ◽

Management System ◽

Data Management System ◽

Raw Data

The BESIII experiment has got about 1PB raw data and 1PB DST data from 2009 to date, so how to manage these data and condition data well is very important. The BESIII data managment system has run successfully for ten years, which has offered a full-featured, time-tested approach to BESIII offline and physics users. We designed an almost perfect structure for the system and had good backup and maintance strategy.

Download Full-text

The data management of heterogeneous resources in Belle II

EPJ Web of Conferences ◽

10.1051/epjconf/201921404031 ◽

2019 ◽

Vol 214 ◽

pp. 04031

Author(s):

Malachi Schram

Keyword(s):

Data Management ◽

High Speed ◽

Data Management System ◽

Distributed Data ◽

Global Networks ◽

Distributed Data Management ◽

Geographically Distributed ◽

Belle Ii ◽

Belle Experiment ◽

Heterogeneous Resources

The Belle II experiment at the SuperKEKB collider in Tsukuba, Japan, has started taking physics data in early 2018 and plans to accumulate 50 ab-1, which is approximately 50 times more data than the Belle experiment. The collaboration expects it will require managing and processing approximately 200 PB of data. Computing at this scale requires efficient and coordinated use of the geographically distributed compute resources in North America, Asia and Europe and will take advantage of high-speed global networks. We present the general Belle II the distributed data management system and computing results from the first phase of data taking.

Download Full-text

Integrating a dynamic data federation into the ATLAS distributed data management system

EPJ Web of Conferences ◽

10.1051/epjconf/201921407009 ◽

2019 ◽

Vol 214 ◽

pp. 07009

Author(s):

Frank Berghaus ◽

Tobias Wegner ◽

Mario Lassnig ◽

Marcus Ebert ◽

Cedric Serfon ◽

...

Keyword(s):

Data Management ◽

Management System ◽

Data Management System ◽

Distributed Data ◽

Dynamic Data ◽

Distributed Data Management ◽

Data Federation ◽

Other Information ◽

Multiple Copies ◽

Set Up

Input data for applications that run in cloud computing centres can be stored at remote repositories, typically with multiple copies of the most popular data stored at many sites. Locating and retrieving the remote data can be challenging, and we believe that federating the storage can address this problem. In this approach, the closest copy of the data is used based on geographical or other information. Currently, we are using the dynamic data federation, Dynafed, a software solution developed by CERN IT. Dynafed supports several industry standard interfaces, such as Amazon S3, Microsoft Azure and HTTP with WebDAV extensions. Dynafed functions as an abstraction layer under which protocol-dependent authentication details are hidden from the user, requiring the user to only provide an X509 certificate. We have set up an instance of Dynafed and integrated it into the ATLAS distributed data management system, Rucio. We report on the challenges faced during the installation and integration.

Download Full-text

Monitoring the atlas distributed data management system

Journal of Physics Conference Series ◽

10.1088/1742-6596/119/7/072027 ◽

2008 ◽

Vol 119 (7) ◽

pp. 072027

Author(s):

R Ricardo ◽

B Miguel ◽

G Benjamin ◽

G Vincent ◽

L Mario ◽

...

Keyword(s):

Data Management ◽

Management System ◽

Data Management System ◽

Distributed Data ◽

Distributed Data Management

Download Full-text

Distributed data management on Belle II

EPJ Web of Conferences ◽

10.1051/epjconf/202024504007 ◽

2020 ◽

Vol 245 ◽

pp. 04007 ◽

Cited By ~ 1

Author(s):

Siarhei Padolski ◽

Hironori Ito ◽

Paul Laycock ◽

Ruslan Mashinistov ◽

Hideki Miyake ◽

...

Keyword(s):

Data Management ◽

Current Status ◽

Distributed Data ◽

Fully Integrated ◽

Distributed Data Management ◽

Performance Improvements ◽

Significant Performance ◽

Belle Ii ◽

Work Done

The Belle II experiment started taking physics data in April 2018 with an estimated total volume of all files including raw events, Monte-Carlo and skim statistics of 340 petabytes expected by the end of operations in the late-2020s. Originally designed as a fully integrated component of the BelleDIRAC production system, the Belle II distributed data management (DDM) software needs to manage data across about 29 storage elements worldwide for a collaboration of nearly 1000 physicists. By late 2018, this software required significant performance improvements to meet the requirements of physics data taking and was seriously lacking in automation. Rucio, the DDM solution created by ATLAS, was an obvious alternative but required tight integration with BelleDIRAC and a seamless yet non-trivial migration. This contribution describes the work done on both DDM options, the current status of the software running successfully in production and the problems associated with trying to balance long-term operations cost against short term risk.

Download Full-text

MEDIC: An on Site Designed Data Management System for Cardiology

Methods of Information in Medicine ◽

10.1055/s-0038-1636473 ◽

1979 ◽

Vol 18 (04) ◽

pp. 199-202 ◽

Cited By ~ 4

Author(s):

F. Lustman ◽

P. Lanthier ◽

D. Charbonneau

Keyword(s):

Data Management ◽

Management System ◽

Selection Process ◽

Low Cost ◽

Database Management System ◽

Data Management System ◽

Data Quality Control ◽

Data Definition ◽

And Performance ◽

Data Independence

A patient-oriented data management system is described. The environment was cardiology with a heavy emphasis on research and the MEDIC system was designed to meet the day to day program needs. The data are organized in speciality files with dynamic patient records composed of subrecords of different types. The schema is described by a data definition language. Application packages include data quality control, medical reporting and general inquiry.After five years of extensive use in various clinical applications, its utility has been assessed as well as its low cost. The disadvantages, the main being the multifile structure, can now be stated as its advantages, like data independence and performance increase. Although the system is now partially outdated, the experience acquired with its use becomes very helpful in the selection process of the future database management system.

Download Full-text

Design and Performance Evaluation of Mobile Personal Big Data Management System based on ICN

Internet Technology Letters ◽

10.1002/itl2.237 ◽

2020 ◽

Author(s):

Lijun Chen ◽

Luca Sciullo ◽

Angelo Trotta

Keyword(s):

Big Data ◽

Performance Evaluation ◽

Data Management ◽

Management System ◽

Data Management System ◽

And Performance

Download Full-text