scholarly journals The Belle II Raw Data Management System

2020 ◽  
Vol 245 ◽  
pp. 04005 ◽  
Author(s):  
Michel Hernández Villanueva ◽  
Ikuo Ueda

The Belle II experiment, a major upgrade of the previous e+e− asymmetric collider experiment Belle, is expected to produce tens of petabytes of data per year due to the luminosity increase from the upgraded SuperKEKB accelerator. The distributed computing system of the Belle II experiment plays a key role, storing and distributing data in a reliable way to be easily accessed and analyzed by more than 1000 collaborators. In particular, the Belle II Raw Data Management system has been developed with an aim to upload output files onto grid storage, register them into the file and metadata catalogs, and make two replicas of the full raw data set using the Belle II Distributed Data Management system. It has been implemented as an extension of DIRAC (Distributed Infrastructure with Remote Agent Control) and consists of a database, services, client and monitoring tools, and several agents that treat the data automatically. The first year of data taken with the Belle II full detector has been managed by the Belle II Raw Data Management system successfully. The design, current status, and performance are presented. Prospects for improvements towards the full luminosity data taking are also reviewed.

2021 ◽  
Vol 251 ◽  
pp. 02026
Author(s):  
Cédric Serfon ◽  
John Steven De Stefano ◽  
Michel Hernández Villanueva ◽  
Hironori Ito ◽  
Yuji Kato ◽  
...  

DIRAC and Rucio are two standard pieces of software widely used in the HEP domain. DIRAC provides Workload and Data Management functionalities, among other things, while Rucio is a dedicated, advanced Distributed Data Management system. Many communities that already use DIRAC have expressed their interest in using DIRAC for Workload Management in combination with Rucio for Data Management. In this paper, we describe the integration of the Rucio File Catalog into DIRAC that was initially developed for the Belle II collaboration.


2014 ◽  
Vol 513 (3) ◽  
pp. 032095 ◽  
Author(s):  
Wataru Takase ◽  
Yoshimi Matsumoto ◽  
Adil Hasan ◽  
Francesca Di Lodovico ◽  
Yoshiyuki Watase ◽  
...  

2019 ◽  
Vol 214 ◽  
pp. 04001
Author(s):  
Qiumei Ma ◽  
Yao Zhang

The BESIII experiment has got about 1PB raw data and 1PB DST data from 2009 to date, so how to manage these data and condition data well is very important. The BESIII data managment system has run successfully for ten years, which has offered a full-featured, time-tested approach to BESIII offline and physics users. We designed an almost perfect structure for the system and had good backup and maintance strategy.


2019 ◽  
Vol 214 ◽  
pp. 04031
Author(s):  
Malachi Schram

The Belle II experiment at the SuperKEKB collider in Tsukuba, Japan, has started taking physics data in early 2018 and plans to accumulate 50 ab-1, which is approximately 50 times more data than the Belle experiment. The collaboration expects it will require managing and processing approximately 200 PB of data. Computing at this scale requires efficient and coordinated use of the geographically distributed compute resources in North America, Asia and Europe and will take advantage of high-speed global networks. We present the general Belle II the distributed data management system and computing results from the first phase of data taking.


2019 ◽  
Vol 214 ◽  
pp. 07009
Author(s):  
Frank Berghaus ◽  
Tobias Wegner ◽  
Mario Lassnig ◽  
Marcus Ebert ◽  
Cedric Serfon ◽  
...  

Input data for applications that run in cloud computing centres can be stored at remote repositories, typically with multiple copies of the most popular data stored at many sites. Locating and retrieving the remote data can be challenging, and we believe that federating the storage can address this problem. In this approach, the closest copy of the data is used based on geographical or other information. Currently, we are using the dynamic data federation, Dynafed, a software solution developed by CERN IT. Dynafed supports several industry standard interfaces, such as Amazon S3, Microsoft Azure and HTTP with WebDAV extensions. Dynafed functions as an abstraction layer under which protocol-dependent authentication details are hidden from the user, requiring the user to only provide an X509 certificate. We have set up an instance of Dynafed and integrated it into the ATLAS distributed data management system, Rucio. We report on the challenges faced during the installation and integration.


2008 ◽  
Vol 119 (7) ◽  
pp. 072027
Author(s):  
R Ricardo ◽  
B Miguel ◽  
G Benjamin ◽  
G Vincent ◽  
L Mario ◽  
...  

2020 ◽  
Vol 245 ◽  
pp. 04007 ◽  
Author(s):  
Siarhei Padolski ◽  
Hironori Ito ◽  
Paul Laycock ◽  
Ruslan Mashinistov ◽  
Hideki Miyake ◽  
...  

The Belle II experiment started taking physics data in April 2018 with an estimated total volume of all files including raw events, Monte-Carlo and skim statistics of 340 petabytes expected by the end of operations in the late-2020s. Originally designed as a fully integrated component of the BelleDIRAC production system, the Belle II distributed data management (DDM) software needs to manage data across about 29 storage elements worldwide for a collaboration of nearly 1000 physicists. By late 2018, this software required significant performance improvements to meet the requirements of physics data taking and was seriously lacking in automation. Rucio, the DDM solution created by ATLAS, was an obvious alternative but required tight integration with BelleDIRAC and a seamless yet non-trivial migration. This contribution describes the work done on both DDM options, the current status of the software running successfully in production and the problems associated with trying to balance long-term operations cost against short term risk.


1979 ◽  
Vol 18 (04) ◽  
pp. 199-202 ◽  
Author(s):  
F. Lustman ◽  
P. Lanthier ◽  
D. Charbonneau

A patient-oriented data management system is described. The environment was cardiology with a heavy emphasis on research and the MEDIC system was designed to meet the day to day program needs. The data are organized in speciality files with dynamic patient records composed of subrecords of different types. The schema is described by a data definition language. Application packages include data quality control, medical reporting and general inquiry.After five years of extensive use in various clinical applications, its utility has been assessed as well as its low cost. The disadvantages, the main being the multifile structure, can now be stated as its advantages, like data independence and performance increase. Although the system is now partially outdated, the experience acquired with its use becomes very helpful in the selection process of the future database management system.


Sign in / Sign up

Export Citation Format

Share Document