Building and using containers at HPC centres for the ATLAS experiment

The HPC environment presents several challenges to the ATLAS experiment in running their automated computational workflows smoothly and efficiently, in particular regarding issues such as software distribution and I/O load. A vital component of the LHC Computing Grid, CVMFS, is not always available in HPC environments. ATLAS computing has experimented with all-inclusive containers, and later developed an environment to produce such containers for both Shifter and Singularity. The all-inclusive containers include most of the recent ATLAS software releases, database releases, and other tools extracted from CVMFS. This helped ATLAS to distribute software automatically to HPC centres with an environment identical to those in CVMFS. It also significantly reduced the metadata I/O load to HPC shared file systems. The production operation at NERSC has proved that by using this type of containers, we can transparently fit into the previously developed ATLAS operation methods, and at the same time scale up to run many more jobs.

Download Full-text

Performance Evaluations of Distributed File Systems for Scientific Big Data in FUSE Environment

Electronics ◽

10.3390/electronics10121471 ◽

2021 ◽

Vol 10 (12) ◽

pp. 1471

Author(s):

Jun-Yeong Lee ◽

Moon-Hyun Kim ◽

Syed Asif Raza Raza Shah ◽

Sang-Un Ahn ◽

Heejun Yoon ◽

...

Keyword(s):

Data Storage ◽

Scale Up ◽

File Systems ◽

Performance Evaluations ◽

Distributed File Systems ◽

Data Intensive Computing ◽

Data Intensive ◽

Tremendous Amount ◽

Computing Environments ◽

And Performance

Data are important and ever growing in data-intensive scientific environments. Such research data growth requires data storage systems that play pivotal roles in data management and analysis for scientific discoveries. Redundant Array of Independent Disks (RAID), a well-known storage technology combining multiple disks into a single large logical volume, has been widely used for the purpose of data redundancy and performance improvement. However, this requires RAID-capable hardware or software to build up a RAID-enabled disk array. In addition, it is difficult to scale up the RAID-based storage. In order to mitigate such a problem, many distributed file systems have been developed and are being actively used in various environments, especially in data-intensive computing facilities, where a tremendous amount of data have to be handled. In this study, we investigated and benchmarked various distributed file systems, such as Ceph, GlusterFS, Lustre and EOS for data-intensive environments. In our experiment, we configured the distributed file systems under a Reliable Array of Independent Nodes (RAIN) structure and a Filesystem in Userspace (FUSE) environment. Our results identify the characteristics of each file system that affect the read and write performance depending on the features of data, which have to be considered in data-intensive computing environments.

Download Full-text

Standalone containers with ATLAS offline software

EPJ Web of Conferences ◽

10.1051/epjconf/202024507010 ◽

2020 ◽

Vol 245 ◽

pp. 07010

Author(s):

Marcelo Vogel ◽

Mikhail Borodin ◽

Alessandra Forti ◽

Lukas Heinrich

Keyword(s):

File Systems ◽

Network Connectivity ◽

Data Reconstruction ◽

Atlas Experiment ◽

Network Connection ◽

Software Packages ◽

Processing Data ◽

Runtime Environment ◽

Detector Simulation ◽

Runtime Environments

This paper describes the deployment of the offline software of the ATLAS experiment at LHC in containers for use in production workflows such as simulation and reconstruction. To achieve this goal we are using Docker and Singularity, which are both lightweight virtualization technologies that can encapsulate software packages inside complete file systems. The deployment of offline releases via containers removes the interdependence between the runtime environment needed for job execution and the configuration of the computing nodes at the sites. Docker or Singularity would provide a uniform runtime environment for the grid, HPCs and for a variety of opportunistic resources. Additionally, releases may be supplemented with a detector’s conditions data, thus removing the need for network connectivity at computing nodes, which is normally quite restricted for HPCs. In preparation to achieve this goal, we have built Docker and Singularity images containing single full releases of ATLAS software for running detector simulation and reconstruction jobs in runtime environments without a network connection. Unlike similar efforts to produce containers by packing all possible dependencies of every possible workflow into heavy images (≈ 200GB), our approach is to include only what is needed for specific workflows and to manage dependencies efficiently via software package managers. This approach leads to more stable packaged releases where the dependencies are clear and the resulting images have more portable sizes ( 16GB). In an effort to cover a wider variety of workflows, we are deploying images that can be used in raw data reconstruction. This is particularly challenging due to the high database resource consumption during the access to the experiment’s conditions payload when processing data. We describe here a prototype pipeline in which images are provisioned only with the conditions payload necessary to satisfy the jobs’ requirements. This database-on-demand approach would keep images slim, portable and capable of supporting various workflows in a standalone fashion in environments with no network connectivity.

Download Full-text

Performance Measurement on Scale-Up and Scale-Out Hadoop with Remote and Local File Systems

2016 IEEE 9th International Conference on Cloud Computing (CLOUD) ◽

10.1109/cloud.2016.0067 ◽

2016 ◽

Cited By ~ 6

Author(s):

Zhuozhao Li ◽

Haiying Shen

Keyword(s):

Performance Measurement ◽

Scale Up ◽

File Systems ◽

Local File

Download Full-text

Continuous leaching of uranium from an Indian ore: Residence time scale up and heat effects

Hydrometallurgy ◽

10.1016/j.hydromet.2014.03.014 ◽

2014 ◽

Vol 146 ◽

pp. 119-127 ◽

Cited By ~ 4

Author(s):

K. Anand Rao ◽

T. Sreenivas ◽

Madhu Vinjamur ◽

A.K. Suri

Keyword(s):

Residence Time ◽

Time Scale ◽

Scale Up ◽

Heat Effects

Download Full-text

Pre-emptive Innovation Infrastructure for Medical Emergencies: Accelerating Healthcare Innovation in the Wake of a Global Pandemic

Frontiers in Digital Health ◽

10.3389/fdgth.2021.648520 ◽

2021 ◽

Vol 3 ◽

Author(s):

Khalil B. Ramadi ◽

Shriya S. Srinivasan

Keyword(s):

Time Scale ◽

Scale Up ◽

Academic Research ◽

Regulatory Processes ◽

Innovation Infrastructure ◽

Global Pandemic ◽

Academic Medical ◽

Medical Emergencies ◽

Funding Mechanisms ◽

Healthcare Innovation

Healthcare innovation is impeded by high costs, the need for diverse skillsets, and complex regulatory processes. The COVID-19 pandemic exposed critical gaps in the current framework, especially those lying at the boundary between cutting-edge academic research and industry-scale manufacturing and production. While many resource-rich geographies were equipped with the required expertise to solve challenges posed by the pandemic, mechanisms to unite the appropriate institutions and scale up, fund, and mobilize solutions at a time-scale relevant to the emergency were lacking. We characterize the orthogonal spatial and temporal axes that dictate innovation. Improving on their limitations, we propose a “pre-emptive innovation infrastructure” incorporating in-house hospital innovation teams, consortia-based assembly of expertise, and novel funding mechanisms to combat future emergencies. By leveraging the strengths of academic, medical, government, and industrial institutions, this framework could improve ongoing innovation and supercharge the infrastructure for healthcare emergencies.

Download Full-text

Improvements in utilisation of the Czech national HPC center

EPJ Web of Conferences ◽

10.1051/epjconf/202024509010 ◽

2020 ◽

Vol 245 ◽

pp. 09010

Author(s):

Michal Svatoš ◽

Jiří Chudoba ◽

Petr Vokáč

Keyword(s):

Distributed Computing ◽

File System ◽

Computing System ◽

Limiting Factor ◽

Atlas Experiment ◽

Shared File ◽

Most Significant Change ◽

Significant Change ◽

Grid Site

The distributed computing system of the ATLAS experiment at LHC is allowed to opportunistically use resources at the Czech national HPC center IT4Innovations in Ostrava. The jobs are submitted via an ARC Compute Element (ARC-CE) installed at the grid site in Prague. Scripts and input files are shared between the ARC-CE and a shared file system located at the HPC centre via sshfs. This basic submission system has worked there since the end of 2017. Several improvements were made to increase the amount of resource that ATLAS can use. The most significant change was the migration of the submission system to enable pre-emptable jobs, to adapt to the HPC management’s decision to start pre-empting opportunistic jobs. Another improvement of the submission system was related to the sshfs connection which seemed to be a limiting factor of the system. Now, the submission system consists of several ARC-CE machines. Also, various parameters of sshfs were tested in an attempt to increase throughput. As a result of the improvements, the utilisation of the Czech national HPC center by the ATLAS distributed computing increased.

Download Full-text

ATLAS Software Installation on Supercomputers

EPJ Web of Conferences ◽

10.1051/epjconf/201921403040 ◽

2019 ◽

Vol 214 ◽

pp. 03040 ◽

Cited By ~ 1

Author(s):

Alexander Undrus

Keyword(s):

Data Processing ◽

High Performance ◽

Source Code ◽

Integration Testing ◽

Instruction Sets ◽

Oak Ridge ◽

Atlas Experiment ◽

Performance Loss ◽

High Performance Computers ◽

Atlas Software

PowerPC and high-performance computers (HPC) are important resources for computing in the ATLAS experiment. The future LHC data processing will require more resources than Grid computing, currently using approximately 100,000 cores at well over 100 sites, can provide. Supercomputers are extremely powerful as they utilize hundreds of thousands of CPUs joined together. However, their architectures have different instruction sets. ATLAS binary software distributions for x86 chipsets do not fit these architectures, as emulation of these chipsets results in huge performance loss. This paper describes the methodology of ATLAS software installation from source code on supercomputers. The installation procedure includes downloading the ATLAS simulation release code with 0.7 million C++ and Python lines as well as the source code of more than 50 external packages, such as ROOT and Geant4, followed by compilation, and rigorous unit and integration testing. The presentation reports the application of this procedure at Titan HPC and Summit PowerPC at Oak Ridge Computing Facility (OLCF).

Download Full-text

Conclusions

The Evolutionary Biology of Species ◽

10.1093/oso/9780198749745.003.0011 ◽

2019 ◽

pp. 213-218

Author(s):

Timothy G. Barraclough

Keyword(s):

Time Series Data ◽

Scale Up ◽

Current Data ◽

Model Systems ◽

Series Data ◽

Develop Model ◽

Genome Data ◽

Vital Component ◽

Future Work

This final chapter summarizes conclusions from the book and highlights a few general areas for future work. The species model for the structure of diversity is found to be useful and largely supported by current data, but is open to future tests against explicit alternative models. It is also a vital component for understanding and predicting contemporary evolution in the diverse systems that all organisms live in. The common evolutionary framework for microbial and multicellular life is highlighted, while drawing attention to current gaps in understanding for each type of organism. Future work needs to scale up to develop model systems of diverse assemblages and clades, including time-series data ranging from contemporary to geological scales. The imminent avalanche of genome data for thousands of individuals sampled within and between species is identified as a key challenge and opportunity. Finally, this chapter repeats the challenge that evolutionary biologists should embrace diversity and need to attempt to predict evolution in diverse systems, in order to deliver solutions of benefit to society.

Download Full-text

Approach to transverse uniformity of concentration distribution of a solute in a solvent flowing along a straight pipe

Journal of Fluid Mechanics ◽

10.1017/jfm.2013.648 ◽

2014 ◽

Vol 740 ◽

pp. 196-213 ◽

Cited By ~ 63

Author(s):

Zi Wu ◽

G. Q. Chen

Keyword(s):

Time Scale ◽

Concentration Distribution ◽

Scale Up ◽

Straight Pipe ◽

Concentration Difference ◽

Molecular Diffusivity ◽

Initial Stage ◽

The Mean ◽

Mean Concentration ◽

Transverse Variation

AbstractAssociated with Taylor’s classical analysis of scalar solute dispersion in the laminar flow of a solvent in a straight pipe, this work explores the approach towards transverse uniformity of concentration distribution. Mei’s homogenization technique is extended to find solutions for the concentration transport. Chatwin’s result for the approach to longitudinal normality is recovered in terms of the mean concentration over the cross-section. The asymmetrical structure of the concentration cloud and the transverse variation of the concentration distribution are concretely illustrated for the initial stage. The rate of approach to uniformity is shown to be much slower than that to normality. When the longitudinal normality of mean concentration is well established, the maximum transverse concentration difference remains near one-half of the centroid concentration of the cloud. A time scale up to$10 R^2/D$($R$is the radius of the pipe and$D$is the molecular diffusivity) is suggested to characterize the transition to transverse uniformity, in contrast to the time scale of$0.1 R^2/D$estimated by Taylor for the initial stage of dispersion, and that of$1.0 R^2/D$by Chatwin for longitudinal normality.

Download Full-text

Performance evaluation of distributed file systems for the phase-II upgrade of the ATLAS experiment at CERN

Journal of Physics Conference Series ◽

10.1088/1742-6596/1525/1/012028 ◽

2020 ◽

Vol 1525 ◽

pp. 012028

Author(s):

Adam Abed Abud ◽

Fabrice Le Goff ◽

Giuseppe Avolio

Keyword(s):

Performance Evaluation ◽

Phase Ii ◽

File Systems ◽

Distributed File Systems ◽

Atlas Experiment

Download Full-text