scholarly journals Building and using containers at HPC centres for the ATLAS experiment

2019 ◽  
Vol 214 ◽  
pp. 07005
Author(s):  
Douglas Benjamin ◽  
Taylor Childers ◽  
David Lesny ◽  
Danila Oleynik ◽  
Sergey Panitkin ◽  
...  

The HPC environment presents several challenges to the ATLAS experiment in running their automated computational workflows smoothly and efficiently, in particular regarding issues such as software distribution and I/O load. A vital component of the LHC Computing Grid, CVMFS, is not always available in HPC environments. ATLAS computing has experimented with all-inclusive containers, and later developed an environment to produce such containers for both Shifter and Singularity. The all-inclusive containers include most of the recent ATLAS software releases, database releases, and other tools extracted from CVMFS. This helped ATLAS to distribute software automatically to HPC centres with an environment identical to those in CVMFS. It also significantly reduced the metadata I/O load to HPC shared file systems. The production operation at NERSC has proved that by using this type of containers, we can transparently fit into the previously developed ATLAS operation methods, and at the same time scale up to run many more jobs.

Electronics ◽  
2021 ◽  
Vol 10 (12) ◽  
pp. 1471
Author(s):  
Jun-Yeong Lee ◽  
Moon-Hyun Kim ◽  
Syed Asif Raza Raza Shah ◽  
Sang-Un Ahn ◽  
Heejun Yoon ◽  
...  

Data are important and ever growing in data-intensive scientific environments. Such research data growth requires data storage systems that play pivotal roles in data management and analysis for scientific discoveries. Redundant Array of Independent Disks (RAID), a well-known storage technology combining multiple disks into a single large logical volume, has been widely used for the purpose of data redundancy and performance improvement. However, this requires RAID-capable hardware or software to build up a RAID-enabled disk array. In addition, it is difficult to scale up the RAID-based storage. In order to mitigate such a problem, many distributed file systems have been developed and are being actively used in various environments, especially in data-intensive computing facilities, where a tremendous amount of data have to be handled. In this study, we investigated and benchmarked various distributed file systems, such as Ceph, GlusterFS, Lustre and EOS for data-intensive environments. In our experiment, we configured the distributed file systems under a Reliable Array of Independent Nodes (RAIN) structure and a Filesystem in Userspace (FUSE) environment. Our results identify the characteristics of each file system that affect the read and write performance depending on the features of data, which have to be considered in data-intensive computing environments.


2020 ◽  
Vol 245 ◽  
pp. 07010
Author(s):  
Marcelo Vogel ◽  
Mikhail Borodin ◽  
Alessandra Forti ◽  
Lukas Heinrich

This paper describes the deployment of the offline software of the ATLAS experiment at LHC in containers for use in production workflows such as simulation and reconstruction. To achieve this goal we are using Docker and Singularity, which are both lightweight virtualization technologies that can encapsulate software packages inside complete file systems. The deployment of offline releases via containers removes the interdependence between the runtime environment needed for job execution and the configuration of the computing nodes at the sites. Docker or Singularity would provide a uniform runtime environment for the grid, HPCs and for a variety of opportunistic resources. Additionally, releases may be supplemented with a detector’s conditions data, thus removing the need for network connectivity at computing nodes, which is normally quite restricted for HPCs. In preparation to achieve this goal, we have built Docker and Singularity images containing single full releases of ATLAS software for running detector simulation and reconstruction jobs in runtime environments without a network connection. Unlike similar efforts to produce containers by packing all possible dependencies of every possible workflow into heavy images (≈ 200GB), our approach is to include only what is needed for specific workflows and to manage dependencies efficiently via software package managers. This approach leads to more stable packaged releases where the dependencies are clear and the resulting images have more portable sizes ( 16GB). In an effort to cover a wider variety of workflows, we are deploying images that can be used in raw data reconstruction. This is particularly challenging due to the high database resource consumption during the access to the experiment’s conditions payload when processing data. We describe here a prototype pipeline in which images are provisioned only with the conditions payload necessary to satisfy the jobs’ requirements. This database-on-demand approach would keep images slim, portable and capable of supporting various workflows in a standalone fashion in environments with no network connectivity.


2014 ◽  
Vol 146 ◽  
pp. 119-127 ◽  
Author(s):  
K. Anand Rao ◽  
T. Sreenivas ◽  
Madhu Vinjamur ◽  
A.K. Suri

2021 ◽  
Vol 3 ◽  
Author(s):  
Khalil B. Ramadi ◽  
Shriya S. Srinivasan

Healthcare innovation is impeded by high costs, the need for diverse skillsets, and complex regulatory processes. The COVID-19 pandemic exposed critical gaps in the current framework, especially those lying at the boundary between cutting-edge academic research and industry-scale manufacturing and production. While many resource-rich geographies were equipped with the required expertise to solve challenges posed by the pandemic, mechanisms to unite the appropriate institutions and scale up, fund, and mobilize solutions at a time-scale relevant to the emergency were lacking. We characterize the orthogonal spatial and temporal axes that dictate innovation. Improving on their limitations, we propose a “pre-emptive innovation infrastructure” incorporating in-house hospital innovation teams, consortia-based assembly of expertise, and novel funding mechanisms to combat future emergencies. By leveraging the strengths of academic, medical, government, and industrial institutions, this framework could improve ongoing innovation and supercharge the infrastructure for healthcare emergencies.


2020 ◽  
Vol 245 ◽  
pp. 09010
Author(s):  
Michal Svatoš ◽  
Jiří Chudoba ◽  
Petr Vokáč

The distributed computing system of the ATLAS experiment at LHC is allowed to opportunistically use resources at the Czech national HPC center IT4Innovations in Ostrava. The jobs are submitted via an ARC Compute Element (ARC-CE) installed at the grid site in Prague. Scripts and input files are shared between the ARC-CE and a shared file system located at the HPC centre via sshfs. This basic submission system has worked there since the end of 2017. Several improvements were made to increase the amount of resource that ATLAS can use. The most significant change was the migration of the submission system to enable pre-emptable jobs, to adapt to the HPC management’s decision to start pre-empting opportunistic jobs. Another improvement of the submission system was related to the sshfs connection which seemed to be a limiting factor of the system. Now, the submission system consists of several ARC-CE machines. Also, various parameters of sshfs were tested in an attempt to increase throughput. As a result of the improvements, the utilisation of the Czech national HPC center by the ATLAS distributed computing increased.


2019 ◽  
Vol 214 ◽  
pp. 03040 ◽  
Author(s):  
Alexander Undrus

PowerPC and high-performance computers (HPC) are important resources for computing in the ATLAS experiment. The future LHC data processing will require more resources than Grid computing, currently using approximately 100,000 cores at well over 100 sites, can provide. Supercomputers are extremely powerful as they utilize hundreds of thousands of CPUs joined together. However, their architectures have different instruction sets. ATLAS binary software distributions for x86 chipsets do not fit these architectures, as emulation of these chipsets results in huge performance loss. This paper describes the methodology of ATLAS software installation from source code on supercomputers. The installation procedure includes downloading the ATLAS simulation release code with 0.7 million C++ and Python lines as well as the source code of more than 50 external packages, such as ROOT and Geant4, followed by compilation, and rigorous unit and integration testing. The presentation reports the application of this procedure at Titan HPC and Summit PowerPC at Oak Ridge Computing Facility (OLCF).


Author(s):  
Timothy G. Barraclough

This final chapter summarizes conclusions from the book and highlights a few general areas for future work. The species model for the structure of diversity is found to be useful and largely supported by current data, but is open to future tests against explicit alternative models. It is also a vital component for understanding and predicting contemporary evolution in the diverse systems that all organisms live in. The common evolutionary framework for microbial and multicellular life is highlighted, while drawing attention to current gaps in understanding for each type of organism. Future work needs to scale up to develop model systems of diverse assemblages and clades, including time-series data ranging from contemporary to geological scales. The imminent avalanche of genome data for thousands of individuals sampled within and between species is identified as a key challenge and opportunity. Finally, this chapter repeats the challenge that evolutionary biologists should embrace diversity and need to attempt to predict evolution in diverse systems, in order to deliver solutions of benefit to society.


2014 ◽  
Vol 740 ◽  
pp. 196-213 ◽  
Author(s):  
Zi Wu ◽  
G. Q. Chen

AbstractAssociated with Taylor’s classical analysis of scalar solute dispersion in the laminar flow of a solvent in a straight pipe, this work explores the approach towards transverse uniformity of concentration distribution. Mei’s homogenization technique is extended to find solutions for the concentration transport. Chatwin’s result for the approach to longitudinal normality is recovered in terms of the mean concentration over the cross-section. The asymmetrical structure of the concentration cloud and the transverse variation of the concentration distribution are concretely illustrated for the initial stage. The rate of approach to uniformity is shown to be much slower than that to normality. When the longitudinal normality of mean concentration is well established, the maximum transverse concentration difference remains near one-half of the centroid concentration of the cloud. A time scale up to$10 R^2/D$($R$is the radius of the pipe and$D$is the molecular diffusivity) is suggested to characterize the transition to transverse uniformity, in contrast to the time scale of$0.1 R^2/D$estimated by Taylor for the initial stage of dispersion, and that of$1.0 R^2/D$by Chatwin for longitudinal normality.


Sign in / Sign up

Export Citation Format

Share Document