Operating an HPC/HTC Cluster with Fully Containerized Jobs Using HTCondor, Singularity, CephFS and CVMFS
AbstractHigh performance and high throughput computing (HPC/HTC) is challenged by ever increasing demands on the software stacks and more and more diverging requirements by different research communities. This led to a reassessment of the operational concept of HPC/HTC clusters at the Physikalisches Institut at the University of Bonn. As a result, the present HPC/HTC cluster (named BAF2) introduced various conceptual changes compared to conventional clusters. All jobs are now run in containers and a container-aware resource management system is used which allowed us to switch to a model without login/head nodes. Furthermore, a modern, feature-rich storage system with powerful interfaces has been deployed. We describe the design considerations, the implemented functionality and the operational experience gained with this new-generation setup which turned out to be very successful and well-accepted by its users.