scholarly journals Cosmos : A Unified Accounting System both for the HTCondor and Slurm Clusters at IHEP

2020 ◽  
Vol 245 ◽  
pp. 07060
Author(s):  
Ran Du ◽  
Jingyan Shi ◽  
Xiaowei Jiang ◽  
Jiaheng Zou

HTCondor was adopted to manage the High Throughput Computing (HTC) cluster at IHEP in 2016. In 2017 a Slurm cluster was set up to run High Performance Computing (HPC) jobs. To provide accounting services for these two clusters, we implemented a unified accounting system named Cosmos. Multiple workloads bring different accounting requirements. Briefly speaking, there are four types of jobs to account. First of all, 30 million single-core jobs run in the HTCondor cluster every year. Secondly, Virtual Machine (VM) jobs run in the legacy HTCondor VM cluster. Thirdly, parallel jobs run in the Slurm cluster, and some of these jobs are run on the GPU worker nodes to accelerate computing. Lastly, some selected HTC jobs are migrated from the HTCondor cluster to the Slurm cluster for research purposes. To satisfy all the mentioned requirements, Cosmos is implemented with four layers: acquisition, integration, statistics and presentation. Details about the issues and solutions of each layer will be presented in the paper. Cosmos has run in production for two years, and the status shows that it is a well-functioning system, also meets the requirements of the HTCondor and Slurm clusters.

2019 ◽  
Vol 214 ◽  
pp. 03024
Author(s):  
Vladimir Brik ◽  
David Schultz ◽  
Gonzalo Merino

Here we report IceCube’s first experiences of running GPU simulations on the Titan supercomputer. This undertaking was non-trivial because Titan is designed for High Performance Computing (HPC) workloads, whereas IceCube’s workloads fall under the High Throughput Computing (HTC) category. In particular: (i) Titan’s design, policies, and tools are geared heavily toward large MPI applications, while IceCube’s workloads consist of large numbers of relatively small independent jobs, (ii) Titan compute nodes run Cray Linux, which is not directly compatible with IceCube software, and (iii) Titan compute nodes cannot access outside networks, making it impossible to access IceCube’s CVMFS repositories and workload management systems. This report examines our experience of packaging our application in Singularity containers and using HTCondor as the second-level scheduler on the Titan supercomputer.


2000 ◽  
Vol 8 (2) ◽  
pp. 95-108 ◽  
Author(s):  
Joan M. Francioni ◽  
Cherri M. Pancake

Throughout 1998, the High Performance Debugging Forum worked on defining a base level standard for high performance debuggers. The standard had to meet the sometimes conflicting constraints of being useful to users, realistically implementable by developers, and architecturally independent across multiple platforms. To meet criteria for timeliness, the standard had to be defined in one year and in such a way that it could be implemented within an additional year. The Forum was successful, and in November 1998 released Version 1 of the HPD Standard. Implementations of the standard are currently underway. This paper presents an overview of Version 1 of the standard and an analysis of the process by which the standard was developed. The status of implementation efforts and plans for follow-on efforts are discussed as well.


Author(s):  
Kamer Kaya ◽  
Ayat Hatem ◽  
Hatice Gülçin Özer ◽  
Kun Huang ◽  
Ümit V. Çatalyürek

2020 ◽  
Vol 245 ◽  
pp. 07006
Author(s):  
Cécile Cavet ◽  
Martin Souchal ◽  
Sébastien Gadrat ◽  
Gilles Grasseau ◽  
Andrea Satirana ◽  
...  

The High Performance Computing (HPC) domain aims to optimize code in order to use the latest multicore and parallel technologies including specific processor instructions. In this computing framework, portability and reproducibility are key concepts. A way to handle these requirements is to use Linux containers. These “light virtual machines” allow to encapsulate applications within its environment in Linux processes. Containers have been recently rediscovered due to their abilities to provide both multi-infrastructure environnement for developers and system administrators and reproducibility due to image building file. Two container solutions are emerging: Docker for microservices and Singularity for computing applications. We present here the status of the ComputeOps project which has the goal to study the benefit of containers for HPC applications.


Sign in / Sign up

Export Citation Format

Share Document