Using Managed High Performance Computing Systems for High-Throughput Computing

Here we report IceCube’s first experiences of running GPU simulations on the Titan supercomputer. This undertaking was non-trivial because Titan is designed for High Performance Computing (HPC) workloads, whereas IceCube’s workloads fall under the High Throughput Computing (HTC) category. In particular: (i) Titan’s design, policies, and tools are geared heavily toward large MPI applications, while IceCube’s workloads consist of large numbers of relatively small independent jobs, (ii) Titan compute nodes run Cray Linux, which is not directly compatible with IceCube software, and (iii) Titan compute nodes cannot access outside networks, making it impossible to access IceCube’s CVMFS repositories and workload management systems. This report examines our experience of packaging our application in Singularity containers and using HTCondor as the second-level scheduler on the Titan supercomputer.

Download Full-text

High-Throughput Computing Versus High-Performance Computing for Groundwater Applications

Ground Water ◽

10.1111/gwat.12320 ◽

2015 ◽

Vol 53 (2) ◽

pp. 180-184 ◽

Cited By ~ 11

Author(s):

Michael N. Fienen ◽

Randall J. Hunt

Keyword(s):

High Performance Computing ◽

High Throughput ◽

High Performance ◽

High Throughput Computing ◽

Performance Computing

Download Full-text

Cosmos : A Unified Accounting System both for the HTCondor and Slurm Clusters at IHEP

EPJ Web of Conferences ◽

10.1051/epjconf/202024507060 ◽

2020 ◽

Vol 245 ◽

pp. 07060

Author(s):

Ran Du ◽

Jingyan Shi ◽

Xiaowei Jiang ◽

Jiaheng Zou

Keyword(s):

High Performance Computing ◽

Virtual Machine ◽

High Throughput ◽

High Performance ◽

Accounting System ◽

High Throughput Computing ◽

Parallel Jobs ◽

The Status ◽

Set Up ◽

Performance Computing

HTCondor was adopted to manage the High Throughput Computing (HTC) cluster at IHEP in 2016. In 2017 a Slurm cluster was set up to run High Performance Computing (HPC) jobs. To provide accounting services for these two clusters, we implemented a unified accounting system named Cosmos. Multiple workloads bring different accounting requirements. Briefly speaking, there are four types of jobs to account. First of all, 30 million single-core jobs run in the HTCondor cluster every year. Secondly, Virtual Machine (VM) jobs run in the legacy HTCondor VM cluster. Thirdly, parallel jobs run in the Slurm cluster, and some of these jobs are run on the GPU worker nodes to accelerate computing. Lastly, some selected HTC jobs are migrated from the HTCondor cluster to the Slurm cluster for research purposes. To satisfy all the mentioned requirements, Cosmos is implemented with four layers: acquisition, integration, statistics and presentation. Details about the issues and solutions of each layer will be presented in the paper. Cosmos has run in production for two years, and the status shows that it is a well-functioning system, also meets the requirements of the HTCondor and Slurm clusters.

Download Full-text

MonSTer: An Out-of-the-Box Monitoring Tool for High Performance Computing Systems

2020 IEEE International Conference on Cluster Computing (CLUSTER) ◽

10.1109/cluster49012.2020.00022 ◽

2020 ◽

Author(s):

Jie Li ◽

Ghazanfar Ali ◽

Ngan Nguyen ◽

Jon Hass ◽

Alan Sill ◽

...

Keyword(s):

High Performance Computing ◽

High Performance ◽

Monitoring Tool ◽

Computing Systems ◽

Performance Computing

Download Full-text

Session details: Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)

ACM SIGMETRICS Performance Evaluation Review ◽

10.1145/3263957 ◽

2011 ◽

Vol 38 (4) ◽

Keyword(s):

High Performance Computing ◽

High Performance ◽

Performance Modeling ◽

International Workshop ◽

Special Issue ◽

Computing Systems ◽

Performance Computing

Download Full-text

Treasure Hunt Framework: Distributing Metaheuristics on High Performance Computing Systems

Swarm and Evolutionary Computation ◽

10.1016/j.swevo.2021.100906 ◽

2021 ◽

pp. 100906

Author(s):

Peter Frank Perroni ◽

Myriam Regattieri Delgado ◽

Daniel Weingaertner

Keyword(s):

High Performance Computing ◽

High Performance ◽

Computing Systems ◽

Performance Computing

Download Full-text

Session details: Special issue on the 2nd international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 11)

ACM SIGMETRICS Performance Evaluation Review ◽

10.1145/3264251 ◽

2012 ◽

Vol 40 (2) ◽

Keyword(s):

High Performance Computing ◽

High Performance ◽

Performance Modeling ◽

International Workshop ◽

Special Issue ◽

Computing Systems ◽

Performance Computing

Download Full-text

Achieving Safety for Power Shifting in Overprovisioned High Performance Computing Systems

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) ◽

10.1109/ipdpsw.2016.160 ◽

2016 ◽

Author(s):

Shirley Moore

Keyword(s):

High Performance Computing ◽

High Performance ◽

Computing Systems ◽

Performance Computing

Download Full-text

Overview of AIM: supporting computer vision on heterogeneous high-performance computing systems

10.1117/12.323476 ◽

1998 ◽

Author(s):

Monica Sweat ◽

Joseph N. Wilson

Keyword(s):

Computer Vision ◽

High Performance Computing ◽

High Performance ◽

Computing Systems ◽

Performance Computing

Download Full-text