Cosmos : A Unified Accounting System both for the HTCondor and Slurm Clusters at IHEP

HTCondor was adopted to manage the High Throughput Computing (HTC) cluster at IHEP in 2016. In 2017 a Slurm cluster was set up to run High Performance Computing (HPC) jobs. To provide accounting services for these two clusters, we implemented a unified accounting system named Cosmos. Multiple workloads bring different accounting requirements. Briefly speaking, there are four types of jobs to account. First of all, 30 million single-core jobs run in the HTCondor cluster every year. Secondly, Virtual Machine (VM) jobs run in the legacy HTCondor VM cluster. Thirdly, parallel jobs run in the Slurm cluster, and some of these jobs are run on the GPU worker nodes to accelerate computing. Lastly, some selected HTC jobs are migrated from the HTCondor cluster to the Slurm cluster for research purposes. To satisfy all the mentioned requirements, Cosmos is implemented with four layers: acquisition, integration, statistics and presentation. Details about the issues and solutions of each layer will be presented in the paper. Cosmos has run in production for two years, and the status shows that it is a well-functioning system, also meets the requirements of the HTCondor and Slurm clusters.

Download Full-text

Using Managed High Performance Computing Systems for High-Throughput Computing

Conquering Big Data with High Performance Computing ◽

10.1007/978-3-319-33742-5_4 ◽

2016 ◽

pp. 61-79 ◽

Cited By ~ 2

Author(s):

Lucas A. Wilson

Keyword(s):

High Performance Computing ◽

High Throughput ◽

High Performance ◽

Computing Systems ◽

High Throughput Computing ◽

Performance Computing

Download Full-text

An Introduction to Big Data, High Performance Computing, High-Throughput Computing, and Hadoop

Conquering Big Data with High Performance Computing ◽

10.1007/978-3-319-33742-5_1 ◽

2016 ◽

pp. 1-12 ◽

Cited By ~ 1

Author(s):

Ritu Arora

Keyword(s):

Big Data ◽

High Performance Computing ◽

High Throughput ◽

High Performance ◽

High Throughput Computing ◽

Performance Computing

Download Full-text

Running IceCube GPU simulations on Titan

EPJ Web of Conferences ◽

10.1051/epjconf/201921403024 ◽

2019 ◽

Vol 214 ◽

pp. 03024

Author(s):

Vladimir Brik ◽

David Schultz ◽

Gonzalo Merino

Keyword(s):

High Performance Computing ◽

High Throughput ◽

High Performance ◽

Management Systems ◽

Workload Management ◽

High Throughput Computing ◽

Large Numbers ◽

Mpi Applications ◽

Performance Computing

Here we report IceCube’s first experiences of running GPU simulations on the Titan supercomputer. This undertaking was non-trivial because Titan is designed for High Performance Computing (HPC) workloads, whereas IceCube’s workloads fall under the High Throughput Computing (HTC) category. In particular: (i) Titan’s design, policies, and tools are geared heavily toward large MPI applications, while IceCube’s workloads consist of large numbers of relatively small independent jobs, (ii) Titan compute nodes run Cray Linux, which is not directly compatible with IceCube software, and (iii) Titan compute nodes cannot access outside networks, making it impossible to access IceCube’s CVMFS repositories and workload management systems. This report examines our experience of packaging our application in Singularity containers and using HTCondor as the second-level scheduler on the Titan supercomputer.

Download Full-text

High-Throughput Computing Versus High-Performance Computing for Groundwater Applications

Ground Water ◽

10.1111/gwat.12320 ◽

2015 ◽

Vol 53 (2) ◽

pp. 180-184 ◽

Cited By ~ 11

Author(s):

Michael N. Fienen ◽

Randall J. Hunt

Keyword(s):

High Performance Computing ◽

High Throughput ◽

High Performance ◽

High Throughput Computing ◽

Performance Computing

Download Full-text

A Debugging Standard for High-Performance Computing

Scientific Programming ◽

10.1155/2000/971291 ◽

2000 ◽

Vol 8 (2) ◽

pp. 95-108 ◽

Cited By ~ 4

Author(s):

Joan M. Francioni ◽

Cherri M. Pancake

Keyword(s):

High Performance Computing ◽

High Performance ◽

Base Level ◽

The Status ◽

Performance Debugging ◽

One Year ◽

Performance Computing

Throughout 1998, the High Performance Debugging Forum worked on defining a base level standard for high performance debuggers. The standard had to meet the sometimes conflicting constraints of being useful to users, realistically implementable by developers, and architecturally independent across multiple platforms. To meet criteria for timeliness, the standard had to be defined in one year and in such a way that it could be implemented within an additional year. The Forum was successful, and in November 1998 released Version 1 of the HPD Standard. Implementations of the standard are currently underway. This paper presents an overview of Version 1 of the standard and an analysis of the process by which the standard was developed. The status of implementation efforts and plans for follow-on efforts are discussed as well.

Download Full-text

High-Performance Computing In High-Throughput Sequencing

Biological Knowledge Discovery Handbook ◽

10.1002/9781118617151.ch43 ◽

2013 ◽

pp. 981-1002 ◽

Cited By ~ 1

Author(s):

Kamer Kaya ◽

Ayat Hatem ◽

Hatice Gülçin Özer ◽

Kun Huang ◽

Ümit V. Çatalyürek

Keyword(s):

High Performance Computing ◽

High Throughput ◽

High Performance ◽

High Throughput Sequencing ◽

Performance Computing

Download Full-text

MapReduce Accounting System Integrated with High-Performance Computing Infrastructure

Proceedings of the 2015 International Conference on Big Data Applications and Services - BigDAS '15 ◽

10.1145/2837060.2837122 ◽

2015 ◽

Author(s):

Chia-Chuan Chuang

Keyword(s):

High Performance Computing ◽

High Performance ◽

Accounting System ◽

Performance Computing ◽

Computing Infrastructure

Download Full-text

Improvements of common open Grid standards to increase High Throughput and High Performance Computing effectiveness on large-scale Grid and e-science infrastructures

2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW) ◽

10.1109/ipdpsw.2010.5470916 ◽

2010 ◽

Cited By ~ 2

Author(s):

M. Riedel ◽

M.S. Memon ◽

A.S. Memon ◽

A. Streit ◽

F. Wolf ◽

...

Keyword(s):

High Performance Computing ◽

High Throughput ◽

High Performance ◽

Large Scale ◽

Large Scale Grid ◽

Scale Grid ◽

Performance Computing

Download Full-text

ComputeOps: Container for High Performance Computing

EPJ Web of Conferences ◽

10.1051/epjconf/202024507006 ◽

2020 ◽

Vol 245 ◽

pp. 07006

Author(s):

Cécile Cavet ◽

Martin Souchal ◽

Sébastien Gadrat ◽

Gilles Grasseau ◽

Andrea Satirana ◽

...

Keyword(s):

High Performance Computing ◽

High Performance ◽

Virtual Machines ◽

Key Concepts ◽

The Status ◽

Image Building ◽

Computing Framework ◽

Performance Computing ◽

Linux Containers

The High Performance Computing (HPC) domain aims to optimize code in order to use the latest multicore and parallel technologies including specific processor instructions. In this computing framework, portability and reproducibility are key concepts. A way to handle these requirements is to use Linux containers. These “light virtual machines” allow to encapsulate applications within its environment in Linux processes. Containers have been recently rediscovered due to their abilities to provide both multi-infrastructure environnement for developers and system administrators and reproducibility due to image building file. Two container solutions are emerging: Docker for microservices and Singularity for computing applications. We present here the status of the ComputeOps project which has the goal to study the benefit of containers for HPC applications.

Download Full-text

An Interference-Aware Virtual Machine Placement Strategy for High Performance Computing Applications in Clouds

2018 Symposium on High Performance Computing Systems (WSCAD) ◽

10.1109/wscad.2018.00024 ◽

2018 ◽

Author(s):

Maicon Melo Alves ◽

Luan Teylo ◽

Yuri Frota ◽

Lucia M.A. Drummond

Keyword(s):

High Performance Computing ◽

Virtual Machine ◽

High Performance ◽

Virtual Machine Placement ◽

Performance Computing

Download Full-text