PanDA and RADICAL-Pilot Integration: Enabling the Pilot Paradigm on HPC Resources

PanDA executes millions of ATLAS jobs a month on Grid systems with more than 300,000 cores. Currently, PanDA is compatible only with few high-performance computing (HPC) resources due to different edge services and operational policies; does not implement the pilot paradigm on HPC; and does not dynamically optimize resource allocation among queues. We integrated the PanDA Harvester service and the RADICAL-Pilot (RP) system to overcome these limitations and enable the execution of ATLAS, Molecular Dy-namics and other workloads on HPC resources. This paper offer two main con-tributions: (1) introducing PanDA Harvester and RADICAL-Pilot, two systems independent developed to support high-throughput computing (HTC) on high-performance computing (HPC) infrastructures; (2) describing the integration between these two systems to produce a middleware component with unique functionalities, including the concurrent execution of heterogeneous workloads on the Titan OLCF machine. We integrated Harvester and RP by prototyping a Next Generation Executor (NGE) to expose RP capabilities and manage the execution of PanDA workloads. In this way, we minimized the reengineering of the two systems, allowing their integration while being in production.

Download Full-text

Using Managed High Performance Computing Systems for High-Throughput Computing

Conquering Big Data with High Performance Computing ◽

10.1007/978-3-319-33742-5_4 ◽

2016 ◽

pp. 61-79 ◽

Cited By ~ 2

Author(s):

Lucas A. Wilson

Keyword(s):

High Performance Computing ◽

High Throughput ◽

High Performance ◽

Computing Systems ◽

High Throughput Computing ◽

Performance Computing

Download Full-text

An Introduction to Big Data, High Performance Computing, High-Throughput Computing, and Hadoop

Conquering Big Data with High Performance Computing ◽

10.1007/978-3-319-33742-5_1 ◽

2016 ◽

pp. 1-12 ◽

Cited By ~ 1

Author(s):

Ritu Arora

Keyword(s):

Big Data ◽

High Performance Computing ◽

High Throughput ◽

High Performance ◽

High Throughput Computing ◽

Performance Computing

Download Full-text

Adaptation and Policy-Based Resource Allocation for Efficient Bulk Data Transfers in High Performance Computing Environments

2014 Fourth International Workshop on Network-Aware Data Management ◽

10.1109/ndm.2014.7 ◽

2014 ◽

Cited By ~ 1

Author(s):

Ann L. Chervenak ◽

Alex Sim ◽

Junmin Gu ◽

Robert E. Schuler ◽

Nandan Hirpathak

Keyword(s):

Resource Allocation ◽

High Performance Computing ◽

High Performance ◽

Bulk Data ◽

Computing Environments ◽

Data Transfers ◽

Performance Computing

Download Full-text

A comparative analysis of resource allocation schemes for real-time services in high-performance computing systems

International Journal of Distributed Sensor Networks ◽

10.1177/1550147720932750 ◽

2020 ◽

Vol 16 (8) ◽

pp. 155014772093275 ◽

Cited By ~ 1

Author(s):

Muhammad Shuaib Qureshi ◽

Muhammad Bilal Qureshi ◽

Muhammad Fayaz ◽

Wali Khan Mashwani ◽

Samir Brahim Belhaouari ◽

...

Keyword(s):

Resource Allocation ◽

High Performance Computing ◽

Real Time ◽

High Performance ◽

System Size ◽

Operational Environment ◽

Computing Systems ◽

The Real ◽

Resource Allocation Schemes ◽

Performance Computing

An efficient resource allocation scheme plays a vital role in scheduling applications on high-performance computing resources in order to achieve desired level of service. The major part of the existing literature on resource allocation is covered by the real-time services having timing constraints as primary parameter. Resource allocation schemes for the real-time services have been designed with various architectures (static, dynamic, centralized, or distributed) and quality of service criteria (cost efficiency, completion time minimization, energy efficiency, and memory optimization). In this analysis, numerous resource allocation schemes for real-time services in various high-performance computing (distributed and non-distributed) domains have been studied and compared on the basis of common parameters such as application type, operational environment, optimization goal, architecture, system size, resource type, optimality, simulation tool, comparison technique, and input data. The basic aim of this study is to provide a consolidated platform to the researchers working on scheduling and allocating high-performance computing resources to the real-time services. This work comprehensively discusses, integrates, analysis, and categorizes all resource allocation schemes for real-time services into five high-performance computing classes: grid, cloud, edge, fog, and multicore computing systems. The workflow representations of the studied schemes help the readers in understanding basic working and architectures of these mechanisms in order to investigate further research gaps.

Download Full-text

Applications of Virtualization Technology in Grid Systems and Cloud Servers

Advances in Computer and Electrical Engineering - Design and Use of Virtualization Technology in Cloud Computing ◽

10.4018/978-1-5225-2785-5.ch001 ◽

2018 ◽

pp. 1-28

Author(s):

Mohammad Samadi Gharajeh

Keyword(s):

High Performance Computing ◽

High Performance ◽

Global Network ◽

Grid Systems ◽

Virtual Clusters ◽

Virtualization Technology ◽

Computing Environments ◽

Cloud Servers ◽

Performance Rate ◽

Performance Computing

Grid systems and cloud servers are two distributed networks that deliver computing resources (e.g., file storages) to users' services via a large and often global network of computers. Virtualization technology can enhance the efficiency of these networks by dedicating the available resources to multiple execution environments. This chapter describes applications of virtualization technology in grid systems and cloud servers. It presents different aspects of virtualized networks in systematic and teaching issues. Virtual machine abstraction virtualizes high-performance computing environments to increase the service quality. Besides, grid virtualization engine and virtual clusters are used in grid systems to accomplish users' services in virtualized environments, efficiently. The chapter, also, explains various virtualization technologies in cloud severs. The evaluation results analyze performance rate of the high-performance computing and virtualized grid systems in terms of bandwidth, latency, number of nodes, and throughput.

Download Full-text

Measuring TeraGrid: workload characterization for a high-performance computing federation

The International Journal of High Performance Computing Applications ◽

10.1177/1094342010394382 ◽

2011 ◽

Vol 25 (4) ◽

pp. 451-465 ◽

Cited By ~ 18

Author(s):

David L Hart

Keyword(s):

Incomplete Information ◽

High Performance Computing ◽

High Performance ◽

Workload Characterization ◽

Work Patterns ◽

Grid Systems ◽

Loosely Coupled ◽

Tightly Coupled ◽

Performance Computing

TeraGrid has deployed a significant monitoring and accounting infrastructure in order to understand its operational success. In this paper, we present an analysis of the jobs reported by TeraGrid for 2008. We consider the workload from several perspectives: traditional high-performance computing (HPC) workload characteristics; grid-oriented workload characteristics; and finally user- and group-oriented characteristics. We use metrics reported in prior studies of HPC and grid systems in order to understand whether such metrics provide useful information for managing and studying resource federations. This study highlights the importance of distinguishing between analyses of job patterns and work patterns; that small sets of users dominate the workload both in terms of job and work patterns; and that aggregate analyses across even loosely coupled federations, with incomplete information for individual systems, reflect patterns seen in more tightly coupled grids and in single HPC systems.

Download Full-text

Evaluating Cloud Auto-Scaler Resource Allocation Planning Under High-Performance Computing Workloads

2020 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom) ◽

10.1109/ispa-bdcloud-socialcom-sustaincom51426.2020.00147 ◽

2020 ◽

Author(s):

Kester Leochico ◽

Eugene John

Keyword(s):

Resource Allocation ◽

High Performance Computing ◽

High Performance ◽

Performance Computing ◽

Allocation Planning

Download Full-text

Service Oriented Load Balancing Framework in Computational Grid Environment

INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY ◽

10.24297/ijct.v9i3.3334 ◽

2013 ◽

Vol 9 (3) ◽

pp. 1091-1098 ◽

Cited By ~ 3

Author(s):

Sukalyan Goswami ◽

Ajanta De Sarkar

Keyword(s):

Load Balancing ◽

Grid Computing ◽

High Performance Computing ◽

Resource Sharing ◽

High Performance ◽

Computational Grid ◽

Computing Systems ◽

Grid Systems ◽

Service Oriented ◽

Performance Computing

Grid computing or computational grid has become a vast research field in academics. It is a promising platform that provides resource sharing through multi-institutional virtual organizations for dynamic problem solving. Such platforms are much more cost-effective than traditional high performance computing systems. Due to the provision of scalability of resources, these days grid computing has become popular in industry as well. However, computational grid has different constraints and requirements to those of traditional high performance computing systems. In order to fully exploit such grid systems, resource management and scheduling are key challenges, where issues of task allocation and load balancing represent a common problem for most grid systems as because the load scenarios of individual grid resources are dynamic in nature. The objective of this paper is to review different existing load balancing algorithms or techniques applicable in grid computing and propose a layered service oriented framework for computational grid to solve the prevailing problem of dynamic load balancing.

Download Full-text