scholarly journals PanDA and RADICAL-Pilot Integration: Enabling the Pilot Paradigm on HPC Resources

2019 ◽  
Vol 214 ◽  
pp. 03057
Author(s):  
Andre Merzky ◽  
Pavlo Svirin ◽  
Matteo Turilli

PanDA executes millions of ATLAS jobs a month on Grid systems with more than 300,000 cores. Currently, PanDA is compatible only with few high-performance computing (HPC) resources due to different edge services and operational policies; does not implement the pilot paradigm on HPC; and does not dynamically optimize resource allocation among queues. We integrated the PanDA Harvester service and the RADICAL-Pilot (RP) system to overcome these limitations and enable the execution of ATLAS, Molecular Dy-namics and other workloads on HPC resources. This paper offer two main con-tributions: (1) introducing PanDA Harvester and RADICAL-Pilot, two systems independent developed to support high-throughput computing (HTC) on high-performance computing (HPC) infrastructures; (2) describing the integration between these two systems to produce a middleware component with unique functionalities, including the concurrent execution of heterogeneous workloads on the Titan OLCF machine. We integrated Harvester and RP by prototyping a Next Generation Executor (NGE) to expose RP capabilities and manage the execution of PanDA workloads. In this way, we minimized the reengineering of the two systems, allowing their integration while being in production.

2020 ◽  
Vol 16 (8) ◽  
pp. 155014772093275 ◽  
Author(s):  
Muhammad Shuaib Qureshi ◽  
Muhammad Bilal Qureshi ◽  
Muhammad Fayaz ◽  
Wali Khan Mashwani ◽  
Samir Brahim Belhaouari ◽  
...  

An efficient resource allocation scheme plays a vital role in scheduling applications on high-performance computing resources in order to achieve desired level of service. The major part of the existing literature on resource allocation is covered by the real-time services having timing constraints as primary parameter. Resource allocation schemes for the real-time services have been designed with various architectures (static, dynamic, centralized, or distributed) and quality of service criteria (cost efficiency, completion time minimization, energy efficiency, and memory optimization). In this analysis, numerous resource allocation schemes for real-time services in various high-performance computing (distributed and non-distributed) domains have been studied and compared on the basis of common parameters such as application type, operational environment, optimization goal, architecture, system size, resource type, optimality, simulation tool, comparison technique, and input data. The basic aim of this study is to provide a consolidated platform to the researchers working on scheduling and allocating high-performance computing resources to the real-time services. This work comprehensively discusses, integrates, analysis, and categorizes all resource allocation schemes for real-time services into five high-performance computing classes: grid, cloud, edge, fog, and multicore computing systems. The workflow representations of the studied schemes help the readers in understanding basic working and architectures of these mechanisms in order to investigate further research gaps.


Author(s):  
Mohammad Samadi Gharajeh

Grid systems and cloud servers are two distributed networks that deliver computing resources (e.g., file storages) to users' services via a large and often global network of computers. Virtualization technology can enhance the efficiency of these networks by dedicating the available resources to multiple execution environments. This chapter describes applications of virtualization technology in grid systems and cloud servers. It presents different aspects of virtualized networks in systematic and teaching issues. Virtual machine abstraction virtualizes high-performance computing environments to increase the service quality. Besides, grid virtualization engine and virtual clusters are used in grid systems to accomplish users' services in virtualized environments, efficiently. The chapter, also, explains various virtualization technologies in cloud severs. The evaluation results analyze performance rate of the high-performance computing and virtualized grid systems in terms of bandwidth, latency, number of nodes, and throughput.


Author(s):  
David L Hart

TeraGrid has deployed a significant monitoring and accounting infrastructure in order to understand its operational success. In this paper, we present an analysis of the jobs reported by TeraGrid for 2008. We consider the workload from several perspectives: traditional high-performance computing (HPC) workload characteristics; grid-oriented workload characteristics; and finally user- and group-oriented characteristics. We use metrics reported in prior studies of HPC and grid systems in order to understand whether such metrics provide useful information for managing and studying resource federations. This study highlights the importance of distinguishing between analyses of job patterns and work patterns; that small sets of users dominate the workload both in terms of job and work patterns; and that aggregate analyses across even loosely coupled federations, with incomplete information for individual systems, reflect patterns seen in more tightly coupled grids and in single HPC systems.


2013 ◽  
Vol 9 (3) ◽  
pp. 1091-1098 ◽  
Author(s):  
Sukalyan Goswami ◽  
Ajanta De Sarkar

Grid computing or computational grid has become a vast research field in academics. It is a promising platform that provides resource sharing through multi-institutional virtual organizations for dynamic problem solving. Such platforms are much more cost-effective than traditional high performance computing systems. Due to the provision of scalability of resources, these days grid computing has become popular in industry as well. However, computational grid has different constraints and requirements to those of traditional high performance computing systems. In order to fully exploit such grid systems, resource management and scheduling are key challenges, where issues of task allocation and load balancing represent a common problem for most grid systems as because the load scenarios of individual grid resources are dynamic in nature. The objective of this paper is to review different existing load balancing algorithms or techniques applicable in grid computing and propose a layered service oriented framework for computational grid to solve the prevailing problem of dynamic load balancing.


Sign in / Sign up

Export Citation Format

Share Document