Raythena: a vertically integrated scheduler for ATLAS applications on heterogeneous distributed resources

The ATLAS experiment has successfully integrated HighPerformance Computing resources (HPCs) in its production system. Unlike the current generation of HPC systems, and the LHC computing grid, the next generation of supercomputers is expected to be extremely heterogeneous in nature: different systems will have radically different architectures, and most of them will provide partitions optimized for different kinds of workloads. In this work we explore the applicability of concepts and tools realized in Ray (the high-performance distributed execution framework targeting large-scale machine learning applications) to ATLAS event throughput optimization on heterogeneous distributed resources, ranging from traditional grid clusters to Exascale computers. We present a prototype of Raythena, a Ray-based implementation of the ATLAS Event Service (AES), a fine-grained event processing workflow aimed at improving the efficiency of ATLAS workflows on opportunistic resources, specifically HPCs. The AES is implemented as an event processing task farm that distributes packets of events to several worker processes running on multiple nodes. Each worker in the task farm runs an event-processing application (Athena) as a daemon. The whole system is orchestrated by Ray, which assigns work in a distributed, possibly heterogeneous, environment. For all its flexibility, the AES implementation is currently comprised of multiple separate layers that communicate through ad-hoc command-line and filebased interfaces. The goal of Raythena is to integrate these layers through a feature-rich, efficient application framework. Besides increasing usability and robustness, a vertically integrated scheduler will enable us to explore advanced concepts such as dynamically shaping of workflows to exploit currently available resources, particularly on heterogeneous systems.

Download Full-text

Enabling low latency at large-scale data center and high-performance computing interconnect networks using fine-grained all-optical switching technology

2017 International Conference on Optical Network Design and Modeling (ONDM) ◽

10.23919/ondm.2017.7958532 ◽

2017 ◽

Cited By ~ 3

Author(s):

Nan Hua ◽

Zhizhen Zhong ◽

Xiaoping Zheng

Keyword(s):

Data Center ◽

High Performance ◽

Optical Switching ◽

Large Scale ◽

Fine Grained ◽

Large Scale Data ◽

All Optical ◽

Performance Computing ◽

All Optical Switching ◽

Scale Data

Download Full-text

DAPT: A package enabling distributed automated parameter testing

Gigabyte ◽

10.46471/gigabyte.22 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Ben Duggan ◽

John Metzcar ◽

Paul Macklin

Keyword(s):

High Performance ◽

Large Scale ◽

Ad Hoc ◽

Simulation Models ◽

Power Combining ◽

Agent Based ◽

Tool Set ◽

Computational Resources ◽

Performance Computing ◽

Python Package

Modern agent-based models (ABM) and other simulation models require evaluation and testing of many different parameters. Managing that testing for large scale parameter sweeps (grid searches), as well as storing simulation data, requires multiple, potentially customizable steps that may vary across simulations. Furthermore, parameter testing, processing, and analysis are slowed if simulation and processing jobs cannot be shared across teammates or computational resources. While high-performance computing (HPC) has become increasingly available, models can often be tested faster with the use of multiple computers and HPC resources. To address these issues, we created the Distributed Automated Parameter Testing (DAPT) Python package. By hosting parameters in an online (and often free) “database”, multiple individuals can run parameter sets simultaneously in a distributed fashion, enabling ad hoc crowdsourcing of computational power. Combining this with a flexible, scriptable tool set, teams can evaluate models and assess their underlying hypotheses quickly. Here, we describe DAPT and provide an example demonstrating its use.

Download Full-text

Analytical Performance Estimation for Large-Scale Reconfigurable Dataflow Platforms

ACM Transactions on Reconfigurable Technology and Systems ◽

10.1145/3452742 ◽

2021 ◽

Vol 14 (3) ◽

pp. 1-21

Author(s):

Ryota Yasudo ◽

José G. F. Coutinho ◽

Ana-Lucia Varbanescu ◽

Wayne Luk ◽

Hideharu Amano ◽

...

Keyword(s):

High Performance Computing ◽

High Performance ◽

Large Scale ◽

Heterogeneous Systems ◽

Performance Estimation ◽

Performance Impact ◽

Accurate Performance ◽

Computing Platforms ◽

Reduced Power Consumption ◽

Performance Computing

Next-generation high-performance computing platforms will handle extreme data- and compute-intensive problems that are intractable with today’s technology. A promising path in achieving the next leap in high-performance computing is to embrace heterogeneity and specialised computing in the form of reconfigurable accelerators such as FPGAs, which have been shown to speed up compute-intensive tasks with reduced power consumption. However, assessing the feasibility of large-scale heterogeneous systems requires fast and accurate performance prediction. This article proposes Performance Estimation for Reconfigurable Kernels and Systems (PERKS), a novel performance estimation framework for reconfigurable dataflow platforms. PERKS makes use of an analytical model with machine and application parameters for predicting the performance of multi-accelerator systems and detecting their bottlenecks. Model calibration is automatic, making the model flexible and usable for different machine configurations and applications, including hypothetical ones. Our experimental results show that PERKS can predict the performance of current workloads on reconfigurable dataflow platforms with an accuracy above 91%. The results also illustrate how the modelling scales to large workloads, and how performance impact of architectural features can be estimated in seconds.

Download Full-text

A MapReduce Clone Car Identification Model over Traffic Data Stream

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.346.117 ◽

2013 ◽

Vol 346 ◽

pp. 117-122

Author(s):

Wen Chuan Yang ◽

Guang Jie Lin ◽

Jiang Yong Wang

Keyword(s):

High Performance ◽

Large Scale ◽

Ad Hoc ◽

Image Data ◽

Traffic Information ◽

Identification System ◽

Traffic Data ◽

Traffic Surveillance ◽

Fine Grain ◽

Large Scale Dataset

Accompany the widely use of Intelligent Traffic in China, all traffic input data streams to the Traffic Surveillance Center (TSC). Some metropolitan TSC, such as in Beijing, produces up to 18 million records and 1T image data arriving every hour. Normally, the job of the TSC is to monitor and retain data. There is a tendency to put more capability into the TSC, such as ad-hoc query for clone car identification and feedback abnormal traffic information. Thus we definitely need to think about what can be kept in working storage and how to analysis it. Obviously, the ordinary database cannot handle the massive dataset and complex ad-hoc query. MapReduce is a popular and widely used fine grain parallel runtime, which is developed for high performance processing of large scale dataset. In this paper, we propose CarMR, a MapReduce Clone Car Identification system based on Hive/Hadoop frameworks. A distributed file system HDFS is used in CarMR for fast data sharing and query. CarMR supports fast locating clone car and also optimizes the route to catch fugitive. Our results show that the model achieves a higher efficiency.

Download Full-text

Novel submission modes for tightly coupled jobs across distributed resources for reduced time-to-solution

Philosophical Transactions of The Royal Society A Mathematical Physical and Engineering Sciences ◽

10.1098/rsta.2009.0054 ◽

2009 ◽

Vol 367 (1897) ◽

pp. 2545-2556 ◽

Cited By ~ 2

Author(s):

Promita Chakraborty ◽

Shantenu Jha ◽

Daniel S. Katz

Keyword(s):

High Performance ◽

Large Scale ◽

Wait Time ◽

Peak Performance ◽

Distributed Resources ◽

Vast Number ◽

Multiple Input ◽

Tightly Coupled ◽

And Performance ◽

Multiple Machines

The problems of scheduling a single parallel job across a large-scale distributed system are well known and surprisingly difficult to solve. In addition, because of the issues involved in distributed submission, such as co-reserving resources, and managing accounts and certificates simultaneously on multiple machines, etc., the vast number of high-performance computing (HPC) application users have been happy to remain restricted to submitting jobs to single machines. Meanwhile, the need to simulate larger and more complex physical systems continues to grow, with a concomitant increase in the number of cores required to solve the resulting scientific problems. One might reduce the demand on load per machine, and eventually the wait-time in queue, by decomposing the problem to use two resources in such circumstances, even though there might be a reduction in the peak performance. This motivates a question. Can otherwise monolithic jobs running on single resources be distributed over more than one machine such that there is an overall reduction in the time-to-solution? In this paper, we briefly discuss the development and performance of a parallel molecular dynamics code and its generalization to work on multiple distributed machines (using MPICH-G2). We benchmark and validate the performance of our simulations over multiple input datasets of varying sizes. The primary aim of this work, however, is to show that the time-to-solution can be reduced by sacrificing some peak performance and distributing over multiple machines.

Download Full-text

Research of a MapReduce Model to Process the Traffic Big Data

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.548-549.1853 ◽

2014 ◽

Vol 548-549 ◽

pp. 1853-1856 ◽

Cited By ~ 1

Author(s):

Wen Chuan Yang ◽

He Chen ◽

Qing Yi Qu

Keyword(s):

High Performance ◽

Large Scale ◽

Ad Hoc ◽

Processing System ◽

Traffic Information ◽

Traffic Data ◽

Fine Grain ◽

Large Scale Dataset ◽

Data Processing Center ◽

Mapreduce Model

Normally, the job of the Traffic Data Processing Center (TDPC) is to monitor and retain data. There is a tendency to put more capability into the TDPC, such as ad-hoc query for speeding car identification and feedback abnormal traffic information. Thus we definitely need to think about what can be kept in working storage and how to analysis it. Obviously, the ordinary database cannot handle the massive dataset and complex ad-hoc query. MapReduce is a popular and widely used fine grain parallel runtime, which is developed for high performance processing of large scale dataset. In this paper, we propose MRTP, a MapReduce Traffic Processing system based on Hive/Hadoop frameworks. A distributed file system HDFS is used in MRTP for fast data sharing and query. MRTP supports fast locating speeding car and also optimizes the route to catch fugitive. Our results show that the model achieves a higher efficiency.

Download Full-text

DYNAMIC JOB SCHEDULING IN GRID COMPUTING

International Journal of Computer and Communication Technology ◽

10.47893/ijcct.2016.1364 ◽

2016 ◽

pp. 186-189

Author(s):

JANI KUNTESH KETAN ◽

ARPITA SHAH

Keyword(s):

Grid Computing ◽

High Performance ◽

Large Scale ◽

Heterogeneous Computing ◽

Ant Colony Algorithm ◽

Job Scheduling ◽

Heuristic Algorithms ◽

Optimal Schedule ◽

Heterogeneous Systems ◽

Sequential Method

Grid computing is growing rapidly in the distributed heterogeneous systems for utilizing and sharing large-scale resources to solve complex scientific problems. Scheduling is the most recent topic used to achieve high performance in grid environments. It aims to find a suitable allocation of resources for each job. A typical problem which arises during this task is the decision of scheduling. It is about an effective utilization of processor to minimize tardiness time of a job, when it is being scheduled. Scheduling jobs to resources in grid computing is complicated due to the distributed and heterogeneous nature of the resources. The efficient scheduling of independent jobs in a heterogeneous computing environment is an important problem in domains such as grid computing. In general, finding optimal schedule for such an environment using the traditional sequential method is an NP-hard problem whereas heuristic approaches will provide near optimal solutions for complex problems. The Ant colony algorithm, which is one of the heuristic algorithms, suits well for the grid scheduling environment using stigmeric communication.

Download Full-text

Hierarchical semantic interaction-based deep hashing network for cross-modal retrieval

PeerJ Computer Science ◽

10.7717/peerj-cs.552 ◽

2021 ◽

Vol 7 ◽

pp. e552

Author(s):

Shubai Chen ◽

Song Wu ◽

Li Wang

Keyword(s):

High Performance ◽

Large Scale ◽

High Efficiency ◽

Specific Information ◽

Linear Interaction ◽

Fine Grained ◽

Semantic Correlation ◽

Deep Hashing ◽

Public Datasets ◽

Semantic Interaction

Due to the high efficiency of hashing technology and the high abstraction of deep networks, deep hashing has achieved appealing effectiveness and efficiency for large-scale cross-modal retrieval. However, how to efficiently measure the similarity of fine-grained multi-labels for multi-modal data and thoroughly explore the intermediate layers specific information of networks are still two challenges for high-performance cross-modal hashing retrieval. Thus, in this paper, we propose a novel Hierarchical Semantic Interaction-based Deep Hashing Network (HSIDHN) for large-scale cross-modal retrieval. In the proposed HSIDHN, the multi-scale and fusion operations are first applied to each layer of the network. A Bidirectional Bi-linear Interaction (BBI) policy is then designed to achieve the hierarchical semantic interaction among different layers, such that the capability of hash representations can be enhanced. Moreover, a dual-similarity measurement (“hard” similarity and “soft” similarity) is designed to calculate the semantic similarity of different modality data, aiming to better preserve the semantic correlation of multi-labels. Extensive experiment results on two large-scale public datasets have shown that the performance of our HSIDHN is competitive to state-of-the-art deep cross-modal hashing methods.

Download Full-text

Enabling Fair Pricing on High Performance Computer Systems with Node Sharing

Scientific Programming ◽

10.1155/2014/906454 ◽

2014 ◽

Vol 22 (2) ◽

pp. 59-74 ◽

Cited By ~ 4

Author(s):

Alex D. Breslow ◽

Ananta Tiwari ◽

Martin Schulz ◽

Laura Carrington ◽

Lingjia Tang ◽

...

Keyword(s):

High Performance ◽

Large Scale ◽

Mean Absolute Error ◽

Absolute Error ◽

Interference Detection ◽

Fair Pricing ◽

Fine Grained ◽

High Performance Computer ◽

Shutter Mechanism ◽

Application Execution

Co-location, where multiple jobs share compute nodes in large-scale HPC systems, has been shown to increase aggregate throughput and energy efficiency by 10–20%. However, system operators disallow co-location due to fair-pricing concerns, i.e., a pricing mechanism that considers performance interference from co-running jobs. In the current pricing model, application execution time determines the price, which results in unfair prices paid by the minority of users whose jobs suffer from co-location. This paper presents POPPA, a runtime system that enables fair pricing by delivering precise online interference detection and facilitates the adoption of supercomputers with co-locations. POPPA leverages a novel shutter mechanism – a cyclic, fine-grained interference sampling mechanism to accurately deduce the interference between co-runners – to provide unbiased pricing of jobs that share nodes. POPPA is able to quantify inter-application interference within 4% mean absolute error on a variety of co-located benchmark and real scientific workloads.

Download Full-text

Architecture of an Event Processing Application for Monitoring Cardiac Patient Wait Times

International Journal of Information Technology and Web Engineering ◽

10.4018/jitwe.2012010101 ◽

2012 ◽

Vol 7 (1) ◽

pp. 1-16 ◽

Cited By ~ 3

Author(s):

Aladdin Baarah ◽

Alain Mouttham ◽

Liam Peyton

Keyword(s):

Real Time ◽

Business Process ◽

Business Processes ◽

Current Practice ◽

Cardiac Patient ◽

Wait Times ◽

Event Processing ◽

Processing Application ◽

Fine Grained

Presented is an architecture for event processing applications that manage business processes, and the authors use a case study of monitoring cardiac patient wait times to evaluate their architecture and illustrate our approach. Event processing applications can collect streams of events from sensors for processing to infer critical medical events in real time. However, to manage business processes, it is critical to understand not only where in the hospital those events occur, but also where in the business process those events are occurring. Metrics, such as wait times, can be computed in real-time by using complex event processing to integrate and aggregate events in support of fine grained monitoring of business processes. The authors evaluate their architecture against both current practice and related works in the literature.

Download Full-text