Analytical and Numerical Evaluation of Co-Scheduling Strategies and Their Application

Ruslan Kuchumov; Vladimir Korkhov

doi:10.3390/computers10100122

Analytical and Numerical Evaluation of Co-Scheduling Strategies and Their Application

Computers ◽

10.3390/computers10100122 ◽

2021 ◽

Vol 10 (10) ◽

pp. 122

Author(s):

Ruslan Kuchumov ◽

Vladimir Korkhov

Keyword(s):

Optimal Strategy ◽

High Performance ◽

A Priori ◽

Scheduling Problem ◽

Heuristic Strategies ◽

Online Strategy ◽

Scheduling Strategies ◽

Priori Information ◽

Computational Resources ◽

Performance Computing

Applications in high-performance computing (HPC) may not use all available computational resources, leaving some of them underutilized. By co-scheduling, i.e., running more than one application on the same computational node, it is possible to improve resource utilization and overall throughput. Some applications may have conflicting requirements on resources and co-scheduling may cause performance degradation, so it is important to take it into account in scheduling decisions. In this paper, we formalize the co-scheduling problem and propose multiple scheduling strategies to solve it: an optimal strategy, an online strategy and heuristic strategies. These strategies vary in terms of the optimality of the solution they produce and a priori information about the system they require. We show theoretically that the online strategy provides schedules with a competitive ratio that has a constant upper limit. This allows us to solve the co-scheduling problem using heuristic strategies that approximate this online strategy. Numerical simulations show how heuristic strategies compare to the optimal strategy for different input systems. We propose a method for measuring input parameters of the model in practice and evaluate this method on HPC benchmark applications. We show the high accuracy of the measurement method, which allows us to apply the proposed scheduling strategies in the scheduler implementation.

Download Full-text

Analytical and Numerical Evaluation of Co-Scheduling Strategies and Their Application

10.20944/preprints202109.0053.v1 ◽

2021 ◽

Author(s):

Ruslan Kuchumov ◽

Vladimir Korkhov

Keyword(s):

Optimal Strategy ◽

High Performance ◽

A Priori ◽

Scheduling Problem ◽

Heuristic Strategies ◽

Accuracy Of Measurement ◽

Online Strategy ◽

Scheduling Strategies ◽

Priori Information ◽

Computational Resources

Applications in high-performance computing (HPC) may not use all available computational resources, leaving some of them underutilized. By co-scheduling, i.e. running more than one application on the same computational node, it is possible to improve resource utilization and overall throughput. Some applications may have conflicting requirements on resources and co-scheduling may cause performance degradation, so it is important to take it into account in scheduling decisions. In this paper, we formalized co-scheduling problem and proposed multiple scheduling strategies to solve it: an optimal strategy, an online strategy and heuristic strategies. These strategies vary in terms of the optimality of the solution they produce and a priori information about the system they require. We showed theoretically that the online strategy provides schedules with a competitive ratio that has a constant upper limit. This allowed us to solve the co-scheduling problem using heuristic strategies that approximate this online strategy. Numerical simulations showed how heuristic strategies compare to the optimal strategy for different input systems. We proposed a method for measuring input parameters of the model in practice and evaluated this method on HPC benchmark applications. We showed high accuracy of measurement method, which allows to apply proposed scheduling strategies in scheduler implementation.

Download Full-text

GITIRBio: A Semantic and Distributed Service Oriented- Architecture for Bioinformatics Pipeline

Journal of Integrative Bioinformatics ◽

10.1515/jib-2015-255 ◽

2015 ◽

Vol 12 (1) ◽

pp. 1-15 ◽

Cited By ~ 1

Author(s):

Luis F. Castillo ◽

Germán López-Gartner ◽

Gustavo A. Isaza ◽

Mariana Sánchez ◽

Jeferson Arango ◽

...

Keyword(s):

High Performance ◽

Service Oriented Architecture ◽

Command Line ◽

Bioinformatics Pipeline ◽

Front End ◽

First Approximation ◽

Service Oriented ◽

Computational Resources ◽

Multiple Sequences ◽

Performance Computing

Summary The need to process large quantities of data generated from genomic sequencing has resulted in a difficult task for life scientists who are not familiar with the use of command-line operations or developments in high performance computing and parallelization. This knowledge gap, along with unfamiliarity with necessary processes, can hinder the execution of data processing tasks. Furthermore, many of the commonly used bioinformatics tools for the scientific community are presented as isolated, unrelated entities that do not provide an integrated, guided, and assisted interaction with the scheduling facilities of computational resources or distribution, processing and mapping with runtime analysis. This paper presents the first approximation of a Web Services platform-based architecture (GITIRBio) that acts as a distributed front-end system for autonomous and assisted processing of parallel bioinformatics pipelines that has been validated using multiple sequences. Additionally, this platform allows integration with semantic repositories of genes for search annotations. GITIRBio is available at: http://c-head.ucaldas.edu.co:8080/gitirbio

Download Full-text

Service for parallel applications based on JINR cloud and HybriLIT resources

EPJ Web of Conferences ◽

10.1051/epjconf/201921407012 ◽

2019 ◽

Vol 214 ◽

pp. 07012 ◽

Cited By ~ 1

Author(s):

Nikita Balashov ◽

Maxim Bashashin ◽

Pavel Goncharov ◽

Ruslan Kuchumov ◽

Nikolay Kutovskiy ◽

...

Keyword(s):

High Performance ◽

Cloud Service ◽

Parallel Applications ◽

Cloud Infrastructure ◽

Modular Architecture ◽

Practical Applications ◽

Speed Up ◽

Scientific Results ◽

Computational Resources ◽

Performance Computing

Cloud computing has become a routine tool for scientists in many fields. The JINR cloud infrastructure provides JINR users with computational resources to perform various scientific calculations. In order to speed up achievements of scientific results the JINR cloud service for parallel applications has been developed. It consists of several components and implements a flexible and modular architecture which allows to utilize both more applications and various types of resources as computational backends. An example of using the Cloud&HybriLIT resources in scientific computing is the study of superconducting processes in the stacked long Josephson junctions (LJJ). The LJJ systems have undergone intensive research because of the perspective of practical applications in nano-electronics and quantum computing. In this contribution we generalize the experience in application of the Cloud&HybriLIT resources for high performance computing of physical characteristics in the LJJ system.

Download Full-text

An Architecture for Developing Cyber Environments using Multiple HPC Infrastructures for e-Science Applications

10.5753/bresci.2015.7206 ◽

2015 ◽

Author(s):

Felipe Maciel ◽

Carina Oliveira ◽

Renato Juaçaba Neto ◽

João Alencar ◽

Paulo Rego ◽

...

Keyword(s):

High Performance Computing ◽

High Performance ◽

Scientific Applications ◽

Computational Resources ◽

Performance Computing

In this paper, we propose a novel architecture to allow the implementation of a cyber environment composed of different High Performance Computing (HPC) infrastructures (i.e., clusters, grids and clouds). To access this cyber environment, scientific researchers do not have to become computer experts. In particular, we assume that scientific researchers provide a description of the problem as an input to the cyber environment and then get their results without being responsible for managing the computational resources. We provide a prototype of the architecture and introduce an evaluation which studies a real workload of scientific applications executions. The results show the advantages of the proposed architecture. Besides, we highlight this work provides guidelines for developing cyber environments focused on e-Science.

Download Full-text

An Offline-Online Strategy for Goal-Oriented Coverage Path Planning using A Priori Information

10.1109/induscon51756.2021.9529583 ◽

2021 ◽

Author(s):

Zeba Khanam ◽

Sangeet Saha ◽

Dimitri Ognibene ◽

Klaus McDonald-Maier ◽

Shoaib Ehsan

Keyword(s):

Path Planning ◽

A Priori ◽

A Priori Information ◽

Coverage Path Planning ◽

Online Strategy ◽

Priori Information

Download Full-text

Efficient scheduling strategies in High Performance Computing Service Platform for Shanghai Colleges

2011 3rd International Conference on Computer Research and Development ◽

10.1109/iccrd.2011.5763879 ◽

2011 ◽

Author(s):

Lihua Zhang ◽

Bofeng Zhang ◽

You Zhang ◽

Longhai Zeng

Keyword(s):

High Performance Computing ◽

High Performance ◽

Service Platform ◽

Scheduling Strategies ◽

Performance Computing

Download Full-text

Extraction of Specific Signals with Temporal Structure

Neural Computation ◽

10.1162/089976601750399272 ◽

2001 ◽

Vol 13 (9) ◽

pp. 1995-2003 ◽

Cited By ~ 154

Author(s):

Allan Kardec Barros ◽

Andrzej Cichocki

Keyword(s):

High Performance ◽

Learning Algorithm ◽

Temporal Structure ◽

A Priori ◽

Real Data ◽

Primary Sources ◽

Source Signal ◽

Priori Information ◽

Blind Extraction ◽

Linear Mixtures

In this work we develop a very simple batch learning algorithm for semi-blind extraction of a desired source signal with temporal structure from linear mixtures. Although we use the concept of sequential blind extraction of sources and independent component analysis, we do not carry out the extraction in a completely blind manner; neither do we assume that sources are statistically independent. In fact, we show that the a priori information about the autocorrelation function of primary sources can be used to extract the desired signals (sources of interest) from their linear mixtures. Extensive computer simulations and real data application experiments confirm the validity and high performance of the proposed algorithm.

Download Full-text

Predicting Runtime in HPC Environments for an Efficient Use of Computational Resources

10.5753/wscad.2021.18513 ◽

2021 ◽

Author(s):

Mariza Ferro ◽

Vinicius P. Klôh ◽

Matheus Gritz ◽

Vitor de Sá ◽

Bruno Schulze

Keyword(s):

Neural Network ◽

Machine Learning ◽

Linear Regression ◽

Decision Tree ◽

High Performance ◽

Performance Metrics ◽

Scientific Applications ◽

Computing Systems ◽

Computational Resources ◽

Performance Computing

Understanding the computational impact of scientific applications on computational architectures through runtime should guide the use of computational resources in high-performance computing systems. In this work, we propose an analysis of Machine Learning (ML) algorithms to gather knowledge about the performance of these applications through hardware events and derived performance metrics. Nine NAS benchmarks were executed and the hardware events were collected. These experimental results were used to train a Neural Network, a Decision Tree Regressor and a Linear Regression focusing on predicting the runtime of scientific applications according to the performance metrics.

Download Full-text

DAPT: A package enabling distributed automated parameter testing

Gigabyte ◽

10.46471/gigabyte.22 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Ben Duggan ◽

John Metzcar ◽

Paul Macklin

Keyword(s):

High Performance ◽

Large Scale ◽

Ad Hoc ◽

Simulation Models ◽

Power Combining ◽

Agent Based ◽

Tool Set ◽

Computational Resources ◽

Performance Computing ◽

Python Package

Modern agent-based models (ABM) and other simulation models require evaluation and testing of many different parameters. Managing that testing for large scale parameter sweeps (grid searches), as well as storing simulation data, requires multiple, potentially customizable steps that may vary across simulations. Furthermore, parameter testing, processing, and analysis are slowed if simulation and processing jobs cannot be shared across teammates or computational resources. While high-performance computing (HPC) has become increasingly available, models can often be tested faster with the use of multiple computers and HPC resources. To address these issues, we created the Distributed Automated Parameter Testing (DAPT) Python package. By hosting parameters in an online (and often free) “database”, multiple individuals can run parameter sets simultaneously in a distributed fashion, enabling ad hoc crowdsourcing of computational power. Combining this with a flexible, scriptable tool set, teams can evaluate models and assess their underlying hypotheses quickly. Here, we describe DAPT and provide an example demonstrating its use.

Download Full-text

High-performance computing service for bioinformatics and data science

Journal of the Medical Library Association JMLA ◽

10.5195/jmla.2018.512 ◽

2018 ◽

Vol 106 (4) ◽

Author(s):

Jean-Paul Courneya ◽

Alexa Mayo

Keyword(s):

Open Source ◽

High Throughput ◽

High Performance ◽

Large Scale ◽

Data Science ◽

Wet Work ◽

High Throughput Data ◽

Guided Learning ◽

Computational Resources ◽

Performance Computing

Despite having an ideal setup in their labs for wet work, researchers often lack the computational infrastructure to analyze the magnitude of data that result from “-omics” experiments. In this innovative project, the library supports analysis of high-throughput data from global molecular profiling experiments by offering a high-performance computer with open source software along with expert bioinformationist support. The audience for this new service is faculty, staff, and students for whom using the university’s large scale, CORE computational resources is not warranted because these resources exceed the needs of smaller projects. In the library’s approach, users are empowered to analyze high-throughput data that they otherwise would not be able to on their own computers. To develop the project, the library’s bioinformationist identified the ideal computing hardware and a group of open source bioinformatics software to provide analysis options for experimental data such as scientific images, sequence reads, and flow cytometry files. To close the loop between learning and practice, the bioinformationist developed self-guided learning materials and workshops or consultations on topics such as the National Center for Biotechnology Information’s BLAST, Bioinformatics on the Cloud, and ImageJ. Researchers apply the data analysis techniques that they learned in the classroom in an ideal computing environment.

Download Full-text