scholarly journals Analytical and Numerical Evaluation of Co-Scheduling Strategies and Their Application

Computers ◽  
2021 ◽  
Vol 10 (10) ◽  
pp. 122
Author(s):  
Ruslan Kuchumov ◽  
Vladimir Korkhov

Applications in high-performance computing (HPC) may not use all available computational resources, leaving some of them underutilized. By co-scheduling, i.e., running more than one application on the same computational node, it is possible to improve resource utilization and overall throughput. Some applications may have conflicting requirements on resources and co-scheduling may cause performance degradation, so it is important to take it into account in scheduling decisions. In this paper, we formalize the co-scheduling problem and propose multiple scheduling strategies to solve it: an optimal strategy, an online strategy and heuristic strategies. These strategies vary in terms of the optimality of the solution they produce and a priori information about the system they require. We show theoretically that the online strategy provides schedules with a competitive ratio that has a constant upper limit. This allows us to solve the co-scheduling problem using heuristic strategies that approximate this online strategy. Numerical simulations show how heuristic strategies compare to the optimal strategy for different input systems. We propose a method for measuring input parameters of the model in practice and evaluate this method on HPC benchmark applications. We show the high accuracy of the measurement method, which allows us to apply the proposed scheduling strategies in the scheduler implementation.

Author(s):  
Ruslan Kuchumov ◽  
Vladimir Korkhov

Applications in high-performance computing (HPC) may not use all available computational resources, leaving some of them underutilized. By co-scheduling, i.e. running more than one application on the same computational node, it is possible to improve resource utilization and overall throughput. Some applications may have conflicting requirements on resources and co-scheduling may cause performance degradation, so it is important to take it into account in scheduling decisions. In this paper, we formalized co-scheduling problem and proposed multiple scheduling strategies to solve it: an optimal strategy, an online strategy and heuristic strategies. These strategies vary in terms of the optimality of the solution they produce and a priori information about the system they require. We showed theoretically that the online strategy provides schedules with a competitive ratio that has a constant upper limit. This allowed us to solve the co-scheduling problem using heuristic strategies that approximate this online strategy. Numerical simulations showed how heuristic strategies compare to the optimal strategy for different input systems. We proposed a method for measuring input parameters of the model in practice and evaluated this method on HPC benchmark applications. We showed high accuracy of measurement method, which allows to apply proposed scheduling strategies in scheduler implementation.


2015 ◽  
Vol 12 (1) ◽  
pp. 1-15 ◽  
Author(s):  
Luis F. Castillo ◽  
Germán López-Gartner ◽  
Gustavo A. Isaza ◽  
Mariana Sánchez ◽  
Jeferson Arango ◽  
...  

Summary The need to process large quantities of data generated from genomic sequencing has resulted in a difficult task for life scientists who are not familiar with the use of command-line operations or developments in high performance computing and parallelization. This knowledge gap, along with unfamiliarity with necessary processes, can hinder the execution of data processing tasks. Furthermore, many of the commonly used bioinformatics tools for the scientific community are presented as isolated, unrelated entities that do not provide an integrated, guided, and assisted interaction with the scheduling facilities of computational resources or distribution, processing and mapping with runtime analysis. This paper presents the first approximation of a Web Services platform-based architecture (GITIRBio) that acts as a distributed front-end system for autonomous and assisted processing of parallel bioinformatics pipelines that has been validated using multiple sequences. Additionally, this platform allows integration with semantic repositories of genes for search annotations. GITIRBio is available at: http://c-head.ucaldas.edu.co:8080/gitirbio


2019 ◽  
Vol 214 ◽  
pp. 07012 ◽  
Author(s):  
Nikita Balashov ◽  
Maxim Bashashin ◽  
Pavel Goncharov ◽  
Ruslan Kuchumov ◽  
Nikolay Kutovskiy ◽  
...  

Cloud computing has become a routine tool for scientists in many fields. The JINR cloud infrastructure provides JINR users with computational resources to perform various scientific calculations. In order to speed up achievements of scientific results the JINR cloud service for parallel applications has been developed. It consists of several components and implements a flexible and modular architecture which allows to utilize both more applications and various types of resources as computational backends. An example of using the Cloud&HybriLIT resources in scientific computing is the study of superconducting processes in the stacked long Josephson junctions (LJJ). The LJJ systems have undergone intensive research because of the perspective of practical applications in nano-electronics and quantum computing. In this contribution we generalize the experience in application of the Cloud&HybriLIT resources for high performance computing of physical characteristics in the LJJ system.


2015 ◽  
Author(s):  
Felipe Maciel ◽  
Carina Oliveira ◽  
Renato Juaçaba Neto ◽  
João Alencar ◽  
Paulo Rego ◽  
...  

In this paper, we propose a novel architecture to allow the implementation of a cyber environment composed of different High Performance Computing (HPC) infrastructures (i.e., clusters, grids and clouds). To access this cyber environment, scientific researchers do not have to become computer experts. In particular, we assume that scientific researchers provide a description of the problem as an input to the cyber environment and then get their results without being responsible for managing the computational resources. We provide a prototype of the architecture and introduce an evaluation which studies a real workload of scientific applications executions. The results show the advantages of the proposed architecture. Besides, we highlight this work provides guidelines for developing cyber environments focused on e-Science.


Author(s):  
Zeba Khanam ◽  
Sangeet Saha ◽  
Dimitri Ognibene ◽  
Klaus McDonald-Maier ◽  
Shoaib Ehsan

2001 ◽  
Vol 13 (9) ◽  
pp. 1995-2003 ◽  
Author(s):  
Allan Kardec Barros ◽  
Andrzej Cichocki

In this work we develop a very simple batch learning algorithm for semi-blind extraction of a desired source signal with temporal structure from linear mixtures. Although we use the concept of sequential blind extraction of sources and independent component analysis, we do not carry out the extraction in a completely blind manner; neither do we assume that sources are statistically independent. In fact, we show that the a priori information about the autocorrelation function of primary sources can be used to extract the desired signals (sources of interest) from their linear mixtures. Extensive computer simulations and real data application experiments confirm the validity and high performance of the proposed algorithm.


2021 ◽  
Author(s):  
Mariza Ferro ◽  
Vinicius P. Klôh ◽  
Matheus Gritz ◽  
Vitor de Sá ◽  
Bruno Schulze

Understanding the computational impact of scientific applications on computational architectures through runtime should guide the use of computational resources in high-performance computing systems. In this work, we propose an analysis of Machine Learning (ML) algorithms to gather knowledge about the performance of these applications through hardware events and derived performance metrics. Nine NAS benchmarks were executed and the hardware events were collected. These experimental results were used to train a Neural Network, a Decision Tree Regressor and a Linear Regression focusing on predicting the runtime of scientific applications according to the performance metrics.


Gigabyte ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Ben Duggan ◽  
John Metzcar ◽  
Paul Macklin

Modern agent-based models (ABM) and other simulation models require evaluation and testing of many different parameters. Managing that testing for large scale parameter sweeps (grid searches), as well as storing simulation data, requires multiple, potentially customizable steps that may vary across simulations. Furthermore, parameter testing, processing, and analysis are slowed if simulation and processing jobs cannot be shared across teammates or computational resources. While high-performance computing (HPC) has become increasingly available, models can often be tested faster with the use of multiple computers and HPC resources. To address these issues, we created the Distributed Automated Parameter Testing (DAPT) Python package. By hosting parameters in an online (and often free) “database”, multiple individuals can run parameter sets simultaneously in a distributed fashion, enabling ad hoc crowdsourcing of computational power. Combining this with a flexible, scriptable tool set, teams can evaluate models and assess their underlying hypotheses quickly. Here, we describe DAPT and provide an example demonstrating its use.


2018 ◽  
Vol 106 (4) ◽  
Author(s):  
Jean-Paul Courneya ◽  
Alexa Mayo

Despite having an ideal setup in their labs for wet work, researchers often lack the computational infrastructure to analyze the magnitude of data that result from “-omics” experiments. In this innovative project, the library supports analysis of high-throughput data from global molecular profiling experiments by offering a high-performance computer with open source software along with expert bioinformationist support. The audience for this new service is faculty, staff, and students for whom using the university’s large scale, CORE computational resources is not warranted because these resources exceed the needs of smaller projects. In the library’s approach, users are empowered to analyze high-throughput data that they otherwise would not be able to on their own computers. To develop the project, the library’s bioinformationist identified the ideal computing hardware and a group of open source bioinformatics software to provide analysis options for experimental data such as scientific images, sequence reads, and flow cytometry files. To close the loop between learning and practice, the bioinformationist developed self-guided learning materials and workshops or consultations on topics such as the National Center for Biotechnology Information’s BLAST, Bioinformatics on the Cloud, and ImageJ. Researchers apply the data analysis techniques that they learned in the classroom in an ideal computing environment.


Sign in / Sign up

Export Citation Format

Share Document