Scheduling and Resource Provisioning Algorithms for ScientificWorkflows on Commercial Clouds

Mapping Intimacies ◽

10.26686/wgtn.17071976.v1 ◽

2021 ◽

Author(s):

◽

Vahid Arabnejad

Keyword(s):

High Performance ◽

Large Scale ◽

Cost Model ◽

Universal Access ◽

Scientific Workflow ◽

Resource Provisioning ◽

Scheduling Problem ◽

Computing Paradigm ◽

Computationally Intensive ◽

Workflow Tasks

<p>Basic science is becoming ever more computationally intensive, increasing the need for large-scale compute and storage resources, be they within a High-Performance Computer cluster, or more recently, within the cloud. Commercial clouds have increasingly become a viable platform for hosting scientific analyses and computation due to their elasticity, recent introduction of specialist hardware, and pay-as-you-go cost model. This computing paradigm therefore presents a low capital and low barrier alternative to operating dedicated eScience infrastructure. Indeed, commercial clouds now enable universal access to capabilities previously available to only large well funded research groups. While the potential benefits of cloud computing are clear, there are still significant technical hurdles associated with obtaining the best execution efficiency whilst trading off cost. In most cases, large scale scientific computation is represented as a workflow for scheduling and runtime provisioning. Such scheduling becomes an even more challenging problem on cloud systems due to the dynamic nature of the cloud, in particular, the elasticity, the pricing models (both static and dynamic), the non-homogeneous resource types and the vast array of services. This mapping of workflow tasks onto a set of provisioned instances is an example of the general scheduling problem and is NP-complete. In addition, certain runtime constraints, the most typical being the cost of the computation and the time which that computation requires to complete, must be met. This thesis addresses 'the scientific workflow scheduling problem in cloud', which is to schedule workflow tasks on cloud resources in a way that users meet their defined constraints such as budget and deadline, and providers maximize profits and resource utilization. Moreover, it explores different mechanisms and strategies for distributing defined constraints over a workflow and investigate its impact on the overall cost of the resulting schedule.</p>

Download Full-text

Scheduling and Resource Provisioning Algorithms for ScientificWorkflows on Commercial Clouds

10.26686/wgtn.17071976 ◽

2021 ◽

Author(s):

◽

Vahid Arabnejad

Keyword(s):

High Performance ◽

Large Scale ◽

Cost Model ◽

Universal Access ◽

Scientific Workflow ◽

Resource Provisioning ◽

Scheduling Problem ◽

Computing Paradigm ◽

Computationally Intensive ◽

Workflow Tasks

Download Full-text

Beyond HEP: Photon and accelerator science computing infrastructure at DESY

EPJ Web of Conferences ◽

10.1051/epjconf/202024507036 ◽

2020 ◽

Vol 245 ◽

pp. 07036

Author(s):

Christoph Beyer ◽

Stefan Bujack ◽

Stefan Dietrich ◽

Thomas Finnern ◽

Martin Flemming ◽

...

Keyword(s):

High Performance ◽

Large Scale ◽

High Energy Physics ◽

High Energy ◽

Resource Provisioning ◽

Small Scale ◽

Online Processing ◽

Offline Processing ◽

National Analysis ◽

Energy Physics

DESY is one of the largest accelerator laboratories in Europe. It develops and operates state of the art accelerators for fundamental science in the areas of high energy physics, photon science and accelerator development. While for decades high energy physics (HEP) has been the most prominent user of the DESY compute, storage and network infrastructure, various scientific areas as science with photons and accelerator development have caught up and are now dominating the demands on the DESY infrastructure resources, with significant consequences for the IT resource provisioning. In this contribution, we will present an overview of the computational, storage and network resources covering the various physics communities on site. Ranging from high-throughput computing (HTC) batch-like offline processing in the Grid and the interactive user analyses resources in the National Analysis Factory (NAF) for the HEP community, to the computing needs of accelerator development or of photon sciences such as PETRA III or the European XFEL. Since DESY is involved in these experiments and their data taking, their requirements include fast low-latency online processing for data taking and calibration as well as offline processing, thus high-performance computing (HPC) workloads, that are run on the dedicated Maxwell HPC cluster. As all communities face significant challenges due to changing environments and increasing data rates in the following years, we will discuss how this will reflect in necessary changes to the computing and storage infrastructures. We will present DESY compute cloud and container orchestration plans as a basis for infrastructure and platform services. We will show examples of Jupyter notebooks for small scale interactive analysis, as well as its integration into large scale resources such as batch systems or Spark clusters. To overcome the fragmentation of the various resources for all scientific communities at DESY, we explore how to integrate them into a seamless user experience in an Interdisciplinary Data Analysis Facility.

Download Full-text

Towards Efficient Bounds on Completion Time and Resource Provisioning for Scheduling Workflows on Heterogeneous Processing Systems

International Journal of Grid and High Performance Computing ◽

10.4018/ijghpc.2017070105 ◽

2017 ◽

Vol 9 (3) ◽

pp. 60-82 ◽

Cited By ~ 2

Author(s):

D. Sirisha ◽

G. Vijayakumari

Keyword(s):

Completion Time ◽

High Performance ◽

Turnaround Time ◽

Resource Provisioning ◽

Optimal Solutions ◽

Complete Problem ◽

Heterogeneous Processing ◽

Heuristic Scheduling ◽

Np Complete ◽

Workflow Tasks

Compute intensive applications featured as workflows necessitate Heterogeneous Processing Systems (HPS) for attaining high performance to minimize the turnaround time. Efficient scheduling of the workflow tasks is paramount to attain higher potentials of HPS and is a challenging NP-Complete problem. In the present work, Branch and Bound (BnB) strategy is applied to optimally schedule the workflow tasks. The proposed bounds are tighter, simpler and less complex than the existing bounds and the upper bound is closer to the exact solution. Moreover, the bounds on the resource provisioning are devised to execute the workflows in the minimum possible time and optimally utilize the resources. The performance of the proposed BnB strategy is evaluated on a suite of benchmark workflows. The experimental results reveal that the proposed BnB strategy improved the optimal solutions compared to the existing heuristic scheduling algorithms for more than 20 percent of the cases and generated better schedules over 7 percent for 82.6 percent of the cases.

Download Full-text

Automated Clustering of Virtual Machines based on Correlation of Resource Usage

Journal of Communications Software and Systems ◽

10.24138/jcomss.v8i4.164 ◽

2012 ◽

Vol 8 (4) ◽

pp. 102 ◽

Cited By ~ 7

Author(s):

Claudia Canali ◽

Riccardo Lancellotti

Keyword(s):

Cloud Computing ◽

High Performance ◽

Large Scale ◽

Virtual Machines ◽

Resource Usage ◽

Cloud Data ◽

Multiple Resources ◽

Computing Paradigm ◽

Innovative Methodology ◽

Cloud Data Centers

The recent growth in demand for modern applicationscombined with the shift to the Cloud computing paradigm have led to the establishment of large-scale cloud data centers. The increasing size of these infrastructures represents a major challenge in terms of monitoring and management of the system resources. Available solutions typically consider every Virtual Machine (VM) as a black box each with independent characteristics, and face scalability issues by reducing the number of monitored resource samples, considering in most cases only average CPU usage sampled at a coarse time granularity. We claim that scalability issues can be addressed by leveraging thesimilarity between VMs in terms of resource usage patterns.In this paper we propose an automated methodology to cluster VMs depending on the usage of multiple resources, both systemand network-related, assuming no knowledge of the services executed on them. This is an innovative methodology that exploits the correlation between the resource usage to cluster together similar VMs. We evaluate the methodology through a case study with data coming from an enterprise datacenter, and we show that high performance may be achieved in automatic VMs clustering. Furthermore, we estimate the reduction in the amount of data collected, thus showing that our proposal may simplify the monitoring requirements and help administrators totake decisions on the resource management of cloud computing datacenters.

Download Full-text

Convergence of artificial intelligence and high performance computing on NSF-supported cyberinfrastructure

Journal Of Big Data ◽

10.1186/s40537-020-00361-2 ◽

2020 ◽

Vol 7 (1) ◽

Author(s):

E. A. Huerta ◽

Asad Khan ◽

Edward Davis ◽

Colleen Bushell ◽

William D. Gropp ◽

...

Keyword(s):

Artificial Intelligence ◽

Big Data ◽

High Performance Computing ◽

High Performance ◽

Large Scale ◽

Social Patterns ◽

Computing Paradigm ◽

Recent Developments ◽

Optimization Schemes ◽

Performance Computing

Abstract Significant investments to upgrade and construct large-scale scientific facilities demand commensurate investments in R&D to design algorithms and computing approaches to enable scientific and engineering breakthroughs in the big data era. Innovative Artificial Intelligence (AI) applications have powered transformational solutions for big data challenges in industry and technology that now drive a multi-billion dollar industry, and which play an ever increasing role shaping human social patterns. As AI continues to evolve into a computing paradigm endowed with statistical and mathematical rigor, it has become apparent that single-GPU solutions for training, validation, and testing are no longer sufficient for computational grand challenges brought about by scientific facilities that produce data at a rate and volume that outstrip the computing capabilities of available cyberinfrastructure platforms. This realization has been driving the confluence of AI and high performance computing (HPC) to reduce time-to-insight, and to enable a systematic study of domain-inspired AI architectures and optimization schemes to enable data-driven discovery. In this article we present a summary of recent developments in this field, and describe specific advances that authors in this article are spearheading to accelerate and streamline the use of HPC platforms to design and apply accelerated AI algorithms in academia and industry.

Download Full-text

BioWorkbench: a high-performance framework for managing and analyzing bioinformatics experiments

PeerJ ◽

10.7717/peerj.5551 ◽

2018 ◽

Vol 6 ◽

pp. e5551 ◽

Cited By ~ 4

Author(s):

Maria Luiza Mondelli ◽

Thiago Magalhães ◽

Guilherme Loss ◽

Michael Wilde ◽

Ian Foster ◽

...

Keyword(s):

Case Studies ◽

Web Application ◽

High Performance ◽

Large Scale ◽

Scientific Workflow ◽

Biological Data ◽

Machine Learning Techniques ◽

Annotation Database ◽

Scientific Domain ◽

Provenance Data

Advances in sequencing techniques have led to exponential growth in biological data, demanding the development of large-scale bioinformatics experiments. Because these experiments are computation- and data-intensive, they require high-performance computing techniques and can benefit from specialized technologies such as Scientific Workflow Management Systems and databases. In this work, we present BioWorkbench, a framework for managing and analyzing bioinformatics experiments. This framework automatically collects provenance data, including both performance data from workflow execution and data from the scientific domain of the workflow application. Provenance data can be analyzed through a web application that abstracts a set of queries to the provenance database, simplifying access to provenance information. We evaluate BioWorkbench using three case studies: SwiftPhylo, a phylogenetic tree assembly workflow; SwiftGECKO, a comparative genomics workflow; and RASflow, a RASopathy analysis workflow. We analyze each workflow from both computational and scientific domain perspectives, by using queries to a provenance and annotation database. Some of these queries are available as a pre-built feature of the BioWorkbench web application. Through the provenance data, we show that the framework is scalable and achieves high-performance, reducing up to 98% of the case studies execution time. We also show how the application of machine learning techniques can enrich the analysis process.

Download Full-text

bíogo: a simple high-performance bioinformatics toolkit for the Go language

10.1101/005033 ◽

2014 ◽

Cited By ~ 6

Author(s):

R Daniel Kortschak ◽

David L Adelson

Keyword(s):

High Performance ◽

Large Scale ◽

Large Data ◽

Biological Data ◽

Data Sets ◽

Barriers To Entry ◽

Data Types ◽

Concurrent Processing ◽

Computationally Intensive ◽

And Performance

bíogo is a framework designed to ease development and maintenance of computationally intensive bioinformatics applications. The library is written in the Go programming language, a garbage-collected, strictly typed compiled language with built in support for concurrent processing, and performance comparable to C and Java. It provides a variety of data types and utility functions to facilitate manipulation and analysis of large scale genomic and other biological data. bíogo uses a concise and expressive syntax, lowering the barriers to entry for researchers needing to process large data sets with custom analyses while retaining computational safety and ease of code review. We believe bíogo provides an excellent environment for training and research in computational biology because of its combination of strict typing, simple and expressive syntax, and high performance.

Download Full-text

A Novel Completion-Time-Minimization Scheduling Approach of Scientific Workflows Over Heterogeneous Cloud Computing Systems

International Journal of Web Services Research ◽

10.4018/ijwsr.2019100101 ◽

2019 ◽

Vol 16 (4) ◽

pp. 1-20

Author(s):

S. Sabahat H. Bukhari ◽

Yunni Xia

Keyword(s):

Cloud Computing ◽

Time Management ◽

Completion Time ◽

Large Scale ◽

Scientific Workflow ◽

Scientific Workflows ◽

Transmission Delays ◽

Computing Paradigm ◽

Heterogeneous Cloud ◽

The Impact

The cloud computing paradigm provides an ideal platform for supporting large-scale scientific-workflow-based applications over the internet. However, the scheduling and execution of scientific workflows still face various challenges such as cost and response time management, which aim at handling acquisition delays of physical servers and minimizing the overall completion time of workflows. A careful investigation into existing methods shows that most existing approaches consider static performance of physical machines (PMs) and ignore the impact of resource acquisition delays in their scheduling models. In this article, the authors present a meta-heuristic-based method to scheduling scientific workflows aiming at reducing workflow completion time through appropriately managing acquisition and transmission delays required for inter-PM communications. The authors carry out extensive case studies as well based on real-world commercial cloud sand multiple workflow templates. Experimental results clearly show that the proposed method outperforms the state-of-art ones such as ICPCP, CEGA, and JIT-C in terms of workflow completion time.

Download Full-text

Work in progress — Integration of the scientific workflow paradigm into high performance computing and large scale data management curricula

2010 IEEE Frontiers in Education Conference (FIE) ◽

10.1109/fie.2010.5673235 ◽

2010 ◽

Author(s):

Brandeis Marshall ◽

John Springer ◽

Thomas Hacker

Keyword(s):

Data Management ◽

High Performance Computing ◽

High Performance ◽

Large Scale ◽

Scientific Workflow ◽

Work In Progress ◽

Large Scale Data ◽

Performance Computing ◽

Scale Data

Download Full-text

Autonomic Security Management for IoT Smart Spaces

ACM Transactions on Internet of Things ◽

10.1145/3466696 ◽

2021 ◽

Vol 2 (4) ◽

pp. 1-20

Author(s):

Changyuan Lin ◽

Hamzeh Khazaei ◽

Andrew Walenstein ◽

Andrew Malton

Keyword(s):

High Performance ◽

Autonomic Computing ◽

Large Scale ◽

Contextual Information ◽

Smart Devices ◽

Smart Space ◽

Smart Spaces ◽

Conference Room ◽

Computing Paradigm ◽

Attack Surface

Embedded sensors and smart devices have turned the environments around us into smart spaces that could automatically evolve, depending on the needs of users, and adapt to the new conditions. While smart spaces are beneficial and desired in many aspects, they could be compromised and expose privacy, security, or render the whole environment a hostile space in which regular tasks cannot be accomplished anymore. In fact, ensuring the security of smart spaces is a very challenging task due to the heterogeneity of devices, vast attack surface, and device resource limitations. The key objective of this study is to minimize the manual work in enforcing the security of smart spaces by leveraging the autonomic computing paradigm in the management of IoT environments. More specifically, we strive to build an autonomic manager that can monitor the smart space continuously, analyze the context, plan and execute countermeasures to maintain the desired level of security, and reduce liability and risks of security breaches. We follow the microservice architecture pattern and propose a generic ontology named Secure Smart Space Ontology (SSSO) for describing dynamic contextual information in security-enhanced smart spaces. Based on SSSO, we build an autonomic security manager with four layers that continuously monitors the managed spaces, analyzes contextual information and events, and automatically plans and implements adaptive security policies. As the evaluation, focusing on a current BlackBerry customer problem, we deployed the proposed autonomic security manager to maintain the security of a smart conference room with 32 devices and 66 services. The high performance of the proposed solution was also evaluated on a large-scale deployment with over 1.8 million triples.

Download Full-text