On improving resource utilization and system throughput of master slave job scheduling in heterogeneous systems

As we approach the limits of Moore’s law the Cloud computing landscape is becoming ever more heterogeneous in order to extract more performance from available resources. Meanwhile, the container-based cloud is of growing importance as a lightweight way to deploy applications. A unified heterogeneous systems framework for use with container-based applications in the heterogeneous cloud is required. We present a bytecode-based framework and it’s implementation called Man O’ War, which allows for the creation of novel, portable LLVM bitcode-based containers for use in the heterogeneous cloud. Containers in Man O’ War enabled systems can be efficiently specialized for the available hardware within the Cloud and expand the frontiers for optimization in heterogeneous cloud environments. We demonstrate that a framework utilizing portable bytecode-based containers eases optimizations such as heterogeneous scaling which have the potential to improve resource utilization and significantly lower costs for users of the public cloud.

Download Full-text

A Feasibility Study on workload integration between HT-Condor and Slurm Clusters

EPJ Web of Conferences ◽

10.1051/epjconf/201921408004 ◽

2019 ◽

Vol 214 ◽

pp. 08004 ◽

Cited By ~ 1

Author(s):

R. Du ◽

J. Shi ◽

J. Zou ◽

X. Jiang ◽

Z. Sun ◽

...

Keyword(s):

Resource Utilization ◽

High Performance ◽

High Energy Physics ◽

Job Scheduling ◽

High Energy ◽

The Other ◽

Workload Manager ◽

High Degree ◽

Performance Computing ◽

Energy Physics

There are two production clusters co-existed in the Institute of High Energy Physics (IHEP). One is a High Throughput Computing (HTC) cluster with HTCondor as the workload manager, the other is a High Performance Computing (HPC) cluster with Slurm as the workload manager. The resources of the HTCondor cluster are funded by multiple experiments, and the resource utilization reached more than 90% by adopting a dynamic resource share mechanism. Nevertheless, there is a bottleneck if more resources are requested by multiple experiments at the same moment. On the other hand, parallel jobs running on the Slurm cluster reflect some specific attributes, such as high degree of parallelism, low quantity and long wall time. Such attributes make it easy to generate free resource slots which are suitable for jobs from the HTCondor cluster. As a result, if there is a mechanism to schedule jobs from the HTCon-dor cluster to the Slurm cluster transparently, it would improve the resource utilization of the Slurm cluster, and reduce job queue time for the HTCondor cluster. In this proceeding, we present three methods to migrate HTCondor jobs to the Slurm cluster, and concluded that HTCondor-C is more preferred. Furthermore, because design philosophy and application scenes are di↵erent between HTCondor and Slurm, some issues and possible solutions related with job scheduling are presented.

Download Full-text

State-Carrying Code for Computation Mobility

Handbook of Research on Scalable Computing Technologies ◽

10.4018/978-1-60566-661-7.ch038 ◽

2010 ◽

pp. 874-894 ◽

Cited By ~ 2

Author(s):

Hai Jiang ◽

Yanqing Ji

Keyword(s):

Resource Utilization ◽

Virtual Machine ◽

System Throughput ◽

Future Trends ◽

Performance Gain ◽

Thread Migration ◽

High Productivity ◽

Language Level ◽

Pros And Cons ◽

Many Core

Computation mobility enables running programs to move around among machines and is the essence of performance gain, fault tolerance, and system throughput increase. State-carrying code (SCC) is a software mechanism to achieve such computation mobility by saving and retrieving computation states during normal program execution in heterogeneous multi-core/many-core clusters. This chapter analyzes different kinds of state saving/retrieving mechanisms for their pros and cons. To achieve a portable, flexible and scalable solution, SCC adopts the application-level thread migration approach. Major deployment features are explained and one example system, MigThread, is used to illustrate implementation details. Future trends are given to point out how SCC can evolve into a complete lightweight virtual machine. New high productivity languages might step in to raise SCC to language level. With SCC, thorough resource utilization is expected.

Download Full-text

Extending goal-oriented parallel computer job scheduling policies to heterogeneous systems

The Journal of Supercomputing ◽

10.1007/s11227-013-0879-x ◽

2013 ◽

Vol 65 (3) ◽

pp. 1223-1242 ◽

Cited By ~ 1

Author(s):

S. Vasupongayya ◽

A. Prasitsupparote

Keyword(s):

Job Scheduling ◽

Heterogeneous Systems ◽

Parallel Computer ◽

Scheduling Policies ◽

Oriented Parallel

Download Full-text

Energy Efficient Job Scheduling with DVFS for CPU-GPU Heterogeneous Systems

Proceedings of the Eighth International Conference on Future Energy Systems - e-Energy '17 ◽

10.1145/3077839.3077855 ◽

2017 ◽

Cited By ~ 10

Author(s):

Vincent Chau ◽

Xiaowen Chu ◽

Hai Liu ◽

Yiu-Wing Leung

Keyword(s):

Energy Efficient ◽

Job Scheduling ◽

Heterogeneous Systems

Download Full-text

Development of Multiple Big Data Analytics Platforms with Rapid Response

Scientific Programming ◽

10.1155/2017/6972461 ◽

2017 ◽

Vol 2017 ◽

pp. 1-13 ◽

Cited By ~ 3

Author(s):

Bao Rong Chang ◽

Yun-Da Lee ◽

Po-Hao Liao

Keyword(s):

Big Data ◽

Business Intelligence ◽

Data Analytics ◽

High Performance ◽

Job Scheduling ◽

Big Data Analytics ◽

Data Retrieval ◽

System Throughput ◽

Data Platform ◽

R Programming

The crucial problem of the integration of multiple platforms is how to adapt for their own computing features so as to execute the assignments most efficiently and gain the best outcome. This paper introduced the new approaches to big data platform, RHhadoop and SparkR, and integrated them to form a high-performance big data analytics with multiple platforms as part of business intelligence (BI) to carry out rapid data retrieval and analytics with R programming. This paper aims to develop the optimization for job scheduling using MSHEFT algorithm and implement the optimized platform selection based on computing features for improving the system throughput significantly. In addition, users would simply give R commands rather than run Java or Scala program to perform the data retrieval and analytics in the proposed platforms. As a result, according to performance index calculated for various methods, although the optimized platform selection can reduce the execution time for the data retrieval and analytics significantly, furthermore scheduling optimization definitely increases the system efficiency a lot.

Download Full-text

Employing Vertical Elasticity for Efficient Big Data Processing in Container-Based Cloud Environments

Applied Sciences ◽

10.3390/app11136200 ◽

2021 ◽

Vol 11 (13) ◽

pp. 6200

Author(s):

Jin-young Choi ◽

Minkyoung Cho ◽

Jik-Soo Kim

Keyword(s):

Cloud Computing ◽

Big Data ◽

Data Processing ◽

Data Storage ◽

Resource Utilization ◽

System Throughput ◽

Big Data Processing ◽

Cloud Environments ◽

Utilization Scheme ◽

Adaptive Resource

Recently, “Big Data” platform technologies have become crucial for distributed processing of diverse unstructured or semi-structured data as the amount of data generated increases rapidly. In order to effectively manage these Big Data, Cloud Computing has been playing an important role by providing scalable data storage and computing resources for competitive and economical Big Data processing. Accordingly, server virtualization technologies that are the cornerstone of Cloud Computing have attracted a lot of research interests. However, conventional hypervisor-based virtualization can cause performance degradation problems due to its heavily loaded guest operating systems and rigid resource allocations. On the other hand, container-based virtualization technology can provide the same level of service faster with a lightweight capacity by effectively eliminating the guest OS layers. In addition, container-based virtualization enables efficient cloud resource management by dynamically adjusting the allocated computing resources (e.g., CPU and memory) during the runtime through “Vertical Elasticity”. In this paper, we present our practice and experience of employing an adaptive resource utilization scheme for Big Data workloads in container-based cloud environments by leveraging the vertical elasticity of Docker, a representative container-based virtualization technique. We perform extensive experiments running several Big Data workloads on representative Big Data platforms: Apache Hadoop and Spark. During the workload executions, our adaptive resource utilization scheme periodically monitors the resource usage patterns of running containers and dynamically adjusts allocated computing resources that could result in substantial improvements in the overall system throughput.

Download Full-text

Solving Job Scheduling Problem in Computational Grid Systems Using a Hybrid Algorithm

Research Anthology on Architectures, Frameworks, and Integration Strategies for Distributed and Cloud Computing ◽

10.4018/978-1-7998-5339-8.ch111 ◽

2021 ◽

pp. 2267-2281

Author(s):

Tarun Kumar Ghosh ◽

Sanjoy Das

Keyword(s):

Hybrid Algorithm ◽

High Performance ◽

Job Scheduling ◽

Search Algorithm ◽

Cuckoo Search ◽

System Throughput ◽

Scheduling Problem ◽

Grid Systems ◽

High Performance Distributed Computing ◽

Job Scheduling Problem

Grid computing is a high performance distributed computing system that consists of different types of resources such as computing, storage, and communication. The main function of the job scheduling problem is to schedule the resource-intensive user jobs to available grid resources efficiently to achieve high system throughput and to satisfy user requirements. The job scheduling problem has become more challenging with the ever-increasing size of grid systems. The optimal job scheduling is an NP-complete problem which can easily be solved by using meta-heuristic techniques. This chapter presents a hybrid algorithm for job scheduling using genetic algorithm (GA) and cuckoo search algorithm (CSA) for efficiently allocating jobs to resources in a grid system so that makespan, flowtime, and job failure rate are minimized. This proposed algorithm combines the advantages of both GA and CSA. The results have been compared with standard GA, CSA, and ant colony optimization (ACO) to show the importance of the proposed algorithm.

Download Full-text