On improving resource utilization and system throughput of master slave job scheduling in heterogeneous systems

2008 ◽  
Vol 45 (1) ◽  
pp. 129-150 ◽  
Author(s):  
Ching-Hsien Hsu ◽  
Tai-Lung Chen ◽  
Jong-Hyuk Park
2021 ◽  
Author(s):  
David Andrew Lloyd Tenty

As we approach the limits of Moore’s law the Cloud computing landscape is becoming ever more heterogeneous in order to extract more performance from available resources. Meanwhile, the container-based cloud is of growing importance as a lightweight way to deploy applications. A unified heterogeneous systems framework for use with container-based applications in the heterogeneous cloud is required. We present a bytecode-based framework and it’s implementation called Man O’ War, which allows for the creation of novel, portable LLVM bitcode-based containers for use in the heterogeneous cloud. Containers in Man O’ War enabled systems can be efficiently specialized for the available hardware within the Cloud and expand the frontiers for optimization in heterogeneous cloud environments. We demonstrate that a framework utilizing portable bytecode-based containers eases optimizations such as heterogeneous scaling which have the potential to improve resource utilization and significantly lower costs for users of the public cloud.


2019 ◽  
Vol 214 ◽  
pp. 08004 ◽  
Author(s):  
R. Du ◽  
J. Shi ◽  
J. Zou ◽  
X. Jiang ◽  
Z. Sun ◽  
...  

There are two production clusters co-existed in the Institute of High Energy Physics (IHEP). One is a High Throughput Computing (HTC) cluster with HTCondor as the workload manager, the other is a High Performance Computing (HPC) cluster with Slurm as the workload manager. The resources of the HTCondor cluster are funded by multiple experiments, and the resource utilization reached more than 90% by adopting a dynamic resource share mechanism. Nevertheless, there is a bottleneck if more resources are requested by multiple experiments at the same moment. On the other hand, parallel jobs running on the Slurm cluster reflect some specific attributes, such as high degree of parallelism, low quantity and long wall time. Such attributes make it easy to generate free resource slots which are suitable for jobs from the HTCondor cluster. As a result, if there is a mechanism to schedule jobs from the HTCon-dor cluster to the Slurm cluster transparently, it would improve the resource utilization of the Slurm cluster, and reduce job queue time for the HTCondor cluster. In this proceeding, we present three methods to migrate HTCondor jobs to the Slurm cluster, and concluded that HTCondor-C is more preferred. Furthermore, because design philosophy and application scenes are di↵erent between HTCondor and Slurm, some issues and possible solutions related with job scheduling are presented.


Author(s):  
Hai Jiang ◽  
Yanqing Ji

Computation mobility enables running programs to move around among machines and is the essence of performance gain, fault tolerance, and system throughput increase. State-carrying code (SCC) is a software mechanism to achieve such computation mobility by saving and retrieving computation states during normal program execution in heterogeneous multi-core/many-core clusters. This chapter analyzes different kinds of state saving/retrieving mechanisms for their pros and cons. To achieve a portable, flexible and scalable solution, SCC adopts the application-level thread migration approach. Major deployment features are explained and one example system, MigThread, is used to illustrate implementation details. Future trends are given to point out how SCC can evolve into a complete lightweight virtual machine. New high productivity languages might step in to raise SCC to language level. With SCC, thorough resource utilization is expected.


2017 ◽  
Vol 2017 ◽  
pp. 1-13 ◽  
Author(s):  
Bao Rong Chang ◽  
Yun-Da Lee ◽  
Po-Hao Liao

The crucial problem of the integration of multiple platforms is how to adapt for their own computing features so as to execute the assignments most efficiently and gain the best outcome. This paper introduced the new approaches to big data platform, RHhadoop and SparkR, and integrated them to form a high-performance big data analytics with multiple platforms as part of business intelligence (BI) to carry out rapid data retrieval and analytics with R programming. This paper aims to develop the optimization for job scheduling using MSHEFT algorithm and implement the optimized platform selection based on computing features for improving the system throughput significantly. In addition, users would simply give R commands rather than run Java or Scala program to perform the data retrieval and analytics in the proposed platforms. As a result, according to performance index calculated for various methods, although the optimized platform selection can reduce the execution time for the data retrieval and analytics significantly, furthermore scheduling optimization definitely increases the system efficiency a lot.


2021 ◽  
Vol 11 (13) ◽  
pp. 6200
Author(s):  
Jin-young Choi ◽  
Minkyoung Cho ◽  
Jik-Soo Kim

Recently, “Big Data” platform technologies have become crucial for distributed processing of diverse unstructured or semi-structured data as the amount of data generated increases rapidly. In order to effectively manage these Big Data, Cloud Computing has been playing an important role by providing scalable data storage and computing resources for competitive and economical Big Data processing. Accordingly, server virtualization technologies that are the cornerstone of Cloud Computing have attracted a lot of research interests. However, conventional hypervisor-based virtualization can cause performance degradation problems due to its heavily loaded guest operating systems and rigid resource allocations. On the other hand, container-based virtualization technology can provide the same level of service faster with a lightweight capacity by effectively eliminating the guest OS layers. In addition, container-based virtualization enables efficient cloud resource management by dynamically adjusting the allocated computing resources (e.g., CPU and memory) during the runtime through “Vertical Elasticity”. In this paper, we present our practice and experience of employing an adaptive resource utilization scheme for Big Data workloads in container-based cloud environments by leveraging the vertical elasticity of Docker, a representative container-based virtualization technique. We perform extensive experiments running several Big Data workloads on representative Big Data platforms: Apache Hadoop and Spark. During the workload executions, our adaptive resource utilization scheme periodically monitors the resource usage patterns of running containers and dynamically adjusts allocated computing resources that could result in substantial improvements in the overall system throughput.


Author(s):  
Tarun Kumar Ghosh ◽  
Sanjoy Das

Grid computing is a high performance distributed computing system that consists of different types of resources such as computing, storage, and communication. The main function of the job scheduling problem is to schedule the resource-intensive user jobs to available grid resources efficiently to achieve high system throughput and to satisfy user requirements. The job scheduling problem has become more challenging with the ever-increasing size of grid systems. The optimal job scheduling is an NP-complete problem which can easily be solved by using meta-heuristic techniques. This chapter presents a hybrid algorithm for job scheduling using genetic algorithm (GA) and cuckoo search algorithm (CSA) for efficiently allocating jobs to resources in a grid system so that makespan, flowtime, and job failure rate are minimized. This proposed algorithm combines the advantages of both GA and CSA. The results have been compared with standard GA, CSA, and ant colony optimization (ACO) to show the importance of the proposed algorithm.


IEEE Access ◽  
2021 ◽  
Vol 9 ◽  
pp. 17557-17571
Author(s):  
Mulya Agung ◽  
Yuta Watanabe ◽  
Henning Weber ◽  
Ryusuke Egawa ◽  
Hiroyuki Takizawa

Sign in / Sign up

Export Citation Format

Share Document