scholarly journals Location of Processor Allocator and Job Scheduler and Its Impact on CMP Performance

2012 ◽  
Vol 58 (1) ◽  
pp. 9-14 ◽  
Author(s):  
Dawid Zydek ◽  
Grzegorz Chmaj ◽  
Alaa Shawky ◽  
Henry Selvaraj

Location of Processor Allocator and Job Scheduler and Its Impact on CMP PerformanceHigh Performance Computing (HPC) architectures are being developed continually with an aim of achieving exascale capability by 2020. Processors that are being developed and used as nodes in HPC systems are Chip Multiprocessors (CMPs) with a number of cores. In this paper, we continue our effort towards a better processor allocation process. The Processor Allocator (PA) and Job Scheduler (JS) proposed and implemented in our previous works are explored in the context of its best location on the chip. We propose a system, where all locations on a chip can be analyzed, considering energy used by Network-on-Chip (NoC), PA and JS, and processing elements. We present energy models for the researched CMP components, mathematical model of the system, and experimentation system. Based on experimental results, proper placement of PA and JS on a chip can provide up to 45% NoC energy savings.

Author(s):  
Mehdi Modarressi ◽  
Hamid Sarbazi-Azad

In this chapter, we present a reconfigurable architecture for network-on-chips (NoC) on which arbitrary application-specific topologies can be implemented. The proposed NoC can dynamically tailor its topology to the traffic pattern of different applications, aiming to address one of the main drawbacks of existing application-specific NoC optimization methods, i.e. optimizing NoCs based on the traffic pattern of a single application. Supporting multiple applications is a critical feature of an NoC as several different applications are integrated into the modern and complex multi-core system-on-chips and chip multiprocessors and an NoC that is designed to run exactly one application does not necessarily meet the design constraints of other applications. The proposed NoC supports multiple applications by configuring as a topology which matches the traffic pattern of the currently running application in the best way. In this chapter, we first introduce the proposed reconfigurable topology and then address the two problems of core to network mapping and topology exploration. Experimental results show that this architecture effectively improves the performance of NoCs and reduces power consumption.


2020 ◽  
Vol 10 (7) ◽  
pp. 2634
Author(s):  
JunWeon Yoon ◽  
TaeYoung Hong ◽  
ChanYeol Park ◽  
Seo-Young Noh ◽  
HeonChang Yu

High-performance computing (HPC) uses many distributed computing resources to solve large computational science problems through parallel computation. Such an approach can reduce overall job execution time and increase the capacity of solving large-scale and complex problems. In the supercomputer, the job scheduler, the HPC’s flagship tool, is responsible for distributing and managing the resources of large systems. In this paper, we analyze the execution log of the job scheduler for a certain period of time and propose an optimization approach to reduce the idle time of jobs. In our experiment, it has been found that the main root cause of delayed job is highly related to resource waiting. The execution time of the entire job is affected and significantly delayed due to the increase in idle resources that must be ready when submitting the large-scale job. The backfilling algorithm can optimize the inefficiency of these idle resources and help to reduce the execution time of the job. Therefore, we propose the backfilling algorithm, which can be applied to the supercomputer. This experimental result shows that the overall execution time is reduced.


Author(s):  
K. Suresh Kumar ◽  
S. Anitha ◽  
M. Gayathri

In this model a runtime cache data mapping is discussed for 3-D stacked L2 caches to minimize the overall energy of 3-D chip multiprocessors (CMPs). The suggested method considers both temperature distribution and memory traffic of 3-D CMPs. Experimental result shows energy reduction achieving up to 22.88% compared to an existing solution which considers only the temperature distribution.  New tendencies envisage 3D Multi-Processor System-On-Chip (MPSoC) design as a promising solution to keep increasing the performance of the next-generation high performance computing (HPC) systems. However, as the power density of HPC systems increases with the arrival of 3D MPSoCs with energy reduction achieving up to 19.55% by supplying electrical power to the computing equipment and constantly removing the generated heat is rapidly becoming the dominant cost in any HPC facility.


2021 ◽  
Author(s):  
Hao Wang ◽  
Yi-Qin Dai ◽  
Jie Yu ◽  
Yong Dong

Abstract Improving resource utilization is an important goal of high-performance computing systems of supercomputing centers. In order to meet this goal, the job scheduler of high-performance computing systems often use backfilling scheduling to fill short-time jobs into the gaps of jobs at the front of the queue. Backfilling scheduling needs to obtain the running time of the job. In the past, the job running times are usually given by users and often far exceeded the actual running time of the job, which leads to inaccurate backfilling and a waste of computing resources. In particular, when the predicted job running time is lower than the actual time, the damage caused to the utilization of the system’s computing resources becomes more serious. Therefore, the prediction accuracy of the job running time is crucial to the utilization of system resources. The use of machine learning methods can make more accurate predictions of the job running time. Aiming at the parallel application of aerodynamics, we propose a job running time prediction framework SU combining supervised and unsupervised learning, and verifies it on the real historical data of the high-performance computing systems of China Aerodynamics Research and Development Center(CARDC). The experimental results show that SU has a high prediction accuracy(80.46%) and a low underestimation rate(24.85%).


2012 ◽  
Vol 2012 ◽  
pp. 1-15 ◽  
Author(s):  
Wen-Chung Tsai ◽  
Ying-Cherng Lan ◽  
Yu-Hen Hu ◽  
Sao-Jie Chen

The next generation of multiprocessor system on chip (MPSoC) and chip multiprocessors (CMPs) will contain hundreds or thousands of cores. Such a many-core system requires high-performance interconnections to transfer data among the cores on the chip. Traditional system components interface with the interconnection backbone via a bus interface. This interconnection backbone can be an on-chip bus or multilayer bus architecture. With the advent of many-core architectures, the bus architecture becomes the performance bottleneck of the on-chip interconnection framework. In contrast, network on chip (NoC) becomes a promising on-chip communication infrastructure, which is commonly considered as an aggressive long-term approach for on-chip communications. Accordingly, this paper first discusses several common architectures and prevalent techniques that can deal well with the design issues of communication performance, power consumption, signal integrity, and system scalability in an NoC. Finally, a novel bidirectional NoC (BiNoC) architecture with a dynamically self-reconfigurable bidirectional channel is proposed to break the conventional performance bottleneck caused by bandwidth restriction in conventional NoCs.


Sign in / Sign up

Export Citation Format

Share Document