Location of Processor Allocator and Job Scheduler and Its Impact on CMP Performance

Location of Processor Allocator and Job Scheduler and Its Impact on CMP PerformanceHigh Performance Computing (HPC) architectures are being developed continually with an aim of achieving exascale capability by 2020. Processors that are being developed and used as nodes in HPC systems are Chip Multiprocessors (CMPs) with a number of cores. In this paper, we continue our effort towards a better processor allocation process. The Processor Allocator (PA) and Job Scheduler (JS) proposed and implemented in our previous works are explored in the context of its best location on the chip. We propose a system, where all locations on a chip can be analyzed, considering energy used by Network-on-Chip (NoC), PA and JS, and processing elements. We present energy models for the researched CMP components, mathematical model of the system, and experimentation system. Based on experimental results, proper placement of PA and JS on a chip can provide up to 45% NoC energy savings.

Download Full-text

Phoenix: A Runtime Environment for High Performance Computing on Chip Multiprocessors

2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing ◽

10.1109/pdp.2009.41 ◽

2009 ◽

Cited By ~ 4

Author(s):

Avneesh Pant ◽

Hassan Jafri ◽

Volodymyr Kindratenko

Keyword(s):

High Performance Computing ◽

High Performance ◽

Chip Multiprocessors ◽

Runtime Environment ◽

On Chip ◽

Performance Computing

Download Full-text

Merging Plasmonics and Silicon Photonics Towards Greener and Faster “Network-on-Chip” Solutions for Data Centers and High-Performance Computing Systems

Plasmonics - Principles and Applications ◽

10.5772/51853 ◽

2012 ◽

Cited By ~ 3

Author(s):

Sotirios Papaioannou ◽

Konstantinos Vyrsokinos ◽

Dimitrios Kalavrouziotis ◽

Giannis Giannoulis ◽

Dimitrios Apostolopoulos ◽

...

Keyword(s):

High Performance Computing ◽

Silicon Photonics ◽

High Performance ◽

Data Centers ◽

Network On Chip ◽

Computing Systems ◽

On Chip ◽

Performance Computing

Download Full-text

A High-Performance and Low-Power On-Chip Network with Reconfigurable Topology

Dynamic Reconfigurable Network-on-Chip Design ◽

10.4018/978-1-61520-807-4.ch013 ◽

2010 ◽

pp. 309-329

Author(s):

Mehdi Modarressi ◽

Hamid Sarbazi-Azad

Keyword(s):

High Performance ◽

Chip Multiprocessors ◽

Optimization Methods ◽

Critical Feature ◽

Design Constraints ◽

Core System ◽

Traffic Pattern ◽

On Chip ◽

Network Mapping ◽

Application Specific

In this chapter, we present a reconfigurable architecture for network-on-chips (NoC) on which arbitrary application-specific topologies can be implemented. The proposed NoC can dynamically tailor its topology to the traffic pattern of different applications, aiming to address one of the main drawbacks of existing application-specific NoC optimization methods, i.e. optimizing NoCs based on the traffic pattern of a single application. Supporting multiple applications is a critical feature of an NoC as several different applications are integrated into the modern and complex multi-core system-on-chips and chip multiprocessors and an NoC that is designed to run exactly one application does not necessarily meet the design constraints of other applications. The proposed NoC supports multiple applications by configuring as a topology which matches the traffic pattern of the currently running application in the best way. In this chapter, we first introduce the proposed reconfigurable topology and then address the two problems of core to network mapping and topology exploration. Experimental results show that this architecture effectively improves the performance of NoCs and reduces power consumption.

Download Full-text

Log Analysis-Based Resource and Execution Time Improvement in HPC: A Case Study

Applied Sciences ◽

10.3390/app10072634 ◽

2020 ◽

Vol 10 (7) ◽

pp. 2634

Author(s):

JunWeon Yoon ◽

TaeYoung Hong ◽

ChanYeol Park ◽

Seo-Young Noh ◽

HeonChang Yu

Keyword(s):

Execution Time ◽

High Performance ◽

Large Scale ◽

Experimental Result ◽

Optimization Approach ◽

Root Cause ◽

Large Systems ◽

Job Scheduler ◽

Performance Computing

High-performance computing (HPC) uses many distributed computing resources to solve large computational science problems through parallel computation. Such an approach can reduce overall job execution time and increase the capacity of solving large-scale and complex problems. In the supercomputer, the job scheduler, the HPC’s flagship tool, is responsible for distributing and managing the resources of large systems. In this paper, we analyze the execution log of the job scheduler for a certain period of time and propose an optimization approach to reduce the idle time of jobs. In our experiment, it has been found that the main root cause of delayed job is highly related to resource waiting. The execution time of the entire job is affected and significantly delayed due to the increase in idle resources that must be ready when submitting the large-scale job. The backfilling algorithm can optimize the inefficiency of these idle resources and help to reduce the execution time of the job. Therefore, we propose the backfilling algorithm, which can be applied to the supercomputer. This experimental result shows that the overall execution time is reduced.

Download Full-text

A Space-Efficient On-Chip Compressed Cache Organization for High Performance Computing

Parallel and Distributed Processing and Applications - Lecture Notes in Computer Science ◽

10.1007/978-3-540-30566-8_109 ◽

2004 ◽

pp. 952-964 ◽

Cited By ~ 1

Author(s):

Keun Soo Yim ◽

Jang-Soo Lee ◽

Jihong Kim ◽

Shin-Dug Kim ◽

Kern Koh

Keyword(s):

High Performance Computing ◽

High Performance ◽

Cache Organization ◽

On Chip ◽

Performance Computing

Download Full-text

3D Stacked Cache Data Management for Energy Minimization of 3D Chip Multiprocessor

International Journal of Students Research in Technology & Management ◽

10.18510/ijsrtm.2015.325 ◽

2015 ◽

Vol 3 (2) ◽

pp. 264-268

Author(s):

K. Suresh Kumar ◽

S. Anitha ◽

M. Gayathri

Keyword(s):

Temperature Distribution ◽

High Performance ◽

Chip Multiprocessors ◽

Electrical Power ◽

Chip Multiprocessor ◽

Energy Reduction ◽

Experimental Result ◽

Data Mapping ◽

Promising Solution ◽

On Chip

In this model a runtime cache data mapping is discussed for 3-D stacked L2 caches to minimize the overall energy of 3-D chip multiprocessors (CMPs). The suggested method considers both temperature distribution and memory traffic of 3-D CMPs. Experimental result shows energy reduction achieving up to 22.88% compared to an existing solution which considers only the temperature distribution. New tendencies envisage 3D Multi-Processor System-On-Chip (MPSoC) design as a promising solution to keep increasing the performance of the next-generation high performance computing (HPC) systems. However, as the power density of HPC systems increases with the arrival of 3D MPSoCs with energy reduction achieving up to 19.55% by supplying electrical power to the computing equipment and constantly removing the generated heat is rapidly becoming the dominant cost in any HPC facility.

Download Full-text

Software Controlled Reconfigurable On-chip Memory for High Performance Computing

Intelligent Memory Systems - Lecture Notes in Computer Science ◽

10.1007/3-540-44570-6_2 ◽

2001 ◽

pp. 15-32 ◽

Cited By ~ 2

Author(s):

Hiroshi Nakamura ◽

Masaaki Kondo ◽

Taisuke Boku

Keyword(s):

High Performance Computing ◽

High Performance ◽

On Chip ◽

Performance Computing

Download Full-text

Low Power High Performance Computing on Arm System-on-Chip in Astrophysics

Advances in Intelligent Systems and Computing - Proceedings of the Future Technologies Conference (FTC) 2019 ◽

10.1007/978-3-030-32520-6_33 ◽

2019 ◽

pp. 427-446

Author(s):

Giuliano Taffoni ◽

Sara Bertocco ◽

Igor Coretti ◽

David Goz ◽

Antonio Ragagnin ◽

...

Keyword(s):

Low Power ◽

High Performance Computing ◽

High Performance ◽

System On Chip ◽

On Chip ◽

Performance Computing

Download Full-text

Predicting running time of aerodynamic jobs in HPC system by combining supervised and unsupervised learning method

10.21203/rs.3.rs-360961/v1 ◽

2021 ◽

Author(s):

Hao Wang ◽

Yi-Qin Dai ◽

Jie Yu ◽

Yong Dong

Keyword(s):

Unsupervised Learning ◽

High Performance Computing ◽

Prediction Accuracy ◽

High Performance ◽

Computing Systems ◽

Running Time ◽

Supervised And Unsupervised Learning ◽

Underestimation Rate ◽

Job Scheduler ◽

Performance Computing

Abstract Improving resource utilization is an important goal of high-performance computing systems of supercomputing centers. In order to meet this goal, the job scheduler of high-performance computing systems often use backfilling scheduling to fill short-time jobs into the gaps of jobs at the front of the queue. Backfilling scheduling needs to obtain the running time of the job. In the past, the job running times are usually given by users and often far exceeded the actual running time of the job, which leads to inaccurate backfilling and a waste of computing resources. In particular, when the predicted job running time is lower than the actual time, the damage caused to the utilization of the system’s computing resources becomes more serious. Therefore, the prediction accuracy of the job running time is crucial to the utilization of system resources. The use of machine learning methods can make more accurate predictions of the job running time. Aiming at the parallel application of aerodynamics, we propose a job running time prediction framework SU combining supervised and unsupervised learning, and verifies it on the real historical data of the high-performance computing systems of China Aerodynamics Research and Development Center(CARDC). The experimental results show that SU has a high prediction accuracy(80.46%) and a low underestimation rate(24.85%).

Download Full-text

Networks on Chips: Structure and Design Methodologies

Journal of Electrical and Computer Engineering ◽

10.1155/2012/509465 ◽

2012 ◽

Vol 2012 ◽

pp. 1-15 ◽

Cited By ~ 20

Author(s):

Wen-Chung Tsai ◽

Ying-Cherng Lan ◽

Yu-Hen Hu ◽

Sao-Jie Chen

Keyword(s):

High Performance ◽

Chip Multiprocessors ◽

Multiprocessor System ◽

Communication Performance ◽

Core System ◽

Traditional System ◽

On Chip ◽

Many Core ◽

Bus Architecture

The next generation of multiprocessor system on chip (MPSoC) and chip multiprocessors (CMPs) will contain hundreds or thousands of cores. Such a many-core system requires high-performance interconnections to transfer data among the cores on the chip. Traditional system components interface with the interconnection backbone via a bus interface. This interconnection backbone can be an on-chip bus or multilayer bus architecture. With the advent of many-core architectures, the bus architecture becomes the performance bottleneck of the on-chip interconnection framework. In contrast, network on chip (NoC) becomes a promising on-chip communication infrastructure, which is commonly considered as an aggressive long-term approach for on-chip communications. Accordingly, this paper first discusses several common architectures and prevalent techniques that can deal well with the design issues of communication performance, power consumption, signal integrity, and system scalability in an NoC. Finally, a novel bidirectional NoC (BiNoC) architecture with a dynamically self-reconfigurable bidirectional channel is proposed to break the conventional performance bottleneck caused by bandwidth restriction in conventional NoCs.

Download Full-text