Achieving Memory Access Equalization Via Round-Trip Routing Latency Prediction in 3D Many-Core NoCs

Round-trip latency prediction for memory access fairness in mesh-based many-core architectures

IEICE Electronics Express ◽

10.1587/elex.11.20141027 ◽

2014 ◽

Vol 11 (24) ◽

pp. 20141027-20141027

Author(s):

Yang Li ◽

Xiaowen Chen ◽

Xiaohui Zhao ◽

Yong Yang ◽

Hengzhu Liu

Keyword(s):

Memory Access ◽

Round Trip ◽

Many Core

Download Full-text

Memory Access Analysis of Many-core System with Abundant Bandwidth

2015 IEEE 9th International Symposium on Embedded Multicore/Many-core Systems-on-Chip ◽

10.1109/mcsoc.2015.14 ◽

2015 ◽

Cited By ~ 1

Author(s):

Chuan Tang ◽

Dan Liu ◽

Zuocheng Xing ◽

Peng Yang ◽

Zhe Wang ◽

...

Keyword(s):

Memory Access ◽

Core System ◽

Many Core

Download Full-text

Agent-Based Memory Access for Many-Core CMPs

2014 IEEE 13th International Symposium on Parallel and Distributed Computing ◽

10.1109/ispdc.2014.9 ◽

2014 ◽

Author(s):

Weiwei Fu ◽

Mingmin Yuan ◽

Tianzhou Chen ◽

Li Liu

Keyword(s):

Memory Access ◽

Agent Based ◽

Many Core

Download Full-text

The Need for HPC Computing in Network Science

Advances in Computer and Electrical Engineering - Creativity in Load-Balance Schemes for Multi/Many-Core Heterogeneous Graph Computing ◽

10.4018/978-1-5225-3799-1.ch001 ◽

2018 ◽

pp. 1-29

Keyword(s):

High Performance Computing ◽

Real World ◽

High Performance ◽

Heterogeneous Computing ◽

Network Science ◽

Memory Access ◽

Combined Use ◽

Hardware Architectures ◽

Many Core ◽

Performance Computing

The size of complex networks introduces large amounts of traversal times that can be tackled by exploiting pervasive multi-core and many-core parallel hardware architectures. However, there is a list of factors that make the design of efficient parallel traversal algorithms for graphs difficult: unstructured problems, data-driven computation, irregular memory access, poor locality, and low computing load. In this chapter, the authors introduce the synergy between Network Science and High Performance Computing and motivate the combined use of multi/many-core heterogeneous computing and Network Science techniques to tackle the above-mentioned challenges and to efficiently traverse the structure of massive real-world graphs.

Download Full-text

Round-trip DRAM Access Fairness in 3D NoC-based Many-core Systems

ACM Transactions on Embedded Computing Systems ◽

10.1145/3126561 ◽

2017 ◽

Vol 16 (5s) ◽

pp. 1-21 ◽

Cited By ~ 3

Author(s):

Xiaowen Chen ◽

Zhonghai Lu ◽

Sheng Liu ◽

Shuming Chen

Keyword(s):

Round Trip ◽

Many Core

Download Full-text

A memory access model for highly-threaded many-core architectures

Future Generation Computer Systems ◽

10.1016/j.future.2013.06.020 ◽

2014 ◽

Vol 30 ◽

pp. 202-215 ◽

Cited By ~ 25

Author(s):

Lin Ma ◽

Kunal Agrawal ◽

Roger D. Chamberlain

Keyword(s):

Memory Access ◽

Many Core ◽

Access Model

Download Full-text

Aristotle: A performance Impact Indicator for the OpenCL Kernels Using Local Memory

Scientific Programming ◽

10.1155/2014/623841 ◽

2014 ◽

Vol 22 (3) ◽

pp. 239-257 ◽

Cited By ~ 6

Author(s):

Jianbin Fang ◽

Henk Sips ◽

Ana Lucia Varbanescu

Keyword(s):

Memory Performance ◽

Empirical Evaluation ◽

Memory Access ◽

Performance Impact ◽

Impact Indicator ◽

Local Memory ◽

Performance Variability ◽

Access Patterns ◽

Many Core ◽

Do So

Due to the increasing complexity of multi/many-core architectures (with their mix of caches and scratch-pad memories) and applications (with different memory access patterns), the performance of many workloads becomes increasingly variable. In this work, we address one of the main causes for this performance variability: the efficiency of the memory system. Specifically, based on an empirical evaluation driven by memory access patterns, we qualify and partially quantify the performance impact of using local memory in multi/many-core processors. To do so, we systematically describe memory access patterns (MAPs) in an application-agnostic manner. Next, for each identified MAP, we use OpenCL (for portability reasons) to generate two microbenchmarks: a “naive” version (without local memory) and “an optimized” version (using local memory). We then evaluate both of them on typically used multi-core and many-core platforms, and we log their performance. What we eventually obtain is a local memory performance database, indexed by various MAPs and platforms. Further, we propose a set of composing rules for multiple MAPs. Thus, we can get an indicator of whether using local memory is beneficial in the presence of multiple memory access patterns. This indication can be used to either avoid the hassle of implementing optimizations with too little gain or, alternatively, give a rough prediction of the performance gain.

Download Full-text