Design Space Optimization of Shared Memory Architecture in Accelerator-rich Systems

Mitali Sinha; Gade Sri Harsha; Pramit Bhattacharyya; Sujay Deb

doi:10.1145/3446001

Design Space Optimization of Shared Memory Architecture in Accelerator-rich Systems

ACM Transactions on Design Automation of Electronic Systems ◽

10.1145/3446001 ◽

2021 ◽

Vol 26 (4) ◽

pp. 1-31

Author(s):

Mitali Sinha ◽

Gade Sri Harsha ◽

Pramit Bhattacharyya ◽

Sujay Deb

Keyword(s):

Shared Memory ◽

High Performance ◽

Design Space ◽

Data Allocation ◽

Memory Architecture ◽

Shared Data ◽

Memory Sharing ◽

Network Contention ◽

Memory Architectures ◽

Shared Memory Architectures

Shared memory architectures, as opposed to private-only memories, provide a viable alternative to meet the ever-increasing memory requirements of multi-accelerator systems to achieve high performance under stringent area and energy constraints. However, an impulsive memory sharing degrades performance due to network contention and latency to access shared memory. We propose the Accelerator Shared Memory (ASM) framework to provide an optimal private/shared memory configuration and shared data allocation under a system’s resource and network constraints. Evaluations show ASM provides up to 34.35% and 31.34% improvement in performance and energy, respectively, over baseline systems.

Extending OpenMP for NUMA Machines

Scientific Programming ◽

10.1155/2000/464182 ◽

2000 ◽

Vol 8 (3) ◽

pp. 163-181 ◽

Cited By ~ 16

Author(s):

John Bircsak ◽

Peter Craig ◽

RaeLyn Crowell ◽

Zarka Cvetanovic ◽

Jonathan Harris ◽

...

Keyword(s):

Shared Memory ◽

High Performance ◽

Distributed Memory ◽

Parallel Programs ◽

Compiler Optimizations ◽

High Performance Fortran ◽

Efficient Code ◽

Memory Architectures ◽

Shared Memory Architectures ◽

Fast Access

This paper describes extensions to OpenMP that implement data placement features needed for NUMA architectures. OpenMP is a collection of compiler directives and library routines used to write portable parallel programs for shared-memory architectures. Writing efficient parallel programs for NUMA architectures, which have characteristics of both shared-memory and distributed-memory architectures, requires that a programmer control the placement of data in memory and the placement of computations that operate on that data. Optimal performance is obtained when computations occur on processors that have fast access to the data needed by those computations. OpenMP -- designed for shared-memory architectures -- does not by itself address these issues. The extensions to OpenMP Fortran presented here have been mainly taken from High Performance Fortran. The paper describes some of the techniques that the Compaq Fortran compiler uses to generate efficient code based on these extensions. It also describes some additional compiler optimizations, and concludes with some preliminary results.

Trojan: a high-performance simulator for shared memory architectures

Proceedings of the 29th Annual Simulation Symposium ◽

10.1109/simsym.1996.492151 ◽

2002 ◽

Cited By ~ 5

Author(s):

D. Park ◽

R.H. Saavedra

Keyword(s):

Shared Memory ◽

High Performance ◽

Memory Architectures ◽

Shared Memory Architectures

Coding Productivity in MapReduce Applications for Distributed and Shared Memory Architectures

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194015710096 ◽

2015 ◽

Vol 25 (09n10) ◽

pp. 1739-1741

Author(s):

Daniel Adornes ◽

Dalvan Griebler ◽

Cleverson Ledur ◽

Luiz Gustavo Fernandes

Keyword(s):

Shared Memory ◽

Code Generation ◽

High Performance ◽

Domain Specific ◽

Significant Performance ◽

Memory Architectures ◽

Programming Interfaces ◽

Shared Memory Architectures ◽

Performance Computing

MapReduce was originally proposed as a suitable and efficient approach for analyzing and processing large amounts of data. Since then, many researches contributed with MapReduce implementations for distributed and shared memory architectures. Nevertheless, different architectural levels require different optimization strategies in order to achieve high-performance computing. Such strategies in turn have caused very different MapReduce programming interfaces among these researches. This paper presents some research notes on coding productivity when developing MapReduce applications for distributed and shared memory architectures. As a case study, we introduce our current research on a unified MapReduce domain-specific language with code generation for Hadoop and Phoenix++, which has achieved a coding productivity increase from 41.84% and up to 94.71% without significant performance losses (below 3%) compared to those frameworks.

Realising a concurrent object-based programming model on parallel virtual shared memory architectures

Programming Models for Massively Parallel Computers ◽

10.1109/pmmpc.1995.504345 ◽

2002 ◽

Cited By ~ 1

Author(s):

M. Fisher ◽

J. Keane

Keyword(s):

Shared Memory ◽

Programming Model ◽

Object Based ◽

Virtual Shared Memory ◽

Memory Architectures ◽

Shared Memory Architectures

Analytic evaluation of shared-memory architectures

IEEE Transactions on Parallel and Distributed Systems ◽

10.1109/tpds.2003.1178880 ◽

2003 ◽

Vol 14 (2) ◽

pp. 166-180 ◽

Cited By ~ 6

Author(s):

D.J. Sorin ◽

J.L. Lemon ◽

D.L. Eager ◽

M.K. Vernon

Keyword(s):

Shared Memory ◽

Analytic Evaluation ◽

Memory Architectures ◽

Shared Memory Architectures

Adaptive software cache management for distributed shared memory architectures

[1990] Proceedings. The 17th Annual International Symposium on Computer Architecture ◽

10.1109/isca.1990.134515 ◽

2002 ◽

Cited By ~ 37

Author(s):

J.K. Bennett ◽

J.B. Carter ◽

W. Zwaenepoel

Keyword(s):

Shared Memory ◽

Distributed Shared Memory ◽

Cache Management ◽

Adaptive Software ◽

Memory Architectures ◽

Shared Memory Architectures ◽

Software Cache

Or-Parallel Prolog on Distributed Shared-Memory Architectures

Implementations of Logic Programming Systems ◽

10.1007/978-1-4615-2690-2_14 ◽

1994 ◽

pp. 203-215

Author(s):

Fernando M. A. Silva

Keyword(s):

Shared Memory ◽

Distributed Shared Memory ◽

Memory Architectures ◽

Shared Memory Architectures

Parallel algorithms for clustering biological graphs on distributed and shared memory architectures

International Journal of High Performance Computing and Networking ◽

10.1504/ijhpcn.2014.062724 ◽

2014 ◽

Vol 7 (4) ◽

pp. 241 ◽

Cited By ~ 5

Author(s):

Inna Rytsareva ◽

Timothy Chapman ◽

Ananth Kalyanaraman

Keyword(s):

Parallel Algorithms ◽

Shared Memory ◽

Memory Architectures ◽

Shared Memory Architectures

Large in-memory cyber-physical security-related analytics via scalable coherent shared memory architectures

2011 IEEE Symposium on Computational Intelligence in Cyber Security (CICS) ◽

10.1109/cicybs.2011.5949414 ◽

2011 ◽

Cited By ~ 1

Author(s):

John R. Williams ◽

Sergio Herrero ◽

Christopher Leonardi ◽

Stephen Chan ◽

Abel Sanchez ◽

...

Keyword(s):

Shared Memory ◽

Physical Security ◽

Memory Architectures ◽

Shared Memory Architectures

Parallel State Space Generation and Exploration on Shared-Memory Architectures

Lecture Notes in Computer Science - Computer Aided Systems Theory – EUROCAST 2005 ◽

10.1007/11556985_37 ◽

2005 ◽

pp. 275-280

Author(s):

Milan Češka ◽

Bohuslav Křena ◽

Tomáš Vojnar

Keyword(s):

State Space ◽

Shared Memory ◽

State Space Generation ◽

Memory Architectures ◽

Shared Memory Architectures