Design Space Optimization of Shared Memory Architecture in Accelerator-rich Systems

2021 ◽  
Vol 26 (4) ◽  
pp. 1-31
Author(s):  
Mitali Sinha ◽  
Gade Sri Harsha ◽  
Pramit Bhattacharyya ◽  
Sujay Deb

Shared memory architectures, as opposed to private-only memories, provide a viable alternative to meet the ever-increasing memory requirements of multi-accelerator systems to achieve high performance under stringent area and energy constraints. However, an impulsive memory sharing degrades performance due to network contention and latency to access shared memory. We propose the Accelerator Shared Memory (ASM) framework to provide an optimal private/shared memory configuration and shared data allocation under a system’s resource and network constraints. Evaluations show ASM provides up to 34.35% and 31.34% improvement in performance and energy, respectively, over baseline systems.


2000 ◽  
Vol 8 (3) ◽  
pp. 163-181 ◽  
Author(s):  
John Bircsak ◽  
Peter Craig ◽  
RaeLyn Crowell ◽  
Zarka Cvetanovic ◽  
Jonathan Harris ◽  
...  

This paper describes extensions to OpenMP that implement data placement features needed for NUMA architectures. OpenMP is a collection of compiler directives and library routines used to write portable parallel programs for shared-memory architectures. Writing efficient parallel programs for NUMA architectures, which have characteristics of both shared-memory and distributed-memory architectures, requires that a programmer control the placement of data in memory and the placement of computations that operate on that data. Optimal performance is obtained when computations occur on processors that have fast access to the data needed by those computations. OpenMP -- designed for shared-memory architectures -- does not by itself address these issues. The extensions to OpenMP Fortran presented here have been mainly taken from High Performance Fortran. The paper describes some of the techniques that the Compaq Fortran compiler uses to generate efficient code based on these extensions. It also describes some additional compiler optimizations, and concludes with some preliminary results.



2015 ◽  
Vol 25 (09n10) ◽  
pp. 1739-1741
Author(s):  
Daniel Adornes ◽  
Dalvan Griebler ◽  
Cleverson Ledur ◽  
Luiz Gustavo Fernandes

MapReduce was originally proposed as a suitable and efficient approach for analyzing and processing large amounts of data. Since then, many researches contributed with MapReduce implementations for distributed and shared memory architectures. Nevertheless, different architectural levels require different optimization strategies in order to achieve high-performance computing. Such strategies in turn have caused very different MapReduce programming interfaces among these researches. This paper presents some research notes on coding productivity when developing MapReduce applications for distributed and shared memory architectures. As a case study, we introduce our current research on a unified MapReduce domain-specific language with code generation for Hadoop and Phoenix++, which has achieved a coding productivity increase from 41.84% and up to 94.71% without significant performance losses (below 3%) compared to those frameworks.



2003 ◽  
Vol 14 (2) ◽  
pp. 166-180 ◽  
Author(s):  
D.J. Sorin ◽  
J.L. Lemon ◽  
D.L. Eager ◽  
M.K. Vernon






Sign in / Sign up

Export Citation Format

Share Document