A unified accelerator design for LiDAR SLAM algorithms for low-end FPGAs

Block-structured adaptive mesh refinement (AMR) provides the basis for the temporal and spatial discretization strategy for a number of Exascale Computing Project applications in the areas of accelerator design, additive manufacturing, astrophysics, combustion, cosmology, multiphase flow, and wind plant modeling. AMReX is a software framework that provides a unified infrastructure with the functionality needed for these and other AMR applications to be able to effectively and efficiently utilize machines from laptops to exascale architectures. AMR reduces the computational cost and memory footprint compared to a uniform mesh while preserving accurate descriptions of different physical processes in complex multiphysics algorithms. AMReX supports algorithms that solve systems of partial differential equations in simple or complex geometries and those that use particles and/or particle–mesh operations to represent component physical processes. In this article, we will discuss the core elements of the AMReX framework such as data containers and iterators as well as several specialized operations to meet the needs of the application projects. In addition, we will highlight the strategy that the AMReX team is pursuing to achieve highly performant code across a range of accelerator-based architectures for a variety of different applications.

Download Full-text

Accelerator design for the Cornell High Energy Synchrotron Source upgrade

Physical Review Accelerators and Beams ◽

10.1103/physrevaccelbeams.22.021602 ◽

2019 ◽

Vol 22 (2) ◽

Cited By ~ 4

Author(s):

J. Shanks ◽

J. Barley ◽

S. Barrett ◽

M. Billing ◽

G. Codner ◽

...

Keyword(s):

High Energy ◽

Synchrotron Source ◽

Accelerator Design

Download Full-text

2.5 MeV CW 4-vane RFQ accelerator design for BNCT applications

Nuclear Instruments and Methods in Physics Research Section A Accelerators Spectrometers Detectors and Associated Equipment ◽

10.1016/j.nima.2017.11.042 ◽

2018 ◽

Vol 883 ◽

pp. 57-74 ◽

Cited By ~ 4

Author(s):

Xiaowen Zhu ◽

Hu Wang ◽

Yuanrong Lu ◽

Zhi Wang ◽

Kun Zhu ◽

...

Keyword(s):

Accelerator Design

Download Full-text

RiSA: A Reinforced Systolic Array for Depthwise Convolutions and Embedded Tensor Reshaping

ACM Transactions on Embedded Computing Systems ◽

10.1145/3476984 ◽

2021 ◽

Vol 20 (5s) ◽

pp. 1-20

Author(s):

Hyungmin Cho

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Language Processing ◽

Systolic Array ◽

Data Reuse ◽

Systolic Arrays ◽

High Data ◽

Area Efficiency ◽

High Area ◽

Accelerator Design

Depthwise convolutions are widely used in convolutional neural networks (CNNs) targeting mobile and embedded systems. Depthwise convolution layers reduce the computation loads and the number of parameters compared to the conventional convolution layers. Many deep neural network (DNN) accelerators adopt an architecture that exploits the high data-reuse factor of DNN computations, such as a systolic array. However, depthwise convolutions have low data-reuse factor and under-utilize the processing elements (PEs) in systolic arrays. In this paper, we present a DNN accelerator design called RiSA, which provides a novel mechanism that boosts the PE utilization for depthwise convolutions on a systolic array with minimal overheads. In addition, the PEs in systolic arrays can be efficiently used only if the data items ( tensors ) are arranged in the desired layout. Typical DNN accelerators provide various types of PE interconnects or additional modules to flexibly rearrange the data items and manage data movements during DNN computations. RiSA provides a lightweight set of tensor management tasks within the PE array itself that eliminates the need for an additional module for tensor reshaping tasks. Using this embedded tensor reshaping, RiSA supports various DNN models, including convolutional neural networks and natural language processing models while maintaining a high area efficiency. Compared to Eyeriss v2, RiSA improves the area and energy efficiency for MobileNet-V1 inference by 1.91× and 1.31×, respectively.

Download Full-text