A unified accelerator design for LiDAR SLAM algorithms for low-end FPGAs

Author(s):  
Keisuke Sugiura ◽  
Hiroki Matsutani
Keyword(s):  
Author(s):  
Rachit Nigam ◽  
Sachille Atapattu ◽  
Samuel Thomas ◽  
Zhijing Li ◽  
Theodore Bauer ◽  
...  
Keyword(s):  

Author(s):  
Weiqun Zhang ◽  
Andrew Myers ◽  
Kevin Gott ◽  
Ann Almgren ◽  
John Bell

Block-structured adaptive mesh refinement (AMR) provides the basis for the temporal and spatial discretization strategy for a number of Exascale Computing Project applications in the areas of accelerator design, additive manufacturing, astrophysics, combustion, cosmology, multiphase flow, and wind plant modeling. AMReX is a software framework that provides a unified infrastructure with the functionality needed for these and other AMR applications to be able to effectively and efficiently utilize machines from laptops to exascale architectures. AMR reduces the computational cost and memory footprint compared to a uniform mesh while preserving accurate descriptions of different physical processes in complex multiphysics algorithms. AMReX supports algorithms that solve systems of partial differential equations in simple or complex geometries and those that use particles and/or particle–mesh operations to represent component physical processes. In this article, we will discuss the core elements of the AMReX framework such as data containers and iterators as well as several specialized operations to meet the needs of the application projects. In addition, we will highlight the strategy that the AMReX team is pursuing to achieve highly performant code across a range of accelerator-based architectures for a variety of different applications.


Author(s):  
J. Shanks ◽  
J. Barley ◽  
S. Barrett ◽  
M. Billing ◽  
G. Codner ◽  
...  

2021 ◽  
Vol 20 (5s) ◽  
pp. 1-20
Author(s):  
Hyungmin Cho

Depthwise convolutions are widely used in convolutional neural networks (CNNs) targeting mobile and embedded systems. Depthwise convolution layers reduce the computation loads and the number of parameters compared to the conventional convolution layers. Many deep neural network (DNN) accelerators adopt an architecture that exploits the high data-reuse factor of DNN computations, such as a systolic array. However, depthwise convolutions have low data-reuse factor and under-utilize the processing elements (PEs) in systolic arrays. In this paper, we present a DNN accelerator design called RiSA, which provides a novel mechanism that boosts the PE utilization for depthwise convolutions on a systolic array with minimal overheads. In addition, the PEs in systolic arrays can be efficiently used only if the data items ( tensors ) are arranged in the desired layout. Typical DNN accelerators provide various types of PE interconnects or additional modules to flexibly rearrange the data items and manage data movements during DNN computations. RiSA provides a lightweight set of tensor management tasks within the PE array itself that eliminates the need for an additional module for tensor reshaping tasks. Using this embedded tensor reshaping, RiSA supports various DNN models, including convolutional neural networks and natural language processing models while maintaining a high area efficiency. Compared to Eyeriss v2, RiSA improves the area and energy efficiency for MobileNet-V1 inference by 1.91× and 1.31×, respectively.


2021 ◽  
Author(s):  
Jeff Jun Zhang ◽  
Nicolas Bohm Agostini ◽  
Shihao Song ◽  
Cheng Tan ◽  
Ankur Limaye ◽  
...  

Author(s):  
A. Mondelli ◽  
C. Chang ◽  
A. Drobot ◽  
K. Ko ◽  
A. Mankofsky ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document