Platform Generation for Edge AI Devices with Custom Hardware Accelerators

Author(s):  
Leon Hielscher ◽  
Alexander Bloeck ◽  
Alexander Viehl ◽  
Sebastian Reiter ◽  
Marc Staiger ◽  
...  
Electronics ◽  
2019 ◽  
Vol 8 (6) ◽  
pp. 641 ◽  
Author(s):  
Miguel Rivera-Acosta ◽  
Susana Ortega-Cisneros ◽  
Jorge Rivera

This paper presents a platform that automatically generates custom hardware accelerators for convolutional neural networks (CNNs) implemented in field-programmable gate array (FPGA) devices. It includes a user interface for configuring and managing these accelerators. The herein-presented platform can perform all the processes necessary to design and test CNN accelerators from the CNN architecture description at both layer and internal parameter levels, training the desired architecture with any dataset and generating the configuration files required by the platform. With these files, it can synthesize the register-transfer level (RTL) and program the customized CNN accelerator into the FPGA device for testing, making it possible to generate custom CNN accelerators quickly and easily. All processes save the CNN architecture description are fully automatized and carried out by the platform, which manages third-party software to train the CNN and synthesize and program the generated RTL. The platform has been tested with the implementation of some of the CNN architectures found in the state-of-the-art for freely available datasets such as MNIST, CIFAR-10, and STL-10.


Author(s):  
S. Mahlke ◽  
R. Ravindran ◽  
M. Schlansker ◽  
R. Schreiber ◽  
T. Sherwood

2021 ◽  
Vol 7 ◽  
pp. e330
Author(s):  
Massimiliano Fasi ◽  
Nicholas J. Higham ◽  
Mantas Mikaitis ◽  
Srikara Pranesh

We explore the floating-point arithmetic implemented in the NVIDIA tensor cores, which are hardware accelerators for mixed-precision matrix multiplication available on the Volta, Turing, and Ampere microarchitectures. Using Volta V100, Turing T4, and Ampere A100 graphics cards, we determine what precision is used for the intermediate results, whether subnormal numbers are supported, what rounding mode is used, in which order the operations underlying the matrix multiplication are performed, and whether partial sums are normalized. These aspects are not documented by NVIDIA, and we gain insight by running carefully designed numerical experiments on these hardware units. Knowing the answers to these questions is important if one wishes to: (1) accurately simulate NVIDIA tensor cores on conventional hardware; (2) understand the differences between results produced by code that utilizes tensor cores and code that uses only IEEE 754-compliant arithmetic operations; and (3) build custom hardware whose behavior matches that of NVIDIA tensor cores. As part of this work we provide a test suite that can be easily adapted to test newer versions of the NVIDIA tensor cores as well as similar accelerators from other vendors, as they become available. Moreover, we identify a non-monotonicity issue affecting floating point multi-operand adders if the intermediate results are not normalized after each step.


1989 ◽  
Vol 6 (3) ◽  
pp. 77
Author(s):  
A.P. Ambler

Information ◽  
2020 ◽  
Vol 12 (1) ◽  
pp. 14
Author(s):  
Aluizio Rocha Neto ◽  
Thiago P. Silva ◽  
Thais Batista ◽  
Flávia C. Delicato ◽  
Paulo F. Pires ◽  
...  

In smart city scenarios, the huge proliferation of monitoring cameras scattered in public spaces has posed many challenges to network and processing infrastructure. A few dozen cameras are enough to saturate the city’s backbone. In addition, most smart city applications require a real-time response from the system in charge of processing such large-scale video streams. Finding a missing person using facial recognition technology is one of these applications that require immediate action on the place where that person is. In this paper, we tackle these challenges presenting a distributed system for video analytics designed to leverage edge computing capabilities. Our approach encompasses architecture, methods, and algorithms for: (i) dividing the burdensome processing of large-scale video streams into various machine learning tasks; and (ii) deploying these tasks as a workflow of data processing in edge devices equipped with hardware accelerators for neural networks. We also propose the reuse of nodes running tasks shared by multiple applications, e.g., facial recognition, thus improving the system’s processing throughput. Simulations showed that, with our algorithm to distribute the workload, the time to process a workflow is about 33% faster than a naive approach.


Sensors ◽  
2021 ◽  
Vol 21 (2) ◽  
pp. 391
Author(s):  
Luca Bigazzi ◽  
Stefano Gherardini ◽  
Giacomo Innocenti ◽  
Michele Basso

In this paper, solutions for precise maneuvering of an autonomous small (e.g., 350-class) Unmanned Aerial Vehicles (UAVs) are designed and implemented from smart modifications of non expensive mass market technologies. The considered class of vehicles suffers from light load, and, therefore, only a limited amount of sensors and computing devices can be installed on-board. Then, to make the prototype capable of moving autonomously along a fixed trajectory, a “cyber-pilot”, able on demand to replace the human operator, has been implemented on an embedded control board. This cyber-pilot overrides the commands thanks to a custom hardware signal mixer. The drone is able to localize itself in the environment without ground assistance by using a camera possibly mounted on a 3 Degrees Of Freedom (DOF) gimbal suspension. A computer vision system elaborates the video stream pointing out land markers with known absolute position and orientation. This information is fused with accelerations from a 6-DOF Inertial Measurement Unit (IMU) to generate a “virtual sensor” which provides refined estimates of the pose, the absolute position, the speed and the angular velocities of the drone. Due to the importance of this sensor, several fusion strategies have been investigated. The resulting data are, finally, fed to a control algorithm featuring a number of uncoupled digital PID controllers which work to bring to zero the displacement from the desired trajectory.


Sign in / Sign up

Export Citation Format

Share Document