scholarly journals Accelerating Event Detection with DGCNN and FPGAs

Electronics ◽  
2020 ◽  
Vol 9 (10) ◽  
pp. 1666
Author(s):  
Zhe Han ◽  
Jingfei Jiang ◽  
Linbo Qiao ◽  
Yong Dou ◽  
Jinwei Xu ◽  
...  

Recently, Deep Neural Networks (DNNs) have been widely used in natural language processing. However, DNNs are often computation-intensive and memory-expensive. Therefore, deploying DNNs in the real world is very difficult. In order to solve this problem, we proposed a network model based on the dilate gated convolutional neural network, which is very hardware-friendly. We further expanded the word representations and depth of the network to improve the performance of the model. We replaced the Sigmoid function to make it more friendly for hardware computation without loss, and we quantized the network weights and activations to compress the network size. We then proposed the first FPGA (Field Programmable Gate Array)-based event detection accelerator based on the proposed model. The accelerator significantly reduced the latency with the fully pipelined architecture. We implemented the accelerator on the Xilinx XCKU115 FPGA. The experimental results show that our model obtains the highest F1-score of 84.6% in the ACE 2005 corpus. Meanwhile, the accelerator achieved 95.2 giga operations (GOP)/s and 13.4 GOPS/W in performance and energy efficiency, which is 17/158 times higher than the Graphics Processing Unit (GPU).

Electronics ◽  
2021 ◽  
Vol 10 (8) ◽  
pp. 884
Author(s):  
Stefano Rossi ◽  
Enrico Boni

Methods of increasing complexity are currently being proposed for ultrasound (US) echographic signal processing. Graphics Processing Unit (GPU) resources allowing massive exploitation of parallel computing are ideal candidates for these tasks. Many high-performance US instruments, including open scanners like ULA-OP 256, have an architecture based only on Field-Programmable Gate Arrays (FPGAs) and/or Digital Signal Processors (DSPs). This paper proposes the implementation of the embedded NVIDIA Jetson Xavier AGX module on board ULA-OP 256. The system architecture was revised to allow the introduction of a new Peripheral Component Interconnect Express (PCIe) communication channel, while maintaining backward compatibility with all other embedded computing resources already on board. Moreover, the Input/Output (I/O) peripherals of the module make the ultrasound system independent, freeing the user from the need to use an external controlling PC.


Micromachines ◽  
2021 ◽  
Vol 12 (7) ◽  
pp. 838
Author(s):  
Dong Hyun Hwang ◽  
Chang Yeop Han ◽  
Hyun Woo Oh ◽  
Seung Eun Lee

Artificial intelligence algorithms need an external computing device such as a graphics processing unit (GPU) due to computational complexity. For running artificial intelligence algorithms in an embedded device, many studies proposed light-weighted artificial intelligence algorithms and artificial intelligence accelerators. In this paper, we propose the ASimOV framework, which optimizes artificial intelligence algorithms and generates Verilog hardware description language (HDL) code for executing intelligence algorithms in field programmable gate array (FPGA). To verify ASimOV, we explore the performance space of k-NN algorithms and generate Verilog HDL code to demonstrate the k-NN accelerator in FPGA. Our contribution is to provide the artificial intelligence algorithm as an end-to-end pipeline and ensure that it is optimized to a specific dataset through simulation, and an artificial intelligence accelerator is generated in the end.


Author(s):  
Themistoklis Giitsidis ◽  
Nikolaos I Dourvas ◽  
Georgios Ch Sirakoulis

In this paper we present a model based on the parallel computational tool of cellular automata (CA) capable of simulating the process of disembarking in a small airplane seat layout, corresponding to Airbus A320/ Boeing 737 layout, in search of ways to make it faster and safer under normal evacuation conditions, as well as emergency scenarios. The proposed model is highly customizable, with the number of exits, the walking speed of passengers, depending on their sex, age and height, and the effects of retrieving and carrying luggage. Additionally, the presence of obstacles in the aisles as well as the emergence of panic being parameters whose values can be varied in order to enlighten the disembarking and emergency evacuation processes are considered in detail. The simulation results were compared to existing aircraft disembarking and evacuation times and indicate the efficacy of the proposed model in investigating and revealing passenger attributes during these processes in all the examined cases. Moreover, we parallelized our code in order to run on a graphics processing unit (GPU) using the CUDA programming language, speeding up the simulation process. Finally, in order to present a fully dynamical anticipative real-time system helpful for decision-making we implemented the proposed CA model in a field programmable gate array (FPGA) device, and recreated the results given by the software simulations in a fraction of the time. We then compared and exported the performance results among a sequential software implementation, the implementation running on a GPU, and a hardware implementation, proving the consequent acceleration that results from the parallel CA implementation in specific hardware.


Electronics ◽  
2019 ◽  
Vol 8 (3) ◽  
pp. 281 ◽  
Author(s):  
Bing Liu ◽  
Danyin Zou ◽  
Lei Feng ◽  
Shou Feng ◽  
Ping Fu ◽  
...  

The Convolutional Neural Network (CNN) has been used in many fields and has achieved remarkable results, such as image classification, face detection, and speech recognition. Compared to GPU (graphics processing unit) and ASIC, a FPGA (field programmable gate array)-based CNN accelerator has great advantages due to its low power consumption and reconfigurable property. However, FPGA’s extremely limited resources and CNN’s huge amount of parameters and computational complexity pose great challenges to the design. Based on the ZYNQ heterogeneous platform and the coordination of resource and bandwidth issues with the roofline model, the CNN accelerator we designed can accelerate both standard convolution and depthwise separable convolution with a high hardware resource rate. The accelerator can handle network layers of different scales through parameter configuration and maximizes bandwidth and achieves full pipelined by using a data stream interface and ping-pong on-chip cache. The experimental results show that the accelerator designed in this paper can achieve 17.11GOPS for 32bit floating point when it can also accelerate depthwise separable convolution, which has obvious advantages compared with other designs.


2007 ◽  
Author(s):  
Fredrick H. Rothganger ◽  
Kurt W. Larson ◽  
Antonio Ignacio Gonzales ◽  
Daniel S. Myers

2021 ◽  
Vol 22 (10) ◽  
pp. 5212
Author(s):  
Andrzej Bak

A key question confronting computational chemists concerns the preferable ligand geometry that fits complementarily into the receptor pocket. Typically, the postulated ‘bioactive’ 3D ligand conformation is constructed as a ‘sophisticated guess’ (unnecessarily geometry-optimized) mirroring the pharmacophore hypothesis—sometimes based on an erroneous prerequisite. Hence, 4D-QSAR scheme and its ‘dialects’ have been practically implemented as higher level of model abstraction that allows the examination of the multiple molecular conformation, orientation and protonation representation, respectively. Nearly a quarter of a century has passed since the eminent work of Hopfinger appeared on the stage; therefore the natural question occurs whether 4D-QSAR approach is still appealing to the scientific community? With no intention to be comprehensive, a review of the current state of art in the field of receptor-independent (RI) and receptor-dependent (RD) 4D-QSAR methodology is provided with a brief examination of the ‘mainstream’ algorithms. In fact, a myriad of 4D-QSAR methods have been implemented and applied practically for a diverse range of molecules. It seems that, 4D-QSAR approach has been experiencing a promising renaissance of interests that might be fuelled by the rising power of the graphics processing unit (GPU) clusters applied to full-atom MD-based simulations of the protein-ligand complexes.


Sign in / Sign up

Export Citation Format

Share Document