Accelerating Event Detection with DGCNN and FPGAs

Zhe Han; Jingfei Jiang; Linbo Qiao; Yong Dou; Jinwei Xu; Zhigang Kan

doi:10.3390/electronics9101666

Accelerating Event Detection with DGCNN and FPGAs

Electronics ◽

10.3390/electronics9101666 ◽

2020 ◽

Vol 9 (10) ◽

pp. 1666

Author(s):

Zhe Han ◽

Jingfei Jiang ◽

Linbo Qiao ◽

Yong Dou ◽

Jinwei Xu ◽

...

Keyword(s):

Language Processing ◽

Event Detection ◽

Graphics Processing Unit ◽

Network Size ◽

Processing Unit ◽

Sigmoid Function ◽

Pipelined Architecture ◽

Proposed Model ◽

Field Programmable ◽

Graphics Processing

Recently, Deep Neural Networks (DNNs) have been widely used in natural language processing. However, DNNs are often computation-intensive and memory-expensive. Therefore, deploying DNNs in the real world is very difficult. In order to solve this problem, we proposed a network model based on the dilate gated convolutional neural network, which is very hardware-friendly. We further expanded the word representations and depth of the network to improve the performance of the model. We replaced the Sigmoid function to make it more friendly for hardware computation without loss, and we quantized the network weights and activations to compress the network size. We then proposed the first FPGA (Field Programmable Gate Array)-based event detection accelerator based on the proposed model. The accelerator significantly reduced the latency with the fully pipelined architecture. We implemented the accelerator on the Xilinx XCKU115 FPGA. The experimental results show that our model obtains the highest F1-score of 84.6% in the ACE 2005 corpus. Meanwhile, the accelerator achieved 95.2 giga operations (GOP)/s and 13.4 GOPS/W in performance and energy efficiency, which is 17/158 times higher than the Graphics Processing Unit (GPU).

Download Full-text

Embedded GPU Implementation for High-Performance Ultrasound Imaging

Electronics ◽

10.3390/electronics10080884 ◽

2021 ◽

Vol 10 (8) ◽

pp. 884

Author(s):

Stefano Rossi ◽

Enrico Boni

Keyword(s):

High Performance ◽

Graphics Processing Unit ◽

Digital Signal ◽

Processing Unit ◽

Embedded Computing ◽

Field Programmable ◽

Peripheral Component Interconnect ◽

Programmable Gate Arrays ◽

Graphics Processing ◽

Signal Processors

Methods of increasing complexity are currently being proposed for ultrasound (US) echographic signal processing. Graphics Processing Unit (GPU) resources allowing massive exploitation of parallel computing are ideal candidates for these tasks. Many high-performance US instruments, including open scanners like ULA-OP 256, have an architecture based only on Field-Programmable Gate Arrays (FPGAs) and/or Digital Signal Processors (DSPs). This paper proposes the implementation of the embedded NVIDIA Jetson Xavier AGX module on board ULA-OP 256. The system architecture was revised to allow the introduction of a new Peripheral Component Interconnect Express (PCIe) communication channel, while maintaining backward compatibility with all other embedded computing resources already on board. Moreover, the Input/Output (I/O) peripherals of the module make the ultrasound system independent, freeing the user from the need to use an external controlling PC.

Download Full-text

ASimOV: A Framework for Simulation and Optimization of an Embedded AI Accelerator

Micromachines ◽

10.3390/mi12070838 ◽

2021 ◽

Vol 12 (7) ◽

pp. 838

Author(s):

Dong Hyun Hwang ◽

Chang Yeop Han ◽

Hyun Woo Oh ◽

Seung Eun Lee

Keyword(s):

Artificial Intelligence ◽

Graphics Processing Unit ◽

Processing Unit ◽

Verilog Hdl ◽

Computing Device ◽

Simulation And Optimization ◽

Performance Space ◽

Field Programmable ◽

Hardware Description ◽

Graphics Processing

Artificial intelligence algorithms need an external computing device such as a graphics processing unit (GPU) due to computational complexity. For running artificial intelligence algorithms in an embedded device, many studies proposed light-weighted artificial intelligence algorithms and artificial intelligence accelerators. In this paper, we propose the ASimOV framework, which optimizes artificial intelligence algorithms and generates Verilog hardware description language (HDL) code for executing intelligence algorithms in field programmable gate array (FPGA). To verify ASimOV, we explore the performance space of k-NN algorithms and generate Verilog HDL code to demonstrate the k-NN accelerator in FPGA. Our contribution is to provide the artificial intelligence algorithm as an end-to-end pipeline and ensure that it is optimized to a specific dataset through simulation, and an artificial intelligence accelerator is generated in the end.

Download Full-text

Parallel implementation of aircraft disembarking and emergency evacuation based on cellular automata

The International Journal of High Performance Computing Applications ◽

10.1177/1094342015584533 ◽

2016 ◽

Vol 31 (2) ◽

pp. 134-151 ◽

Cited By ~ 12

Author(s):

Themistoklis Giitsidis ◽

Nikolaos I Dourvas ◽

Georgios Ch Sirakoulis

Keyword(s):

Cellular Automata ◽

Graphics Processing Unit ◽

Parallel Implementation ◽

Emergency Evacuation ◽

Processing Unit ◽

Real Time System ◽

Proposed Model ◽

Cuda Programming ◽

Field Programmable ◽

Emergency Scenarios

In this paper we present a model based on the parallel computational tool of cellular automata (CA) capable of simulating the process of disembarking in a small airplane seat layout, corresponding to Airbus A320/ Boeing 737 layout, in search of ways to make it faster and safer under normal evacuation conditions, as well as emergency scenarios. The proposed model is highly customizable, with the number of exits, the walking speed of passengers, depending on their sex, age and height, and the effects of retrieving and carrying luggage. Additionally, the presence of obstacles in the aisles as well as the emergence of panic being parameters whose values can be varied in order to enlighten the disembarking and emergency evacuation processes are considered in detail. The simulation results were compared to existing aircraft disembarking and evacuation times and indicate the efficacy of the proposed model in investigating and revealing passenger attributes during these processes in all the examined cases. Moreover, we parallelized our code in order to run on a graphics processing unit (GPU) using the CUDA programming language, speeding up the simulation process. Finally, in order to present a fully dynamical anticipative real-time system helpful for decision-making we implemented the proposed CA model in a field programmable gate array (FPGA) device, and recreated the results given by the software simulations in a fraction of the time. We then compared and exported the performance results among a sequential software implementation, the implementation running on a GPU, and a hardware implementation, proving the consequent acceleration that results from the parallel CA implementation in specific hardware.

Download Full-text

An FPGA-Based CNN Accelerator Integrating Depthwise Separable Convolution

Electronics ◽

10.3390/electronics8030281 ◽

2019 ◽

Vol 8 (3) ◽

pp. 281 ◽

Cited By ~ 11

Author(s):

Bing Liu ◽

Danyin Zou ◽

Lei Feng ◽

Shou Feng ◽

Ping Fu ◽

...

Keyword(s):

Graphics Processing Unit ◽

Processing Unit ◽

Network Layers ◽

Ping Pong ◽

Parameter Configuration ◽

Field Programmable ◽

Hardware Resource ◽

Roofline Model ◽

On Chip ◽

Graphics Processing

The Convolutional Neural Network (CNN) has been used in many fields and has achieved remarkable results, such as image classification, face detection, and speech recognition. Compared to GPU (graphics processing unit) and ASIC, a FPGA (field programmable gate array)-based CNN accelerator has great advantages due to its low power consumption and reconfigurable property. However, FPGA’s extremely limited resources and CNN’s huge amount of parameters and computational complexity pose great challenges to the design. Based on the ZYNQ heterogeneous platform and the coordination of resource and bandwidth issues with the roofline model, the CNN accelerator we designed can accelerate both standard convolution and depthwise separable convolution with a high hardware resource rate. The accelerator can handle network layers of different scales through parameter configuration and maximizes bandwidth and achieves full pipelined by using a data stream interface and ping-pong on-chip cache. The experimental results show that the accelerator designed in this paper can achieve 17.11GOPS for 32bit floating point when it can also accelerate depthwise separable convolution, which has obvious advantages compared with other designs.

Download Full-text

Fast iterative solvers for large compressed-sparse row linear systems on graphics processing unit

Pollack Periodica ◽

10.1556/pollack.10.2015.1.1 ◽

2015 ◽

Vol 10 (1) ◽

pp. 3-18 ◽

Cited By ~ 1

Author(s):

Frédéric Magoulès ◽

Abal-Kassim Cheik Ahamed ◽

Roman Putanowicz

Keyword(s):

Linear Systems ◽

Graphics Processing Unit ◽

Iterative Solvers ◽

Processing Unit ◽

Compressed Sparse Row ◽

Graphics Processing

Download Full-text

Performance Analysis and Optimization of Graphics Processing Unit

SSRN Electronic Journal ◽

10.2139/ssrn.3350249 ◽

2019 ◽

Author(s):

Lokendra Singh Umrao ◽

Jay Prakash Pandey

Keyword(s):

Performance Analysis ◽

Graphics Processing Unit ◽

Processing Unit ◽

Graphics Processing

Download Full-text

Implementing wide baseline matching algorithms on a graphics processing unit.

10.2172/921737 ◽

2007 ◽

Author(s):

Fredrick H. Rothganger ◽

Kurt W. Larson ◽

Antonio Ignacio Gonzales ◽

Daniel S. Myers

Keyword(s):

Graphics Processing Unit ◽

Processing Unit ◽

Wide Baseline Matching ◽

Graphics Processing

Download Full-text

Two Decades of 4D-QSAR: A Dying Art or Staging a Comeback?

International Journal of Molecular Sciences ◽

10.3390/ijms22105212 ◽

2021 ◽

Vol 22 (10) ◽

pp. 5212

Author(s):

Andrzej Bak

Keyword(s):

Molecular Conformation ◽

Graphics Processing Unit ◽

Processing Unit ◽

Diverse Range ◽

Current State ◽

Gpu Clusters ◽

Pharmacophore Hypothesis ◽

Rising Power ◽

Graphics Processing ◽

Ligand Conformation

A key question confronting computational chemists concerns the preferable ligand geometry that fits complementarily into the receptor pocket. Typically, the postulated ‘bioactive’ 3D ligand conformation is constructed as a ‘sophisticated guess’ (unnecessarily geometry-optimized) mirroring the pharmacophore hypothesis—sometimes based on an erroneous prerequisite. Hence, 4D-QSAR scheme and its ‘dialects’ have been practically implemented as higher level of model abstraction that allows the examination of the multiple molecular conformation, orientation and protonation representation, respectively. Nearly a quarter of a century has passed since the eminent work of Hopfinger appeared on the stage; therefore the natural question occurs whether 4D-QSAR approach is still appealing to the scientific community? With no intention to be comprehensive, a review of the current state of art in the field of receptor-independent (RI) and receptor-dependent (RD) 4D-QSAR methodology is provided with a brief examination of the ‘mainstream’ algorithms. In fact, a myriad of 4D-QSAR methods have been implemented and applied practically for a diverse range of molecules. It seems that, 4D-QSAR approach has been experiencing a promising renaissance of interests that might be fuelled by the rising power of the graphics processing unit (GPU) clusters applied to full-atom MD-based simulations of the protein-ligand complexes.

Download Full-text

Parallelization of Global Sequence Alignment on Graphics Processing Unit

2020 International Conference on Communications, Computing, Cybersecurity, and Informatics (CCCI) ◽

10.1109/ccci49893.2020.9256747 ◽

2020 ◽

Author(s):

Kailash W. Kalare ◽

Mohammad S. Obaidat ◽

Jitendra V. Tembhurne ◽

Chandrashekhar Meshram ◽

Kuei-Fang Hsiao

Keyword(s):

Sequence Alignment ◽

Graphics Processing Unit ◽

Processing Unit ◽

Graphics Processing

Download Full-text

Graphics processing unit acceleration of the island model genetic algorithm using the CUDA programming platform

Concurrency and Computation Practice and Experience ◽

10.1002/cpe.6286 ◽

2021 ◽

Author(s):

Dylan M. Janssen ◽

Wayne Pullan ◽

Alan Wee‐Chung Liew

Keyword(s):

Genetic Algorithm ◽

Graphics Processing Unit ◽

Island Model ◽

Processing Unit ◽

Cuda Programming ◽

Graphics Processing

Download Full-text