AERO: A 1.28 MOP/s/LUT Reconfigurable Inference Processor for Recurrent Neural Networks in a Resource-Limited FPGA

This study presents a resource-efficient reconfigurable inference processor for recurrent neural networks (RNN), named AERO. AERO is programmable to perform inference on RNN models of various types. This was designed based on the instruction-set architecture specializing in processing primitive vector operations that compose the dataflows of RNN models. A versatile vector-processing unit (VPU) was incorporated to perform every vector operation and achieve a high resource efficiency. Aiming at a low resource usage, the multiplication in VPU is carried out on the basis of an approximation scheme. In addition, the activation functions are realized with the reduced tables. We developed a prototype inference system based on AERO using a resource-limited field-programmable gate array, under which the functionality of AERO was verified extensively for inference tasks based on several RNN models of different types. The resource efficiency of AERO was found to be as high as 1.28 MOP/s/LUT, which is 1.3-times higher than the previous state-of-the-art result.

Download Full-text

Real-time classification of hand movements as a basis for intuitive control of grasp neuroprostheses

Current Directions in Biomedical Engineering ◽

10.1515/cdbme-2020-2011 ◽

2020 ◽

Vol 6 (2) ◽

Author(s):

Dmitry Amelin ◽

Ivan Potapov ◽

Josep Cardona Audí ◽

Andreas Kogut ◽

Rüdiger Rupp ◽

...

Keyword(s):

Neural Networks ◽

Standard Deviation ◽

Real Time ◽

Convolutional Neural Networks ◽

Recurrent Neural Networks ◽

Healthy Subjects ◽

Hand Movements ◽

Cord Injury ◽

Field Programmable

AbstractThis paper reports on the evaluation of recurrent and convolutional neural networks as real-time grasp phase classifiers for future control of neuroprostheses for people with high spinal cord injury. A field-programmable gate array has been chosen as an implementation platform due to its form factor and ability to perform parallel computations, which are specific for the selected neural networks. Three different phases of two grasp patterns and the additional open hand pattern were predicted by means of surface Electromyography (EMG) signals (i.e. Seven classes in total). Across seven healthy subjects, CNN (Convolutional Neural Networks) and RNN (Recurrent Neural Networks) had a mean accuracy of 85.23% with a standard deviation of 4.77% and 112 µs per prediction and 83.30% with a standard deviation of 4.36% and 40 µs per prediction, respectively.

Download Full-text

Deploying a Smart Queuing System on Edge with Intel OpenVINO Toolkit

10.21203/rs.3.rs-509460/v1 ◽

2021 ◽

Author(s):

Rishit Dagli ◽

Süleyman Eken

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Public Transportation ◽

Queuing System ◽

Processing Unit ◽

Gate Arrays ◽

Field Programmable ◽

Programmable Gate Arrays ◽

The Cost ◽

Hardware Platforms

Abstract Recent increases in computational power and the development of specialized architecture led to the possibility to perform machine learning, especially inference, on the edge. OpenVINO is a toolkit based on Convolutional Neural Networks that facilitates fast-track development of computer vision algorithms and deep learning neural networks into vision applications, and enables their easy heterogeneous execution across hardware platforms. A smart queue management can be the key to the success of any sector.} In this paper, we focus on edge deployments to make the Smart Queuing System (SQS) accessible by all also providing ability to run it on cheap devices. This gives it the ability to run the queuing system deep learning algorithms on pre-existing computers which a retail store, public transportation facility or a factory may already possess thus considerably reducing the cost of deployment of such a system. SQS demonstrates how to create a video AI solution on the edge. We validate our results by testing it on multiple edge devices namely CPU, Integrated Edge Graphic Processing Unit (iGPU), Vision Processing Unit (VPU) and Field Programmable Gate Arrays (FPGAs). Experimental results show that deploying a SQS on edge is very promising.

Download Full-text

Using recurrent neural networks for probabilistic classification of the processor architecture of executable files

Neurocomputers ◽

10.18127/j19998554-202101-05 ◽

2021 ◽

Author(s):

A. A. Gladkikh ◽

M. O. Komakhin ◽

A. V. Simankov ◽

D. A. Uzenkov

Keyword(s):

Neural Network ◽

Neural Networks ◽

Recurrent Neural Networks ◽

Software Package ◽

Performance Metrics ◽

Digital Data ◽

Processor Architecture ◽

Classification Problems ◽

Processor Architectures ◽

Previous State

The main problem of applying recurrent neural networks to the problem of classifying processor architectures is that the use of a recurrent neural network is complicated by the lack of blocks that allow memorizing and taking into account the result of work at each next step. To solve this problem, the authors proposed a strategy for using a neural network based on the mechanism of controlled recurrent blocks. Each neuron of such a network has a memory cell, which stores the previous state and several filters. The update filter determines how much information will remain from the previous state and how much will be taken from the previous layer. The reset filter determines how much information about previous states is lost. The purpose of the work is to increase the efficiency of determining the processor architecture by code from executable files running on this processor by creating methods, algorithms and programs that are invariant to constant data (strings, constants, header sections, data sections, indents) contained in executable files. The paper discusses the features of the use of recurrent neural networks on the example of the problem of classifying the processor architecture by executable code from compiled executable files. The features of the machine code of various processor architectures used in modern computing have been briefly considered. The use of recurrent neural networks has been proposed, which have advantages in terms of speed and accuracy in solving classification problems. It is noted that in order to improve the classification results and practical use, it is necessary to provide a larger volume of the training sample for each of the classes, as well as to expand the number of classes. The proposed method based on a neural network with a mechanism of controlled recurrent blocks has been implemented in the software package that allows processing digital data from executable files for various processor architectures, in particular at the initial stage of security audit of embedded systems in order to determine a set of technical means that can be applied to analysis at subsequent stages. Conclusions have been drawn about the results of measuring the performance metrics of the algorithm and the possibility of expanding functionality without making changes to the architecture of the software package.

Download Full-text

Optimized Compression for Implementing Convolutional Neural Networks on FPGA

Electronics ◽

10.3390/electronics8030295 ◽

2019 ◽

Vol 8 (3) ◽

pp. 295 ◽

Cited By ~ 15

Author(s):

Min Zhang ◽

Linpeng Li ◽

Hai Wang ◽

Yan Liu ◽

Hongbo Qin ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Short Term Memory ◽

Graphics Processing Unit ◽

Processing Unit ◽

Central Processing ◽

Pruning Strategy ◽

Field Programmable ◽

Storage Format ◽

Evaluation Board

Field programmable gate array (FPGA) is widely considered as a promising platform for convolutional neural network (CNN) acceleration. However, the large numbers of parameters of CNNs cause heavy computing and memory burdens for FPGA-based CNN implementation. To solve this problem, this paper proposes an optimized compression strategy, and realizes an accelerator based on FPGA for CNNs. Firstly, a reversed-pruning strategy is proposed which reduces the number of parameters of AlexNet by a factor of 13× without accuracy loss on the ImageNet dataset. Peak-pruning is further introduced to achieve better compressibility. Moreover, quantization gives another 4× with negligible loss of accuracy. Secondly, an efficient storage technique, which aims for the reduction of the whole overhead cache of the convolutional layer and the fully connected layer, is presented respectively. Finally, the effectiveness of the proposed strategy is verified by an accelerator implemented on a Xilinx ZCU104 evaluation board. By improving existing pruning techniques and the storage format of sparse data, we significantly reduce the size of AlexNet by 28×, from 243 MB to 8.7 MB. In addition, the overall performance of our accelerator achieves 9.73 fps for the compressed AlexNet. Compared with the central processing unit (CPU) and graphics processing unit (GPU) platforms, our implementation achieves 182.3× and 1.1× improvements in latency and throughput, respectively, on the convolutional (CONV) layers of AlexNet, with an 822.0× and 15.8× improvement for energy efficiency, separately. This novel compression strategy provides a reference for other neural network applications, including CNNs, long short-term memory (LSTM), and recurrent neural networks (RNNs).

Download Full-text

Efficient and effective training of sparse recurrent neural networks

Neural Computing and Applications ◽

10.1007/s00521-021-05727-y ◽

2021 ◽

Author(s):

Shiwei Liu ◽

Iftitahu Ni’mah ◽

Vlado Menkovski ◽

Decebal Constantin Mocanu ◽

Mykola Pechenizkiy

Keyword(s):

Neural Networks ◽

Recurrent Neural Networks ◽

Short Term Memory ◽

State Of The Art ◽

Fixed Number ◽

Floating Point ◽

General Belief ◽

Practical Applications ◽

Model Compression ◽

Resource Limited

AbstractRecurrent neural networks (RNNs) have achieved state-of-the-art performances on various applications. However, RNNs are prone to be memory-bandwidth limited in practical applications and need both long periods of training and inference time. The aforementioned problems are at odds with training and deploying RNNs on resource-limited devices where the memory and floating-point operations (FLOPs) budget are strictly constrained. To address this problem, conventional model compression techniques usually focus on reducing inference costs, operating on a costly pre-trained model. Recently, dynamic sparse training has been proposed to accelerate the training process by directly training sparse neural networks from scratch. However, previous sparse training techniques are mainly designed for convolutional neural networks and multi-layer perceptron. In this paper, we introduce a method to train intrinsically sparse RNN models with a fixed number of parameters and floating-point operations (FLOPs) during training. We demonstrate state-of-the-art sparse performance with long short-term memory and recurrent highway networks on widely used tasks, language modeling, and text classification. We simply use the results to advocate that, contrary to the general belief that training a sparse neural network from scratch leads to worse performance than dense networks, sparse training with adaptive connectivity can usually achieve better performance than dense models for RNNs.

Download Full-text

Application of Artificial Intelligence To Predict Time-DependentMud-Weight Windows in Real Time

SPE Journal ◽

10.2118/206748-pa ◽

2021 ◽

pp. 1-21

Author(s):

Dung T. Phan ◽

Chao Liu ◽

Murtadha J. AlTammar ◽

Yanhui Han ◽

Younane N. Abousleiman

Keyword(s):

Artificial Intelligence ◽

Neural Networks ◽

Analytical Solution ◽

Real Time ◽

Analytical Solutions ◽

Wellbore Stability ◽

Time Dependent ◽

Processing Unit ◽

Inference System ◽

Drilling Operations

Summary Selection of a safe mud weight is crucial in drilling operations to reduce costly wellbore-instability problems. Advanced physics models and their analytical solutions for mud-weight-window computation are available but still demanding in terms of central-processing-unit (CPU) time. This paper presents an artificial-intelligence (AI) solution for predicting time-dependent safe mud-weight windows and very refined polar charts in real time. The AI agents are trained and tested on data generated from a time-dependent coupled analytical solution (poroelastic) because numerical solutions are prohibitively slow. Different AI techniques, including linear regression, decision tree, random forest, extra trees, adaptive neuro fuzzy inference system (ANFIS), and neural networks are evaluated to select the most suitable one. The results show that neural networks have the best performances and are capable of predicting time-dependentmud-weight windows and polar charts as accurately as the analytical solution, with 1/1,000 of the computer time needed, making them very applicable to real-time drilling operations. The trained neural networks achieve a mean squared error (MSE) of 0.0352 and a coefficient of determination (R2) of 0.9984 for collapse mud weights, and an MSE of 0.0072 and an R2 of 0.9998 for fracturing mud weights on test data sets. The neural networks are statistically guaranteed to predict mud weights that are within 5% and 10% of the analytical solutions with probability up to 0.986 and 0.997, respectively, for collapse mud weights, and up to 0.9992 and 0.9998, respectively, for fracturing mud weights. Their time performances are significantly faster and less demanding in computing capacity than the analytical solution, consistently showing three-orders-of-magnitude speedups in computational speed tests. The AI solution is integrated into a deployed wellbore-stability analyzer, which is used to demonstrate the AI’s performances and advantages through three case studies.

Download Full-text