Rethinking arithmetic for deep neural networks

We consider efficiency in the implementation of deep neural networks. Hardware accelerators are gaining interest as machine learning becomes one of the drivers of high-performance computing. In these accelerators, the directed graph describing a neural network can be implemented as a directed graph describing a Boolean circuit. We make this observation precise, leading naturally to an understanding of practical neural networks as discrete functions, and show that the so-called binarized neural networks are functionally complete. In general, our results suggest that it is valuable to consider Boolean circuits as neural networks , leading to the question of which circuit topologies are promising. We argue that continuity is central to generalization in learning, explore the interaction between data coding, network topology, and node functionality for continuity and pose some open questions for future research. As a first step to bridging the gap between continuous and Boolean views of neural network accelerators, we present some recent results from our work on LUTNet, a novel Field-Programmable Gate Array inference approach. Finally, we conclude with additional possible fruitful avenues for research bridging the continuous and discrete views of neural networks. This article is part of a discussion meeting issue ‘Numerical algorithms for high-performance computational science’.

Download Full-text

A Unified FPGA Virtualization Framework for General-Purpose Deep Neural Networks in the Cloud

ACM Transactions on Reconfigurable Technology and Systems ◽

10.1145/3480170 ◽

2022 ◽

Vol 15 (3) ◽

pp. 1-31

Author(s):

Shulin Zeng ◽

Guohao Dai ◽

Hanbo Sun ◽

Jun Liu ◽

Shiyao Li ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

High Performance ◽

Deep Neural Networks ◽

Cost Effective ◽

General Purpose ◽

The Other ◽

Private Cloud ◽

Single Task ◽

Other Hand

INFerence-as-a-Service (INFaaS) has become a primary workload in the cloud. However, existing FPGA-based Deep Neural Network (DNN) accelerators are mainly optimized for the fastest speed of a single task, while the multi-tenancy of INFaaS has not been explored yet. As the demand for INFaaS keeps growing, simply increasing the number of FPGA-based DNN accelerators is not cost-effective, while merely sharing these single-task optimized DNN accelerators in a time-division multiplexing way could lead to poor isolation and high-performance loss for INFaaS. On the other hand, current cloud-based DNN accelerators have excessive compilation overhead, especially when scaling out to multi-FPGA systems for multi-tenant sharing, leading to unacceptable compilation costs for both offline deployment and online reconfiguration. Therefore, it is far from providing efficient and flexible FPGA virtualization for public and private cloud scenarios. Aiming to solve these problems, we propose a unified virtualization framework for general-purpose deep neural networks in the cloud, enabling multi-tenant sharing for both the Convolution Neural Network (CNN), and the Recurrent Neural Network (RNN) accelerators on a single FPGA. The isolation is enabled by introducing a two-level instruction dispatch module and a multi-core based hardware resources pool. Such designs provide isolated and runtime-programmable hardware resources, which further leads to performance isolation for multi-tenant sharing. On the other hand, to overcome the heavy re-compilation overheads, a tiling-based instruction frame package design and a two-stage static-dynamic compilation, are proposed. Only the lightweight runtime information is re-compiled with ∼1 ms overhead, thus guaranteeing the private cloud’s performance. Finally, the extensive experimental results show that the proposed virtualized solutions achieve up to 3.12× and 6.18× higher throughput in the private cloud compared with the static CNN and RNN baseline designs, respectively.

Download Full-text

NU-LiteNet: Mobile Landmark Recognition using Convolutional Neural Networks

ECTI Transactions on Computer and Information Technology (ECTI-CIT) ◽

10.37936/ecti-cit.2019131.165074 ◽

2019 ◽

Vol 13 (1) ◽

pp. 21-28 ◽

Cited By ~ 1

Author(s):

Chakkrit Termritthikun ◽

Paisarn Muneesawang

Keyword(s):

Neural Network ◽

Neural Networks ◽

Convolutional Neural Network ◽

Mobile Devices ◽

Mobile Phones ◽

High Performance ◽

Deep Neural Networks ◽

Landmark Recognition ◽

Research Problems ◽

Model Size

The growth of high-performance mobile devices has resulted in more research into on-device image recognition. The research problems have been the latency and accuracy of automatic recognition, which remain as obstacles to its real-world usage. Although the recently developed deep neural networks can achieve accuracy comparable to that of a human user, some of them are still too slow. This paper describes the development of the architecture of a new convolutional neural network model, NU-LiteNet. For this, SqueezeNet was developed to reduce the model size to a degree suitable for smartphones. The model size of NU-LiteNet was therefore 2.6 times smaller than that of SqueezeNet. The model outperformed other Convolutional Neural Network (CNN) models for mobile devices (eg. SqueezeNet and MobileNet) with an accuracy of 81.15% and 69.58% on Singapore and Paris landmark datasets respectively. The shortest execution time of 0.7 seconds per image was recorded with NU-LiteNet on mobile phones.

Download Full-text

Architecture Analysis of an FPGA-Based Hopfield Neural Network

Advances in Artificial Neural Systems ◽

10.1155/2014/602325 ◽

2014 ◽

Vol 2014 ◽

pp. 1-10 ◽

Cited By ~ 1

Author(s):

Miguel Angelo de Abreu de Sousa ◽

Edson Lemos Horta ◽

Sergio Takeo Kofuji ◽

Emilio Del-Moral-Hernandez

Keyword(s):

Neural Network ◽

Neural Networks ◽

High Performance ◽

Neural Model ◽

Hopfield Neural Network ◽

Network Capacity ◽

Chip Area ◽

Characteristics Analysis ◽

Field Programmable ◽

Implementation Methodology

Interconnections between electronic circuits and neural computation have been a strongly researched topic in the machine learning field in order to approach several practical requirements, including decreasing training and operation times in high performance applications and reducing cost, size, and energy consumption for autonomous or embedded developments. Field programmable gate array (FPGA) hardware shows some inherent features typically associated with neural networks, such as, parallel processing, modular executions, and dynamic adaptation, and works on different types of FPGA-based neural networks were presented in recent years. This paper aims to address different aspects of architectural characteristics analysis on a Hopfield Neural Network implemented in FPGA, such as maximum operating frequency and chip-area occupancy according to the network capacity. Also, the FPGA implementation methodology, which does not employ multipliers in the architecture developed for the Hopfield neural model, is presented, in detail.

Download Full-text

StochasticNet in StochasticNet

Journal of Computational Vision and Imaging Systems ◽

10.15353/vsnl.v2i1.106 ◽

2016 ◽

Vol 2 (1) ◽

Author(s):

Mohammad Javad Shafiee ◽

Paul Fieguth ◽

Alexander Wong

Keyword(s):

Neural Network ◽

Neural Networks ◽

High Performance ◽

Deep Neural Network ◽

Deep Neural Networks ◽

Low Cost ◽

Network Architectures ◽

Modeling Accuracy ◽

Random Graph Theory ◽

Performance Computing

Deep neural networks have been shown to outperform conventionalstate-of-the-art approaches in several structured predictionapplications. While high-performance computing devices such asGPUs has made developing very powerful deep neural networkspossible, it is not feasible to run these networks on low-cost, lowpowercomputing devices such as embedded CPUs or even embeddedGPUs. As such, there has been a lot of recent interestto produce efficient deep neural network architectures that can berun on small computing devices. Motivated by this, the idea ofStochasticNets was introduced, where deep neural networks areformed by leveraging random graph theory. It has been shownthat StochasticNet can form new networks with 2X or 3X architecturalefficiency while maintaining modeling accuracy. Motivated bythese promising results, here we investigate the idea of Stochastic-Net in StochasticNet (SiS), where highly-efficient deep neural networkswith Network in Network (NiN) architectures are formed ina stochastic manner. Such networks have an intertwining structurecomposed of convolutional layers and micro neural networksto boost the modeling accuracy. The experimental results showthat SiS can form deep neural networks with NiN architectures thathave 4X greater architectural efficiency with only a 2% dropin accuracy for the CIFAR10 dataset. The results are even morepromising for the SVHN dataset, where SiS formed deep neuralnetworks with NiN architectures that have 11.5X greater architecturalefficiency with only a 1% decrease in modeling accuracy.

Download Full-text

Deep Neural Network for Visual Stimulus-Based Reaction Time Estimation Using the Periodogram of Single-Trial EEG

Sensors ◽

10.3390/s20216090 ◽

2020 ◽

Vol 20 (21) ◽

pp. 6090

Author(s):

Mohammad Samin Nur Chowdhury ◽

Arindam Dutta ◽

Matthew Kyle Robison ◽

Chris Blais ◽

Gene Arnold Brewer ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Reaction Time ◽

Visual Stimulus ◽

High Performance ◽

Deep Neural Networks ◽

Robust Regression ◽

Correlation Coefficients ◽

Time Estimation ◽

Single Trial

Multiplexed deep neural networks (DNN) have engendered high-performance predictive models gaining popularity for decoding brain waves, extensively collected in the form of electroencephalogram (EEG) signals. In this paper, to the best of our knowledge, we introduce a first-ever DNN-based generalized approach to estimate reaction time (RT) using the periodogram representation of single-trial EEG in a visual stimulus-response experiment with 48 participants. We have designed a Fully Connected Neural Network (FCNN) and a Convolutional Neural Network (CNN) to predict and classify RTs for each trial. Though deep neural networks are widely known for classification applications, cascading FCNN/CNN with the Random Forest model, we designed a robust regression-based estimator to predict RT. With the FCNN model, the accuracies obtained for binary and 3-class classification were 93% and 76%, respectively, which further improved with the use of CNN (94% and 78%, respectively). The regression-based approach predicted RTs with correlation coefficients (CC) of 0.78 and 0.80 for FCNN and CNN, respectively. Investigating further, we found that the left central as well as parietal and occipital lobes were crucial for predicting RT, with significant activities in the theta and alpha frequency bands.

Download Full-text

ExpDNN: Explainable Deep Neural Network

10.21203/rs.3.rs-299913/v1 ◽

2021 ◽

Author(s):

Chi-Hua Chen

Keyword(s):

Neural Network ◽

Neural Networks ◽

Pattern Recognition ◽

Feature Extraction ◽

High Performance ◽

Deep Neural Network ◽

Deep Neural Networks ◽

Regression Method ◽

Linear Regression Method ◽

Absolute Value

Abstract In recent years, deep neural networks have been applied to obtain high performance of prediction, classification, and pattern recognition. However, the weights in these deep neural networks are difficult to be explained. Although a linear regression method can provide explainable results, the method is not suitable in the case of input interaction. Therefore, an explainable deep neural network (ExpDNN) with explainable layers is proposed to obtain explainable results in the case of input interaction. Three cases were given to evaluate the proposed ExpDNN, and the results showed that the absolute value of weight in an explainable layer can be used to explain the weight of corresponding input for feature extraction.

Download Full-text

Memory Requirement Reduction of Deep Neural Networks for Field Programmable Gate Arrays Using Low-Bit Quantization of Parameters

2020 28th European Signal Processing Conference (EUSIPCO) ◽

10.23919/eusipco47968.2020.9287739 ◽

2021 ◽

Author(s):

Niccolo Nicodemo ◽

Gaurav Naithani ◽

Konstantinos Drossos ◽

Tuomas Virtanen ◽

Roberto Saletti

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Field Programmable Gate Arrays ◽

Memory Requirement ◽

Gate Arrays ◽

Field Programmable ◽

Programmable Gate Arrays

Download Full-text

Deep neural networks using a single neuron: folded-in-time architecture using feedback-modulated delay loops

Nature Communications ◽

10.1038/s41467-021-25427-4 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Florian Stelzer ◽

André Röhm ◽

Raul Vicente ◽

Ingo Fischer ◽

Serhiy Yanchuk

Keyword(s):

Neural Network ◽

Neural Networks ◽

Deep Neural Network ◽

Single Neuron ◽

Deep Neural Networks ◽

Back Propagation ◽

Local Network ◽

Multiple Time ◽

Learning Tools ◽

Back Propagation Algorithm

AbstractDeep neural networks are among the most widely applied machine learning tools showing outstanding performance in a broad range of tasks. We present a method for folding a deep neural network of arbitrary size into a single neuron with multiple time-delayed feedback loops. This single-neuron deep neural network comprises only a single nonlinearity and appropriately adjusted modulations of the feedback signals. The network states emerge in time as a temporal unfolding of the neuron’s dynamics. By adjusting the feedback-modulation within the loops, we adapt the network’s connection weights. These connection weights are determined via a back-propagation algorithm, where both the delay-induced and local network connections must be taken into account. Our approach can fully represent standard Deep Neural Networks (DNN), encompasses sparse DNNs, and extends the DNN concept toward dynamical systems implementations. The new method, which we call Folded-in-time DNN (Fit-DNN), exhibits promising performance in a set of benchmark tasks.

Download Full-text

Location- and Person-Independent Activity Recognition with WiFi, Deep Neural Networks, and Reinforcement Learning

ACM Transactions on Internet of Things ◽

10.1145/3424739 ◽

2021 ◽

Vol 2 (1) ◽

pp. 1-25

Author(s):

Yongsen Ma ◽

Sheheryar Arshad ◽

Swetha Muniraju ◽

Eric Torkildson ◽

Enrico Rantala ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Reinforcement Learning ◽

Activity Recognition ◽

Deep Neural Networks ◽

State Machine ◽

Recognition Algorithm ◽

The State ◽

Neural Architecture ◽

Learning Agent

In recent years, Channel State Information (CSI) measured by WiFi is widely used for human activity recognition. In this article, we propose a deep learning design for location- and person-independent activity recognition with WiFi. The proposed design consists of three Deep Neural Networks (DNNs): a 2D Convolutional Neural Network (CNN) as the recognition algorithm, a 1D CNN as the state machine, and a reinforcement learning agent for neural architecture search. The recognition algorithm learns location- and person-independent features from different perspectives of CSI data. The state machine learns temporal dependency information from history classification results. The reinforcement learning agent optimizes the neural architecture of the recognition algorithm using a Recurrent Neural Network (RNN) with Long Short-Term Memory (LSTM). The proposed design is evaluated in a lab environment with different WiFi device locations, antenna orientations, sitting/standing/walking locations/orientations, and multiple persons. The proposed design has 97% average accuracy when testing devices and persons are not seen during training. The proposed design is also evaluated by two public datasets with accuracy of 80% and 83%. The proposed design needs very little human efforts for ground truth labeling, feature engineering, signal processing, and tuning of learning parameters and hyperparameters.

Download Full-text

An efficient pruning scheme of deep neural networks for Internet of Things applications

EURASIP Journal on Advances in Signal Processing ◽

10.1186/s13634-021-00744-4 ◽

2021 ◽

Vol 2021 (1) ◽

Author(s):

Chen Qi ◽

Shibo Shen ◽

Rongpeng Li ◽

Zhifeng Zhao ◽

Qing Liu ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Internet Of Things ◽

Deep Neural Networks ◽

Computational Cost ◽

Superior Performance ◽

Compact Structure ◽

Resource Limited ◽

Benchmark Datasets ◽

Iot Devices

AbstractNowadays, deep neural networks (DNNs) have been rapidly deployed to realize a number of functionalities like sensing, imaging, classification, recognition, etc. However, the computational-intensive requirement of DNNs makes it difficult to be applicable for resource-limited Internet of Things (IoT) devices. In this paper, we propose a novel pruning-based paradigm that aims to reduce the computational cost of DNNs, by uncovering a more compact structure and learning the effective weights therein, on the basis of not compromising the expressive capability of DNNs. In particular, our algorithm can achieve efficient end-to-end training that transfers a redundant neural network to a compact one with a specifically targeted compression rate directly. We comprehensively evaluate our approach on various representative benchmark datasets and compared with typical advanced convolutional neural network (CNN) architectures. The experimental results verify the superior performance and robust effectiveness of our scheme. For example, when pruning VGG on CIFAR-10, our proposed scheme is able to significantly reduce its FLOPs (floating-point operations) and number of parameters with a proportion of 76.2% and 94.1%, respectively, while still maintaining a satisfactory accuracy. To sum up, our scheme could facilitate the integration of DNNs into the common machine-learning-based IoT framework and establish distributed training of neural networks in both cloud and edge.

Download Full-text