High performance accelerators for deep neural networks: A review

SummaryAchievement of human-level image recognition by deep neural networks (DNNs) has spurred interest in whether and how DNNs are brain-like. Both DNNs and the visual cortex perform hierarchical processing, and correspondence has been shown between hierarchical visual areas and DNN layers in representing visual features. Here, we propose the brain hierarchy (BH) score as a metric to quantify the degree of hierarchical correspondence based on the decoding of individual DNN unit activations from human brain activity. We find that BH scores for 29 pretrained DNNs with varying architectures are negatively correlated with image recognition performance, indicating that recently developed high-performance DNNs are not necessarily brain-like. Experimental manipulations of DNN models suggest that relatively simple feedforward architecture with broad spatial integration is critical to brain-like hierarchy. Our method provides new ways for designing DNNs and understanding the brain in consideration of their representational homology.

Download Full-text

Implementation of high-performance, sub-microsecond deep neural networks on FPGAs for trigger applications

Journal of Instrumentation ◽

10.1088/1748-0221/14/09/p09014 ◽

2019 ◽

Vol 14 (09) ◽

pp. P09014-P09014 ◽

Cited By ~ 2

Author(s):

N. Nottbeck ◽

Dr. C. Schmitt ◽

Prof. Dr. V. Büscher

Keyword(s):

Neural Networks ◽

High Performance ◽

Deep Neural Networks

Download Full-text

Towards high performance low bitwidth training for deep neural networks

Journal of Semiconductors ◽

10.1088/1674-4926/41/2/022404 ◽

2020 ◽

Vol 41 (2) ◽

pp. 022404

Author(s):

Chunyou Su ◽

Sheng Zhou ◽

Liang Feng ◽

Wei Zhang

Keyword(s):

Neural Networks ◽

High Performance ◽

Deep Neural Networks

Download Full-text

A Unified FPGA Virtualization Framework for General-Purpose Deep Neural Networks in the Cloud

ACM Transactions on Reconfigurable Technology and Systems ◽

10.1145/3480170 ◽

2022 ◽

Vol 15 (3) ◽

pp. 1-31

Author(s):

Shulin Zeng ◽

Guohao Dai ◽

Hanbo Sun ◽

Jun Liu ◽

Shiyao Li ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

High Performance ◽

Deep Neural Networks ◽

Cost Effective ◽

General Purpose ◽

The Other ◽

Private Cloud ◽

Single Task ◽

Other Hand

INFerence-as-a-Service (INFaaS) has become a primary workload in the cloud. However, existing FPGA-based Deep Neural Network (DNN) accelerators are mainly optimized for the fastest speed of a single task, while the multi-tenancy of INFaaS has not been explored yet. As the demand for INFaaS keeps growing, simply increasing the number of FPGA-based DNN accelerators is not cost-effective, while merely sharing these single-task optimized DNN accelerators in a time-division multiplexing way could lead to poor isolation and high-performance loss for INFaaS. On the other hand, current cloud-based DNN accelerators have excessive compilation overhead, especially when scaling out to multi-FPGA systems for multi-tenant sharing, leading to unacceptable compilation costs for both offline deployment and online reconfiguration. Therefore, it is far from providing efficient and flexible FPGA virtualization for public and private cloud scenarios. Aiming to solve these problems, we propose a unified virtualization framework for general-purpose deep neural networks in the cloud, enabling multi-tenant sharing for both the Convolution Neural Network (CNN), and the Recurrent Neural Network (RNN) accelerators on a single FPGA. The isolation is enabled by introducing a two-level instruction dispatch module and a multi-core based hardware resources pool. Such designs provide isolated and runtime-programmable hardware resources, which further leads to performance isolation for multi-tenant sharing. On the other hand, to overcome the heavy re-compilation overheads, a tiling-based instruction frame package design and a two-stage static-dynamic compilation, are proposed. Only the lightweight runtime information is re-compiled with ∼1 ms overhead, thus guaranteeing the private cloud’s performance. Finally, the extensive experimental results show that the proposed virtualized solutions achieve up to 3.12× and 6.18× higher throughput in the private cloud compared with the static CNN and RNN baseline designs, respectively.

Download Full-text

Detection of Melanoma Skin Cancer with Deep Neural Networks

Medical & Clinical Research ◽

10.33140/mcr.04.04.05 ◽

2019 ◽

Vol 4 (4) ◽

Keyword(s):

Neural Networks ◽

Skin Cancer ◽

High Performance ◽

Deep Neural Networks ◽

Histopathological Examination ◽

Skin Lesions ◽

Image Features ◽

Test Accuracy ◽

Data Sets ◽

Deep Convolutional Neural Networks

Detection of skin cancer involves several steps of examinations first being visual diagnosis that is followed by dermoscopic analysis, a biopsy, and histopathological examination. The classification of skin lesions in the first step is critical and challenging as classes vary by minute appearance in skin lesions. Deep convolutional neural networks (CNNs) have great potential in multicategory image-based classification by considering coarse-to-fine image features. This study aims to demonstrate how to classify skin lesions, in particular, melanoma, using CNN trained on data sets with disease labels. We developed and trained our own CNN model using a subset of the images from International Skin Imaging Collaboration (ISIC) Dermoscopic Archive. To test the performance of the proposed model, we used a different subset of images from the same archive as the test set. Our model is trained to classify images into two categories: malignant melanoma and nevus and is shown to achieve excellent classification results with high test accuracy (91.16%) and high performance as measured by various metrics. Our study demonstrated the potential of using deep neural networks to assist early detection of melanoma and thereby improve the patient survival rate from this aggressive skin cancer.

Download Full-text

NU-LiteNet: Mobile Landmark Recognition using Convolutional Neural Networks

ECTI Transactions on Computer and Information Technology (ECTI-CIT) ◽

10.37936/ecti-cit.2019131.165074 ◽

2019 ◽

Vol 13 (1) ◽

pp. 21-28 ◽

Cited By ~ 1

Author(s):

Chakkrit Termritthikun ◽

Paisarn Muneesawang

Keyword(s):

Neural Network ◽

Neural Networks ◽

Convolutional Neural Network ◽

Mobile Devices ◽

Mobile Phones ◽

High Performance ◽

Deep Neural Networks ◽

Landmark Recognition ◽

Research Problems ◽

Model Size

The growth of high-performance mobile devices has resulted in more research into on-device image recognition. The research problems have been the latency and accuracy of automatic recognition, which remain as obstacles to its real-world usage. Although the recently developed deep neural networks can achieve accuracy comparable to that of a human user, some of them are still too slow. This paper describes the development of the architecture of a new convolutional neural network model, NU-LiteNet. For this, SqueezeNet was developed to reduce the model size to a degree suitable for smartphones. The model size of NU-LiteNet was therefore 2.6 times smaller than that of SqueezeNet. The model outperformed other Convolutional Neural Network (CNN) models for mobile devices (eg. SqueezeNet and MobileNet) with an accuracy of 81.15% and 69.58% on Singapore and Paris landmark datasets respectively. The shortest execution time of 0.7 seconds per image was recorded with NU-LiteNet on mobile phones.

Download Full-text

SWIRL: High-performance many-core CPU code generation for deep neural networks

The International Journal of High Performance Computing Applications ◽

10.1177/1094342019866247 ◽

2019 ◽

Vol 33 (6) ◽

pp. 1275-1289 ◽

Cited By ~ 3

Author(s):

Anand Venkat ◽

Tharindu Rusira ◽

Raj Barik ◽

Mary Hall ◽

Leonard Truong

Keyword(s):

Neural Networks ◽

Language Processing ◽

Code Generation ◽

High Performance ◽

Deep Neural Networks ◽

Graphics Processing Unit ◽

Processing Unit ◽

Data Movement ◽

Central Processing ◽

The Status

Deep neural networks (DNNs) have demonstrated effectiveness in many domains including object recognition, speech recognition, natural language processing, and health care. Typically, the computations involved in DNN training and inferencing are time consuming and require efficient implementations. Existing frameworks such as TensorFlow, Theano, Torch, Cognitive Tool Kit (CNTK), and Caffe enable Graphics Processing Unit (GPUs) as the status quo devices for DNN execution, leaving Central Processing Unit (CPUs) behind. Moreover, existing frameworks forgo or limit cross layer optimization opportunities that have the potential to improve performance by significantly reducing data movement through the memory hierarchy. In this article, we describe an alternative approach called SWIRL, a compiler that provides high-performance CPU implementations for DNNs. SWIRL is built on top of the existing domain-specific language (DSL) for DNNs called LATTE. SWIRL separates DNN specification and its schedule using predefined transformation recipes for tensors and layers commonly found in DNN layers. These recipes synergize with DSL constructs to generate high-quality fused, vectorized, and parallelized code for CPUs. On an Intel Xeon Platinum 8180M CPU, SWIRL achieves performance comparable with Tensorflow integrated with MKL-DNN; on average 1.00× of Tensorflow inference and 0.99× of Tensorflow training. It also outperforms the original LATTE compiler on average by 1.22× and 1.30× on inference and training, respectively.

Download Full-text