High performance accelerators for deep neural networks: A review

2021 ◽  
Author(s):  
Mohd Saqib Akhoon ◽  
Shahrel A. Suandi ◽  
Abdullah Alshahrani ◽  
Abdul‐Malik H. Y. Saad ◽  
Fahad R. Albogamy ◽  
...  
2020 ◽  
Author(s):  
Soma Nonaka ◽  
Kei Majima ◽  
Shuntaro C. Aoki ◽  
Yukiyasu Kamitani

SummaryAchievement of human-level image recognition by deep neural networks (DNNs) has spurred interest in whether and how DNNs are brain-like. Both DNNs and the visual cortex perform hierarchical processing, and correspondence has been shown between hierarchical visual areas and DNN layers in representing visual features. Here, we propose the brain hierarchy (BH) score as a metric to quantify the degree of hierarchical correspondence based on the decoding of individual DNN unit activations from human brain activity. We find that BH scores for 29 pretrained DNNs with varying architectures are negatively correlated with image recognition performance, indicating that recently developed high-performance DNNs are not necessarily brain-like. Experimental manipulations of DNN models suggest that relatively simple feedforward architecture with broad spatial integration is critical to brain-like hierarchy. Our method provides new ways for designing DNNs and understanding the brain in consideration of their representational homology.


2019 ◽  
Vol 14 (09) ◽  
pp. P09014-P09014 ◽  
Author(s):  
N. Nottbeck ◽  
Dr. C. Schmitt ◽  
Prof. Dr. V. Büscher

2020 ◽  
Vol 41 (2) ◽  
pp. 022404
Author(s):  
Chunyou Su ◽  
Sheng Zhou ◽  
Liang Feng ◽  
Wei Zhang

2022 ◽  
Vol 15 (3) ◽  
pp. 1-31
Author(s):  
Shulin Zeng ◽  
Guohao Dai ◽  
Hanbo Sun ◽  
Jun Liu ◽  
Shiyao Li ◽  
...  

INFerence-as-a-Service (INFaaS) has become a primary workload in the cloud. However, existing FPGA-based Deep Neural Network (DNN) accelerators are mainly optimized for the fastest speed of a single task, while the multi-tenancy of INFaaS has not been explored yet. As the demand for INFaaS keeps growing, simply increasing the number of FPGA-based DNN accelerators is not cost-effective, while merely sharing these single-task optimized DNN accelerators in a time-division multiplexing way could lead to poor isolation and high-performance loss for INFaaS. On the other hand, current cloud-based DNN accelerators have excessive compilation overhead, especially when scaling out to multi-FPGA systems for multi-tenant sharing, leading to unacceptable compilation costs for both offline deployment and online reconfiguration. Therefore, it is far from providing efficient and flexible FPGA virtualization for public and private cloud scenarios. Aiming to solve these problems, we propose a unified virtualization framework for general-purpose deep neural networks in the cloud, enabling multi-tenant sharing for both the Convolution Neural Network (CNN), and the Recurrent Neural Network (RNN) accelerators on a single FPGA. The isolation is enabled by introducing a two-level instruction dispatch module and a multi-core based hardware resources pool. Such designs provide isolated and runtime-programmable hardware resources, which further leads to performance isolation for multi-tenant sharing. On the other hand, to overcome the heavy re-compilation overheads, a tiling-based instruction frame package design and a two-stage static-dynamic compilation, are proposed. Only the lightweight runtime information is re-compiled with ∼1 ms overhead, thus guaranteeing the private cloud’s performance. Finally, the extensive experimental results show that the proposed virtualized solutions achieve up to 3.12× and 6.18× higher throughput in the private cloud compared with the static CNN and RNN baseline designs, respectively.


2019 ◽  
Vol 4 (4) ◽  

Detection of skin cancer involves several steps of examinations first being visual diagnosis that is followed by dermoscopic analysis, a biopsy, and histopathological examination. The classification of skin lesions in the first step is critical and challenging as classes vary by minute appearance in skin lesions. Deep convolutional neural networks (CNNs) have great potential in multicategory image-based classification by considering coarse-to-fine image features. This study aims to demonstrate how to classify skin lesions, in particular, melanoma, using CNN trained on data sets with disease labels. We developed and trained our own CNN model using a subset of the images from International Skin Imaging Collaboration (ISIC) Dermoscopic Archive. To test the performance of the proposed model, we used a different subset of images from the same archive as the test set. Our model is trained to classify images into two categories: malignant melanoma and nevus and is shown to achieve excellent classification results with high test accuracy (91.16%) and high performance as measured by various metrics. Our study demonstrated the potential of using deep neural networks to assist early detection of melanoma and thereby improve the patient survival rate from this aggressive skin cancer.


Author(s):  
Chakkrit Termritthikun ◽  
Paisarn Muneesawang

The growth of high-performance mobile devices has resulted in more research into on-device image recognition. The research problems have been the latency and accuracy of automatic recognition, which remain as obstacles to its real-world usage. Although the recently developed deep neural networks can achieve accuracy comparable to that of a human user, some of them are still too slow. This paper describes the development of the architecture of a new convolutional neural network model, NU-LiteNet. For this, SqueezeNet was developed to reduce the model size to a degree suitable for smartphones. The model size of NU-LiteNet was therefore 2.6 times smaller than that of SqueezeNet. The model outperformed other Convolutional Neural Network (CNN) models for mobile devices (eg. SqueezeNet and MobileNet) with an accuracy of 81.15% and 69.58% on Singapore and Paris landmark datasets respectively. The shortest execution time of 0.7 seconds per image was recorded with NU-LiteNet on mobile phones.


Author(s):  
Anand Venkat ◽  
Tharindu Rusira ◽  
Raj Barik ◽  
Mary Hall ◽  
Leonard Truong

Deep neural networks (DNNs) have demonstrated effectiveness in many domains including object recognition, speech recognition, natural language processing, and health care. Typically, the computations involved in DNN training and inferencing are time consuming and require efficient implementations. Existing frameworks such as TensorFlow, Theano, Torch, Cognitive Tool Kit (CNTK), and Caffe enable Graphics Processing Unit (GPUs) as the status quo devices for DNN execution, leaving Central Processing Unit (CPUs) behind. Moreover, existing frameworks forgo or limit cross layer optimization opportunities that have the potential to improve performance by significantly reducing data movement through the memory hierarchy. In this article, we describe an alternative approach called SWIRL, a compiler that provides high-performance CPU implementations for DNNs. SWIRL is built on top of the existing domain-specific language (DSL) for DNNs called LATTE. SWIRL separates DNN specification and its schedule using predefined transformation recipes for tensors and layers commonly found in DNN layers. These recipes synergize with DSL constructs to generate high-quality fused, vectorized, and parallelized code for CPUs. On an Intel Xeon Platinum 8180M CPU, SWIRL achieves performance comparable with Tensorflow integrated with MKL-DNN; on average 1.00× of Tensorflow inference and 0.99× of Tensorflow training. It also outperforms the original LATTE compiler on average by 1.22× and 1.30× on inference and training, respectively.


2020 ◽  
Vol 125 ◽  
pp. 70-82 ◽  
Author(s):  
Yukuan Yang ◽  
Lei Deng ◽  
Shuang Wu ◽  
Tianyi Yan ◽  
Yuan Xie ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document