scholarly journals Hardware-Aware Neural Architecture Search: Survey and Taxonomy

Author(s):  
Hadjer Benmeziane ◽  
Kaoutar El Maghraoui ◽  
Hamza Ouarnoughi ◽  
Smail Niar ◽  
Martin Wistuba ◽  
...  

There is no doubt that making AI mainstream by bringing powerful, yet power hungry deep neural networks (DNNs) to resource-constrained devices would required an efficient co-design of algorithms, hardware and software. The increased popularity of DNN applications deployed on a wide variety of platforms, from tiny microcontrollers to data-centers, have resulted in multiple questions and challenges related to constraints introduced by the hardware. In this survey on hardware-aware neural architecture search (HW-NAS), we present some of the existing answers proposed in the literature for the following questions: "Is it possible to build an efficient DL model that meets the latency and energy constraints of tiny edge devices?", "How can we reduce the trade-off between the accuracy of a DL model and its ability to be deployed in a variety of platforms?". The survey provides a new taxonomy of HW-NAS and assesses the hardware cost estimation strategies. We also highlight the challenges and limitations of existing approaches and potential future directions. We hope that this survey will help to fuel the research towards efficient deep learning.

Security in resource-constrained devices has drawn the great attentions to researchers in recent years. To make secure transmission of critical information in such devices, lightweight cryptography algorithms come in light to large extend. KLEIN has been popular lightweight block cipher used to overcome such issues. In this paper, different architectures of KLEIN block cipher are presented. One of designs enhances the efficiency with regard to the throughput at the expense of a larger area. In order to make such designs, the pipelined registers are placed on different positions in datapath algorithm. The proposed design transforms the data input to protected output with the speed of 2414.13 Mbps for xc5vlx50t-3ff1136 device. In addition, the second design implementation completes either one or more than one round in only one clock and gives energy-efficient and high throughput implementations. Due to this, a trade-off between area and speed can be analyzed for high-speed applications. Moreover, this proposed design shows that with increasing the area of cipher implementation results in more transformation of plaintext into ciphertext. All results are verified and simulated for various families of Xilinx ISE design suite.


Informatica ◽  
2017 ◽  
Vol 28 (1) ◽  
pp. 193-214 ◽  
Author(s):  
Tung-Tso Tsai ◽  
Sen-Shan Huang ◽  
Yuh-Min Tseng

Sensors ◽  
2021 ◽  
Vol 21 (13) ◽  
pp. 4496
Author(s):  
Vlad Pandelea ◽  
Edoardo Ragusa ◽  
Tommaso Apicella ◽  
Paolo Gastaldo ◽  
Erik Cambria

Emotion recognition, among other natural language processing tasks, has greatly benefited from the use of large transformer models. Deploying these models on resource-constrained devices, however, is a major challenge due to their computational cost. In this paper, we show that the combination of large transformers, as high-quality feature extractors, and simple hardware-friendly classifiers based on linear separators can achieve competitive performance while allowing real-time inference and fast training. Various solutions including batch and Online Sequential Learning are analyzed. Additionally, our experiments show that latency and performance can be further improved via dimensionality reduction and pre-training, respectively. The resulting system is implemented on two types of edge device, namely an edge accelerator and two smartphones.


2021 ◽  
Vol 2 (1) ◽  
pp. 1-25
Author(s):  
Yongsen Ma ◽  
Sheheryar Arshad ◽  
Swetha Muniraju ◽  
Eric Torkildson ◽  
Enrico Rantala ◽  
...  

In recent years, Channel State Information (CSI) measured by WiFi is widely used for human activity recognition. In this article, we propose a deep learning design for location- and person-independent activity recognition with WiFi. The proposed design consists of three Deep Neural Networks (DNNs): a 2D Convolutional Neural Network (CNN) as the recognition algorithm, a 1D CNN as the state machine, and a reinforcement learning agent for neural architecture search. The recognition algorithm learns location- and person-independent features from different perspectives of CSI data. The state machine learns temporal dependency information from history classification results. The reinforcement learning agent optimizes the neural architecture of the recognition algorithm using a Recurrent Neural Network (RNN) with Long Short-Term Memory (LSTM). The proposed design is evaluated in a lab environment with different WiFi device locations, antenna orientations, sitting/standing/walking locations/orientations, and multiple persons. The proposed design has 97% average accuracy when testing devices and persons are not seen during training. The proposed design is also evaluated by two public datasets with accuracy of 80% and 83%. The proposed design needs very little human efforts for ground truth labeling, feature engineering, signal processing, and tuning of learning parameters and hyperparameters.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Jeonghyuk Park ◽  
Yul Ri Chung ◽  
Seo Taek Kong ◽  
Yeong Won Kim ◽  
Hyunho Park ◽  
...  

AbstractThere have been substantial efforts in using deep learning (DL) to diagnose cancer from digital images of pathology slides. Existing algorithms typically operate by training deep neural networks either specialized in specific cohorts or an aggregate of all cohorts when there are only a few images available for the target cohort. A trade-off between decreasing the number of models and their cancer detection performance was evident in our experiments with The Cancer Genomic Atlas dataset, with the former approach achieving higher performance at the cost of having to acquire large datasets from the cohort of interest. Constructing annotated datasets for individual cohorts is extremely time-consuming, with the acquisition cost of such datasets growing linearly with the number of cohorts. Another issue associated with developing cohort-specific models is the difficulty of maintenance: all cohort-specific models may need to be adjusted when a new DL algorithm is to be used, where training even a single model may require a non-negligible amount of computation, or when more data is added to some cohorts. In resolving the sub-optimal behavior of a universal cancer detection model trained on an aggregate of cohorts, we investigated how cohorts can be grouped to augment a dataset without increasing the number of models linearly with the number of cohorts. This study introduces several metrics which measure the morphological similarities between cohort pairs and demonstrates how the metrics can be used to control the trade-off between performance and the number of models.


Sign in / Sign up

Export Citation Format

Share Document