Data-Efficient Sensor Upgrade Path Using Knowledge Distillation

Deep neural networks have achieved state-of-the-art performance in image classification. Due to this success, deep learning is now also being applied to other data modalities such as multispectral images, lidar and radar data. However, successfully training a deep neural network requires a large reddataset. Therefore, transitioning to a new sensor modality (e.g., from regular camera images to multispectral camera images) might result in a drop in performance, due to the limited availability of data in the new modality. This might hinder the adoption rate and time to market for new sensor technologies. In this paper, we present an approach to leverage the knowledge of a teacher network, that was trained using the original data modality, to improve the performance of a student network on a new data modality: a technique known in literature as knowledge distillation. By applying knowledge distillation to the problem of sensor transition, we can greatly speed up this process. We validate this approach using a multimodal version of the MNIST dataset. Especially when little data is available in the new modality (i.e., 10 images), training with additional teacher supervision results in increased performance, with the student network scoring a test set accuracy of 0.77, compared to an accuracy of 0.37 for the baseline. We also explore two extensions to the default method of knowledge distillation, which we evaluate on a multimodal version of the CIFAR-10 dataset: an annealing scheme for the hyperparameter α and selective knowledge distillation. Of these two, the first yields the best results. Choosing the optimal annealing scheme results in an increase in test set accuracy of 6%. Finally, we apply our method to the real-world use case of skin lesion classification.

Download Full-text

Relieving the Incompatibility of Network Representation and Classification for Long-Tailed Data Distribution

Computational Intelligence and Neuroscience ◽

10.1155/2021/6702625 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Hao Hu ◽

Mengya Gao ◽

Mingsheng Wu

Keyword(s):

Large Scale ◽

Deep Neural Networks ◽

State Of The Art ◽

Data Distribution ◽

Distribution Problem ◽

Imbalanced Dataset ◽

Network Representation ◽

Knowledge Distillation ◽

Rare Classes ◽

And Training

In the real-world scenario, data often have a long-tailed distribution and training deep neural networks on such an imbalanced dataset has become a great challenge. The main problem caused by a long-tailed data distribution is that common classes will dominate the training results and achieve a very low accuracy on the rare classes. Recent work focuses on improving the network representation ability to overcome the long-tailed problem, while it always ignores adapting the network classifier to a long-tailed case, which will cause the “incompatibility” problem of network representation and network classifier. In this paper, we use knowledge distillation to solve the long-tailed data distribution problem and fully optimize the network representation and classifier simultaneously. We propose multiexperts knowledge distillation with class-balanced sampling to jointly learn high-quality network representation and classifier. Also, a channel activation-based knowledge distillation method is also proposed to improve the performance further. State-of-the-art performance on several large-scale long-tailed classification datasets shows the superior generalization of our method.

Download Full-text

RTN: Reparameterized Ternary Network

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5912 ◽

2020 ◽

Vol 34 (04) ◽

pp. 4780-4787

Author(s):

Yuhang Li ◽

Xin Dong ◽

Sai Qian Zhang ◽

Haoli Bai ◽

Yuanpeng Chen ◽

...

Keyword(s):

Deep Neural Networks ◽

State Of The Art ◽

Hardware Acceleration ◽

Field Programmable Gate Arrays ◽

Accuracy Improvement ◽

Gate Arrays ◽

Resource Limited ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Speed Up

To deploy deep neural networks on resource-limited devices, quantization has been widely explored. In this work, we study the extremely low-bit networks which have tremendous speed-up, memory saving with quantized activation and weights. We first bring up three omitted issues in extremely low-bit networks: the squashing range of quantized values; the gradient vanishing during backpropagation and the unexploited hardware acceleration of ternary networks. By reparameterizing quantized activation and weights vector with full precision scale and offset for fixed ternary vector, we decouple the range and magnitude from direction to extenuate above problems. Learnable scale and offset can automatically adjust the range of quantized values and sparsity without gradient vanishing. A novel encoding and computation pattern are designed to support efficient computing for our reparameterized ternary network (RTN). Experiments on ResNet-18 for ImageNet demonstrate that the proposed RTN finds a much better efficiency between bitwidth and accuracy and achieves up to 26.76% relative accuracy improvement compared with state-of-the-art methods. Moreover, we validate the proposed computation pattern on Field Programmable Gate Arrays (FPGA), and it brings 46.46 × and 89.17 × savings on power and area compared with the full precision convolution.

Download Full-text

A Novel Automate Python Edge-to-Edge: From Automated Generation on Cloud to User Application Deployment on Edge of Deep Neural Networks for Low Power IoT Systems FPGA-Based Acceleration

Sensors ◽

10.3390/s21186050 ◽

2021 ◽

Vol 21 (18) ◽

pp. 6050

Author(s):

Tarek Belabed ◽

Vitor Ramos Gomes da Silva ◽

Alexandre Quenon ◽

Carlos Valderamma ◽

Chokri Souani

Keyword(s):

Neural Networks ◽

Design Methodology ◽

Deep Neural Networks ◽

State Of The Art ◽

Automated Generation ◽

Application Deployment ◽

Speed Up ◽

Learning Software ◽

High Level ◽

Novel Design

Deep Neural Networks (DNNs) deployment for IoT Edge applications requires strong skills in hardware and software. In this paper, a novel design framework fully automated for Edge applications is proposed to perform such a deployment on System-on-Chips. Based on a high-level Python interface that mimics the leading Deep Learning software frameworks, it offers an easy way to implement a hardware-accelerated DNN on an FPGA. To do this, our design methodology covers the three main phases: (a) customization: where the user specifies the optimizations needed on each DNN layer, (b) generation: the framework generates on the Cloud the necessary binaries for both FPGA and software parts, and (c) deployment: the SoC on the Edge receives the resulting files serving to program the FPGA and related Python libraries for user applications. Among the study cases, an optimized DNN for the MNIST database can speed up more than 60× a software version on the ZYNQ 7020 SoC and still consume less than 0.43W. A comparison with the state-of-the-art frameworks demonstrates that our methodology offers the best trade-off between throughput, power consumption, and system cost.

Download Full-text

DiffChaser: Detecting Disagreements for Deep Neural Networks

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/800 ◽

2019 ◽

Cited By ~ 14

Author(s):

Xiaofei Xie ◽

Lei Ma ◽

Haijun Wang ◽

Yuekang Li ◽

Yang Liu ◽

...

Keyword(s):

Deep Neural Network ◽

Deep Neural Networks ◽

State Of The Art ◽

Optimization Procedure ◽

Black Box ◽

Massive Data ◽

Test Set ◽

Testing Framework ◽

Development Lifecycle ◽

Black Box Testing

The platform migration and customization have become an indispensable process of deep neural network (DNN) development lifecycle. A high-precision but complex DNN trained in the cloud on massive data and powerful GPUs often goes through an optimization phase (e.g, quantization, compression) before deployment to a target device (e.g, mobile device). A test set that effectively uncovers the disagreements of a DNN and its optimized variant provides certain feedback to debug and further enhance the optimization procedure. However, the minor inconsistency between a DNN and its optimized version is often hard to detect and easily bypasses the original test set. This paper proposes DiffChaser, an automated black-box testing framework to detect untargeted/targeted disagreements between version variants of a DNN. We demonstrate 1) its effectiveness by comparing with the state-of-the-art techniques, and 2) its usefulness in real-world DNN product deployment involved with quantization and optimization.

Download Full-text

Review on biomass feedstocks, pyrolysis mechanism and physicochemical properties of biochar: State-of-the-art framework to speed up vision of circular bioeconomy

Journal of Cleaner Production ◽

10.1016/j.jclepro.2021.126645 ◽

2021 ◽

Vol 297 ◽

pp. 126645

Author(s):

Gajanan Sampatrao Ghodake ◽

Surendra Krushna Shinde ◽

Avinash Ashok Kadam ◽

Rijuta Ganesh Saratale ◽

Ganesh Dattatraya Saratale ◽

...

Keyword(s):

Physicochemical Properties ◽

State Of The Art ◽

Pyrolysis Mechanism ◽

Biomass Feedstocks ◽

Speed Up

Download Full-text

Fissure Ridges: A Reappraisal of Faulting and Travertine Deposition (Travitonics)

Geosciences ◽

10.3390/geosciences11070278 ◽

2021 ◽

Vol 11 (7) ◽

pp. 278

Author(s):

Andrea Brogi ◽

Enrico Capezzuoli ◽

Volkan Karabacak ◽

Mehmet Cihat Alcicek ◽

Lianchao Luo

Keyword(s):

State Of The Art ◽

Tectonic Setting ◽

Original Data ◽

Apical Part ◽

Thermal Waters ◽

Growth Mechanisms ◽

Depositional Facies ◽

Geothermal Fluids ◽

Tectonic Features ◽

Travertine Deposition

The mechanical discontinuities in the upper crust (i.e., faults and related fractures) lead to the uprising of geothermal fluids to the Earth’s surface. If fluids are enriched in Ca2+ and HCO3-, masses of CaCO3 (i.e., travertine deposits) can form mainly due to the CO2 leakage from the thermal waters. Among other things, fissure-ridge-type deposits are peculiar travertine bodies made of bedded carbonate that gently to steeply dip away from the apical part where a central fissure is located, corresponding to the fracture trace intersecting the substratum; these morpho-tectonic features are the most useful deposits for tectonic and paleoseismological investigation, as their development is contemporaneous with the activity of faults leading to the enhancement of permeability that serves to guarantee the circulation of fluids and their emergence. Therefore, the fissure ridge architecture sheds light on the interplay among fault activity, travertine deposition, and ridge evolution, providing key geo-chronologic constraints due to the fact that travertine can be dated by different radiometric methods. In recent years, studies dealing with travertine fissure ridges have been considerably improved to provide a large amount of information. In this paper, we report the state of the art of knowledge on this topic refining the literature data as well as adding original data, mainly focusing on the fissure ridge morphology, internal architecture, depositional facies, growth mechanisms, tectonic setting in which the fissure ridges develop, and advantages of using the fissure ridges for neotectonic and seismotectonic studies.

Download Full-text

Balanced Sparsity for Efficient DNN Inference on GPU

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33015676 ◽

2019 ◽

Vol 33 ◽

pp. 5676-5683 ◽

Cited By ~ 3

Author(s):

Zhuliang Yao ◽

Shijie Cao ◽

Wencong Xiao ◽

Chen Zhang ◽

Lanshun Nie

Keyword(s):

Deep Neural Networks ◽

General Purpose ◽

Coarse Grained ◽

Efficient Computation ◽

Model Accuracy ◽

Sparse Model ◽

Model Inference ◽

Fine Grained ◽

Practical Inference ◽

Speed Up

In trained deep neural networks, unstructured pruning can reduce redundant weights to lower storage cost. However, it requires the customization of hardwares to speed up practical inference. Another trend accelerates sparse model inference on general-purpose hardwares by adopting coarse-grained sparsity to prune or regularize consecutive weights for efficient computation. But this method often sacrifices model accuracy. In this paper, we propose a novel fine-grained sparsity approach, Balanced Sparsity, to achieve high model accuracy with commercial hardwares efficiently. Our approach adapts to high parallelism property of GPU, showing incredible potential for sparsity in the widely deployment of deep learning services. Experiment results show that Balanced Sparsity achieves up to 3.1x practical speedup for model inference on GPU, while retains the same high model accuracy as finegrained sparsity.

Download Full-text

Approaches for Multilingual Phone Recognition in Code-switched and Non-code-switched Scenarios Using Indian Languages

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3437256 ◽

2021 ◽

Vol 20 (4) ◽

pp. 1-19

Author(s):

Manjunath K. E. ◽

Srinivasa Raghavan K. M. ◽

K. Sreenivasa Rao ◽

Dinesh Babu Jayagopi ◽

V. Ramasubramanian

Keyword(s):

Deep Neural Networks ◽

State Of The Art ◽

Window Size ◽

Recognition System ◽

Error Rates ◽

Indian Languages ◽

International Phonetic Alphabet ◽

Phone Recognition ◽

Front End ◽

Recognition Systems

In this study, we evaluate and compare two different approaches for multilingual phone recognition in code-switched and non-code-switched scenarios. First approach is a front-end Language Identification (LID)-switched to a monolingual phone recognizer (LID-Mono), trained individually on each of the languages present in multilingual dataset. In the second approach, a common multilingual phone-set derived from the International Phonetic Alphabet (IPA) transcription of the multilingual dataset is used to develop a Multilingual Phone Recognition System (Multi-PRS). The bilingual code-switching experiments are conducted using Kannada and Urdu languages. In the first approach, LID is performed using the state-of-the-art i-vectors. Both monolingual and multilingual phone recognition systems are trained using Deep Neural Networks. The performance of LID-Mono and Multi-PRS approaches are compared and analysed in detail. It is found that the performance of Multi-PRS approach is superior compared to more conventional LID-Mono approach in both code-switched and non-code-switched scenarios. For code-switched speech, the effect of length of segments (that are used to perform LID) on the performance of LID-Mono system is studied by varying the window size from 500 ms to 5.0 s, and full utterance. The LID-Mono approach heavily depends on the accuracy of the LID system and the LID errors cannot be recovered. But, the Multi-PRS system by virtue of not having to do a front-end LID switching and designed based on the common multilingual phone-set derived from several languages, is not constrained by the accuracy of the LID system, and hence performs effectively on code-switched and non-code-switched speech, offering low Phone Error Rates than the LID-Mono system.

Download Full-text

Robust CNN Compression Framework for Security-Sensitive Embedded Systems

Applied Sciences ◽

10.3390/app11031093 ◽

2021 ◽

Vol 11 (3) ◽

pp. 1093

Author(s):

Jeonghyun Lee ◽

Sangkyun Lee

Keyword(s):

Embedded Systems ◽

Optimization Problem ◽

State Of The Art ◽

Classification Problems ◽

Proximal Gradient Method ◽

Knowledge Distillation ◽

New Type ◽

Adversarial Examples ◽

Adversarial Training ◽

Memory Efficient

Convolutional neural networks (CNNs) have achieved tremendous success in solving complex classification problems. Motivated by this success, there have been proposed various compression methods for downsizing the CNNs to deploy them on resource-constrained embedded systems. However, a new type of vulnerability of compressed CNNs known as the adversarial examples has been discovered recently, which is critical for security-sensitive systems because the adversarial examples can cause malfunction of CNNs and can be crafted easily in many cases. In this paper, we proposed a compression framework to produce compressed CNNs robust against such adversarial examples. To achieve the goal, our framework uses both pruning and knowledge distillation with adversarial training. We formulate our framework as an optimization problem and provide a solution algorithm based on the proximal gradient method, which is more memory-efficient than the popular ADMM-based compression approaches. In experiments, we show that our framework can improve the trade-off between adversarial robustness and compression rate compared to the existing state-of-the-art adversarial pruning approach.

Download Full-text

Communication Failure Resilient Distributed Neural Network for Edge Devices

Electronics ◽

10.3390/electronics10141614 ◽

2021 ◽

Vol 10 (14) ◽

pp. 1614

Author(s):

Jonghun Jeong ◽

Jong Sung Park ◽

Hoeseok Yang

Keyword(s):

Neural Network ◽

Neural Networks ◽

High Performance ◽

State Of The Art ◽

Wearable Devices ◽

Communication Failure ◽

Canadian Institute ◽

Multiple Devices ◽

Knowledge Distillation ◽

Partitioning Technique

Recently, the necessity to run high-performance neural networks (NN) is increasing even in resource-constrained embedded systems such as wearable devices. However, due to the high computational and memory requirements of the NN applications, it is typically infeasible to execute them on a single device. Instead, it has been proposed to run a single NN application cooperatively on top of multiple devices, a so-called distributed neural network. In the distributed neural network, workloads of a single big NN application are distributed over multiple tiny devices. While the computation overhead could effectively be alleviated by this approach, the existing distributed NN techniques, such as MoDNN, still suffer from large traffics between the devices and vulnerability to communication failures. In order to get rid of such big communication overheads, a knowledge distillation based distributed NN, called Network of Neural Networks (NoNN), was proposed, which partitions the filters in the final convolutional layer of the original NN into multiple independent subsets and derives smaller NNs out of each subset. However, NoNN also has limitations in that the partitioning result may be unbalanced and it considerably compromises the correlation between filters in the original NN, which may result in an unacceptable accuracy degradation in case of communication failure. In this paper, in order to overcome these issues, we propose to enhance the partitioning strategy of NoNN in two aspects. First, we enhance the redundancy of the filters that are used to derive multiple smaller NNs by means of averaging to increase the immunity of the distributed NN to communication failure. Second, we propose a novel partitioning technique, modified from Eigenvector-based partitioning, to preserve the correlation between filters as much as possible while keeping the consistent number of filters distributed to each device. Throughout extensive experiments with the CIFAR-100 (Canadian Institute For Advanced Research-100) dataset, it has been observed that the proposed approach maintains high inference accuracy (over 70%, 1.53× improvement over the state-of-the-art approach), on average, even when a half of eight devices in a distributed NN fail to deliver their partial inference results.

Download Full-text