Training Deep Neural Networks in Generations: A More Tolerant Teacher Educates Better Students

We focus on the problem of training a deep neural network in generations. The flowchart is that, in order to optimize the target network (student), another network (teacher) with the same architecture is first trained, and used to provide part of supervision signals in the next stage. While this strategy leads to a higher accuracy, many aspects (e.g., why teacher-student optimization helps) still need further explorations.This paper studies this problem from a perspective of controlling the strictness in training the teacher network. Existing approaches mostly used a hard distribution (e.g., one-hot vectors) in training, leading to a strict teacher which itself has a high accuracy, but we argue that the teacher needs to be more tolerant, although this often implies a lower accuracy. The implementation is very easy, with merely an extra loss term added to the teacher network, facilitating a few secondary classes to emerge and complement to the primary class. Consequently, the teacher provides a milder supervision signal (a less peaked distribution), and makes it possible for the student to learn from inter-class similarity and potentially lower the risk of over-fitting. Experiments are performed on standard image classification tasks (CIFAR100 and ILSVRC2012). Although the teacher network behaves less powerful, the students show a persistent ability growth and eventually achieve higher classification accuracies than other competitors. Model ensemble and transfer feature extraction also verify the effectiveness of our approach.

Download Full-text

Deep Scattering Spectra with Deep Neural Networks for Acoustic Scene Classification Tasks

Chinese Journal of Electronics ◽

10.1049/cje.2019.07.006 ◽

2019 ◽

Vol 28 (6) ◽

pp. 1177-1183

Author(s):

Pengyuan Zhang ◽

Hangting Chen ◽

Haichuan Bai ◽

Qingsheng Yuan

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Scene Classification ◽

Scattering Spectra ◽

Classification Tasks

Download Full-text

Explicit Interaction Model towards Text Classification

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33016359 ◽

2019 ◽

Vol 33 ◽

pp. 6359-6366 ◽

Cited By ~ 3

Author(s):

Cunxiao Du ◽

Zhaozheng Chen ◽

Fuli Feng ◽

Lei Zhu ◽

Tian Gan ◽

...

Keyword(s):

Language Processing ◽

Text Classification ◽

Deep Neural Networks ◽

Interaction Mechanism ◽

Interaction Model ◽

Classification Task ◽

Fine Grained ◽

Word Level ◽

Benchmark Datasets ◽

Classification Tasks

Text classification is one of the fundamental tasks in natural language processing. Recently, deep neural networks have achieved promising performance in the text classification task compared to shallow models. Despite of the significance of deep models, they ignore the fine-grained (matching signals between words and classes) classification clues since their classifications mainly rely on the text-level representations. To address this problem, we introduce the interaction mechanism to incorporate word-level matching signals into the text classification task. In particular, we design a novel framework, EXplicit interAction Model (dubbed as EXAM), equipped with the interaction mechanism. We justified the proposed approach on several benchmark datasets including both multilabel and multi-class text classification tasks. Extensive experimental results demonstrate the superiority of the proposed method. As a byproduct, we have released the codes and parameter settings to facilitate other researches.

Download Full-text

Towards Robust Text Classification with Semantics-Aware Recurrent Neural Architecture

Machine Learning and Knowledge Extraction ◽

10.3390/make1020034 ◽

2019 ◽

Vol 1 (2) ◽

pp. 575-589 ◽

Cited By ~ 1

Author(s):

Blaž Škrlj ◽

Jan Kralj ◽

Nada Lavrač ◽

Senja Pollak

Keyword(s):

Text Mining ◽

Language Processing ◽

Text Classification ◽

Deep Neural Networks ◽

Semantic Knowledge ◽

Text Documents ◽

Neural Architecture ◽

Classification Tasks ◽

And Gender ◽

Semantic Resources

Deep neural networks are becoming ubiquitous in text mining and natural language processing, but semantic resources, such as taxonomies and ontologies, are yet to be fully exploited in a deep learning setting. This paper presents an efficient semantic text mining approach, which converts semantic information related to a given set of documents into a set of novel features that are used for learning. The proposed Semantics-aware Recurrent deep Neural Architecture (SRNA) enables the system to learn simultaneously from the semantic vectors and from the raw text documents. We test the effectiveness of the approach on three text classification tasks: news topic categorization, sentiment analysis and gender profiling. The experiments show that the proposed approach outperforms the approach without semantic knowledge, with highest accuracy gain (up to 10%) achieved on short document fragments.

Download Full-text

Modeling Teacher-Student Techniques in Deep Neural Networks for Knowledge Distillation

2020 International Conference on Machine Vision and Image Processing (MVIP) ◽

10.1109/mvip49855.2020.9116923 ◽

2020 ◽

Author(s):

Sajjad Abbasi ◽

Mohsen Hajabdollahi ◽

Nader Karimi ◽

Shadrokh Samavi

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Teacher Student ◽

Knowledge Distillation

Download Full-text

Inter-Class Angular Loss for Convolutional Neural Networks

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33013894 ◽

2019 ◽

Vol 33 ◽

pp. 3894-3901 ◽

Cited By ~ 1

Author(s):

Le Hui ◽

Xiang Li ◽

Chen Gong ◽

Meng Fang ◽

Joey Tianyi Zhou ◽

...

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Deep Neural Networks ◽

Learning Difficulties ◽

Feature Space ◽

Superior Performance ◽

Strongly Correlated ◽

Discriminative Ability ◽

Practical Applications ◽

Classification Tasks

Convolutional Neural Networks (CNNs) have shown great power in various classification tasks and have achieved remarkable results in practical applications. However, the distinct learning difficulties in discriminating different pairs of classes are largely ignored by the existing networks. For instance, in CIFAR-10 dataset, distinguishing cats from dogs is usually harder than distinguishing horses from ships. By carefully studying the behavior of CNN models in the training process, we observe that the confusion level of two classes is strongly correlated with their angular separability in the feature space. That is, the larger the inter-class angle is, the lower the confusion will be. Based on this observation, we propose a novel loss function dubbed “Inter-Class Angular Loss” (ICAL), which explicitly models the class correlation and can be directly applied to many existing deep networks. By minimizing the proposed ICAL, the networks can effectively discriminate the examples in similar classes by enlarging the angle between their corresponding class vectors. Thorough experimental results on a series of vision and nonvision datasets confirm that ICAL critically improves the discriminative ability of various representative deep neural networks and generates superior performance to the original networks with conventional softmax loss.

Download Full-text

Application of recurrent and deep neural networks in classification tasks

Revista Gestão & Tecnologia ◽

10.20397/2177-6652/2020.v20i3.1709 ◽

2020 ◽

Vol 20 (3) ◽

pp. 59-79

Author(s):

Lidio Mauro Lima De Campos ◽

Danilo Souza Duarte

Keyword(s):

Neural Networks ◽

Multilayer Perceptron ◽

Deep Neural Networks ◽

Classification Tasks

As Redes Neurais Artificiais (RNAs) tem sido utilizadas nas soluções de variados problemas, dentre eles, os que envolvem tomada de decisões. Neste escopo, o objetivo desta pesquisa é apresentar uma ferramenta que dê suporte ao processo de decisão para seleção de cultivares de vinho e avaliação de carros, por meio da utilização de RNAs multilayer perceptron, profundas e recorrentes. Verificando-se sua eficácia e a melhor convergência, por meio do Modelo de Validação Cruzada. Os resultados elencados indicam a eficiência da técnica, para ambos os problemas, haja vista que a capacidade de generalização das RNAs testadas para o dataset wine foi em média de 85,58% utilizando a arquitetura de 3 camadas, 86,58% para a rede profunda e 93,53% para a rede recorrente, e para o dataset car evaluation foi em média de 93,71% utilizando a rede recorrente.

Download Full-text

Deep neural networks are more accurate than humans at detecting sexual orientation from facial images.

10.31234/osf.io/hv28a ◽

2017 ◽

Cited By ~ 9

Author(s):

Yilun Wang ◽

Michal Kosinski

Keyword(s):

Neural Networks ◽

Sexual Orientation ◽

Gay Men ◽

Deep Neural Networks ◽

Prediction Models ◽

Facial Features ◽

Facial Morphology ◽

Men And Women ◽

Lower Accuracy ◽

Facial Images

We show that faces contain much more information about sexual orientation than can be perceived and interpreted by the human brain. We used deep neural networks to extract features from 35,326 facial images. These features were entered into a logistic regression aimed at classifying sexual orientation. Given a single facial image, a classifier could correctly distinguish between gay and heterosexual men in 81% of cases, and in 74% of cases for women. Human judges achieved much lower accuracy: 61% for men and 54% for women. The accuracy of the algorithm increased to 91% and 83%, respectively, given five facial images per person. Facial features employed by the classifier included both fixed (e.g., nose shape) and transient facial features (e.g., grooming style). Consistent with the prenatal hormone theory of sexual orientation, gay men and women tended to have gender-atypical facial morphology, expression, and grooming styles. Prediction models aimed at gender alone allowed for detecting gay males with 57% accuracy and gay females with 58% accuracy. Those findings advance our understanding of the origins of sexual orientation and the limits of human perception. Additionally, given that companies and governments are increasingly using computer vision algorithms to detect people’s intimate traits, our findings expose a threat to the privacy and safety of gay men and women.

Download Full-text

Network Approximation using Tensor Sketching

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/321 ◽

2018 ◽

Cited By ~ 1

Author(s):

Shiva Prasad Kasiviswanathan ◽

Nina Narodytska ◽

Hongxia Jin

Keyword(s):

Neural Networks ◽

Language Processing ◽

Network Architecture ◽

Deep Neural Networks ◽

Network Architectures ◽

Effective Parameters ◽

Unified Framework ◽

Design Changes ◽

Target Network ◽

Fully Connected

Deep neural networks are powerful learning models that achieve state-of-the-art performance on many computer vision, speech, and language processing tasks. In this paper, we study a fundamental question that arises when designing deep network architectures: Given a target network architecture can we design a `smaller' network architecture that 'approximates' the operation of the target network? The question is, in part, motivated by the challenge of parameter reduction (compression) in modern deep neural networks, as the ever increasing storage and memory requirements of these networks pose a problem in resource constrained environments.In this work, we focus on deep convolutional neural network architectures, and propose a novel randomized tensor sketching technique that we utilize to develop a unified framework for approximating the operation of both the convolutional and fully connected layers. By applying the sketching technique along different tensor dimensions, we design changes to the convolutional and fully connected layers that substantially reduce the number of effective parameters in a network. We show that the resulting smaller network can be trained directly, and has a classification accuracy that is comparable to the original network.

Download Full-text

Neural Stochastic Differential Equations with Neural Processes Family Members for Uncertainty Estimation in Deep Learning

Sensors ◽

10.3390/s21113708 ◽

2021 ◽

Vol 21 (11) ◽

pp. 3708

Author(s):

Yongguang Wang ◽

Shuzhen Yao

Keyword(s):

Model Uncertainty ◽

Deep Neural Networks ◽

Predictive Accuracy ◽

Uncertainty Estimation ◽

General Situation ◽

System Perspective ◽

One Dimensional ◽

Differential Equation Models ◽

Neural Processes ◽

Classification Tasks

Existing neural stochastic differential equation models, such as SDE-Net, can quantify the uncertainties of deep neural networks (DNNs) from a dynamical system perspective. SDE-Net is either dominated by its drift net with in-distribution (ID) data to achieve good predictive accuracy, or dominated by its diffusion net with out-of-distribution (OOD) data to generate high diffusion for characterizing model uncertainty. However, it does not consider the general situation in a wider field, such as ID data with noise or high missing rates in practice. In order to effectively deal with noisy ID data for credible uncertainty estimation, we propose a vNPs-SDE model, which firstly applies variants of neural processes (NPs) to deal with the noisy ID data, following which the completed ID data can be processed more effectively by SDE-Net. Experimental results show that the proposed vNPs-SDE model can be implemented with convolutional conditional neural processes (ConvCNPs), which have the property of translation equivariance, and can effectively handle the ID data with missing rates for one-dimensional (1D) regression and two-dimensional (2D) image classification tasks. Alternatively, vNPs-SDE can be implemented with conditional neural processes (CNPs) or attentive neural processes (ANPs), which have the property of permutation invariance, and exceeds vanilla SDE-Net in multidimensional regression tasks.

Download Full-text

Universal adversarial attacks on deep neural networks for medical image classification

BMC Medical Imaging ◽

10.1186/s12880-020-00530-y ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Hokuto Hirano ◽

Akinori Minagi ◽

Kazuhiro Takemoto

Keyword(s):

Neural Networks ◽

Image Classification ◽

Clinical Diagnosis ◽

Medical Image ◽

Deep Neural Networks ◽

Careful Consideration ◽

Specific Class ◽

High Stake ◽

Classification Tasks ◽

Medical Image Classification

Abstract Background Deep neural networks (DNNs) are widely investigated in medical image classification to achieve automated support for clinical diagnosis. It is necessary to evaluate the robustness of medical DNN tasks against adversarial attacks, as high-stake decision-making will be made based on the diagnosis. Several previous studies have considered simple adversarial attacks. However, the vulnerability of DNNs to more realistic and higher risk attacks, such as universal adversarial perturbation (UAP), which is a single perturbation that can induce DNN failure in most classification tasks has not been evaluated yet. Methods We focus on three representative DNN-based medical image classification tasks (i.e., skin cancer, referable diabetic retinopathy, and pneumonia classifications) and investigate their vulnerability to the seven model architectures of UAPs. Results We demonstrate that DNNs are vulnerable to both nontargeted UAPs, which cause a task failure resulting in an input being assigned an incorrect class, and to targeted UAPs, which cause the DNN to classify an input into a specific class. The almost imperceptible UAPs achieved > 80% success rates for nontargeted and targeted attacks. The vulnerability to UAPs depended very little on the model architecture. Moreover, we discovered that adversarial retraining, which is known to be an effective method for adversarial defenses, increased DNNs’ robustness against UAPs in only very few cases. Conclusion Unlike previous assumptions, the results indicate that DNN-based clinical diagnosis is easier to deceive because of adversarial attacks. Adversaries can cause failed diagnoses at lower costs (e.g., without consideration of data distribution); moreover, they can affect the diagnosis. The effects of adversarial defenses may not be limited. Our findings emphasize that more careful consideration is required in developing DNNs for medical imaging and their practical applications.

Download Full-text