Towards a high robust neural network via feature matching

International Journal of Multimedia Information Retrieval ◽

10.1007/s13735-021-00219-0 ◽

2021 ◽

Author(s):

Jian Li ◽

Yanming Guo ◽

Songyang Lao ◽

Yulun Wu ◽

Liang Bai ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Deep Neural Networks ◽

Feature Matching ◽

Feature Vector ◽

State Of The Art ◽

Model Performance ◽

Image Features ◽

Classification Systems ◽

Adversarial Attack

AbstractImage classification systems have been found vulnerable to adversarial attack, which is imperceptible to human but can easily fool deep neural networks. Recent researches indicate that regularizing the network by introducing randomness could greatly improve the model’s robustness against adversarial attack, but the randomness module would normally involve complex calculations and numerous additional parameters and seriously affect the model performance on clean data. In this paper, we propose a feature matching module to regularize the network. Specifically, our model learns a feature vector for each category and imposes additional restrictions on image features. Then, the similarity between image features and category features is used as the basis for classification. Our method does not introduce any additional network parameters than undefended model and can be easily integrated into any neural network. Experiments on the CIFAR10 and SVHN datasets highlight that our proposed module can effectively improve both clean data and perturbed data accuracy in comparison with the state-of-the-art defense methods and outperform the L2P method by 6.3$$\%$$ % , 24$$\%$$ % on clean and perturbed data, respectively, using ResNet-V2(18) architecture.

Download Full-text

Interpolation Consistency Training for Semi-supervised Learning

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/504 ◽

2019 ◽

Cited By ~ 39

Author(s):

Vikas Verma ◽

Alex Lamb ◽

Juho Kannala ◽

Yoshua Bengio ◽

David Lopez-Paz

Keyword(s):

Neural Network ◽

Neural Networks ◽

Supervised Learning ◽

Deep Neural Networks ◽

State Of The Art ◽

Data Distribution ◽

Network Architectures ◽

Low Density ◽

Decision Boundary ◽

Classification Problems

We introduce Interpolation Consistency Training (ICT), a simple and computation efficient algorithm for training Deep Neural Networks in the semi-supervised learning paradigm. ICT encourages the prediction at an interpolation of unlabeled points to be consistent with the interpolation of the predictions at those points. In classification problems, ICT moves the decision boundary to low-density regions of the data distribution. Our experiments show that ICT achieves state-of-the-art performance when applied to standard neural network architectures on the CIFAR-10 and SVHN benchmark dataset.

Download Full-text

Modular Dynamic Neural Network: A Continual Learning Architecture

Applied Sciences ◽

10.3390/app112412078 ◽

2021 ◽

Vol 11 (24) ◽

pp. 12078

Author(s):

Daniel Turner ◽

Pedro J. S. Cardoso ◽

João M. F. Rodrigues

Keyword(s):

Neural Network ◽

Neural Networks ◽

Feature Extraction ◽

Deep Neural Networks ◽

State Of The Art ◽

Simple Task ◽

Dynamic Neural Network ◽

Main Components ◽

Over Time ◽

Continual Learning

Learning to recognize a new object after having learned to recognize other objects may be a simple task for a human, but not for machines. The present go-to approaches for teaching a machine to recognize a set of objects are based on the use of deep neural networks (DNN). So, intuitively, the solution for teaching new objects on the fly to a machine should be DNN. The problem is that the trained DNN weights used to classify the initial set of objects are extremely fragile, meaning that any change to those weights can severely damage the capacity to perform the initial recognitions; this phenomenon is known as catastrophic forgetting (CF). This paper presents a new (DNN) continual learning (CL) architecture that can deal with CF, the modular dynamic neural network (MDNN). The presented architecture consists of two main components: (a) the ResNet50-based feature extraction component as the backbone; and (b) the modular dynamic classification component, which consists of multiple sub-networks and progressively builds itself up in a tree-like structure that rearranges itself as it learns over time in such a way that each sub-network can function independently. The main contribution of the paper is a new architecture that is strongly based on its modular dynamic training feature. This modular structure allows for new classes to be added while only altering specific sub-networks in such a way that previously known classes are not forgotten. Tests on the CORe50 dataset showed results above the state of the art for CL architectures.

Download Full-text

A Comparison of Deep Learning Methods for Timbre Analysis in Polyphonic Automatic Music Transcription

Electronics ◽

10.3390/electronics10070810 ◽

2021 ◽

Vol 10 (7) ◽

pp. 810

Author(s):

Carlos Hernandez-Olivan ◽

Ignacio Zay Pinilla ◽

Carlos Hernandez-Lopez ◽

Jose R. Beltran

Keyword(s):

Neural Network ◽

Neural Networks ◽

Deep Neural Networks ◽

State Of The Art ◽

High Impact ◽

Critical Problem ◽

Music Transcription ◽

Automatic Music Transcription ◽

Music Information ◽

Method Show

Automatic music transcription (AMT) is a critical problem in the field of music information retrieval (MIR). When AMT is faced with deep neural networks, the variety of timbres of different instruments can be an issue that has not been studied in depth yet. The goal of this work is to address AMT transcription by analyzing how timbre affect monophonic transcription in a first approach based on the CREPE neural network and then to improve the results by performing polyphonic music transcription with different timbres with a second approach based on the Deep Salience model that performs polyphonic transcription based on the Constant-Q Transform. The results of the first method show that the timbre and envelope of the onsets have a high impact on the AMT results and the second method shows that the developed model is less dependent on the strength of the onsets than other state-of-the-art models that deal with AMT on piano sounds such as Google Magenta Onset and Frames (OaF). Our polyphonic transcription model for non-piano instruments outperforms the state-of-the-art model, such as for bass instruments, which has an F-score of 0.9516 versus 0.7102. In our latest experiment we also show how adding an onset detector to our model can outperform the results given in this work.

Download Full-text

ThriftyNets: Convolutional Neural Networks with Tiny Parameter Budget

IoT ◽

10.3390/iot2020012 ◽

2021 ◽

Vol 2 (2) ◽

pp. 222-235

Author(s):

Guillaume Coiffier ◽

Ghouthi Boukli Hacene ◽

Vincent Gripon

Keyword(s):

Neural Network ◽

Machine Learning ◽

Neural Networks ◽

Convolutional Neural Network ◽

Spatial Resolution ◽

Network Architecture ◽

Deep Neural Networks ◽

State Of The Art ◽

Feature Maps ◽

Neural Network Architecture

Deep Neural Networks are state-of-the-art in a large number of challenges in machine learning. However, to reach the best performance they require a huge pool of parameters. Indeed, typical deep convolutional architectures present an increasing number of feature maps as we go deeper in the network, whereas spatial resolution of inputs is decreased through downsampling operations. This means that most of the parameters lay in the final layers, while a large portion of the computations are performed by a small fraction of the total parameters in the first layers. In an effort to use every parameter of a network at its maximum, we propose a new convolutional neural network architecture, called ThriftyNet. In ThriftyNet, only one convolutional layer is defined and used recursively, leading to a maximal parameter factorization. In complement, normalization, non-linearities, downsamplings and shortcut ensure sufficient expressivity of the model. ThriftyNet achieves competitive performance on a tiny parameters budget, exceeding 91% accuracy on CIFAR-10 with less than 40 k parameters in total, 74.3% on CIFAR-100 with less than 600 k parameters, and 67.1% On ImageNet ILSVRC 2012 with no more than 4.15 M parameters. However, the proposed method typically requires more computations than existing counterparts.

Download Full-text

Tri-net for Semi-Supervised Deep Learning

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/278 ◽

2018 ◽

Cited By ~ 11

Author(s):

Dong-Dong Chen ◽

Wei Wang ◽

Wei Gao ◽

Zhi-Hua Zhou

Keyword(s):

Neural Network ◽

Neural Networks ◽

Deep Learning ◽

Error Rate ◽

Deep Neural Network ◽

Deep Neural Networks ◽

State Of The Art ◽

Fine Tuning ◽

Learning Methods ◽

Model Initialization

Deep neural networks have witnessed great successes in various real applications, but it requires a large number of labeled data for training. In this paper, we propose tri-net, a deep neural network which is able to use massive unlabeled data to help learning with limited labeled data. We consider model initialization, diversity augmentation and pseudo-label editing simultaneously. In our work, we utilize output smearing to initialize modules, use fine-tuning on labeled data to augment diversity and eliminate unstable pseudo-labels to alleviate the influence of suspicious pseudo-labeled data. Experiments show that our method achieves the best performance in comparison with state-of-the-art semi-supervised deep learning methods. In particular, it achieves 8.30% error rate on CIFAR-10 by using only 4000 labeled examples.

Download Full-text

The FaceChannel: A Fast and Furious Deep Neural Network for Facial Expression Recognition

SN Computer Science ◽

10.1007/s42979-020-00325-6 ◽

2020 ◽

Vol 1 (6) ◽

Author(s):

Pablo Barros ◽

Nikhil Churamani ◽

Alessandra Sciutti

Keyword(s):

Neural Network ◽

Neural Networks ◽

Facial Expression ◽

Facial Expression Recognition ◽

Deep Neural Networks ◽

State Of The Art ◽

Facial Features ◽

Expression Recognition ◽

Current State ◽

Benchmark Datasets

AbstractCurrent state-of-the-art models for automatic facial expression recognition (FER) are based on very deep neural networks that are effective but rather expensive to train. Given the dynamic conditions of FER, this characteristic hinders such models of been used as a general affect recognition. In this paper, we address this problem by formalizing the FaceChannel, a light-weight neural network that has much fewer parameters than common deep neural networks. We introduce an inhibitory layer that helps to shape the learning of facial features in the last layer of the network and, thus, improving performance while reducing the number of trainable parameters. To evaluate our model, we perform a series of experiments on different benchmark datasets and demonstrate how the FaceChannel achieves a comparable, if not better, performance to the current state-of-the-art in FER. Our experiments include cross-dataset analysis, to estimate how our model behaves on different affective recognition conditions. We conclude our paper with an analysis of how FaceChannel learns and adapts the learned facial features towards the different datasets.

Download Full-text

Image Hashtag Recommendations Using a Voting Deep Neural Network and Associative Rules Mining Approach

Entropy ◽

10.3390/e22121351 ◽

2020 ◽

Vol 22 (12) ◽

pp. 1351

Author(s):

Tomasz Hachaj ◽

Justyna Miazga

Keyword(s):

Neural Network ◽

Neural Networks ◽

Deep Neural Network ◽

Deep Neural Networks ◽

Rapid Development ◽

Classification Problem ◽

Image Features ◽

Source Codes ◽

Confidence Threshold ◽

Social Media Platforms

Hashtag-based image descriptions are a popular approach for labeling images on social media platforms. In practice, images are often described by more than one hashtag. Due the rapid development of deep neural networks specialized in image embedding and classification, it is now possible to generate those descriptions automatically. In this paper we propose a novel Voting Deep Neural Network with Associative Rules Mining (VDNN-ARM) algorithm that can be used to solve multi-label hashtag recommendation problems. VDNN-ARM is a machine learning approach that utilizes an ensemble of deep neural networks to generate image features, which are then classified to potential hashtag sets. Proposed hashtags are then filtered by a voting schema. The remaining hashtags might be included in a final recommended hashtags dataset by application of associative rules mining, which explores dependencies in certain hashtag groups. Our approach is evaluated on a HARRISON benchmark dataset as a multi-label classification problem. The highest values of our evaluation parameters, including precision, recall, and accuracy, have been obtained for VDNN-ARM with a confidence threshold 0.95. VDNN-ARM outperforms state-of-the-art algorithms, including VGG-Object + VGG-Scene precision by 17.91% as well as ensemble–FFNN (intersection) recall by 32.33% and accuracy by 27.00%. Both the dataset and all source codes we implemented for this research are available for download, and our results can be reproduced.

Download Full-text

Leveraging the Bhattacharyya coefficient for uncertainty quantification in deep neural networks

Neural Computing and Applications ◽

10.1007/s00521-021-05789-y ◽

2021 ◽

Author(s):

Pieter Van Molle ◽

Tim Verbelen ◽

Bert Vankeirsbilck ◽

Jonas De Vylder ◽

Bart Diricx ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Deep Neural Networks ◽

State Of The Art ◽

Use Case ◽

Bhattacharyya Coefficient ◽

Output Uncertainty ◽

Novel Approach ◽

Benchmark Datasets ◽

Network Approaches

AbstractModern deep learning models achieve state-of-the-art results for many tasks in computer vision, such as image classification and segmentation. However, its adoption into high-risk applications, e.g. automated medical diagnosis systems, happens at a slow pace. One of the main reasons for this is that regular neural networks do not capture uncertainty. To assess uncertainty in classification, several techniques have been proposed casting neural network approaches in a Bayesian setting. Amongst these techniques, Monte Carlo dropout is by far the most popular. This particular technique estimates the moments of the output distribution through sampling with different dropout masks. The output uncertainty of a neural network is then approximated as the sample variance. In this paper, we highlight the limitations of such a variance-based uncertainty metric and propose an novel approach. Our approach is based on the overlap between output distributions of different classes. We show that our technique leads to a better approximation of the inter-class output confusion. We illustrate the advantages of our method using benchmark datasets. In addition, we apply our metric to skin lesion classification—a real-world use case—and show that this yields promising results.

Download Full-text

A Tensor Space Model-Based Deep Neural Network for Text Classification

Applied Sciences ◽

10.3390/app11209703 ◽

2021 ◽

Vol 11 (20) ◽

pp. 9703

Author(s):

Han-joon Kim ◽

Pureum Lim

Keyword(s):

Neural Network ◽

Neural Networks ◽

Deep Learning ◽

Text Classification ◽

Deep Neural Network ◽

Deep Neural Networks ◽

Classification Systems ◽

Support Vector ◽

Tensor Space ◽

Space Model

Most text classification systems use machine learning algorithms; among these, naïve Bayes and support vector machine algorithms adapted to handle text data afford reasonable performance. Recently, given developments in deep learning technology, several scholars have used deep neural networks (recurrent and convolutional neural networks) to improve text classification. However, deep learning-based text classification has not greatly improved performance compared to that of conventional algorithms. This is because a textual document is essentially expressed as a vector (only), albeit with word dimensions, which compromises the inherent semantic information, even if the vector is (appropriately) transformed to add conceptual information. To solve this `loss of term senses’ problem, we develop a concept-driven deep neural network based upon our semantic tensor space model. The semantic tensor used for text representation features a dependency between the term and the concept; we use this to develop three deep neural networks for text classification. We perform experiments using three standard document corpora, and we show that our proposed methods are superior to both traditional and more recent learning methods.

Download Full-text

Adversarial Attacks for Deep Learning-Based Infrared Object Detection

Journal of the Korea Institute of Military Science and Technology ◽

10.9766/kimst.2021.24.6.591 ◽

2021 ◽

Vol 24 (6) ◽

pp. 591-601

Author(s):

Hoseong Kim ◽

Jaeguk Hyun ◽

Hyunjung Yoo ◽

Chunho Kim ◽

Hyunho Jeon

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Object Detection ◽

Image Recognition ◽

Rapid Growth ◽

Deep Neural Networks ◽

State Of The Art ◽

Visible Image ◽

Adversarial Attack

Recently, infrared object detection(IOD) has been extensively studied due to the rapid growth of deep neural networks(DNN). Adversarial attacks using imperceptible perturbation can dramatically deteriorate the performance of DNN. However, most adversarial attack works are focused on visible image recognition(VIR), and there are few methods for IOD. We propose deep learning-based adversarial attacks for IOD by expanding several state-of-the-art adversarial attacks for VIR. We effectively validate our claim through comprehensive experiments on two challenging IOD datasets, including FLIR and MSOD.

Download Full-text