BBW: a batch balance wrapper for training deep neural networks on extremely imbalanced datasets with few minority samples

Applied Intelligence ◽

10.1007/s10489-021-02623-9 ◽

2021 ◽

Author(s):

Jingzhao Hu ◽

Hao Zhang ◽

Yang Liu ◽

Richard Sutcliffe ◽

Jun Feng

Keyword(s):

Neural Networks ◽

Learning Process ◽

Deep Neural Networks ◽

Imbalanced Data ◽

Parameter Tuning ◽

Classification Performance ◽

Imbalanced Datasets ◽

Sample Distribution ◽

Network Layers ◽

Additional Processing

AbstractIn recent years, Deep Neural Networks (DNNs) have achieved excellent performance on many tasks, but it is very difficult to train good models from imbalanced datasets. Creating balanced batches either by majority data down-sampling or by minority data up-sampling can solve the problem in certain cases. However, it may lead to learning process instability and overfitting. In this paper, we propose the Batch Balance Wrapper (BBW), a novel framework which can adapt a general DNN to be well trained from extremely imbalanced datasets with few minority samples. In BBW, two extra network layers are added to the start of a DNN. The layers prevent overfitting of minority samples and improve the expressiveness of the sample distribution of minority samples. Furthermore, Batch Balance (BB), a class-based sampling algorithm, is proposed to make sure the samples in each batch are always balanced during the learning process. We test BBW on three well-known extremely imbalanced datasets with few minority samples. The maximum imbalance ratio reaches 1167:1 with only 16 positive samples. Compared with existing approaches, BBW achieves better classification performance. In addition, BBW-wrapped DNNs are 16.39 times faster, relative to unwrapped DNNs. Moreover, BBW does not require data preprocessing or additional hyper-parameter tuning, operations that may require additional processing time. The experiments prove that BBW can be applied to common applications of extremely imbalanced data with few minority samples, such as the classification of EEG signals, medical images and so on.

Download Full-text

Representing Deep Neural Networks Latent Space Geometries with Graphs

Algorithms ◽

10.3390/a14020039 ◽

2021 ◽

Vol 14 (2) ◽

pp. 39

Author(s):

Carlos Lassance ◽

Vincent Gripon ◽

Antonio Ortega

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Deep Learning ◽

Objective Function ◽

Learning Process ◽

Deep Neural Networks ◽

State Of The Art ◽

The Core ◽

Learning Tasks ◽

Latent Space

Deep Learning (DL) has attracted a lot of attention for its ability to reach state-of-the-art performance in many machine learning tasks. The core principle of DL methods consists of training composite architectures in an end-to-end fashion, where inputs are associated with outputs trained to optimize an objective function. Because of their compositional nature, DL architectures naturally exhibit several intermediate representations of the inputs, which belong to so-called latent spaces. When treated individually, these intermediate representations are most of the time unconstrained during the learning process, as it is unclear which properties should be favored. However, when processing a batch of inputs concurrently, the corresponding set of intermediate representations exhibit relations (what we call a geometry) on which desired properties can be sought. In this work, we show that it is possible to introduce constraints on these latent geometries to address various problems. In more detail, we propose to represent geometries by constructing similarity graphs from the intermediate representations obtained when processing a batch of inputs. By constraining these Latent Geometry Graphs (LGGs), we address the three following problems: (i) reproducing the behavior of a teacher architecture is achieved by mimicking its geometry, (ii) designing efficient embeddings for classification is achieved by targeting specific geometries, and (iii) robustness to deviations on inputs is achieved via enforcing smooth variation of geometry between consecutive latent spaces. Using standard vision benchmarks, we demonstrate the ability of the proposed geometry-based methods in solving the considered problems.

Download Full-text

Attention-based deep learning networks for identification of human gait using radar micro-Doppler spectrograms

International Journal of Microwave and Wireless Technologies ◽

10.1017/s1759078721000830 ◽

2021 ◽

pp. 1-6

Author(s):

Hannah Garcia Doherty ◽

Roberto Arnaiz Burgueño ◽

Roeland P. Trommel ◽

Vasileios Papanastasiou ◽

Ronny I. A. Harmanny

Keyword(s):

Neural Networks ◽

Feature Vector ◽

Classification Performance ◽

Input Image ◽

Human Gait ◽

Learning Networks ◽

Class Label ◽

Deep Convolutional Neural Networks ◽

Network Layers ◽

Feature Dimension

Abstract Identification of human individuals within a group of 39 persons using micro-Doppler (μ-D) features has been investigated. Deep convolutional neural networks with two different training procedures have been used to perform classification. Visualization of the inner network layers revealed the sections of the input image most relevant when determining the class label of the target. A convolutional block attention module is added to provide a weighted feature vector in the channel and feature dimension, highlighting the relevant μ-D feature-filled areas in the image and improving classification performance.

Download Full-text

Semiotic Aggregation in Deep Learning

Entropy ◽

10.3390/e22121365 ◽

2020 ◽

Vol 22 (12) ◽

pp. 1365

Author(s):

Bogdan Muşat ◽

Răzvan Andonie

Keyword(s):

Neural Network ◽

Neural Networks ◽

Deep Learning ◽

Decision Model ◽

Deep Neural Networks ◽

Neural Model ◽

Network Layers ◽

Saliency Maps ◽

Spatial Entropy ◽

Insight Into

Convolutional neural networks utilize a hierarchy of neural network layers. The statistical aspects of information concentration in successive layers can bring an insight into the feature abstraction process. We analyze the saliency maps of these layers from the perspective of semiotics, also known as the study of signs and sign-using behavior. In computational semiotics, this aggregation operation (known as superization) is accompanied by a decrease of spatial entropy: signs are aggregated into supersign. Using spatial entropy, we compute the information content of the saliency maps and study the superization processes which take place between successive layers of the network. In our experiments, we visualize the superization process and show how the obtained knowledge can be used to explain the neural decision model. In addition, we attempt to optimize the architecture of the neural model employing a semiotic greedy technique. To the extent of our knowledge, this is the first application of computational semiotics in the analysis and interpretation of deep neural networks.

Download Full-text

Part-of-Speech Tagging via Deep Neural Networks for Northern-Ethiopic Languages

Information Technology And Control ◽

10.5755/j01.itc.49.4.26808 ◽

2020 ◽

Vol 49 (4) ◽

pp. 482-494

Author(s):

Jurgita Kapočiūtė-Dzikienė ◽

Senait Gebremichael Tesfagergish

Keyword(s):

Neural Network ◽

Neural Networks ◽

Language Processing ◽

Deep Neural Networks ◽

Short Term Memory ◽

Parameter Tuning ◽

Feed Forward Neural Network ◽

Pos Tagging ◽

Part Of Speech ◽

Pos Tagger

Deep Neural Networks (DNNs) have proven to be especially successful in the area of Natural Language Processing (NLP) and Part-Of-Speech (POS) tagging—which is the process of mapping words to their corresponding POS labels depending on the context. Despite recent development of language technologies, low-resourced languages (such as an East African Tigrinya language), have received too little attention. We investigate the effectiveness of Deep Learning (DL) solutions for the low-resourced Tigrinya language of the Northern-Ethiopic branch. We have selected Tigrinya as the testbed example and have tested state-of-the-art DL approaches seeking to build the most accurate POS tagger. We have evaluated DNN classifiers (Feed Forward Neural Network – FFNN, Long Short-Term Memory method – LSTM, Bidirectional LSTM, and Convolutional Neural Network – CNN) on a top of neural word2vec word embeddings with a small training corpus known as Nagaoka Tigrinya Corpus. To determine the best DNN classifier type, its architecture and hyper-parameter set both manual and automatic hyper-parameter tuning has been performed. BiLSTM method was proved to be the most suitable for our solving task: it achieved the highest accuracy equal to 92% that is 65% above the random baseline.

Download Full-text

Granular Classification for Imbalanced Datasets: A Minkowski Distance-Based Method

Algorithms ◽

10.3390/a14020054 ◽

2021 ◽

Vol 14 (2) ◽

pp. 54

Author(s):

Chen Fu ◽

Jianhua Yang

Keyword(s):

Imbalanced Data ◽

Main Idea ◽

Fuzzy Rule ◽

Classification Performance ◽

Distance Measures ◽

Minkowski Distance ◽

Imbalanced Datasets ◽

Minority Class ◽

Information Granules ◽

Practical Applications

The problem of classification for imbalanced datasets is frequently encountered in practical applications. The data to be classified in this problem are skewed, i.e., the samples of one class (the minority class) are much less than those of other classes (the majority class). When dealing with imbalanced datasets, most classifiers encounter a common limitation, that is, they often obtain better classification performances on the majority classes than those on the minority class. To alleviate the limitation, in this study, a fuzzy rule-based modeling approach using information granules is proposed. Information granules, as some entities derived and abstracted from data, can be used to describe and capture the characteristics (distribution and structure) of data from both majority and minority classes. Since the geometric characteristics of information granules depend on the distance measures used in the granulation process, the main idea of this study is to construct information granules on each class of imbalanced data using Minkowski distance measures and then to establish the classification models by using “If-Then” rules. The experimental results involving synthetic and publicly available datasets reflect that the proposed Minkowski distance-based method can produce information granules with a series of geometric shapes and construct granular models with satisfying classification performance for imbalanced datasets.

Download Full-text

Group-Wise Dynamic Dropout Based on Latent Semantic Variations

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6782 ◽

2020 ◽

Vol 34 (07) ◽

pp. 11229-11236

Author(s):

Zhiwei Ke ◽

Zhiwei Wen ◽

Weicheng Xie ◽

Yi Wang ◽

Linlin Shen

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Semantic Information ◽

State Of The Art ◽

Classification Performance ◽

Network Robustness ◽

Feature Detectors ◽

Data Points ◽

Adversarial Examples ◽

Public Datasets

Dropout regularization has been widely used in various deep neural networks to combat overfitting. It works by training a network to be more robust on information-degraded data points for better generalization. Conventional dropout and variants are often applied to individual hidden units in a layer to break up co-adaptations of feature detectors. In this paper, we propose an adaptive dropout to reduce the co-adaptations in a group-wise manner by coarse semantic information to improve feature discriminability. In particular, we showed that adjusting the dropout probability based on local feature densities can not only improve the classification performance significantly but also enhance the network robustness against adversarial examples in some cases. The proposed approach was evaluated in comparison with the baseline and several state-of-the-art adaptive dropouts over four public datasets of Fashion-MNIST, CIFAR-10, CIFAR-100 and SVHN.

Download Full-text

Sharing Residual Units Through Collective Tensor Factorization To Improve Deep Neural Networks

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/88 ◽

2018 ◽

Cited By ~ 6

Author(s):

Yunpeng Chen ◽

Xiaojie Jin ◽

Bingyi Kang ◽

Jiashi Feng ◽

Shuicheng Yan

Keyword(s):

Neural Networks ◽

Network Architecture ◽

Deep Neural Networks ◽

Tensor Decomposition ◽

Classification Performance ◽

Model Parameters ◽

Tensor Factorization ◽

Unified Framework ◽

Benchmark Datasets ◽

Basic Network

The residual unit and its variations are wildly used in building very deep neural networks for alleviating optimization difficulty. In this work, we revisit the standard residual function as well as its several successful variants and propose a unified framework based on tensor Block Term Decomposition (BTD) to explain these apparently different residual functions from the tensor decomposition view. With the BTD framework, we further propose a novel basic network architecture, named the Collective Residual Unit (CRU). CRU further enhances parameter efficiency of deep residual neural networks by sharing core factors derived from collective tensor factorization over the involved residual units. It enables efficient knowledge sharing across multiple residual units, reduces the number of model parameters, lowers the risk of over-fitting, and provides better generalization ability. Extensive experimental results show that our proposed CRU network brings outstanding parameter efficiency -- it achieves comparable classification performance with ResNet-200 while using a model size as small as ResNet-50 on the ImageNet-1k and Places365-Standard benchmark datasets.

Download Full-text

Multimodal Multi-tasking for Skin Lesion Classification Using Deep Neural Networks

10.1007/978-3-030-90439-5_3 ◽

2021 ◽

pp. 27-38

Author(s):

Rafaela Carvalho ◽

João Pedrosa ◽

Tudor Nedelcu

Keyword(s):

Neural Networks ◽

Computer Vision ◽

Deep Learning ◽

Skin Lesion ◽

Deep Neural Networks ◽

Classification Performance ◽

Additional Information ◽

Deep Learning Model ◽

Lesion Classification

AbstractSkin cancer is one of the most common types of cancer and, with its increasing incidence, accurate early diagnosis is crucial to improve prognosis of patients. In the process of visual inspection, dermatologists follow specific dermoscopic algorithms and identify important features to provide a diagnosis. This process can be automated as such characteristics can be extracted by computer vision techniques. Although deep neural networks can extract useful features from digital images for skin lesion classification, performance can be improved by providing additional information. The extracted pseudo-features can be used as input (multimodal) or output (multi-tasking) to train a robust deep learning model. This work investigates the multimodal and multi-tasking techniques for more efficient training, given the single optimization of several related tasks in the latter, and generation of better diagnosis predictions. Additionally, the role of lesion segmentation is also studied. Results show that multi-tasking improves learning of beneficial features which lead to better predictions, and pseudo-features inspired by the ABCD rule provide readily available helpful information about the skin lesion.

Download Full-text

Efficiency of Various Time-Frequency Representations in Deep Neural Network based Passive Sonar Target Classifiers

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.d1662.029420 ◽

2020 ◽

Vol 9 (4) ◽

pp. 1908-1918

Keyword(s):

Neural Network ◽

Neural Networks ◽

Deep Neural Networks ◽

Target Recognition ◽

Classification Performance ◽

Low Frequencies ◽

Acceptable Error ◽

Time Frequency ◽

Passive Sonar ◽

Spectral Components

Passive acoustic target classification is an exceptionally challenging problem due to the complex phenomena associated with the channel and the relatively low Signal to Noise Ratio (SNR) manifested by the pervasive ambient noise field. Inspired by the overwhelming success of Deep Neural Networks (DNNs) in many such hard problems, a carefully crafted network specifically for target recognition application has been employed in this work. Although deep neural networks can learn characteristic features or representations directly from the raw observations, domain specific intermediate representations can mitigate the computational requirements as well as the sample complexity required to achieve an acceptable error rate in prediction. As the sonar target records are essentially a time series, spectro-temporal representations can make the intricate relationship between time and spectral components more explicit. In a passive sonar target recognition scenario, since most of the defining spectral components reside at the lower part of the spectrum, a nonlinear dilated spectral scale having an emphasis on low frequencies is highly desirable. This can be easily achieved using a filterbank based time-frequency decomposition, which allows more filters to be positioned at the desired frequency ranges of interest. In this work, a rigorous analysis of the performance of time-frequency representations initialized at various frequency scales, is conducted independently as well as in combination. A convolutional neural network based spectro-temporal feature learner has been utilized as the initial layers, while a deep stack of Long Short Term Memories (LSTMs) with residual connections has been used for learning the intricate temporal relationships hidden in the intermediate representations. From the experimental results it can be observed that a linear scale spectrogram achieves an accuracy of 92.4% and 90.2% respectively for validation and test sets in the single feature configuration, whereas the gammatone spectrogram is capable of attaining an accuracy in the order of 96.7% and 96.1% respectively for the same. In a multifeatured setup however, the accuracy reaches up to 97.3% and 96.6% respectively, which reveals that a combination of properly initialized intermediate representations can improve the classification performance significantly.

Download Full-text

A Learning Framework for Medical Image-Based Intelligent Diagnosis from Imbalanced Datasets

10.3233/shti210801 ◽

2021 ◽

Author(s):

Tetiana Biloborodova ◽

Inna Skarga-Bandurova ◽

Mark Koverha ◽

Illia Skarha-Bandurov ◽

Yelyzaveta Yevsieieva

Keyword(s):

Image Classification ◽

Predictive Models ◽

Medical Image ◽

Imbalanced Data ◽

Classification Performance ◽

Data Reuse ◽

Imbalanced Datasets ◽

Learning Framework ◽

Class Distribution ◽

Medical Image Classification

Medical image classification and diagnosis based on machine learning has made significant achievements and gradually penetrated the healthcare industry. However, medical data characteristics such as relatively small datasets for rare diseases or imbalance in class distribution for rare conditions significantly restrains their adoption and reuse. Imbalanced datasets lead to difficulties in learning and obtaining accurate predictive models. This paper follows the FAIR paradigm and proposes a technique for the alignment of class distribution, which enables improving image classification performance in imbalanced data and ensuring data reuse. The experiments on the acne disease dataset support that the proposed framework outperforms the baselines and enable to achieve up to 5% improvement in image classification.

Download Full-text