SpecAugment++: A Hidden Space Data Augmentation Method for AcousticScene Classification
In this paper, we present SpecAugment++, a novel data aug-mentation method for deep neural networks based acousticscene classification (ASC). Different from other popular dataaugmentation methods such as SpecAugment and mixup thatonly work on the input space, SpecAugment++ is applied toboth the input space and the hidden space of the deep neuralnetworks to enhance the input and the intermediate feature rep-resentations. For an intermediate hidden state, the augmentationtechniques consist of masking blocks of frequency channels andmasking blocks of time frames, which improve generalizationby enabling a model to attend not only to the most discrimina-tive parts of the feature, but also the entire parts. Apart fromusing zeros for masking, we also examine two approaches formasking based on the use of other samples within the mini-batch, which helps introduce noises to the networks to makethem more discriminative for classification. The experimentalresults on the DCASE 2018 Task1 dataset and DCASE 2019Task1 dataset show that our proposed method can obtain3.6%and4.7%accuracy gains over a strong baseline without aug-mentation (i.e.CP-ResNet) respectively, and outperforms otherprevious data augmentation methods.