Generating Robust Audio Adversarial Examples with Temporal Dependency

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/438 ◽

2020 ◽

Author(s):

Hongting Zhang ◽

Pan Zhou ◽

Qiben Yan ◽

Xiao-Yang Liu

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Defense Mechanisms ◽

User Study ◽

State Of The Art ◽

Temporal Structure ◽

Human Perception ◽

Experimental Results ◽

Low Intensity ◽

Adversarial Examples

Audio adversarial examples, imperceptible to humans, have been constructed to attack automatic speech recognition (ASR) systems. However, the adversarial examples generated by existing approaches usually incorporate noticeable noises, especially during the periods of silences and pauses. Moreover, the added noises often break temporal dependency property of the original audio, which can be easily detected by state-of-the-art defense mechanisms. In this paper, we propose a new Iterative Proportional Clipping (IPC) algorithm that preserves temporal dependency in audios for generating more robust adversarial examples. We are motivated by an observation that the temporal dependency in audios imposes a significant effect on human perception. Following our observation, we leverage a proportional clipping strategy to reduce noise during the low-intensity periods. Experimental results and user study both suggest that the generated adversarial examples can significantly reduce human-perceptible noises and resist the defenses based on the temporal structure.

Download Full-text

Performance vs. hardware requirements in state-of-the-art automatic speech recognition

EURASIP Journal on Audio Speech and Music Processing ◽

10.1186/s13636-021-00217-4 ◽

2021 ◽

Vol 2021 (1) ◽

Author(s):

Alexandru-Lucian Georgescu ◽

Alessandro Pappalardo ◽

Horia Cucu ◽

Michaela Blott

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

State Of The Art ◽

Decision Makers ◽

Computing Power ◽

Trade Off ◽

Speech Features ◽

Commercial Applications ◽

Guided Tour ◽

Embedded Applications

AbstractThe last decade brought significant advances in automatic speech recognition (ASR) thanks to the evolution of deep learning methods. ASR systems evolved from pipeline-based systems, that modeled hand-crafted speech features with probabilistic frameworks and generated phone posteriors, to end-to-end (E2E) systems, that translate the raw waveform directly into words using one deep neural network (DNN). The transcription accuracy greatly increased, leading to ASR technology being integrated into many commercial applications. However, few of the existing ASR technologies are suitable for integration in embedded applications, due to their hard constrains related to computing power and memory usage. This overview paper serves as a guided tour through the recent literature on speech recognition and compares the most popular ASR implementations. The comparison emphasizes the trade-off between ASR performance and hardware requirements, to further serve decision makers in choosing the system which fits best their embedded application. To the best of our knowledge, this is the first study to provide this kind of trade-off analysis for state-of-the-art ASR systems.

Download Full-text

Evaluation of the effectiveness and efficiency of state-of-the-art features and models for automatic speech recognition error detection

Journal Of Big Data ◽

10.1186/s40537-020-00391-w ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Asmaa El Hannani ◽

Rahhal Errattahi ◽

Fatima Zahra Salmam ◽

Thomas Hain ◽

Hassan Ouahmane

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Error Detection ◽

State Of The Art ◽

Rapid Development ◽

Unified Framework ◽

Human Machine Interaction ◽

Detection Analysis ◽

Extensive Evaluation ◽

Effectiveness And Efficiency

AbstractSpeech based human-machine interaction and natural language understanding applications have seen a rapid development and wide adoption over the last few decades. This has led to a proliferation of studies that investigate Error detection and classification in Automatic Speech Recognition (ASR) systems. However, different data sets and evaluation protocols are used, making direct comparisons of the proposed approaches (e.g. features and models) difficult. In this paper we perform an extensive evaluation of the effectiveness and efficiency of state-of-the-art approaches in a unified framework for both errors detection and errors type classification. We make three primary contributions throughout this paper: (1) we have compared our Variant Recurrent Neural Network (V-RNN) model with three other state-of-the-art neural based models, and have shown that the V-RNN model is the most effective classifier for ASR error detection in term of accuracy and speed, (2) we have compared four features’ settings, corresponding to different categories of predictor features and have shown that the generic features are particularly suitable for real-time ASR error detection applications, and (3) we have looked at the post generalization ability of our error detection framework and performed a detailed post detection analysis in order to perceive the recognition errors that are difficult to detect.

Download Full-text

Adversarial Examples for Automatic Speech Recognition: Attacks and Countermeasures

IEEE Communications Magazine ◽

10.1109/mcom.2019.1900006 ◽

2019 ◽

Vol 57 (10) ◽

pp. 120-126 ◽

Cited By ~ 6

Author(s):

Shengshan Hu ◽

Xingcan Shang ◽

Zhan Qin ◽

Minghui Li ◽

Qian Wang ◽

...

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Adversarial Examples

Download Full-text

Evaluating automatic speech recognition systems in comparison with human perception results using distinctive feature measures

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2017.7953270 ◽

2017 ◽

Cited By ~ 2

Author(s):

Xiang Kong ◽

Jeung-Yoon Choi ◽

Stefanie Shattuck-Hufnagel

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Distinctive Feature ◽

Human Perception ◽

Recognition Systems

Download Full-text

Audio-visual automatic speech recognition and related bimodal speech technologies: A review of the state-of-the-art and open problems

2009 IEEE Workshop on Automatic Speech Recognition & Understanding ◽

10.1109/asru.2009.5373530 ◽

2009 ◽

Cited By ~ 2

Author(s):

Gerasimos Potamianos

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

State Of The Art ◽

The State ◽

Open Problems

Download Full-text

Sanitizing hidden activations for improving adversarial robustness of convolutional neural networks

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-210371 ◽

2021 ◽

pp. 1-11

Author(s):

Tianshi Mu ◽

Kequan Lin ◽

Huabing Zhang ◽

Jian Wang

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Convolutional Neural Networks ◽

State Of The Art ◽

Black Box ◽

Experimental Results ◽

Amplification Effect ◽

Wide Range ◽

Adversarial Examples

Deep learning is gaining significant traction in a wide range of areas. Whereas, recent studies have demonstrated that deep learning exhibits the fatal weakness on adversarial examples. Due to the black-box nature and un-transparency problem of deep learning, it is difficult to explain the reason for the existence of adversarial examples and also hard to defend against them. This study focuses on improving the adversarial robustness of convolutional neural networks. We first explore how adversarial examples behave inside the network through visualization. We find that adversarial examples produce perturbations in hidden activations, which forms an amplification effect to fool the network. Motivated by this observation, we propose an approach, termed as sanitizing hidden activations, to help the network correctly recognize adversarial examples by eliminating or reducing the perturbations in hidden activations. To demonstrate the effectiveness of our approach, we conduct experiments on three widely used datasets: MNIST, CIFAR-10 and ImageNet, and also compare with state-of-the-art defense techniques. The experimental results show that our sanitizing approach is more generalized to defend against different kinds of attacks and can effectively improve the adversarial robustness of convolutional neural networks.

Download Full-text

Learning robust features by extended generative stochastic networks

International Journal of Modeling Simulation and Scientific Computing ◽

10.1142/s1793962318500046 ◽

2018 ◽

Vol 09 (01) ◽

pp. 1850004

Author(s):

Da Teng ◽

Xiao Song ◽

Guanghong Gong ◽

Junhua Zhou

Keyword(s):

Neural Networks ◽

Object Recognition ◽

Deep Neural Networks ◽

State Of The Art ◽

Random Noise ◽

Stochastic Networks ◽

Experimental Results ◽

Feedforward Networks ◽

Adversarial Examples ◽

Art Performance

Deep neural networks have achieved state-of-the-art performance on many object recognition tasks, but they are vulnerable to small adversarial perturbations. In this paper, several extensions of generative stochastic networks (GSNs) are proposed to improve the robustness of neural networks to random noise and adversarial perturbations. Experimental results show that compared to normal GSN method, the extensions using adversarial examples, lateral connections and feedforward networks can improve the performance of GSNs by making the models more resistant to overfitting and noise.

Download Full-text

Automatic Speech Recognition System for Tonal Languages: State-of-the-Art Survey

Archives of Computational Methods in Engineering ◽

10.1007/s11831-020-09414-4 ◽

2020 ◽

Author(s):

Jaspreet Kaur ◽

Amitoj Singh ◽

Virender Kadyan

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

State Of The Art ◽

Recognition System ◽

Speech Recognition System ◽

Automatic Speech Recognition System

Download Full-text

Imperio: Robust Over-the-Air Adversarial Examples for Automatic Speech Recognition Systems

Annual Computer Security Applications Conference ◽

10.1145/3427228.3427276 ◽

2020 ◽

Author(s):

Lea Schönherr ◽

Thorsten Eisenhofer ◽

Steffen Zeiler ◽

Thorsten Holz ◽

Dorothea Kolossa

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Adversarial Examples ◽

Recognition Systems

Download Full-text

Nexus DNN for Speech and Speaker Recognition

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.b2963.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 2004-2007

Keyword(s):

Speech Recognition ◽

Open Source ◽

Automatic Speech Recognition ◽

Speaker Recognition ◽

Unified Model ◽

Experimental Results ◽

Combined Model ◽

Close Relationship

Over the years, many efforts have been made on improving recognition accuracies on Automatic speech recognition (ASR) and speaker recognition (SRE), and many different technologies have been developed. Given the close relationship between these two tasks, researchers have proposed different ways to introduce techniques developed for these tasks to each other. In this paper an open source experimental framework is proposed for speech and speaker recognition. Then a unified model, Nexus-DNN is developed that is trained jointly for speech and speaker recognition. Experimental results show that the combined model can effectively perform ASR and SRE tasks.

Download Full-text