scholarly journals Nexus DNN for Speech and Speaker Recognition

Over the years, many efforts have been made on improving recognition accuracies on Automatic speech recognition (ASR) and speaker recognition (SRE), and many different technologies have been developed. Given the close relationship between these two tasks, researchers have proposed different ways to introduce techniques developed for these tasks to each other. In this paper an open source experimental framework is proposed for speech and speaker recognition. Then a unified model, Nexus-DNN is developed that is trained jointly for speech and speaker recognition. Experimental results show that the combined model can effectively perform ASR and SRE tasks.

Author(s):  
Hongting Zhang ◽  
Pan Zhou ◽  
Qiben Yan ◽  
Xiao-Yang Liu

Audio adversarial examples, imperceptible to humans, have been constructed to attack automatic speech recognition (ASR) systems. However, the adversarial examples generated by existing approaches usually incorporate noticeable noises, especially during the periods of silences and pauses. Moreover, the added noises often break temporal dependency property of the original audio, which can be easily detected by state-of-the-art defense mechanisms. In this paper, we propose a new Iterative Proportional Clipping (IPC) algorithm that preserves temporal dependency in audios for generating more robust adversarial examples. We are motivated by an observation that the temporal dependency in audios imposes a significant effect on human perception. Following our observation, we leverage a proportional clipping strategy to reduce noise during the low-intensity periods. Experimental results and user study both suggest that the generated adversarial examples can significantly reduce human-perceptible noises and resist the defenses based on the temporal structure.


Complexity ◽  
2020 ◽  
Vol 2020 ◽  
pp. 1-10
Author(s):  
Hui Wang ◽  
Fei Gao ◽  
Yue Zhao ◽  
Li Yang ◽  
Jianjian Yue ◽  
...  

In this paper, we propose to incorporate the local attention in WaveNet-CTC to improve the performance of Tibetan speech recognition in multitask learning. With an increase in task number, such as simultaneous Tibetan speech content recognition, dialect identification, and speaker recognition, the accuracy rate of a single WaveNet-CTC decreases on speech recognition. Inspired by the attention mechanism, we introduce the local attention to automatically tune the weights of feature frames in a window and pay different attention on context information for multitask learning. The experimental results show that our method improves the accuracies of speech recognition for all Tibetan dialects in three-task learning, compared with the baseline model. Furthermore, our method significantly improves the accuracy for low-resource dialect by 5.11% against the specific-dialect model.


2021 ◽  
Author(s):  
Lotte Weerts ◽  
Claudia Clopath ◽  
Dan F. M. Goodman

Automatic speech recognition (ASR) software has been suggested as a candidate model of the human auditory system thanks to dramatic improvements in performance in recent years. To test this hypothesis, we compared several state-of-the-art ASR systems to results from humans on a barrage of standard psychoacoustic experiments. While some systems showed qualitative agreement with humans in some tests, in others all tested systems diverged markedly from humans. In particular, none of the models used spectral invariance, temporal fine structure or speech periodicity in a similar way to humans. We conclude that none of the tested ASR systems are yet ready to act as a strong proxy for human speech recognition. However, we note that the more recent systems with better performance also tend to better match human results, suggesting that continued cross-fertilisation of ideas between human and automatic speech recognition may be fruitful. Our software is released as an open-source toolbox to allow researchers to assess future ASR systems or add additional psychoacoustic measures.


Author(s):  
Peter A. Heeman ◽  
Rebecca Lunsford ◽  
Andy McMillin ◽  
J. Scott Yaruss

Author(s):  
Manoj Kumar ◽  
Daniel Bone ◽  
Kelly McWilliams ◽  
Shanna Williams ◽  
Thomas D. Lyon ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document