Enhancing quality and accuracy of speech recognition system by using multimodal audio-visual speech signal

Author(s):  
Eslam E. El Maghraby ◽  
Amr M. Gody ◽  
M. Hesham Farouk
Author(s):  
Keshav Sinha ◽  
Rasha Subhi Hameed ◽  
Partha Paul ◽  
Karan Pratap Singh

In recent years, the advancement in voice-based authentication leads in the field of numerous forensic voice authentication technology. For verification, the speech reference model is collected from various open-source clusters. In this chapter, the primary focus is on automatic speech recognition (ASR) technique which stores and retrieves the data and processes them in a scalable manner. There are the various conventional techniques for speech recognition such as BWT, SVD, and MFCC, but for automatic speech recognition, the efficiency of these conventional recognition techniques degrade. So, to overcome this problem, the authors propose a speech recognition system using E-SVD, D3-MFCC, and dynamic time wrapping (DTW). The speech signal captures its important qualities while discarding the unimportant and distracting features using D3-MFCC.


Sign in / Sign up

Export Citation Format

Share Document