Joint acoustic factor learning for robust deep neural network based automatic speech recognition

Author(s):  
Souvik Kundu ◽  
Gautam Mantena ◽  
Yanmin Qian ◽  
Tian Tan ◽  
Marc Delcroix ◽  
...  
2018 ◽  
Vol 1 (3) ◽  
pp. 28 ◽  
Author(s):  
Jeih-weih Hung ◽  
Jung-Shan Lin ◽  
Po-Jen Wu

In recent decades, researchers have been focused on developing noise-robust methods in order to compensate for noise effects in automatic speech recognition (ASR) systems and enhance their performance. In this paper, we propose a feature-based noise-robust method that employs a novel data analysis technique—robust principal component analysis (RPCA). In the proposed scenario, RPCA is employed to process a noise-corrupted speech feature matrix, and the obtained sparse partition is shown to reveal speech-dominant characteristics. One apparent advantage of using RPCA for enhancing noise robustness is that no prior knowledge about the noise is required. The proposed RPCA-based method is evaluated with the Aurora-4 database and a task using a state-of-the-art deep neural network (DNN) architecture as the acoustic models. The evaluation results indicate that the newly proposed method can provide the original speech feature with significant recognition accuracy improvement, and can be cascaded with mean normalization (MN), mean and variance normalization (MVN), and relative spectral (RASTA)—three well-known and widely used feature robustness algorithms—to achieve better performance compared with the individual component method.


2019 ◽  
Vol 20 (11) ◽  
pp. 686-695
Author(s):  
Yin Shuai ◽  
A. S. Yuschenko

The article discusses the system of dialogue control manipulation robots. The analysis of the basic methods of automatic speech recognition, speech understanding, dialogue management, voice response synthesis in dialogue systems has been carried out. Three types of dialogue management are considered as "system initiative", "user initiative" and "combined initiative". A system of object-oriented dialog control of a robot based on the theory of finite state machines with using a deep neural network is proposed. The main difference of the proposed system lies in the separate implementation of the dialogue process and robot’s actions, which is close to the pace of natural dialogue control. This method of constructing a dialogue control robot allows system to automatically correct the result of speech recognition, robot’s actions based on tasks. The necessity of correcting the result of speech recognition and robot’s actions may be caused by the users’ accent, working environment noise or incorrect voice commands. The process of correcting speech recognition results and robot’s actions consists of three stages, respectively, in a special mode and a general mode. The special mode allows users to directly control the manipulator by voice commands. The general mode extends the capabilities of users, allowing them to get additional information in real time. At the first stage, continuous speech recognition is built by using a deep neural network, taking into account the accents and speech speeds of various users. Continuous speech recognition is a real-time voice to text conversion. At the second stage, the correction of the speech recognition result by managing the dialogue based on the theory of finite automata. At the third stage, the actions of the robot are corrected depending on the operating state of the robot and the dialogue management process. In order to realize a natural dialogue between users and robots, the problem is solved in creating a small database of possible dialogues and using various training data. In the experiments, the dialogue system is used to control the KUKA manipulator (KRC4 control) to put the desired block in the specified location, implemented in the Python environment using the RoboDK software. The processes and results of experiments confirming the operability of the interactive robot control system are given. A fairly high accuracy (92 %) and an automatic speech recognition rate close to the rate of natural speech were obtained.


Author(s):  
Khe Chai Sim ◽  
Yanmin Qian ◽  
Gautam Mantena ◽  
Lahiru Samarakoon ◽  
Souvik Kundu ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document