scholarly journals Augmented Latent Features of Deep Neural Network-Based Automatic Speech Recognition for Motor-Driven Robots

2020 ◽  
Vol 10 (13) ◽  
pp. 4602
Author(s):  
Moa Lee ◽  
Joon-Hyuk Chang

Speech recognition for intelligent robots seems to suffer from performance degradation due to ego-noise. The ego-noise is caused by the motors, fans, and mechanical parts inside the intelligent robots especially when the robot moves or shakes its body. To overcome the problems caused by the ego-noise, we propose a robust speech recognition algorithm that uses motor-state information of the robot as an auxiliary feature. For this, we use two deep neural networks (DNN) in this paper. Firstly, we design the latent features using a bottleneck layer, one of the internal layers having a smaller number of hidden units relative to the other layers, to represent whether the motor is operating or not. The latent features maximizing the representation of the motor-state information are generated by taking the motor data and acoustic features as the input of the first DNN. Secondly, once the motor-state dependent latent features are designed at the first DNN, the second DNN, accounting for acoustic modeling, receives the latent features as the input along with the acoustic features. We evaluated the proposed system on LibriSpeech database. The proposed network enables efficient compression of the acoustic and motor-state information, and the resulting word error rate (WER) are superior to that of a conventional speech recognition system.

2010 ◽  
Vol 44-47 ◽  
pp. 1422-1426
Author(s):  
Mei Juan Gao ◽  
Zhi Xin Yang

In this paper, based on the study of two speech recognition algorithms, two designs of speech recognition system are given to realize this isolated speech recognition mobile robot control system based on ARM9 processor. The speech recognition process includes pretreatment of speech signal, characteristic extrication, pattern matching and post-processing. Mel-Frequency cepstrum coefficients (MFCC) and linear prediction cepstrum coefficients (LPCC) are the two most common parameters. Through analysis and comparison the parameters, MFCC shows more noise immunity than LPCC, so MFCC is selected as the characteristic parameters. Both dynamic time warping (DTW) and hidden markov model (HMM) are commonly used algorithm. For the different characteristics of DTW and HMM recognition algorithm, two different programs were designed for mobile robot control system. The effect and speed of the two speech recognition system were analyzed and compared.


2009 ◽  
Vol 2 (4) ◽  
pp. 67-80 ◽  
Author(s):  
Mohamed Ali ◽  
Moustafa Elshafei ◽  
Mansour Al-Ghamdi ◽  
Husni Al-Muhtaseb

Phonetic dictionaries are essential components of large-vocabulary speaker-independent speech recognition systems. This paper presents a rule-based technique to generate phonetic dictionaries for a large vocabulary Arabic speech recognition system. The system used conventional Arabic pronunciation rules, common pronunciation rules of Modern Standard Arabic, as well as some common dialectal cases. The paper gives in detail an explanation of these rules as well as their formal mathematical presentation. The rules were used to generate a dictionary for a 5.4 hour corpus of broadcast news. The rules and the phone set were tested and evaluated on an Arabic speech recognition system. The system was trained on 4.3 hours of the 5.4 hours of Arabic broadcast news corpus and tested on the remaining 1.1 hours. The phonetic dictionary contains 23,841 definitions corresponding to about 14232 words. The language model contains both bi-grams and tri-grams. The Word Error Rate (WER) came to 9.0%.


2013 ◽  
Vol 416-417 ◽  
pp. 1156-1159
Author(s):  
Bo Nian Yi

Speech recognition technology is one of the hottest and the most promising new information technologies in the world. This paper studied the voice pretreatment and extractions of MFCC characteristic parameters, constructed speech keywords recognition algorithm with the core of the VQ model and the HMM model, using MATLAB to complete the training and simulation of algorithm, FPGA-based voice recognition technology, and the simulation and implementation of its hardware and software. It laid the foundation for the realization of speech recognition and control based FPGA.


2019 ◽  
Vol 55 (2) ◽  
pp. 183-209
Author(s):  
Danijel Koržinek ◽  
Krzysztof Wołk ◽  
Łukasz Brocki ◽  
Krzysztof Marasek

Abstract This paper describes an automatic transcription system for the Polish Newsreel, which is a collection of mid to late 20th century news segments presented in audio and video form. They are characterized by their use of archaic language and poor audio quality, which makes them a demanding problem for speech recognition systems. Acoustic and language models had to be retrained using data from in-domain corpora. During the adaptation of the models, experiments were carried out to select optimal adaptation parameters. The experiments showed that the adaptation of the speech recognition system to a narrow and clearly defined domain significantly increases its efficiency. The final word error rate obtained for this domain was 10.97%.


Author(s):  
Xiaoli Lu ◽  
Mohd Asif Shah

Background: Human-computer interaction plays a vital role through Natural Language Conversational Interfaces to improve the usage of computers. Speech recognition technology allows the machine to understand human language. A speech recognition algorithm is used to achieve this function. Methodology: This paper is mainly based on the fundamental theoretical research of speech signals, establishes the HMM model, uses speech collection, recognition, and other methods, simulates on MATLAB, and integrates the recognition system ported to ARM for debugging and running to realize the embedded speech recognition function based on HMM under the ARM platform. Conclusion: The conclusion shows that the HMM-based embedded unspecific continuous English speech recognition system has high recognition accuracy and fast speed.


2021 ◽  
Vol 2066 (1) ◽  
pp. 012046
Author(s):  
Yuanyi Chen

Abstract As one of the core algorithms of machine vision, the mobile image multi-label recognition algorithm has received extensive attention from researchers in recent years and has been widely used in cutting-edge fields such as deep learning framework paddlepaddle platform, video surveillance, intelligent robots, and unmanned aerial vehicles. However, the existing recognition algorithms are not completely satisfied with the practical application in life and production. Due to the complexity of the platform environment, they can often only propose specific solutions based on existing problems, and there is no universal algorithm that is suitable for all kinds of Complex environment. The purpose of this paper is to study the multi-label recognition algorithm of moving images based on PaddlePaddle platform. This research mainly analyzes and researches the mobile image multi-tag space deployment plan and the multi-tag recognition algorithm, and further improves the tag reading rate and recognition reliability of the mobile image on the PaddlePaddle platform. This research first analyzes several key factors that affect the performance of UHF recognition system, considers the improvement plan of PaddlePaddle platform’s mobile image multi-tag recognition algorithm from the two aspects of space diversity and frequency diversity, and finally determines the multiple The label space diversity scheme, and the introduction of a multi-label optimization recognition algorithm to improve the recognition efficiency of the PaddlePaddle platform’s mobile image multi-label. Experimental data shows that the reading rate can reach 0.907 when identifying 300 tags in the experiment, and when the number of tags is greater than 300, the reading rate is close to 1, which verifies that the algorithm proposed in this paper is used in the multi-tag recognition of moving images on the PaddlePaddle platform.


2021 ◽  
Author(s):  
Yuji miao ◽  
Yanan Huang ◽  
Zhenjing Da

Abstract In order to improve the effect of English speech recognition, based on digital means, this paper combines the actual needs of English speech feature recognition to improve the digital algorithm. Moreover, this paper combines fuzzy recognition algorithm to analyze English speech features, and analyzes the shortcomings of traditional algorithms, and proposes the fuzzy digitized English speech recognition algorithm, and builds an English speech feature recognition model on this basis. In addition, this paper conducts time-frequency analysis on chaotic signals and speech signals, eliminates noise in English speech features, improves the recognition effect of English speech features, and builds an English speech feature recognition system based on digital means. Finally, this paper conducts grouping experiments by inputting students' English pronunciation forms, and counts the results of the experiments to test the performance of the system. The research results show that the method proposed in this paper has a certain effect.


2019 ◽  
Vol 2 (2) ◽  
pp. 149-153
Author(s):  
Zulkarnaen Hatala

Dipaparkan prosedur untuk mengembangkan Sistem Pengenalan Suara otomatis, Automatic Speech Recognition System (ASR) untuk kasus online recognition. Prosedur ini  secara cepat dan efisien membangun ASR menggunakan Hidden Markov Toolkit (HTK). Langkah-langkah praktis ini dipaparkan secara jelas untuk mengimplementasikan ASR dengan daftar kata sedikit (Small Vocabulary) dalam contoh kasus pengenalan digit Bahasa Indonesia. Dijelaskan beberapa teknik meningkatkan performansi seperti cara mengatasi noise, pengejaan ganda dan penerapan Principle Component Analysis. Hasil akhir berupa Word Error Rate


2019 ◽  
Vol 8 (2S11) ◽  
pp. 2350-2352

the dissimilarity in recognizing the word sequence and their ground truth in different channels can be absorbed by implementing Automatic Speech Recognition which is the standard evaluation metric and is encountered with the phenomena of Word Error Rate for various measures. In the model of 1ch, the track is trained without any preprocessing and study on multichannel end-to-end Automatic Speech Recognition envisaged that the function can be integrated into (Deep Neural network) – based system and lead to multiple experimental results. More so, when the Word Error Rate (WER) is not directly differentiable, it is pertinent to adopt Encoder – Decoder gradient objective function which has been clear in CHiME-4 system. In this study, we examine that the sequence level evaluation metric is a fair choice for optimizing Encoder – Decoder model for which many training algorithms is designed to reduce sequence level error. The study incorporates the scoring of multiple hypotheses in decoding stage for improving the decoding result to optimum. By this, the mismatch between the objectives is resulted in a feasible form to the maxim. Hence, the study finds the result of voice recognition which is most effective for adaptation.


Sign in / Sign up

Export Citation Format

Share Document