Stacked auto-encoder for ASR error detection and word error rate prediction

Author(s):  
Shahab Jalalvand ◽  
Daniele Falavigna
Kerntechnik ◽  
2021 ◽  
Vol 86 (6) ◽  
pp. 470-477
Author(s):  
M. Farcasiu ◽  
C. Constantinescu

Abstract This paper provides the empirical basis to support predictions of the Human Factor Engineering (HFE) influences in Human Reliability Analysis (HRA). A few methods were analyzed to identify HFE concepts in approaches of Performance Shaping Factors (PSFs): Technique for Human Error Rate Prediction (THERP), Human Cognitive Reliability (HCR) and Cognitive Reliability and Error Analysis Method (CREAM), Success Likelihood Index Method (SLIM) Plant Analysis Risk – Human Reliability Analysis (SPAR-H), A Technique for Human Error Rate Prediction (ATHEANA) and Man-Machine-Organization System Analysis (MMOSA). Also, in order to identify other necessary PSFs in HFE, an additional investigation process of human performance (HPIP) in event occurrences was used. Thus, the human error probability could be reduced and its evaluating can give out the information for error detection and recovery. The HFE analysis model developed using BHEP values (maximum and pessimistic) is based on the simplifying assumption that all specific circumstances of HFE characteristics are equal in importance and have the same value of influence on human performance. This model is incorporated into the PSA through the HRA methodology. Finally, a clarification of the relationships between task analysis and the HFE is performed, ie between potential human errors and design requirements.


Sensors ◽  
2021 ◽  
Vol 21 (9) ◽  
pp. 3063
Author(s):  
Aleksandr Laptev ◽  
Andrei Andrusenko ◽  
Ivan Podluzhny ◽  
Anton Mitrofanov ◽  
Ivan Medennikov ◽  
...  

With the rapid development of speech assistants, adapting server-intended automatic speech recognition (ASR) solutions to a direct device has become crucial. For on-device speech recognition tasks, researchers and industry prefer end-to-end ASR systems as they can be made resource-efficient while maintaining a higher quality compared to hybrid systems. However, building end-to-end models requires a significant amount of speech data. Personalization, which is mainly handling out-of-vocabulary (OOV) words, is another challenging task associated with speech assistants. In this work, we consider building an effective end-to-end ASR system in low-resource setups with a high OOV rate, embodied in Babel Turkish and Babel Georgian tasks. We propose a method of dynamic acoustic unit augmentation based on the Byte Pair Encoding with dropout (BPE-dropout) technique. The method non-deterministically tokenizes utterances to extend the token’s contexts and to regularize their distribution for the model’s recognition of unseen words. It also reduces the need for optimal subword vocabulary size search. The technique provides a steady improvement in regular and personalized (OOV-oriented) speech recognition tasks (at least 6% relative word error rate (WER) and 25% relative F-score) at no additional computational cost. Owing to the BPE-dropout use, our monolingual Turkish Conformer has achieved a competitive result with 22.2% character error rate (CER) and 38.9% WER, which is close to the best published multilingual system.


Author(s):  
Cosmin Munteanu ◽  
Gerald Penn ◽  
Ron Baecker ◽  
Elaine Toms ◽  
David James
Keyword(s):  

ReCALL ◽  
2004 ◽  
Vol 16 (1) ◽  
pp. 173-188 ◽  
Author(s):  
YASUSHI TSUBOTA ◽  
MASATAKE DANTSUJI ◽  
TATSUYA KAWAHARA

We have developed an English pronunciation learning system which estimates the intelligibility of Japanese learners' speech and ranks their errors from the viewpoint of improving their intelligibility to native speakers. Error diagnosis is particularly important in self-study since students tend to spend time on aspects of pronunciation that do not noticeably affect intelligibility. As a preliminary experiment, the speech of seven Japanese students was scored from 1 (hardly intelligible) to 5 (perfectly intelligible) by a linguistic expert. We also computed their error rates for each skill. We found that each intelligibility level is characterized by its distribution of error rates. Thus, we modeled each intelligibility level in accordance with its error rate. Error priority was calculated by comparing students' error rate distributions with that of the corresponding model for each intelligibility level. As non-native speech is acoustically broader than the speech of native speakers, we developed an acoustic model to perform automatic error detection using speech data obtained from Japanese students. As for supra-segmental error detection, we categorized errors frequently made by Japanese students and developed a separate acoustic model for that type of error detection. Pronunciation learning using this system involves two phases. In the first phase, students experience virtual conversation through video clips. They receive an error profile based on pronunciation errors detected during the conversation. Using the profile, students are able to grasp characteristic tendencies in their pronunciation errors which in effect lower their intelligibility. In the second phase, students practise correcting their individual errors using words and short phrases. They then receive information regarding the errors detected during this round of practice and instructions for correcting the errors. We have begun using this system in a CALL class at Kyoto University. We have evaluated system performance through the use of questionnaires and analysis of speech data logged in the server, and will present our findings in this paper.


2020 ◽  
Vol 6 (12) ◽  
pp. 141
Author(s):  
Abdelrahman Abdallah ◽  
Mohamed Hamada ◽  
Daniyar Nurseitov

This article considers the task of handwritten text recognition using attention-based encoder–decoder networks trained in the Kazakh and Russian languages. We have developed a novel deep neural network model based on a fully gated CNN, supported by multiple bidirectional gated recurrent unit (BGRU) and attention mechanisms to manipulate sophisticated features that achieve 0.045 Character Error Rate (CER), 0.192 Word Error Rate (WER), and 0.253 Sequence Error Rate (SER) for the first test dataset and 0.064 CER, 0.24 WER and 0.361 SER for the second test dataset. Our proposed model is the first work to handle handwriting recognition models in Kazakh and Russian languages. Our results confirm the importance of our proposed Attention-Gated-CNN-BGRU approach for training handwriting text recognition and indicate that it can lead to statistically significant improvements (p-value < 0.05) in the sensitivity (recall) over the tests dataset. The proposed method’s performance was evaluated using handwritten text databases of three languages: English, Russian, and Kazakh. It demonstrates better results on the Handwritten Kazakh and Russian (HKR) dataset than the other well-known models.


Sign in / Sign up

Export Citation Format

Share Document