pronunciation variation
Recently Published Documents


TOTAL DOCUMENTS

77
(FIVE YEARS 6)

H-INDEX

11
(FIVE YEARS 0)

Author(s):  
Yanhua Long ◽  
Shuang Wei ◽  
Jie Lian ◽  
Yijie Li

AbstractCode-switching (CS) refers to the phenomenon of using more than one language in an utterance, and it presents great challenge to automatic speech recognition (ASR) due to the code-switching property in one utterance, the pronunciation variation phenomenon of the embedding language words and the heavy training data sparse problem. This paper focuses on the Mandarin-English CS ASR task. We aim at dealing with the pronunciation variation and alleviating the sparse problem of code-switches by using pronunciation augmentation methods. An English-to-Mandarin mix-language phone mapping approach is first proposed to obtain a language-universal CS lexicon. Based on this lexicon, an acoustic data-driven lexicon learning framework is further proposed to learn new pronunciations to cover the accents, mis-pronunciations, or pronunciation variations of those embedding English words. Experiments are performed on real CS ASR tasks. Effectiveness of the proposed methods are examined on all of the conventional, hybrid, and the recent end-to-end speech recognition systems. Experimental results show that both the learned phone mapping and augmented pronunciations can significantly improve the performance of code-switching speech recognition.


2021 ◽  
Vol 7 (3) ◽  
pp. 136-143
Author(s):  
Balakrishnan Sivakumar ◽  
Praveen Kadakola Biligirirangaiah

In order to improve the recognition performance, the articulation of the transcription is very important in the process of training. For continuous speech, the essential characteristics of various speakers are pronunciation variation, over focused or inadequately highlighted words can results the waveform misalignment in the sub word unit margin. Because of the deviation in the articulation leads into misalignment when this is compared with articulation dictionary. So the deletion or insertion of the sub word is necessary. This happens because for each expression, the transcription is not precise. This paper presents the corrections in the transcription at the sub word level utilizing sound prompts that are presented in the waveform. The transcription of a word is fixed Utilizing sentence-level transcriptions with reference to the phonemes that create the word. Specifically, it clarifies that vowels are either deleted or inserted. To help the proposed contention, errors in persistent discourse are validated utilizing machine learning and signal processing tools. A programmed information driven annotator abusing the inductions drawn from the examination is utilized to address transcription errors. The outcomes show that rectified pronunciations lead to higher probability for train expressions in the TIMIT corpus.


2020 ◽  
Vol 54 (4) ◽  
pp. 975-998
Author(s):  
Eiman Alsharhan ◽  
Allan Ramsay

Abstract Research in Arabic automatic speech recognition (ASR) is constrained by datasets of limited size, and of highly variable content and quality. Arabic-language resources vary in the attributes that affect language resources in other languages (noise, channel, speaker, genre), but also vary significantly in the dialect and level of formality of the spoken Arabic they capture. Many languages suffer similar levels of cross-dialect and cross-register acoustic variability, but these effects have been under-studied. This paper is an experimental analysis of the interaction between classical ASR corpus-compensation methods (feature selection, data selection, gender-dependent acoustic models) and the dialect-dependent/register-dependent variation among Arabic ASR corpora. The first interaction studied in this paper is that between acoustic recording quality and discrete pronunciation variation. Discrete pronunciation variation can be compensated by using grapheme-based instead of phone-based acoustic models, and by filtering out speakers with insufficient training data; the latter technique also helps to compensate for poor recording quality, which is further compensated by eliminating delta-delta acoustic features. All three techniques, together, reduce Word Error Rate (WER) by between 3.24% and 5.35%. The second aspect of dialect and register variation to be considered is variation in the fine-grained acoustic pronunciations of each phoneme in the language. Experimental results prove that gender and dialect are the principal components of variation in speech, therefore, building gender and dialect-specific models leads to substantial decreases in WER. In order to further explore the degree of acoustic differences between phone models required for each of the dialects of Arabic, cross-dialect experiments are conducted to measure how far apart Arabic dialects are acoustically in order to make a better decision about the minimal number of recognition systems needed to cover all dialectal Arabic. Finally, the research addresses an important question: how much training data is needed for building efficient speaker-independent ASR systems? This includes developing some learning curves to find out how large must the training set be to achieve acceptable performance.


2020 ◽  
pp. 72-79
Author(s):  
Ibrahim El El-Henawy ◽  
◽  
◽  
Marwa Abo Abo-Elazm

Arabic is one of the phonetically complex languages, and the creation of accurate speech recognition system is a challengeable task. Phonetic dictionary is essential component in automatic speech recognition system (ASR). The pronunciation variations in Arabic are tangible and are investigated widely using data driven approach or knowledge based approach. The phonological rules are used to get the pronunciation of each word accurately to reduce the mismatch between the actual phoneme representation of the spoken words and ASR dictionary. Several studies in Arabic ASR system are conducted using different number of phonological rules. In this paper we focus on those rule that handle within-word pronunciation variation and cross-word pronunciation variation. The experimental results indicate that handling within-word pronunciation variation using phonological rule doesn’t enhance the recognition performance, but using these rules to handle cross-word variation provide a good performance.


2018 ◽  
Vol 10 (4) ◽  
pp. 111-119 ◽  
Author(s):  
Jeong-Uk Bang ◽  
Sang-Hun Kim ◽  
Oh-Wook Kwon

2015 ◽  
Vol 10 (3) ◽  
pp. 313-338 ◽  
Author(s):  
Clara Cohen

A small but growing body of research on English and Dutch has found that pronunciation of affixes in a word form is sensitive to paradigmatic probability – i.e., the probability of using that form over other words in the same morphological paradigm. Yet it remains unclear (a) how paradigmatic probability is best measured; (b) whether an increase in paradigmatic probability leads to phonetic enhancement or reduction; and (c) by what mechanism paradigmatic probability can affect pronunciation. The current work examines pronunciation variation of Russian verbal agreement suffixes. I show that there are two distinct patterns of variation, corresponding to two different measures of paradigmatic probability. One measure, pairwise paradigmatic probability, is associated with a pronunciation pattern that resembles phonetic enhancement. The second measure, lexeme paradigmatic probability, can show enhancement effects, but can also yield reduction effects more similar to those of contextual probability. I propose that these two patterns can be explained in an exemplar model of lexical storage. Reduction effects are the consequence of faster retrieval and encoding of an articulatory target, while effects that resemble enhancement result when the pronunciation target of both members of a pair of competing word forms is shifted towards the more frequent of two.


Sign in / Sign up

Export Citation Format

Share Document