Articulatory Features for Multilingual Phone Recognition

The paper considers some articulatory features of allophones of the vowel /i/ in the Altai-Kizhi dialect (spoken in the locality Ust-Kan, Altai) of the Altai language visualized by magnetic resonance imaging (MRI). The Altai-Kizhi is the central basic dialect of the Altai literary language. In Altai, each rural locality represents a unique dialect, whose relevance of studying was emphasized by V. V. Radlov. Speech sounds of the /i/-type in the dialects of the Altai language are realized mainly as front variants with different degrees of openness. In the written Altai speech, the symbol “и” is used to denote narrow front non-labialized vowel; some variants of the Altai vowel /i/ are central-back differing in this from the Russian vowel /i/. Experimental data on the territorial dialects of the Altai-Kizhi dialect, obtained from its 6 native speakers (d1-d6) taking into account variable inherent palate height, shows both the common articulation bases of native speakers (clearly-expressed frontness) and their differences (variable openness).

Download Full-text

Approaches for Multilingual Phone Recognition in Code-switched and Non-code-switched Scenarios Using Indian Languages

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3437256 ◽

2021 ◽

Vol 20 (4) ◽

pp. 1-19

Author(s):

Manjunath K. E. ◽

Srinivasa Raghavan K. M. ◽

K. Sreenivasa Rao ◽

Dinesh Babu Jayagopi ◽

V. Ramasubramanian

Keyword(s):

Deep Neural Networks ◽

State Of The Art ◽

Window Size ◽

Recognition System ◽

Error Rates ◽

Indian Languages ◽

International Phonetic Alphabet ◽

Phone Recognition ◽

Front End ◽

Recognition Systems

In this study, we evaluate and compare two different approaches for multilingual phone recognition in code-switched and non-code-switched scenarios. First approach is a front-end Language Identification (LID)-switched to a monolingual phone recognizer (LID-Mono), trained individually on each of the languages present in multilingual dataset. In the second approach, a common multilingual phone-set derived from the International Phonetic Alphabet (IPA) transcription of the multilingual dataset is used to develop a Multilingual Phone Recognition System (Multi-PRS). The bilingual code-switching experiments are conducted using Kannada and Urdu languages. In the first approach, LID is performed using the state-of-the-art i-vectors. Both monolingual and multilingual phone recognition systems are trained using Deep Neural Networks. The performance of LID-Mono and Multi-PRS approaches are compared and analysed in detail. It is found that the performance of Multi-PRS approach is superior compared to more conventional LID-Mono approach in both code-switched and non-code-switched scenarios. For code-switched speech, the effect of length of segments (that are used to perform LID) on the performance of LID-Mono system is studied by varying the window size from 500 ms to 5.0 s, and full utterance. The LID-Mono approach heavily depends on the accuracy of the LID system and the LID errors cannot be recovered. But, the Multi-PRS system by virtue of not having to do a front-end LID switching and designed based on the common multilingual phone-set derived from several languages, is not constrained by the accuracy of the LID system, and hence performs effectively on code-switched and non-code-switched speech, offering low Phone Error Rates than the LID-Mono system.

Download Full-text