Joint versus independent phonological feature models within CRF phone recognition

English is used as a lingua franca in most parts of the world (Ozaki, 2011). However, problems and issues related to learning English are country specific (Nagamine, 2011), because most of the difficulties in foreign language learning arise from L1 interference (Flege, 1995). Since this study focuses on acoustic analysis of a phonological feature of Pakistan English (PakE), we outline the historical background of the issue very briefly. Pakistan is a linguistically rich country. More than 70 languages are spoken in Pakistan (Rahman, 1996). Saraiki, Balochi, Sindhi, Punjabi and Pashto are the major indigenous languages of the country. More than 90% of the total population speaks these languages. Pakistan came into being in 1947. It inherited English as a language of education, law, the judiciary and media from the British colonial masters. The British rulers also used the English language in India for official correspondence. Therefore, English became a very effective tool and symbol of power in the subcontinent. As a result, people of the subcontinent feel pride in learning English. Although the colonial period has ended and the English rulers have departed to their homeland, English still remains the language of ruling elite in Pakistan and India.

Download Full-text

Approaches for Multilingual Phone Recognition in Code-switched and Non-code-switched Scenarios Using Indian Languages

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3437256 ◽

2021 ◽

Vol 20 (4) ◽

pp. 1-19

Author(s):

Manjunath K. E. ◽

Srinivasa Raghavan K. M. ◽

K. Sreenivasa Rao ◽

Dinesh Babu Jayagopi ◽

V. Ramasubramanian

Keyword(s):

Deep Neural Networks ◽

State Of The Art ◽

Window Size ◽

Recognition System ◽

Error Rates ◽

Indian Languages ◽

International Phonetic Alphabet ◽

Phone Recognition ◽

Front End ◽

Recognition Systems

In this study, we evaluate and compare two different approaches for multilingual phone recognition in code-switched and non-code-switched scenarios. First approach is a front-end Language Identification (LID)-switched to a monolingual phone recognizer (LID-Mono), trained individually on each of the languages present in multilingual dataset. In the second approach, a common multilingual phone-set derived from the International Phonetic Alphabet (IPA) transcription of the multilingual dataset is used to develop a Multilingual Phone Recognition System (Multi-PRS). The bilingual code-switching experiments are conducted using Kannada and Urdu languages. In the first approach, LID is performed using the state-of-the-art i-vectors. Both monolingual and multilingual phone recognition systems are trained using Deep Neural Networks. The performance of LID-Mono and Multi-PRS approaches are compared and analysed in detail. It is found that the performance of Multi-PRS approach is superior compared to more conventional LID-Mono approach in both code-switched and non-code-switched scenarios. For code-switched speech, the effect of length of segments (that are used to perform LID) on the performance of LID-Mono system is studied by varying the window size from 500 ms to 5.0 s, and full utterance. The LID-Mono approach heavily depends on the accuracy of the LID system and the LID errors cannot be recovered. But, the Multi-PRS system by virtue of not having to do a front-end LID switching and designed based on the common multilingual phone-set derived from several languages, is not constrained by the accuracy of the LID system, and hence performs effectively on code-switched and non-code-switched speech, offering low Phone Error Rates than the LID-Mono system.

Download Full-text