Tusom2021: A Phonetically Transcribed Speech Dataset from an Endangered Language for Universal Phone Recognition Experiments

In this study, we evaluate and compare two different approaches for multilingual phone recognition in code-switched and non-code-switched scenarios. First approach is a front-end Language Identification (LID)-switched to a monolingual phone recognizer (LID-Mono), trained individually on each of the languages present in multilingual dataset. In the second approach, a common multilingual phone-set derived from the International Phonetic Alphabet (IPA) transcription of the multilingual dataset is used to develop a Multilingual Phone Recognition System (Multi-PRS). The bilingual code-switching experiments are conducted using Kannada and Urdu languages. In the first approach, LID is performed using the state-of-the-art i-vectors. Both monolingual and multilingual phone recognition systems are trained using Deep Neural Networks. The performance of LID-Mono and Multi-PRS approaches are compared and analysed in detail. It is found that the performance of Multi-PRS approach is superior compared to more conventional LID-Mono approach in both code-switched and non-code-switched scenarios. For code-switched speech, the effect of length of segments (that are used to perform LID) on the performance of LID-Mono system is studied by varying the window size from 500 ms to 5.0 s, and full utterance. The LID-Mono approach heavily depends on the accuracy of the LID system and the LID errors cannot be recovered. But, the Multi-PRS system by virtue of not having to do a front-end LID switching and designed based on the common multilingual phone-set derived from several languages, is not constrained by the accuracy of the LID system, and hence performs effectively on code-switched and non-code-switched speech, offering low Phone Error Rates than the LID-Mono system.

Download Full-text

Lexicography in endangered language communities

The Cambridge Handbook of Endangered Languages ◽

10.1017/cbo9780511975981.017 ◽

1999 ◽

pp. 337-353 ◽

Cited By ~ 4

Author(s):

Ulrike Mosel

Keyword(s):

Endangered Language

Download Full-text

Who can speak Lenape in Pennsylvania? Authentication and language learning in an endangered language community of practice

Language & Communication ◽

10.1016/j.langcom.2015.04.003 ◽

2016 ◽

Vol 47 ◽

pp. 124-134 ◽

Cited By ~ 6

Author(s):

Miranda Weinberg ◽

Haley De Korne

Keyword(s):

Language Learning ◽

Community Of Practice ◽

Language Community ◽

Endangered Language

Download Full-text

Language shift, bilingualism and the future of Britain's Celtic languages

Philosophical Transactions of the Royal Society B Biological Sciences ◽

10.1098/rstb.2010.0051 ◽

2010 ◽

Vol 365 (1559) ◽

pp. 3855-3864 ◽

Cited By ~ 45

Author(s):

Anne Kandler ◽

Roman Unger ◽

James Steele

Keyword(s):

Census Data ◽

Language Shift ◽

Past Century ◽

Transitional State ◽

Endangered Language ◽

Basic Model ◽

Vernacular Language ◽

Celtic Languages ◽

Shift Dynamics ◽

Demographic Trajectories

‘Language shift’ is the process whereby members of a community in which more than one language is spoken abandon their original vernacular language in favour of another. The historical shifts to English by Celtic language speakers of Britain and Ireland are particularly well-studied examples for which good census data exist for the most recent 100–120 years in many areas where Celtic languages were once the prevailing vernaculars. We model the dynamics of language shift as a competition process in which the numbers of speakers of each language (both monolingual and bilingual) vary as a function both of internal recruitment (as the net outcome of birth, death, immigration and emigration rates of native speakers), and of gains and losses owing to language shift. We examine two models: a basic model in which bilingualism is simply the transitional state for households moving between alternative monolingual states, and a diglossia model in which there is an additional demand for the endangered language as the preferred medium of communication in some restricted sociolinguistic domain, superimposed on the basic shift dynamics. Fitting our models to census data, we successfully reproduce the demographic trajectories of both languages over the past century. We estimate the rates of recruitment of new Scottish Gaelic speakers that would be required each year (for instance, through school education) to counteract the ‘natural wastage’ as households with one or more Gaelic speakers fail to transmit the language to the next generation informally, for different rates of loss during informal intergenerational transmission.

Download Full-text

Grammar of Skolt Saami

10.33341/sus.14 ◽

2015 ◽

Author(s):

Timothy Feist

Keyword(s):

Acoustic Analysis ◽

Speech Community ◽

Vowel Length ◽

Descriptive Grammar ◽

Case Marking ◽

Endangered Language ◽

The Subject ◽

Auxiliary Verb ◽

Northeast Finland

Skolt Saami is a Finno-Ugric language spoken primarily in northeast Finland by less than 300 people. The aim of this descriptive grammar is to provide an overview of all the major grammatical aspects of the language. It comprises descriptions of Skolt Saami phonology, morphophonology, morphology, morphosyntax and syntax. A compilation of interlinearised texts is provided in Chapter 11. Skolt Saami is a phonologically complex language, displaying contrastive vowel length, consonant gradation, suprasegmental palatalisation and vowel height alternations. It is also well known for being one of the few languages to display three distinctive degrees of quantity; indeed, this very topic has already been the subject of an acoustic analysis (McRobbie-Utasi 1999). Skolt Saami is also a morphologically complex language. Nominals in Skolt Saami belong to twelve different inflectional classes. They inflect for number and nine grammatical cases and may also mark possession, giving rise to over seventy distinct forms. Verbs belong to four different inflectional classes and inflect for person, number, tense and mood. Inflection is marked by suffixes, many of which are fused morphemes. Other typologically interesting features of the language, which are covered in this grammar, include (i) the existence of distinct predicative and attributive forms of adjectives, (ii) the case-marking of subject and object nominals which have cardinal numerals as determiners, and (iii) the marking of negation with a negative auxiliary verb. Skolt Saami is a seriously endangered language and it is thus hoped that this grammar will serve both as a tool to linguistic researchers and as an impetus to the speech community in any future revitalisation efforts.

Download Full-text

‘They are asking me why I am speaking Gagauz’: family language practices and the level of linguistic (in)security of adolescents speaking an endangered language

Journal of Multilingual and Multicultural Development ◽

10.1080/01434632.2022.2026365 ◽

2022 ◽

pp. 1-17

Author(s):

Gülin Dağdeviren-Kırmızı ◽

Kayhan İnan

Keyword(s):

Language Practices ◽

Endangered Language ◽

Family Language

Download Full-text

Documenting the Ikpana interrogative system

Journal of African Languages and Linguistics ◽

10.1515/jall-2021-2016 ◽

2021 ◽

Vol 42 (1) ◽

pp. 63-100

Author(s):

Jason Kandybowicz ◽

Bertille Baron Obi ◽

Philip T. Duncan ◽

Hironori Katsuda

Keyword(s):

Comprehensive Treatment ◽

Left Periphery ◽

Long Distance ◽

Southeastern Part ◽

Endangered Language ◽

Research Questions ◽

Volta Region ◽

Movement Asymmetry ◽

Syntactic Properties

Abstract This article provides a comprehensive treatment of the interrogative system of Ikpana (ISO 639-3: lgq), an endangered language spoken in the southeastern part of Ghana’s Volta region. The article features a description and analysis of both the morphosyntax and intonation of questions in the language. Polar questions in Ikpana are associated with dedicated prosodic patterns and may be segmentally marked. As for wh- interrogatives, Ikpana allows for optional wh- movement. Interrogative expressions may appear clause-internally in their base-generated positions or in the left periphery followed by one of two optionally droppable particles with distinct syntactic properties. In this way, wh- movement structures are either focus-marked constructions or cleft structures depending on the accompanying particle. We identify an interesting wh- movement asymmetry – unlike all other wh- movement structures, ‘how’ questions may not be formed via the focus-marked or cleft strategy. We document a number of other attested wh- structures in the language, including long-distance wh- movement, partial wh- movement, long-distance wh- in-situ, and multiple wh- questions. We argue that by allowing our documentation efforts to be shaped and guided by theoretically driven research questions, we reach deeper levels of description than would have been possible if approached from a purely descriptive-documentary perspective.

Download Full-text