scholarly journals Is there a bilingual disadvantage for word segmentation? A computational modeling approach

2021 ◽  
pp. 1-28
Author(s):  
Laia FIBLA ◽  
Nuria SEBASTIAN-GALLES ◽  
Alejandrina CRISTIA

Abstract Since there are no systematic pauses delimiting words in speech, the problem of word segmentation is formidable even for monolingual infants. We use computational modeling to assess whether word segmentation is substantially harder in a bilingual than a monolingual setting. Seven algorithms representing different cognitive approaches to segmentation are applied to transcriptions of naturalistic input to young children, carefully processed to generate perfectly matched monolingual and bilingual corpora. We vary the overlap in phonology and lexicon experienced by modeling exposure to languages that are more similar (Catalan and Spanish) or more different (English and Spanish). We find that the greatest variation in performance is due to different segmentation algorithms and the second greatest to language, with bilingualism having effects that are smaller than both algorithm and language effects. Implications of these computational results for experimental and modeling approaches to language acquisition are discussed.

2021 ◽  
Author(s):  
Georgia Loukatou ◽  
Sabine Stoll ◽  
Damián Ezequiel Blasi ◽  
Alejandrina Cristia

How can infants detect where words or morphemes start and end in the continuous stream of speech? Previous computational studies have investigated this question mainly for English, where morpheme and word boundaries are often isomorphic. Yet in many languages, words are often multimorphemic, such that word and morpheme boundaries do not align. Our study employed corpora of two languages that differ in the complexity of inflectional morphology, Chintang (Sino-Tibetan) and Japanese (in Experiment 1), as well as corpora of artificial languages ranging in morphological complexity, as measured by the ratio and distribution of morphemes per word (in Experiments 2 and 3). We used two baselines and three conceptually diverse word segmentation algorithms, two of which rely purely on sublexical information using distributional cues, and one that builds a lexicon. The algorithms’ performance was evaluated on both word- and morpheme-level representations of the corpora.Segmentation results were better for the morphologically simpler languages than for the morphologically more complex languages, in line with the hypothesis that languages with greater inflectional complexity could be more difficult to segment into words. We further show that the effect of morphological complexity is relatively small, compared to that of algorithm and evaluation level. We therefore recommend that infant researchers look for signatures of the different segmentation algorithms and strategies, before looking for differences in infant segmentation landmarks across languages varying in complexity.


2017 ◽  
Author(s):  
Jess Sullivan ◽  
Kathryn Davidson ◽  
Shirlene Wade ◽  
David Barner

When acquiring language, children must not only learn the meanings of words, but also how to interpret them in context. For example, children must learn both the logical semantics of the scalar quantifier some and its pragmatically enriched meaning: ‘some but not all’. Some studies have shown that this “scalar implicature” that some implies ‘some but not all’ poses a challenge even to nine-year-olds, while others find success by age three. We asked whether reports of children’s early successes might be due to the computation of exclusion inferences (like contrast or mutual exclusivity) rather than an ability to compute scalar implicatures. We found that young children (N=214; ages 4;0-7;11) sometimes prefer to compute symmetrical exclusion inferences rather than asymmetric scalar inferences when interpreting quantifiers. This suggests that some apparent successes in computing scalar implicature can actually be explained by less sophisticated exclusion inferences.


Entropy ◽  
2020 ◽  
Vol 22 (3) ◽  
pp. 275
Author(s):  
Igor A. Bessmertny ◽  
Xiaoxi Huang ◽  
Aleksei V. Platonov ◽  
Chuqiao Yu ◽  
Julia A. Koroleva

Search engines are able to find documents containing patterns from a query. This approach can be used for alphabetic languages such as English. However, Chinese is highly dependent on context. The significant problem of Chinese text processing is the missing blanks between words, so it is necessary to segment the text to words before any other action. Algorithms for Chinese text segmentation should consider context; that is, the word segmentation process depends on other ideograms. As the existing segmentation algorithms are imperfect, we have considered an approach to build the context from all possible n-grams surrounding the query words. This paper proposes a quantum-inspired approach to rank Chinese text documents by their relevancy to the query. Particularly, this approach uses Bell’s test, which measures the quantum entanglement of two words within the context. The contexts of words are built using the hyperspace analogue to language (HAL) algorithm. Experiments fulfilled in three domains demonstrated that the proposed approach provides acceptable results.


PLoS ONE ◽  
2017 ◽  
Vol 12 (6) ◽  
pp. e0178381 ◽  
Author(s):  
Hung Yi Kristal Kaan ◽  
Adelene Y. L. Sim ◽  
Siew Kim Joyce Tan ◽  
Chandra Verma ◽  
Haiwei Song

2010 ◽  
Vol 38 (1) ◽  
pp. 56-60 ◽  
Author(s):  
TANIA S. ZAMUNER

Within the subfields of linguistics, traditional approaches tend to examine different phenomena in isolation. As Stoel-Gammon (this issue) correctly states, there is little interaction between the subfields. However, for a more comprehensive understanding of language acquisition in general and, more specifically, lexical and phonological development, we must consider relations between multiple subfields. That is, by examining the interactions between these subfields, a greater understanding of lexical and phonological development can emerge. For instance, the interaction between phonology, syntax and semantics is demonstrated in recent work looking at how phonological patterns can provide a basis for inferring a word's lexical category (such as nouns and verbs) (Christiansen, Onnis & Hockema, 2009; Lany & Saffran, 2010).


Physiology ◽  
2017 ◽  
Vol 32 (3) ◽  
pp. 234-245 ◽  
Author(s):  
Joanna L. James ◽  
Lawrence W. Chamley ◽  
Alys R. Clark

The utero-placental circulation links the maternal and fetal circulations during pregnancy, ensuring adequate gas and nutrient exchange, and consequently fetal growth. However, our understanding of this circulatory system remains incomplete. Here, we discuss how the utero-placental circulation is established, how it changes dynamically during pregnancy, and how this may impact on pregnancy success, highlighting how we may address knowledge gaps through advances in imaging and computational modeling approaches.


2017 ◽  
Vol 24 ◽  
Author(s):  
Catherine E. Snow

The lessons I have learned over the last many years seem always to come in pairs – a lesson about the findings that brings with it a lesson about life as a researcher...Lesson 1. Even as a doctoral student, I believed that the sorts of social interactions young children had with adults supported language acquisition. In 1971, when I completed my dissertation, that was a minority view, and one ridiculed by many. I was, unfortunately, deflected from a full-on commitment to research on the relationship between social environment and language development for many years by the general atmosphere of disdain for such claims. In the intervening years, of course, evidence to support the claim has accumulated, and now it is generally acknowledged that a large part of the variance among children in language skills can be explained by their language environments. This consensus might have been achieved earlier had I and others been braver about pursuing it.[Download the PDF and read more...]


2021 ◽  
Vol 6 (1) ◽  
pp. 213
Author(s):  
Julia Nee

Long-format speech environment (LFSE) recordings are increasingly used to understand language acquisition among young children (Casillas & Cristia 2019). But in language revitalization, older children are sometimes the largest demographic acquiring a language. In Teotitlán del Valle, Mexico, older children have participated in Zapotec language revitalization workshops since 2017. To better understand how these children use language, and to probe whether the language workshops impact language use, I invited learners to collect LFSE recordings. This study addresses two main questions: (1) what methodological challenges emerge when children ages 6-12 collect LFSE data?; and (2) what do the data suggest about the effects of the Zapotec workshops? I argue that, while creating LFSE recordings with older children presents methodological challenges, the results are useful in highlighting the importance of not only teaching language skills, but of creating spaces where learners are comfortable using the Zapotec language.


Sign in / Sign up

Export Citation Format

Share Document