scholarly journals Does morphological complexity affect word segmentation? Evidence from computational modeling

2021 ◽  
Author(s):  
Georgia Loukatou ◽  
Sabine Stoll ◽  
Damián Ezequiel Blasi ◽  
Alejandrina Cristia

How can infants detect where words or morphemes start and end in the continuous stream of speech? Previous computational studies have investigated this question mainly for English, where morpheme and word boundaries are often isomorphic. Yet in many languages, words are often multimorphemic, such that word and morpheme boundaries do not align. Our study employed corpora of two languages that differ in the complexity of inflectional morphology, Chintang (Sino-Tibetan) and Japanese (in Experiment 1), as well as corpora of artificial languages ranging in morphological complexity, as measured by the ratio and distribution of morphemes per word (in Experiments 2 and 3). We used two baselines and three conceptually diverse word segmentation algorithms, two of which rely purely on sublexical information using distributional cues, and one that builds a lexicon. The algorithms’ performance was evaluated on both word- and morpheme-level representations of the corpora.Segmentation results were better for the morphologically simpler languages than for the morphologically more complex languages, in line with the hypothesis that languages with greater inflectional complexity could be more difficult to segment into words. We further show that the effect of morphological complexity is relatively small, compared to that of algorithm and evaluation level. We therefore recommend that infant researchers look for signatures of the different segmentation algorithms and strategies, before looking for differences in infant segmentation landmarks across languages varying in complexity.

Cognition ◽  
2022 ◽  
Vol 220 ◽  
pp. 104960
Author(s):  
Georgia Loukatou ◽  
Sabine Stoll ◽  
Damian Blasi ◽  
Alejandrina Cristia

2013 ◽  
Vol 41 (2) ◽  
pp. 439-461 ◽  
Author(s):  
F. NIHAN KETREZ

ABSTRACTPrevious studies on the role of vowel harmony in word segmentation are based on artificial languages where harmonic cues reliably signal word boundaries. In this corpus study run on the data available at CHILDES, we investigated whether natural languages provide a learner with reliable segmentation cues similar to the ones created artificially. We observed that in harmonic languages (child-directed speech to thirty-five Turkish and three Hungarian children), but not in non-harmonic ones (child-directed speech to one Farsi and four Polish children), harmonic vowel sequences are more likely to appear within words, and non-harmonic ones mostly appear across word boundaries, suggesting that natural harmonic languages provide a learner with regular cues that could potentially be used for word segmentation along with other cues.


2021 ◽  
pp. 1-28
Author(s):  
Laia FIBLA ◽  
Nuria SEBASTIAN-GALLES ◽  
Alejandrina CRISTIA

Abstract Since there are no systematic pauses delimiting words in speech, the problem of word segmentation is formidable even for monolingual infants. We use computational modeling to assess whether word segmentation is substantially harder in a bilingual than a monolingual setting. Seven algorithms representing different cognitive approaches to segmentation are applied to transcriptions of naturalistic input to young children, carefully processed to generate perfectly matched monolingual and bilingual corpora. We vary the overlap in phonology and lexicon experienced by modeling exposure to languages that are more similar (Catalan and Spanish) or more different (English and Spanish). We find that the greatest variation in performance is due to different segmentation algorithms and the second greatest to language, with bilingualism having effects that are smaller than both algorithm and language effects. Implications of these computational results for experimental and modeling approaches to language acquisition are discussed.


Entropy ◽  
2020 ◽  
Vol 22 (3) ◽  
pp. 275
Author(s):  
Igor A. Bessmertny ◽  
Xiaoxi Huang ◽  
Aleksei V. Platonov ◽  
Chuqiao Yu ◽  
Julia A. Koroleva

Search engines are able to find documents containing patterns from a query. This approach can be used for alphabetic languages such as English. However, Chinese is highly dependent on context. The significant problem of Chinese text processing is the missing blanks between words, so it is necessary to segment the text to words before any other action. Algorithms for Chinese text segmentation should consider context; that is, the word segmentation process depends on other ideograms. As the existing segmentation algorithms are imperfect, we have considered an approach to build the context from all possible n-grams surrounding the query words. This paper proposes a quantum-inspired approach to rank Chinese text documents by their relevancy to the query. Particularly, this approach uses Bell’s test, which measures the quantum entanglement of two words within the context. The contexts of words are built using the hyperspace analogue to language (HAL) algorithm. Experiments fulfilled in three domains demonstrated that the proposed approach provides acceptable results.


2017 ◽  
Author(s):  
Caitlin Garcia ◽  
Gina Iozzo ◽  
Katie Lamirato ◽  
James Ledoux ◽  
Jesse Mu ◽  
...  

We replicated Exp. 1 of Saffran, Newport, & Aslin (1996) Word segmentation: The role of distributional Cues, Journal of Memory and Language, 35, 606-621, as part of a multi-year project to replicate every published adult statistical word segmentation study. Despite a much larger sample than the original (101 subjects vs. 24), evidence of successful segmentation was weak and mixed, and none of the item or condition effects replicated. We consider whether this is more likely to be a failure of replication or a failure of generalization (e.g., to a different population).


1999 ◽  
Vol 122 (2) ◽  
pp. 138-146 ◽  
Author(s):  
L. Nguyen ◽  
C. Quentin ◽  
W. Lee ◽  
S. Bayyuk ◽  
S. A. Bidstrup-Allen ◽  
...  

This paper presents, discusses, and compares results from experimental and computational studies of the plastic encapsulation process for a 144-lead TQFP package. The experimental results were obtained using an instrumented molding press, while the computational predictions were obtained using a newly-developed software for modeling transfer molding processes. Validation of the software is emphasized, and this was done mainly by comparing the computational results with the corresponding experimental measurements for pressure, temperature, and flow front advancement in the cavities and runners. The experimental and computational results were found to be in good agreement, especially for the flow-front shapes and locations. [S1043-7398(00)00502-8]


2016 ◽  
Vol 35 (1) ◽  
pp. 99-119 ◽  
Author(s):  
Vaclav Brezina ◽  
Gabriele Pallotti

Morphological complexity (MC) is a relatively new construct in second language acquisition (SLA). After critically discussing existing approaches to calculating MC in first- and second-language acquisition research, this article presents a new operationalization of the construct, the Morphological Complexity Index (MCI). The MCI is applied in two case studies based on argumentative written texts produced by native and non-native speakers of Italian and English. Study 1 shows that morphological complexity varies between native and non-native speakers of Italian, and that it is significantly lower in learners with lower proficiency levels. The MCI is strongly correlated to proficiency, measured with a C-test, and also shows significant correlations with other measures of linguistic complexity, such as lexical diversity and sentence length. Quite a different picture emerges from Study 2, on advanced English learners. Here, morphological complexity remains constant across natives and non-natives, and is not significantly correlated to other text complexity measures. These results point to the fact that morphological complexity in texts is a function of speakers’ proficiency and the specific language under investigation; for some linguistic systems with a relatively simple inflectional morphology, such as English, learners will soon reach a threshold level after which inflectional diversity remains constant.


2021 ◽  
Vol 12 ◽  
Author(s):  
Mingjing Chen ◽  
Yongsheng Wang ◽  
Bingjie Zhao ◽  
Xin Li ◽  
Xuejun Bai

In alphabetic writing systems (such as English), the spaces between words mark the word boundaries, and the basic unit of reading is distinguished during visual-level processing. The visual-level information of word boundaries facilitates reading. Chinese is an ideographic language whose text contains no intrinsic inter-word spaces as the marker of word boundaries. Previous studies have shown that the basic processing unit of Chinese reading is also a word. However, findings remain inconsistent regarding whether inserting spaces between words in Chinese text promotes reading performance. Researchers have proposed that there may be a trade-off between format familiarity and the facilitation effect of inter-word spaces. In order to verify this, this study manipulated the format familiarity via reversing the Chinese reading direction from right to left to investigate this issue in Experiment 1 and Experiment 2. The purpose of Experiment 1 was to examine whether inter-word spaces facilitated Chinese reading in an unfamiliar format. Experiment 1 was conducted that 40 native Chinese undergraduates read Chinese sentences from right to left on four format conditions. The results showed faster reading speed and shorter total reading time for the inter-word spaced format. Based on this finding, Experiment 2 examined whether the facilitation effect of inter-word spaces would reduce or disappear after improving the format familiarity; this experiment was conducted that 40 native Chinese undergraduates who did not participate in Experiment 1 read Chinese sentences from right to left on four format conditions after ten-day reading training. There was no significant difference between the total reading time and reading speed in the inter-word spaced format and unspaced format, which suggests that the facilitation effect of inter-word spaces in Chinese reading changed smaller. The combined results of the two experiments suggest that there is indeed a trade-off between format familiarity and the facilitation of word segmentation, which supports the assumption of previous studies.


2019 ◽  
Author(s):  
Fabio Trecca ◽  
Dorthe Bleses ◽  
Anders Højen ◽  
Thomas O. Madsen ◽  
Morten H. Christiansen

Research has suggested that Danish-learning children lag behind in early language acquisition. The phenomenon has been attributed to the opaque phonetic structure of Danish, which features an unusually large number of non-consonantal sounds (i.e., vowels and semivowels/glides). The large amount of vocalic sounds in speech is thought to provide fewer cues to word segmentation and to make language processing harder, thus hindering the acquisition process. In this study, we explored whether the presence of vocalic sounds at word boundaries impedes real-time speech processing in 24-month-old Danish-learning children, compared to word boundaries that are marked by consonantal sounds. Using eye-tracking, we tested children’s real-time comprehension of known consonant-initial and vowel-initial words, when presented in either a consonant-final carrier phrase or in a vowel-final carrier phrase, thus resulting in the four boundary types C#C, C#V, V#C, and V#V. Our results showed that the presence of vocalic sounds around a word boundary—especially before—impedes processing of Danish child-directed sentences.


2003 ◽  
Vol 12 (06) ◽  
pp. 783-804 ◽  
Author(s):  
GERGELY TÍMÁR ◽  
KRISTÓF KARACS ◽  
CSABA REKECZKY

This report describes analogic algorithms used in the preprocessing and segmentation phase of offline handwriting recognition tasks. A segmentation-based handwriting recognition approach is discussed, i.e., the system attempts to segment the words into their constituent letters. In order to improve their speed, the utilized CNN algorithms, whenever possible, use dynamic, wave front propagation-based methods instead of relying on morphologic operators were embedded into iterative algorithms. The system first locates the handwritten lines in the page image, then corrects their skew as necessary. It then searches for the words within the lines and corrects the skew at the word level as well. A novel trigger wave-based word segmentation algorithm is presented, which operates on the skeletons of words. Sample results of experiments conducted on a database of 25 handwritten pages along with suggestions for future development are presented.


Sign in / Sign up

Export Citation Format

Share Document