word boundaries
Recently Published Documents


TOTAL DOCUMENTS

174
(FIVE YEARS 27)

H-INDEX

20
(FIVE YEARS 1)

2021 ◽  
Author(s):  
Georgia Loukatou ◽  
Sabine Stoll ◽  
Damián Ezequiel Blasi ◽  
Alejandrina Cristia

How can infants detect where words or morphemes start and end in the continuous stream of speech? Previous computational studies have investigated this question mainly for English, where morpheme and word boundaries are often isomorphic. Yet in many languages, words are often multimorphemic, such that word and morpheme boundaries do not align. Our study employed corpora of two languages that differ in the complexity of inflectional morphology, Chintang (Sino-Tibetan) and Japanese (in Experiment 1), as well as corpora of artificial languages ranging in morphological complexity, as measured by the ratio and distribution of morphemes per word (in Experiments 2 and 3). We used two baselines and three conceptually diverse word segmentation algorithms, two of which rely purely on sublexical information using distributional cues, and one that builds a lexicon. The algorithms’ performance was evaluated on both word- and morpheme-level representations of the corpora.Segmentation results were better for the morphologically simpler languages than for the morphologically more complex languages, in line with the hypothesis that languages with greater inflectional complexity could be more difficult to segment into words. We further show that the effect of morphological complexity is relatively small, compared to that of algorithm and evaluation level. We therefore recommend that infant researchers look for signatures of the different segmentation algorithms and strategies, before looking for differences in infant segmentation landmarks across languages varying in complexity.


2021 ◽  
Vol 12 ◽  
Author(s):  
Gaisha Oralova ◽  
Victor Kuperman

Given that Chinese writing conventions lack inter-word spacing, understanding whether and how readers of Chinese segment regular unspaced Chinese writing into words is an important question for theories of reading. This study examined the processing outcomes of introducing spaces to written Chinese sentences in varying positions based on native speaker consensus. The measure of consensus for every character transition in our stimuli sentences was the percent of raters who placed a word boundary in that position. The eye movements of native readers of Chinese were recorded while they silently read original unspaced sentences and their experimentally manipulated counterparts for comprehension. We introduced two types of spaced sentences: one with spaces inserted at every probable word boundary (heavily spaced), and another with spaces placed only at highly probable word boundaries (lightly spaced). Linear mixed-effects regression models showed that heavily spaced sentences took identical time to read as unspaced ones despite the shortened fixation times on individual words (Experiment 1). On the other hand, reading times for lightly spaced sentences and words were shorter than those for unspaced ones (Experiment 2). Thus, spaces proved to be advantageous but only when introduced at highly probable word boundaries. We discuss methodological and theoretical implications of these findings.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Cheng Feng

This paper proposes a segmented combined English text measurement method based on two sets of orthogonal linear image sensors and one area image sensor. This method fully combines the advantages of the linear image sensor and the area image sensor in long-distance and short-distance English text measurement and can continuously perform high-precision English text tracking within a large range of viewing distance. Based on this method, a set of segmented English text measurement system is designed and constructed. This paper presents a method for extracting English word boundaries based on semantic segmentation to solve the problem of global positioning and horizontal initialization of English reading text. The semantic segmentation method based on fully convolutional networks (FCN) is analyzed, and the target classification is defined. We used the classic FCN framework and model, fine-tuned with manually annotated data, and achieved good segmentation results. For the definition and extraction of English word boundaries in English text, a piecewise linear model is used to measure the projection confidence of each English word boundary point, and the overall observation of the English word boundary is measured. When the observation confidence is high enough, combined with the English word boundaries marked in the high-precision image, the horizontal positioning is obtained by matching the weights. This paper concludes that English reading software can help learners in English learning to a certain extent, which proves that the English reading software is an effective supplement based on blended learning classrooms. Through the analysis of learners and teaching content, an English teaching model based on English reading software blended learning is designed. Experimental studies have proved that English reading software can help learners learn English, which not only expands their vocabulary but also broadens their horizons.


Author(s):  
Sabine Zerbian ◽  
Frank Kügler

The article analyses violations of the Obligatory Contour Principle (OCP) above the word level in Tswana, a Southern Bantu language, by investigating the realization of adjacent lexical high tones across word boundaries. The results show that across word boundaries downstep (i.e. a lowering of the second in a series of adjacent high tones) only takes place within a phonological phrase. A phonological phrase break blocks downstep, even when the necessary tonal configuration is met. A phrase-based account is adopted in order to account for the occurrence of downstep. Our study confirms a pattern previously reported for the closely related language Southern Sotho and provides controlled, empirical data from Tswana, based on read speech of twelve speakers which has been analysed auditorily by two annotators as well as acoustically.


Author(s):  
Jack Isaac Rabinovitch

Through a corpus of five pre-Qin (before 221 BCE) texts, this paper argues that the authors of both prose and poetry in Classical Chinese were sensitive to OCP violations at cross-word boundaries, and changed diction and used marked word order as a way to avoid the creation of pseudogeminates across words. The frequency of bigrams which result in pseudogeminates are compared to the predicted frequency of pseudogeminates across the corpus. This paper finds that pseudogeminates are significantly (p<0.00001) rarer than expected through randomization. Furthermore, by analyzing these texts with multiple possible phonological reconstructions, this paper suggests that post-codas, segments which were present in Old Chinese, but were elided during the process of tonogenesis between Old Chinese and Middle Chinese, were most likely present in the Chinese of the writers of the texts. Evidence comes from the consistency of OCP avoidance across all tones of Chinese assuming the presence of post-codas, and the lack of consistency thereof when post-codas are not assumed.


2021 ◽  
Author(s):  
Catarina Realinho ◽  
Rita Gonçalves ◽  
Helena Moniz ◽  
Isabel Trancoso

Sign in / Sign up

Export Citation Format

Share Document