scholarly journals The Acquisition of Noun and Verb Categories by Bootstrapping From a Few Known Words: A Computational Model

2021 ◽  
Vol 12 ◽  
Author(s):  
Perrine Brusini ◽  
Olga Seminck ◽  
Pascal Amsili ◽  
Anne Christophe

While many studies have shown that toddlers are able to detect syntactic regularities in speech, the learning mechanism allowing them to do this is still largely unclear. In this article, we use computational modeling to assess the plausibility of a context-based learning mechanism for the acquisition of nouns and verbs. We hypothesize that infants can assign basic semantic features, such as “is-an-object” and/or “is-an-action,” to the very first words they learn, then use these words, the semantic seed, to ground proto-categories of nouns and verbs. The contexts in which these words occur, would then be exploited to bootstrap the noun and verb categories: unknown words are attributed to the class that has been observed most frequently in the corresponding context. To test our hypothesis, we designed a series of computational experiments which used French corpora of child-directed speech and different sizes of semantic seed. We partitioned these corpora in training and test sets: the model extracted the two-word contexts of the seed from the training sets, then used them to predict the syntactic category of content words from the test sets. This very simple algorithm demonstrated to be highly efficient in a categorization task: even the smallest semantic seed (only 8 nouns and 1 verb known) yields a very high precision (~90% of new nouns; ~80% of new verbs). Recall, in contrast, was low for small seeds, and increased with the seed size. Interestingly, we observed that the contexts used most often by the model featured function words, which is in line with what we know about infants' language development. Crucially, for the learning method we evaluated here, all initialization hypotheses are plausible and fit the developmental literature (semantic seed and ability to analyse contexts). While this experiment cannot prove that this learning mechanism is indeed used by infants, it demonstrates the feasibility of a realistic learning hypothesis, by using an algorithm that relies on very little computational and memory resources. Altogether, this supports the idea that a probabilistic, context-based mechanism can be very efficient for the acquisition of syntactic categories in infants.

1974 ◽  
Vol 26 (1) ◽  
pp. 15-25 ◽  
Author(s):  
Kate Loewenthal ◽  
Graham Gibbs

These experiments examine the relationship between subjects' familiarity judgements of words of similar (low) frequency and their recall or recognition of these words. The expected relationship between familiarity and recall was well confirmed, as was the less expected relationship between familiarity and recognition. An analysis of the vocabulary acquisition process led to more specific predictions about performance on delayed, as compared with immediate, retention tests. The most crucial of these predictions was that words which are familiar, but whose meanings are not known, are remembered by tagging sets of phonological (as opposed to semantic) features, leading to good immediate recall but poor delayed recall, and a greater likelihood of acoustic confusions following a delay. Some support was obtained for these predictions. However, subjects showed unexpectedly good retention of unknown words and it was felt that tagging alone does not account for all the findings.


Author(s):  
Вера Александровна Фролова

В данной статье рассматриваются лексико-семантические особенности послелогов современного немецкого языка и проблема их функционирования в языке. Исходя из того, что послелоги представляют особую группу служебных слов, необходимых для создания и выражения синтаксической связи в словосочетании или предложении, в статье дается детальная характеристика позиционных особенностей немецких послелогов, изучаются их сочетаемость и функциональная нагруженность, анализируются конкретные значения в тексте. В современном чувашском языке (для многих обучающихся в условиях билингвизма национальной республики он является родным), где удельный вес послелогов значительно больше, чем в немецком (иностранном) языке, послелоги играют важнейшую роль: в отсутствии согласования между определяемыми и зависимыми словами послелоги в чувашском языке являются основными элементами создания синтаксической связи в словосочетаниях. Особое место в статье занимает вопрос позиционного сходства послелогов в таких разноструктурных языках, как немецкий и чувашский, так как преподавание этого иностранного языка в Чувашской Республике часто происходит в условиях билингвизма учащихся. Лингвистическая интерференция, которая чаще всего оказывает негативное влияние на овладение новым материалом, в условиях чувашского и русского билингвизма при правильной формулировке задач может вызвать положительный эффект в обучении. This article discusses the lexical and semantic features of postpositions in modern German and the problem of their functioning in the language. Since postpositions represent a special group of function words which are necessary for creating and expressing a syntactic connection between words in a phrase or sentence, the article gives a detailed description of the positional features of German postpositions, considers their combinability and functional capabilities, analyzes specific meanings in the text. In modern Chuvash (for many students studying under the condition of the bilingualism of the national republic, it is their mother tongue), in which the importance of postpositions is higher than in German (foreign) language, postpositions play a significant role, since there is no agreement between defined and dependent words, postpositions in the Chuvash language are the essential elements for creating a syntactic connection between words in phrases. The article pays special attention to the positional similarity of postpositions in such diverse languages as German and Chuvash, since in the Chuvash Republic this foreign language is often taught under the condition of students’ bilingualism. Linguistic interference, which in most of the cases has a negative impact on learning of new material, can cause a positive effect under the condition of Chuvash-Russian bilingualism and correctly formulated aims.


2008 ◽  
Vol 14 (4) ◽  
pp. 527-546 ◽  
Author(s):  
TASANAWAN SOONKLANG ◽  
ROBERT I. DAMPER ◽  
YANNICK MARCHAND

AbstractAutomatic pronunciation of unknown words (i.e., those not in the system dictionary) is a difficult problem in text-to-speech (TTS) synthesis. Currently, many data-driven approaches have been applied to the problem, as a backup strategy for those cases where dictionary matching fails. The difficulty of the problem depends on the complexity of spelling-to-sound mappings according to the particular writing system of the language. Hence, the degree of success achieved varies widely across languages but also across dictionaries, even for the same language with the same method. Further, the sizes of the training and test sets are an important consideration in data-driven approaches. In this paper, we study the variation of letter-to-phoneme transcription accuracy across seven European languages with twelve different lexicons. We also study the relationship between the size of dictionary and the accuracy obtained. The largest dictionaries of each language have been partitioned into ten approximately equal-sized subsets and combined to give ten different-sized test sets. In view of its superior performance in previous work, the transcription method used is pronunciation by analogy (PbA). Best results are obtained for Spanish, generally believed to have a very regular (‘shallow’) orthography, and poorest results for English, a language whose irregular spelling system is legendary. For those languages for which multiple dictionaries were available (i.e., French and English), results were found to vary across dictionaries. For the relationship between dictionary size and transcription accuracy, we find that as dictionary size grows, so performance grows monotonically. However, the performance gain decelerates (tends to saturate) as the dictionary increases in size; the relation can simply be described by a logarithmic regression, one parameter of which (α) can be taken as quantifying the depth of orthography of a language. We find that α for a language is significantly correlated with transcription performance on a small dictionary (approximately 10,000 words) for that language, but less so for asymptotic performance. This may be because our measure of asymptotic performance is unreliable, being extrapolated from the fitted logarithmic regression.


2020 ◽  
Vol 10 (12) ◽  
pp. 389
Author(s):  
Elfrieda H. Hiebert ◽  
Yukie Toyama ◽  
Robin Irey

This study describes the features of words known and unknown by first graders of different proficiency levels in six instances of an oral reading fluency assessment: three in winter and three in spring. A sample of 411 students was placed into four groups (very high, high, middle, and low) based on their median correct words per minute in spring. Each word in the assessment was coded on 11 features: numbers of phonemes, letters, syllables, blends, morphemes, percentages of multisyllabic and of morphologically complex words, concreteness, age of acquisition, decodability, and U function. Words were classified as known if more than 50% of the students within a group were able to correctly read those words. Features of known and unknown words were contrasted for all but the highest group, which made no errors, at each point in time. An analysis of the patterns of known words across groups from winter to spring shows that students followed a similar general progression in the number and type of words recognized. The most prominent feature of unknown words in winter and spring for the middle group of students was the presence of multiple syllables. The lowest-performing group of students continued to be limited by word length and frequency in their recognition of words, but on both features, their proficiency increased from winter to spring. The discussion addresses several critical issues, most notably the relationship of words in oral reading assessments to the word recognition curriculum of many beginning reading programs.


Diagnostica ◽  
2019 ◽  
Vol 65 (4) ◽  
pp. 193-204
Author(s):  
Johannes Baltasar Hessler ◽  
David Brieber ◽  
Johanna Egle ◽  
Georg Mandler ◽  
Thomas Jahn

Zusammenfassung. Der Auditive Wortlisten Lerntest (AWLT) ist Teil des Test-Sets Kognitive Funktionen Demenz (CFD; Cognitive Functions Dementia) im Rahmen des Wiener Testsystems (WTS). Der AWLT wurde entlang neurolinguistischer Kriterien entwickelt, um Interaktionen zwischen dem kognitiven Status der Testpersonen und den linguistischen Eigenschaften der Lernliste zu reduzieren. Anhand einer nach Alter, Bildung und Geschlecht parallelisierten Stichprobe von gesunden Probandinnen und Probanden ( N = 44) und Patientinnen und Patienten mit Alzheimer Demenz ( N = 44) wurde mit ANOVAs für Messwiederholungen überprüft, inwieweit dieses Konstruktionsziel erreicht wurde. Weiter wurde die Fähigkeit der Hauptvariablen des AWLT untersucht, zwischen diesen Gruppen zu unterscheiden. Es traten Interaktionen mit geringer Effektstärke zwischen linguistischen Eigenschaften und der Diagnose auf. Die Hauptvariablen trennten mit großen Effektstärken Patientinnen und Patienten von Gesunden. Der AWLT scheint bei vergleichbarer differenzieller Validität linguistisch fairer als ähnliche Instrumente zu sein.


Author(s):  
Angela A. Manginelli ◽  
Franziska Geringswald ◽  
Stefan Pollmann

When distractor configurations are repeated over time, visual search becomes more efficient, even if participants are unaware of the repetition. This contextual cueing is a form of incidental, implicit learning. One might therefore expect that contextual cueing does not (or only minimally) rely on working memory resources. This, however, is debated in the literature. We investigated contextual cueing under either a visuospatial or a nonspatial (color) visual working memory load. We found that contextual cueing was disrupted by the concurrent visuospatial, but not by the color working memory load. A control experiment ruled out that unspecific attentional factors of the dual-task situation disrupted contextual cueing. Visuospatial working memory may be needed to match current display items with long-term memory traces of previously learned displays.


Author(s):  
Wim De Neys ◽  
Niki Verschueren

Abstract. The Monty Hall Dilemma (MHD) is an intriguing example of the discrepancy between people’s intuitions and normative reasoning. This study examines whether the notorious difficulty of the MHD is associated with limitations in working memory resources. Experiment 1 and 2 examined the link between MHD reasoning and working memory capacity. Experiment 3 tested the role of working memory experimentally by burdening the executive resources with a secondary task. Results showed that participants who solved the MHD correctly had a significantly higher working memory capacity than erroneous responders. Correct responding also decreased under secondary task load. Findings indicate that working memory capacity plays a key role in overcoming salient intuitions and selecting the correct switching response during MHD reasoning.


Sign in / Sign up

Export Citation Format

Share Document