lexical disambiguation
Recently Published Documents


TOTAL DOCUMENTS

50
(FIVE YEARS 7)

H-INDEX

8
(FIVE YEARS 0)

Electronics ◽  
2021 ◽  
Vol 10 (23) ◽  
pp. 2938
Author(s):  
Minho Kim ◽  
Hyuk-Chul Kwon

Supervised disambiguation using a large amount of corpus data delivers better performance than other word sense disambiguation methods. However, it is not easy to construct large-scale, sense-tagged corpora since this requires high cost and time. On the other hand, implementing unsupervised disambiguation is relatively easy, although most of the efforts have not been satisfactory. A primary reason for the performance degradation of unsupervised disambiguation is that the semantic occurrence probability of ambiguous words is not available. Hence, a data deficiency problem occurs while determining the dependency between words. This paper proposes an unsupervised disambiguation method using a prior probability estimation based on the Korean WordNet. This performs better than supervised disambiguation. In the Korean WordNet, all the words have similar semantic characteristics to their related words. Thus, it is assumed that the dependency between words is the same as the dependency between their related words. This resolves the data deficiency problem by determining the dependency between words by calculating the χ2 statistic between related words. Moreover, in order to have the same effect as using the semantic occurrence probability as prior probability, which is used in supervised disambiguation, semantically related words of ambiguous vocabulary are obtained and utilized as prior probability data. An experiment was conducted with Korean, English, and Chinese to evaluate the performance of our proposed lexical disambiguation method. We found that our proposed method had better performance than supervised disambiguation methods even though our method is based on unsupervised disambiguation (using a knowledge-based approach).


PLoS ONE ◽  
2021 ◽  
Vol 16 (11) ◽  
pp. e0259987
Author(s):  
Ehab W. Hermena ◽  
Sana Bouamama ◽  
Simon P. Liversedge ◽  
Denis Drieghe

In Arabic, a predominantly consonantal script that features a high incidence of lexical ambiguity (heterophonic homographs), glyph-like marks called diacritics supply vowel information that clarifies how each consonant should be pronounced, and thereby disambiguate the pronunciation of consonantal strings. Diacritics are typically omitted from print except in situations where a particular homograph is not sufficiently disambiguated by the surrounding context. In three experiments we investigated whether the presence of disambiguating diacritics on target homographs modulates word frequency, length, and predictability effects during reading. In all experiments, the subordinate representation of the target homographs was instantiated by the diacritics (in the diacritized conditions), and by the context subsequent to the target homographs. The results replicated the effects of word frequency (Experiment 1), word length (Experiment 2), and predictability (Experiment 3). However, there was no evidence that diacritics-based disambiguation modulated these effects in the current study. Rather, diacritized targets in all experiments attracted longer first pass and later (go past and/or total fixation count) processing. These costs are suggested to be a manifestation of the subordinate bias effect. Furthermore, in all experiments, the diacritics-based disambiguation facilitated later sentence processing, relative to when the diacritics were absent. The reported findings expand existing knowledge about processing of diacritics, their contribution towards lexical ambiguity resolution, and sentence processing.


PLoS ONE ◽  
2021 ◽  
Vol 16 (3) ◽  
pp. e0248170
Author(s):  
Michael C. W. Yip

The present study examined how working memory functions in the underlying mechanism of the lexical disambiguation process (in activation approach or in inhibition approach). We recruited sixty native Cantonese listeners to participate in two experimental tasks: (a) a Cantonese-version reading span task to measure their working memory (WM) capacity and (b) a standard cross-modal priming task to measure the lexical disambiguation time. The results revealed that (1) the underlying mechanism of the disambiguation process seemed favorable for an inhibition approach and (2) the frequency of the individual meanings of the ambiguous words and the numbers of their meanings might interact with the WM capacity during lexical access, particularly for the low-WM span group.


2020 ◽  
Author(s):  
Timo Benjamin Roettger ◽  
Michael Franke ◽  
Jennifer Cole

Real-time speech comprehension is challenging because communicatively relevant information is distributed throughout the entire utterance. In five mouse tracking experiments on German and American English we probe, if listeners, in principle, use early intonational information to anticipate upcoming referents. Listeners had to select a speaker intended referent with their mouse guided by intonational cues, allowing for anticipation by moving their hand toward the referent prior to lexical disambiguation. While German listeners (Exps. 1-3) seemed to ignore early pitch cues, American English listeners (Exps. 4-5) were in principle able to use these early pitch cues to anticipate upcoming referents. However, many listeners showed no indication of doing so. These results suggest that there are important positional asymmetries in the way intonational information is integrated, with early information being paid less attention to than later cues in the utterance. Open data, scripts, and materials can be retrieved here: https://osf.io/xf8be/.


Author(s):  
Dan Tufiș ◽  
Radu Ion

One of the fundamental tasks in natural-language processing is the morpho-lexical disambiguation of words occurring in text. Over the last twenty years or so, approaches to part-of-speech tagging based on machine learning techniques have been developed or ported to provide high-accuracy morpho-lexical annotation for an increasing number of languages. Due to recent increases in computing power, together with improvements in tagging technology and the extension of language typologies, part-of-speech tags have become significantly more complex. The need to address multilinguality more directly in the web environment has created a demand for interoperable, harmonized morpho-lexical descriptions across languages. Given the large number of morpho-lexical descriptors for a morphologically complex language, one has to consider ways to avoid the data sparseness threat in standard statistical tagging, yet ensure that full lexicon information is available for each word form in the output. The chapter overviews the current major approaches to part-of-speech tagging.


2017 ◽  
Author(s):  
Ignatius Ezeani ◽  
Mark Hepple ◽  
Ikechukwu Onyenwe

2016 ◽  
Vol 3 (2) ◽  
pp. 15-26
Author(s):  
Joanna Błaszczak ◽  
Dorota Klimek-Jankowska

Abstract This paper is a contribution to a long-standing debate between constructionist, lexicalist, and emergentist schools of thought related to the question of what determines the category of lexically ambiguous words whose meanings belong to different syntactic categories (e.g., duck, walk). In the lexicalist view part-of-speech information is stored in the mental lexicon. According to the syntax-first (or constructionist) view, the ambiguous word is assigned to the syntactic category NOUN or VERB solely on the basis of the morphosyntactic frame in which it occurs irrespective of its meaning. In contrast, the emergentist view assumes an interaction of many constraints (semantic and syntactic) whereby semantic constraints are weaker than syntactic constraints in the resolution of word class ambiguities because while semantic context only favors one of the meanings of ambiguous words but does not exclude the competitors, syntactic context supports one meaning of an ambiguous word by ruling out its alternative interpretation. We intend to provide an overview of recent psycholinguistic studies focusing on the processing of word-class ambiguities in order to show that the syntax-first approach is too restrictive while the emergentist view is too permissive. What seems to be at issue is that when grammatical category-ambiguous words are processed, it is not that all constraints are available at the same time and they compete but rather different sources of information can be predicted to affect the process of lexical disambiguation at different stages during processing.


Sign in / Sign up

Export Citation Format

Share Document