Indonesian Stemmer for Ambiguous Word based on Context

Author(s):  
Bunyamin ◽  
Arief Fatchul Huda ◽  
Arie Ardiyanti Suryani
Keyword(s):  
Author(s):  
Christine Chiarello ◽  
Kim Cannon ◽  
Lorie Richards ◽  
Lisa Maxfield

2012 ◽  
Vol 2012 ◽  
pp. 1-8 ◽  
Author(s):  
Anis Zouaghi ◽  
Mounir Zrigui ◽  
Georges Antoniadis ◽  
Laroussi Merhbene

We propose a new approach for determining the adequate sense of Arabic words. For that, we propose an algorithm based on information retrieval measures to identify the context of use that is the closest to the sentence containing the word to be disambiguated. The contexts of use represent a set of sentences that indicates a particular sense of the ambiguous word. These contexts are generated using the words that define the senses of the ambiguous words, the exact string-matching algorithm, and the corpus. We use the measures employed in the domain of information retrieval, Harman, Croft, and Okapi combined to the Lesk algorithm, to assign the correct sense of those proposed.


2021 ◽  
pp. 174702182199000
Author(s):  
Pilar Ferré ◽  
Juan Haro ◽  
Daniel Huete-Pérez ◽  
Isabel Fraga

There is substantial evidence that affectively charged words (e.g., party or gun) are processed differently from neutral words (e.g., pen), although there are also inconsistent findings in the field. Some lexical or semantic variables might explain such inconsistencies, due to the possible modulation of affective word processing by these variables. The aim of the present study was to examine the extent to which affective word processing is modulated by semantic ambiguity. We conducted a large lexical decision study including semantically ambiguous words (e.g., cataract) and semantically unambiguous words (e.g., terrorism), analysing the extent to which reaction times (RTs) were influenced by their affective properties. The findings revealed a valence effect in which positive valence made RTs faster, whereas negative valence slowed them. The valence effect diminished as the semantic ambiguity of words increased. This decrease did not affect all ambiguous words, but was observed mainly in ambiguous words with incongruent affective meanings. These results highlight the need to consider the affective properties of the distinct meanings of ambiguous words in research on affective word processing.


2020 ◽  
Vol 10 (11) ◽  
pp. 3904
Author(s):  
Van-Hai Vu ◽  
Quang-Phuoc Nguyen ◽  
Joon-Choul Shin ◽  
Cheol-Young Ock

Machine translation (MT) has recently attracted much research on various advanced techniques (i.e., statistical-based and deep learning-based) and achieved great results for popular languages. However, the research on it involving low-resource languages such as Korean often suffer from the lack of openly available bilingual language resources. In this research, we built the open extensive parallel corpora for training MT models, named Ulsan parallel corpora (UPC). Currently, UPC contains two parallel corpora consisting of Korean-English and Korean-Vietnamese datasets. The Korean-English dataset has over 969 thousand sentence pairs, and the Korean-Vietnamese parallel corpus consists of over 412 thousand sentence pairs. Furthermore, the high rate of homographs of Korean causes an ambiguous word issue in MT. To address this problem, we developed a powerful word-sense annotation system based on a combination of sub-word conditional probability and knowledge-based methods, named UTagger. We applied UTagger to UPC and used these corpora to train both statistical-based and deep learning-based neural MT systems. The experimental results demonstrated that using UPC, high-quality MT systems (in terms of the Bi-Lingual Evaluation Understudy (BLEU) and Translation Error Rate (TER) score) can be built. Both UPC and UTagger are available for free download and usage.


2021 ◽  
Vol 11 (1) ◽  
pp. 9-24
Author(s):  
Saki Amano

In this paper, the term futsūgo (common language) was viewed over two periods. The first period (1880s-1894) was concerned with education but aimed to establish everyday, commonplace language and script that was familiar to the populace. However, by the 1890s, the policy of Europeanization was being reconsidered, and national consciousness was on the rise. The second period (1894-early 1900s), with the start of the Sino-Japanese War, saw an increase in the national consciousness in strengthening both literary and military arts, with a desire for the establishment of an artificially unified language with artificial rules that would unify the populace and the nation. The natural shift from the populace’s everyday commonplace language to a unified national language became possible through the linguistic logic, or mediation of terminology, seen in the single (but ambiguous) word futsūgo.


2016 ◽  
Vol 38 (2) ◽  
pp. 457-475 ◽  
Author(s):  
JUAN HARO ◽  
PILAR FERRÉ ◽  
ROGER BOADA ◽  
JOSEP DEMESTRE

ABSTRACTThis study presents semantic ambiguity norms for 530 Spanish words. Two subjective measures of semantic ambiguity and two subjective measures of relatedness of ambiguous word meanings were collected. In addition, two objective measures of semantic ambiguity were included. Furthermore, subjective ratings were obtained for some relevant lexicosemantic variables, such as concreteness, familiarity, emotional valence, arousal, and age of acquisition. In sum, the database overcomes some of the limitations of the published databases of Spanish ambiguous words; in particular, the scarcity of measures of ambiguity, the lack of relatedness of ambiguous word meanings measures, and the absence of a set of unambiguous words. Thus, it will be very helpful for researchers interested in exploring semantic ambiguity as well as for those using semantic ambiguous words to study language processing in clinical populations.


Sign in / Sign up

Export Citation Format

Share Document