Gold standard datasets for evaluating word sense disambiguation programs

1998 ◽  
Vol 12 (4) ◽  
pp. 453-472 ◽  
Author(s):  
Adam Kilgarriff
Author(s):  
Edoardo Barba ◽  
Luigi Procopio ◽  
Caterina Lacerra ◽  
Tommaso Pasini ◽  
Roberto Navigli

Recently, generative approaches have been used effectively to provide definitions of words in their context. However, the opposite, i.e., generating a usage example given one or more words along with their definitions, has not yet been investigated. In this work, we introduce the novel task of Exemplification Modeling (ExMod), along with a sequence-to-sequence architecture and a training procedure for it. Starting from a set of (word, definition) pairs, our approach is capable of automatically generating high-quality sentences which express the requested semantics. As a result, we can drive the creation of sense-tagged data which cover the full range of meanings in any inventory of interest, and their interactions within sentences. Human annotators agree that the sentences generated are as fluent and semantically-coherent with the input definitions as the sentences in manually-annotated corpora. Indeed, when employed as training data for Word Sense Disambiguation, our examples enable the current state of the art to be outperformed, and higher results to be achieved than when using gold-standard datasets only. We release the pretrained model, the dataset and the software at https://github.com/SapienzaNLP/exmod.


2020 ◽  
Vol 4 (3) ◽  
pp. 778
Author(s):  
Valentino Rossi Fierdaus ◽  
Moch Arif Bijaksana ◽  
Widi Astuti

WordNet is a compilation of Synonyms Set (synset), which consists of the words that have the same synonymous. The development of Indonesian WordNet has a goal to build an application that can accommodate and exhibit the relation of words. Synonym Set is a set composed of one or more words that have a similar meaning or synonym relation originated from the Indonesian Thesaurus. In previous studies, the establishment of synsets were transmitted with several approaches, one of which was the cluster ring to produce synsets and WSD (Word Sense Disambiguation). In this research, research is held up to discover the semantic similarities between words in the Indonesian Thesaurus automatically, and also to know the performance of the Agglomerative Hierarchical Clustering method for the development of Indonesian synsets. To calculate performance and evaluation, this research is using the F-measure method involving the gold standard


Author(s):  
Manuel Ladron de Guevara ◽  
Christopher George ◽  
Akshat Gupta ◽  
Daragh Byrne ◽  
Ramesh Krishnamurti

2017 ◽  
Vol 132 ◽  
pp. 47-61 ◽  
Author(s):  
Yoan Gutiérrez ◽  
Sonia Vázquez ◽  
Andrés Montoyo

2005 ◽  
Vol 12 (5) ◽  
pp. 554-565 ◽  
Author(s):  
Martijn J. Schuemie ◽  
Jan A. Kors ◽  
Barend Mons

Sign in / Sign up

Export Citation Format

Share Document