scholarly journals Building Synonym Set for Indonesian WordNet using Commutative Method and Hierarchical Clustering

2020 ◽  
Vol 4 (3) ◽  
pp. 778
Author(s):  
Valentino Rossi Fierdaus ◽  
Moch Arif Bijaksana ◽  
Widi Astuti

WordNet is a compilation of Synonyms Set (synset), which consists of the words that have the same synonymous. The development of Indonesian WordNet has a goal to build an application that can accommodate and exhibit the relation of words. Synonym Set is a set composed of one or more words that have a similar meaning or synonym relation originated from the Indonesian Thesaurus. In previous studies, the establishment of synsets were transmitted with several approaches, one of which was the cluster ring to produce synsets and WSD (Word Sense Disambiguation). In this research, research is held up to discover the semantic similarities between words in the Indonesian Thesaurus automatically, and also to know the performance of the Agglomerative Hierarchical Clustering method for the development of Indonesian synsets. To calculate performance and evaluation, this research is using the F-measure method involving the gold standard

Author(s):  
Edoardo Barba ◽  
Luigi Procopio ◽  
Caterina Lacerra ◽  
Tommaso Pasini ◽  
Roberto Navigli

Recently, generative approaches have been used effectively to provide definitions of words in their context. However, the opposite, i.e., generating a usage example given one or more words along with their definitions, has not yet been investigated. In this work, we introduce the novel task of Exemplification Modeling (ExMod), along with a sequence-to-sequence architecture and a training procedure for it. Starting from a set of (word, definition) pairs, our approach is capable of automatically generating high-quality sentences which express the requested semantics. As a result, we can drive the creation of sense-tagged data which cover the full range of meanings in any inventory of interest, and their interactions within sentences. Human annotators agree that the sentences generated are as fluent and semantically-coherent with the input definitions as the sentences in manually-annotated corpora. Indeed, when employed as training data for Word Sense Disambiguation, our examples enable the current state of the art to be outperformed, and higher results to be achieved than when using gold-standard datasets only. We release the pretrained model, the dataset and the software at https://github.com/SapienzaNLP/exmod.


2021 ◽  
Vol 13 (6) ◽  
pp. 40-50
Author(s):  
Abdo Ababor Abafogi ◽  

Language is the main means of communication used by human. In various situations, the same word can mean differently based on the usage of the word in a particular sentence which is challenging for a computer to understand as level of human. Word Sense Disambiguation (WSD), which aims to identify correct sense of a given ambiguity word, is a long-standing problem in natural language processing (NLP). As the major aim of WSD is to accurately understand the sense of a word in particular context, can be used for the correct labeling of words in natural language applications. In this paper, I propose a normalized statistical algorithm that performs the task of WSD for Afaan Oromo language despite morphological analysis The propose algorithm has the power to discriminate ambiguous word’s sense without windows size consideration, without predefined rule and without utilize annotated dataset for training which minimize a challenge of under resource languages. The proposed system tested on 249 sentences with precision, recall, and F-measure. The overall effectiveness of the system is 80.76% in F-measure, which implies that the proposed system is promising on Afaan Oromo that is one of under resource languages spoken in East Africa. The algorithm can be extended for semantic text similarity without modification or with a bit modification. Furthermore, the forwarded direction can improve the performance of the proposed algorithm.


2020 ◽  
Vol 9 (2) ◽  
pp. 165
Author(s):  
Mubaroq Iqbal ◽  
Moch. Arif Bijaksana ◽  
Widi Astuti

On the development of Indonesian WordNet, the synonym set is an important part that represents the similarity of meaning between words. Synonym sets are built using the Indonesian Thesaurus as the lexical database. After going through the extraction process from the Indonesian Thesaurus, we will get a synonym set that has a similarity or word sense between words. In general, the difference between WordNet and the dictionary is their main focus, in which the dictionary usually focuses on just one word, while in WordNet the focus is on the meaning of words and connectedness with other words. Explained in previous research, the constructions of synonym sets were done using several approaches, which is clustering to produce synonym sets and WSD (Word Sense Disambiguation). In this article, the approach used to produce synonym sets is the ROCK (Robust Clustering Using Links) algorithm, which uses similarity and link values. The resulting synonym sets will then be used for lexical database development. Therefore, the main focus of this article is to produce synonym sets through the clustering process and calculate their accuracy, using the F-Measure method involving the gold standard for performance calculation and evaluation.


2015 ◽  
Vol 22 (5) ◽  
pp. 319-362
Author(s):  
Hiroyuki Shinnou ◽  
Masaki Murata ◽  
Kiyoaki Shirai ◽  
Fumiyo Fukumoto ◽  
Sanae Fujita ◽  
...  

2021 ◽  
Vol 10 (2) ◽  
pp. 145-151
Author(s):  
Nisrina Arintia Maghfiroh ◽  
Galih Wasis Wicaksono ◽  
Christian Sri Kusuma Aditya

Peringkasan berita otomatis merupakan aktivitas mengekstraksi inti dari berita tanpa mengurangi makna penting yang terdapat dalam berita tersebut. Dalam peringkasan berita otomatis terdapat beberapa metode yang dapat digunakan salah satunya yaitu metode Lexical Chain. Metode ini memiliki kinerja yang baik dalam peringkasan teks dengan cara menentukan chain tertinggi. Namun, metode ini memiliki kelemahan yaitu tidak bisa mengidentifikasi kata ambigu yang terdapat pada kalimat berita. Oleh karena itu, untuk memperbaiki kekurangan dari kelemahan metode Lexical Chain maka pada penelitian ini dilengkapi dengan Word Sense Disambiguation untuk mengidentifikasi kata ambigu. Penelitian ini menggunakan 100 berita tentang Covid-19 yang bersumber dari portal berita online terpopuler. Pengujian akurasi peringkasan berita otomatis yang digunakan dalam penelitian ini menggunakan Recall-Oriented Understudy for Gisting Evaluation (ROUGE). Adapun evaluasi yang digunakan pada penelitian ini ada tiga macam yaitu precission, recall, dan f-measure. Hasil evaluasi diperoleh nilai rata-rata precission sebesar 0,62, recall sebesar 0,20, dan f-measure sebesar 0,30.


Sign in / Sign up

Export Citation Format

Share Document