Memory-based Morphological Analysis and Part-of-speech Tagging of Arabic

Author(s):  
Antal van den Bosch ◽  
Erwin Marsi ◽  
Abdelhadi Soudi
2017 ◽  
Vol 68 (2) ◽  
pp. 396-403
Author(s):  
Hana Žižková

Abstract Compound adverbs represent an interesting issue in terms of Automatic Morphological Analysis (AMA). The reason is that compound adverbs in Czech are expressions formed by compounding existing words that are different parts of speech without any change in their form. An indicative sign of compound adverbs is that they can always be decomposed again. Compound adverbs may be written as one word but sometimes a multiword form coexists. A word that is originally a different part of speech gains an adverbial meaning and becomes an adverb. This article presents the results of a corpus probe aimed at mapping expressions that are demonstrably compound adverbs and were not recognized by AMA or were incorrectly tagged by AMA as another part of speech. Analysis of data obtained from the Czech National Corpus (ČNK) SYN v3 show that the unrecognized and incorrectly tagged units can be divided into several groups. Based on knowledge of these groups it is possible to refine part of speech tagging by AMA. The corpus probe examined units written in accordance with the current codification as well as substandard units.


Author(s):  
Nindian Puspa Dewi ◽  
Ubaidi Ubaidi

POS Tagging adalah dasar untuk pengembangan Text Processing suatu bahasa. Dalam penelitian ini kita meneliti pengaruh penggunaan lexicon dan perubahan morfologi kata dalam penentuan tagset yang tepat untuk suatu kata. Aturan dengan pendekatan morfologi kata seperti awalan, akhiran, dan sisipan biasa disebut sebagai lexical rule. Penelitian ini menerapkan lexical rule hasil learner dengan menggunakan algoritma Brill Tagger. Bahasa Madura adalah bahasa daerah yang digunakan di Pulau Madura dan beberapa pulau lainnya di Jawa Timur. Objek penelitian ini menggunakan Bahasa Madura yang memiliki banyak sekali variasi afiksasi dibandingkan dengan Bahasa Indonesia. Pada penelitian ini, lexicon selain digunakan untuk pencarian kata dasar Bahasa Madura juga digunakan sebagai salah satu tahap pemberian POS Tagging. Hasil ujicoba dengan menggunakan lexicon mencapai akurasi yaitu 86.61% sedangkan jika tidak menggunakan lexicon hanya mencapai akurasi 28.95 %. Dari sini dapat disimpulkan bahwa ternyata lexicon sangat berpengaruh terhadap POS Tagging.


Sign in / Sign up

Export Citation Format

Share Document