Sindhi Part of Speech Tagging System Using Wordnet

Author(s):  
Javed Ahmed Mahar ◽  
Ghulam Qadir Memon
Author(s):  
Nesreen Mohammad Alsharman ◽  
Inna V. Pivkina

This article describes a new method for generating extractive summaries directly via unigram and bigram extraction techniques. The methodology uses the selective part of speech tagging to extract significant unigrams and bigrams from a set of sentences. Extracted unigrams and bigrams along with other features are used to build a final summary. A new selective rule-based part of speech tagging system is developed that concentrates on the most important parts of speech for summarizations: noun, verb, and adjective. Other parts of speech such as prepositions, articles, adverbs, etc., play a lesser role in determining the meaning of sentences; therefore, they are not considered when choosing significant unigrams and bigrams. The proposed method is tested on two problem domains: citations and opinosis data sets. Results show that the proposed method performs better than Text-Rank, LexRank, and Edmundson summarization methods. The proposed method is general enough to summarize texts from any domain.


Author(s):  
Muljono Muljono ◽  
Umriya Afini ◽  
Catur Supriyanto ◽  
Raden Arief Nugroho

Word processing tool is a basic need in learning a language. One of the word processors needed by a language learner is part of speech (POS) tagging. While many POS Tagging tools for Indonesian language have been developed, no systems have been addressed specifically for language learners. This paper presents a study on an Indonesian part of speech (POS) tagging system developed as one of word processing tools for language learners. We use resources from previous Indonesian POS tagging research, such as MorphInd for the morphological analysis and IPOSTagger for part of speech tagging. Objective and subjective tests are employed to evaluate this system. In the objective test the part of speech tagging results use a system model developed from IPOSTagger in combination with MorphInd as the morphological analyzer, and compared with the results of part of speech tagging produced from the original IPOSTagger system model. The results show that the part of speech tagging accuracy using this system model is higher than other models. For its subjective evaluation, Mean Opinion Score (MOS) is used to the 24 participating respondents. The MOS results obtained reach 3,61 for test-1, 3,87 for test-2, and 3,72 for test-3. From the results, we expect that this POS tagging system could be used to help language learners in their Indonesian language self-learning process.


Author(s):  
Robert Östling

This work presents Stagger, a new open-source part of speech tagger for Swedish based on the Averaged Perceptron. By using the SALDO morphological lexicon and semi-supervised learning in the form of Collobert andWeston embeddings, it reaches an accuracy of 96.4% on the standard Stockholm-Umeå Corpus dataset, making it the best single part of speech tagging system reported for Swedish. Accuracy increases to 96.6% on the latest version of the corpus, where the annotation has been revised to increase consistency. Stagger is also evaluated on a new corpus of Swedish blog posts, investigating its out-of-domain performance.


Author(s):  
Nindian Puspa Dewi ◽  
Ubaidi Ubaidi

POS Tagging adalah dasar untuk pengembangan Text Processing suatu bahasa. Dalam penelitian ini kita meneliti pengaruh penggunaan lexicon dan perubahan morfologi kata dalam penentuan tagset yang tepat untuk suatu kata. Aturan dengan pendekatan morfologi kata seperti awalan, akhiran, dan sisipan biasa disebut sebagai lexical rule. Penelitian ini menerapkan lexical rule hasil learner dengan menggunakan algoritma Brill Tagger. Bahasa Madura adalah bahasa daerah yang digunakan di Pulau Madura dan beberapa pulau lainnya di Jawa Timur. Objek penelitian ini menggunakan Bahasa Madura yang memiliki banyak sekali variasi afiksasi dibandingkan dengan Bahasa Indonesia. Pada penelitian ini, lexicon selain digunakan untuk pencarian kata dasar Bahasa Madura juga digunakan sebagai salah satu tahap pemberian POS Tagging. Hasil ujicoba dengan menggunakan lexicon mencapai akurasi yaitu 86.61% sedangkan jika tidak menggunakan lexicon hanya mencapai akurasi 28.95 %. Dari sini dapat disimpulkan bahwa ternyata lexicon sangat berpengaruh terhadap POS Tagging.


Sign in / Sign up

Export Citation Format

Share Document