scholarly journals Part of Speech and Gramset Tagging Algorithms for Unknown Words Based on Morphological Dictionaries of the Veps and Karelian Languages

Author(s):  
Andrew Krizhanovsky ◽  
Natalia Krizhanovskaya ◽  
Irina Novak
Keyword(s):  
1996 ◽  
Vol 2 (2) ◽  
pp. 111-136 ◽  
Author(s):  
ANDREI MIKHEEV

Words unknown to the lexicon present a substantial problem to part-of-speech tagging. In this paper we present a technique for fully unsupervised acquisition of rules which guess possible parts of speech for unknown words. This technique does not require specially prepared training data, and uses instead the lexicon supplied with a tagger and word frequencies collected from a raw corpus. Three complimentary sets of word-guessing rules are statistically induced: prefix morphological rules, suffix morphological rules and ending guessing rules. The acquisition process is strongly associated with guessing-rule evaluation methodology which is solely dedicated to the performance of part-of-speech guessers. Using the proposed technique a guessing-rule induction experiment was performed on the Brown Corpus data and rule-sets, with a highly competitive performance, were produced and compared with the state-of-the-art. To evaluate the impact of the word-guessing component on the overall tagging performance, it was integrated into a stochastic and a rule-based tagger and applied to texts with unknown words.


Author(s):  
Tobias Schnabel ◽  
Hinrich Schütze

We present FLORS, a new part-of-speech tagger for domain adaptation. FLORS uses robust representations that work especially well for unknown words and for known words with unseen tags. FLORS is simpler and faster than previous domain adaptation methods, yet it has significantly better accuracy than several baselines.


1997 ◽  
Vol 29 (4) ◽  
pp. 531-553 ◽  
Author(s):  
Paula J. Schwanenflugel ◽  
Steven A. Stahl ◽  
Elisabeth L. McFalls

The experiment investigated the development of vocabulary knowledge in elementary school children as a function of story reading for partially known and unknown words. Fourth graders participated in a vocabulary checklist in which they provided definitions or sentences for words they knew (known words) and checked off words they did not know the meaning of but were familiar with (partially known words). Children then read stories containing some of these words. The remaining words served as a control. Vocabulary growth was small but even for both partially known and unknown words. However, the characteristics of the words being learned themselves (particularly, part of speech and concreteness) were more important in determining this growth than aspects of the texts.


2020 ◽  
Vol 2 (2) ◽  
pp. 71-83
Author(s):  
Mohammad Mursyit ◽  
Aji Prasetya Wibawa ◽  
Ilham Ari Elbaith Zaeni ◽  
Harits Ar Rosyid

Part of Speech Tagging atau POS Tagging adalah proses memberikan label pada setiap kata dalam sebuah kalimat secara otomatis. Penelitian ini menggunakan algoritma Hidden Markov Model (HMM) untuk proses POS Tagging. Perlakuan untuk unknown words menggunakan Most Probable POS-Tag. Dataset yang digunakan berupa 10 cerita pendek berbahasa Jawa terdiri dari 10.180 kata yang telah diberikan tagsetBahasa Jawa. Pada penelitian ini proses POS Tagging menggunakan dua skenario. Skenario pertama yaitu menggunakan algoritma Hidden Markov Model (HMM) tanpa menggunakan perlakuan untuk unknown words. Skenario yang kedua menggunakan HMM dan Most Probable POS-Tag untuk perlakuan unknown words. Hasil menunjukan skenario pertama menghasilkan akurasi sebesar 45.5% dan skenario kedua menghasilkan akurasi sebesar 70.78%. Most Probable POS-Tag dapat meningkatkan akurasi pada POS Tagging tetapi tidak selalu menunjukan hasil yang benar dalam pemberian label. Most Probable POS-Tag dapat menghilangkan probabilitas bernilai Nol dari POS Tagging Hidden Markov Model. Hasil penelitian ini menunjukan bahwa POS Tagging dengan menggunakan Hidden Markov Model dipengaruhi oleh perlakuan terhadap unknown words, perbendaharaan kata dan hubungan label kata pada dataset.  Part of Speech Tagging or POS Tagging is the process of automatically giving labels to each word in a sentence. This study uses the Hidden Markov Model (HMM) algorithm for the POS Tagging process. Treatment for unknown words uses the Most Probable POS-Tag. The dataset used is in the form of 10 short stories in Javanese consisting of 10,180 words which have been given the Javanese tagset. In this study, the POS Tagging process uses two scenarios. The first scenario is using the Hidden Markov Model (HMM) algorithm without using treatment for unknown words. The second scenario uses HMM and Most Probable POS-Tag for treatment of unknown words. The results show that the first scenario produces an accuracy of 45.5% and the second scenario produces an accuracy of 70.78%. Most Probable POS-Tag can improve accuracy in POS Tagging but does not always produce correct labels. Most Probable POS-Tag can remove zero-value probability from POS Tagging Hidden Markov Model. The results of this study indicate that POS Tagging using the Hidden Markov Model is influenced by the treatment of unknown words, vocabulary and word label relationships in the dataset.


2019 ◽  
Author(s):  
Zied Baklouti

Natural Language Processing (NLP) is a branch of machine learning that gives the machines the ability to decode human languages. Part-of-speech tagging (POS tagging) is a preprocessing task that requires an annotated corpora. Rule-based and stochastic methods showed great results for POS tag prediction. On this work, I performed a mathematical model based on Hidden Markov structures and I obtained a high level accuracy of ingredients extracted from text recipe which is a performance greater than what traditional methods could make without unknown words consideration.


Sign in / Sign up

Export Citation Format

Share Document