A Template-Based Approach for Tagging Non-Vocalized Arabic Nouns

Hashem Alsharif;

doi:10.52132/ajrsp.e.2021.32.1

A Template-Based Approach for Tagging Non-Vocalized Arabic Nouns

Academic Journal of Research and Scientific Publishing ◽

10.52132/ajrsp.e.2021.32.1 ◽

2021 ◽

Vol 3 (32) ◽

pp. 05-35

Author(s):

Hashem Alsharif ◽

Keyword(s):

Linear Part ◽

Arabic Language ◽

Arabic Text ◽

Rule Based ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Pos Tagger ◽

Log Linear ◽

Speech Tagging

There exist no corpora of Arabic nouns. Furthermore, in any Arabic text, nouns can be found in different forms. In fact, by tagging nouns in an Arabic text, the beginning of each sentence can determine whether it starts with a noun or a verb. Part of Speech Tagging (POS) is the task of labeling each word in a sentence with its appropriate category, which is called a Tag (Noun, Verb and Article). In this thesis, we attempt to tag non-vocalized Arabic text. The proposed POS Tagger for Arabic Text is based on searching for each word of the text in our lists of Verbs and Articles. Nouns are found by eliminating Verbs and Articles. Our hypothesis states that, if the word in the text is not found in our lists, then it is a Noun. These comparisons will be made for each of the words in the text until all of them have been tagged. To apply our method, we have prepared a list of articles and verbs in the Arabic language with a total of 112 million verbs and articles combined, which are used in our comparisons to prove our hypothesis. To evaluate our proposed method, we used pre-tagged words from "The Quranic Arabic Corpus", making a total of 78,245 words, with our method, the Template-based tagging approach compared with (AraMorph) a rule-based tagging approach and the Stanford Log-linear Part-Of-Speech Tagger. Finally, AraMorph produced 40% correctly-tagged words and Stanford Log-linear Part-Of-Speech Tagger produced 68% correctly-tagged words, while our method produced 68,501 correctly-tagged words (88%).

Download Full-text

Punjabi Pos Tagger: Rule Based and HMM

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse/v7i7/0106 ◽

2017 ◽

Vol 7 (7) ◽

pp. 193

Author(s):

Umrinderpal Singh ◽

Vishal Goyal

Keyword(s):

Information Retrieval ◽

Language Processing ◽

State Of The Art ◽

Input Word ◽

Rule Based ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Unseen Data ◽

Pos Tagger ◽

Speech Tagging

The Part of Speech tagger system is used to assign a tag to every input word in a given sentence. The tags may include different part of speech tag for a particular language like noun, pronoun, verb, adjective, conjunction etc. and may have subcategories of all these tags. Part of Speech tagging is a basic and a preprocessing task of most of the Natural Language Processing (NLP) applications such as Information Retrieval, Machine Translation, and Grammar Checking etc. The task belongs to a larger set of problems, namely, sequence labeling problems. Part of Speech tagging for Punjabi is not widely explored territory. We have discussed Rule Based and HMM based Part of Speech tagger for Punjabi along with the comparison of their accuracies of both approaches. The System is developed using 35 different standard part of speech tag. We evaluate our system on unseen data with state-of-the-art accuracy 93.3%.

Download Full-text

Part of speech tagging for Arabic text based radial basis function

Journal of Discrete Mathematical Sciences and Cryptography ◽

10.1080/09720529.2021.2014148 ◽

2021 ◽

Vol 24 (8) ◽

pp. 2443-2459

Author(s):

Osama R. Shahin ◽

Rady El Rwelli

Keyword(s):

Radial Basis Function ◽

Basis Function ◽

Arabic Text ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Radial Basis ◽

Speech Tagging

Download Full-text

Pengaruh Part of Speech Tagging Berbasis Aturan dan Distribusi Probabilitas Maximum Entropy untuk Bahasa Jawa Krama

Jurnal Buana Informatika ◽

10.24002/jbi.v7i4.764 ◽

2016 ◽

Vol 7 (4) ◽

Author(s):

Hafiz Ridha Pramudita ◽

Ema Utami ◽

Armadyah Amborowati

Keyword(s):

Maximum Entropy ◽

Syntactic Category ◽

Rule Based ◽

Part Of Speech Tagging ◽

Pos Tagging ◽

Part Of Speech ◽

Speech Tagging ◽

Local Languages

Abstract. Javanese language is one of the local languages in Indonesia, which is used by most of the population of Indonesia. The language has complex grammar to embrace the values of decency that is determined by the use of words containing courtesy known as Raos Alus. Every word in the Javanese belongs to a certain part of speech like what happens to other languages. Part of Speech (POS) tagging is a process to set syntactic category in a word such as nouns, verbs, or adjectives to every word in the document or text. This study examined the POS Tagging with Maximum Entropy and Rule Based for Javanese Krama—Higher Javanese--by using the Open NLP library to measure the maximum entropy. The results obtained are Maximum Entropy and Rule Based can be used for POS Tagging on Javanese Krama with the highest accuracy of 97.67%.Keywords: POS Tagging, NLP, Maximum Entropy, Rule Based, Javanese Krama LanguageAbstrak. Bahasa Jawa merupakan salah satu bahasa daerah di Indonesia yang dipakai oleh sebagian besar penduduk Indonesia. Bahasa Jawa memiliki tata bahasa yang kompleks karena menganut nilai-nilai kesopanan yang ditentukan berdasarkan penggunaan dengan kata-kata yang mengandung raos alus (rasa sopan). Setiap kata dalam Bahasa Jawa memiliki jenis kata atau part of speech tertentu seperti halnya dengan bahasa-bahasa lain. POS tagging merupakah bagian penting dari cakupan bidang ilmu Natural Languange Processing (NLP). Penelitian ini menguji POS Tagging dengan Berbasis Aturan dan distribusi probabilitas Maximum Entropy pada Bahasa Jawa Krama menggunakan library OpenNLP untuk mengukur maximum entropy. Hasil yang diperoleh adalah Maximum Entropy dan Rule Based dapat digunakan untuk POSTagging pada Bahasa Jawa Krama dengan akurasi tertinggi 97,67%.Kata Kunci: POS Tagging, NLP, Maximum Entropy, Rule Based, Bahasa Jawa Krama

Download Full-text

A Scalable Solution for Rule-Based Part-of-Speech Tagging on Novel Hardware Accelerators

Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining ◽

10.1145/3219819.3219889 ◽

2018 ◽

Cited By ~ 7

Author(s):

Elaheh Sadredini ◽

Deyuan Guo ◽

Chunkun Bo ◽

Reza Rahimi ◽

Kevin Skadron ◽

...

Keyword(s):

Hardware Accelerators ◽

Rule Based ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

Download Full-text

PENENTUAN KELAS KATA PADA PART OF SPEECH TAGGING KATA AMBIGU BAHASA INDONESIA

JISKA (Jurnal Informatika Sunan Kalijaga) ◽

10.14421/jiska.2018.23-05 ◽

2018 ◽

Vol 2 (3) ◽

pp. 157

Author(s):

Ahmad Subhan Yazid ◽

Agung Fatwanto

Keyword(s):

Language Processing ◽

Word Class ◽

Rule Based ◽

Part Of Speech Tagging ◽

Pos Tagging ◽

Part Of Speech ◽

Ambiguous Words ◽

Computer Science Faculty ◽

Speech Tagging ◽

Bahasa Indonesia

Indonesian hold a fundamental role in the communication. There is ambiguous problem in its machine learning implementation. In the Natural Language Processing study, Part of Speech (POS) tagging has a role in the decreasing this problem. This study use the Rule Based method to determine the best word class for ambiguous words in Indonesian. This research follows some stages: knowledge inventory, making algorithms, implementation, Testing, Analysis, and Conclusions. The first data used is Indonesian corpus that was developed by Language department of Computer science Faculty, Indonesia University. Then, data is processed and shown descriptively by following certain rules and specification. The result is a POS tagging algorithm included 71 rules in flowchart and descriptive sentence notation. Refer to testing result, the algorithm successfully provides 92 labeling of 100 tested words (92%). The results of the implementation are influenced by the availability of rules, word class tagsets and corpus data.

Download Full-text

Robust Part-of-speech Tagging of Arabic Text

10.18653/v1/w15-3222 ◽

2015 ◽

Cited By ~ 2

Author(s):

Hanan Aldarmaki ◽

Mona Diab

Keyword(s):

Arabic Text ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

Download Full-text

POS Tagging Bahasa Indonesia Dengan HMM dan Rule Based

Jurnal Informatika ◽

10.21460/inf.2012.82.125 ◽

2013 ◽

Vol 8 (2) ◽

Cited By ~ 1

Author(s):

Kathryn Widhiyanti ◽

Agus Harjoko

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Hidden Markov ◽

Word Class ◽

Rule Based ◽

Part Of Speech Tagging ◽

Pos Tagging ◽

Part Of Speech ◽

Class Labelling ◽

Speech Tagging

The research conduct a Part of Speech Tagging (POS-tagging) for text in Indonesian language, supporting another process in digitising natural language e.g. Indonesian language text parsing. POS-tagging is an automated process of labelling word classes for certain word in sentences (Jurafsky and Martin, 2000). The escalated issue is how to acquire an accurate word class labelling in sentence domain. The author would like to propose a method which combine Hidden Markov Model and Rule Based method. The expected outcome in this research is a better accurary in word class labelling, resulted by only using Hidden Markov Model. The labelling results –from Hidden Markov Model– are refined by validating with certain rule, composed by the used corpus automatically. From the conducted research through some POST document, using Hidden Markov Model, produced 100% as the highest accurary for identical text within corpus. For different text within the referenced corpus, used words subjected in corpus, produced 92,2% for the highest accurary.

Download Full-text

A Rule-Based Approach for Marathi Part-of-Speech Tagging

10.1007/978-981-16-4177-0_76 ◽

2021 ◽

pp. 773-785

Author(s):

P. Kadam Vaishali ◽

Khandale Kalpana ◽

C. Namrata Mahender

Keyword(s):

Rule Based ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Rule Based Approach ◽

Speech Tagging

Download Full-text

Building Balinese Part-of-Speech Tagger Using Hidden Markov Model (HMM)

JELIKU (Jurnal Elektronik Ilmu Komputer Udayana) ◽

10.24843/jlk.2020.v09.i02.p18 ◽

2020 ◽

Vol 9 (2) ◽

pp. 303

Author(s):

I Gde Made Hendra Pradiptha ◽

Ngurah Agus Sanjaya ER

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Hidden Markov ◽

Probabilistic Approach ◽

Word Class ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Fast Processing ◽

Pos Tagger ◽

Speech Tagging

Part-of-Speech tagging or word class labeling is a process for labeling a word class in a word in a sentence. Previous research on POS Tagger, especially for Indonesian, has been done using various approaches and obtained high accuracy values. However, not many researchers have built POS Tagger for Balinese. In this article, we are interested in building a POS Tagger for Balinese using a probabilistic approach, specifically the Hidden Markov Model (HMM). HMM is selected to deal with ambiguity since it gives higher accuracy and fast processing time. We used k-fold cross-validation (with k = 10) and tagged corpus around 3669 tokens with 21 tags. Based on the experiments conducted, the HMM method obtained an accuracy of 68.56%.

Download Full-text

Fuzzy network model for part-of-speech tagging under small training data

Natural Language Engineering ◽

10.1017/s1351324996001258 ◽

1996 ◽

Vol 2 (2) ◽

pp. 95-110 ◽

Cited By ~ 5

Author(s):

JAE-HOON KIM ◽

GIL CHANG KIM

Keyword(s):

Network Model ◽

Hidden Markov ◽

Training Data ◽

Rule Based ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Network Approaches ◽

Fuzzy Network ◽

Speech Tagging ◽

Better Than

Recently, most part-of-speech tagging approaches, such as rule-based, probabilistic and neural network approaches, have shown very promising results. In this paper, we are particularly interested in probabilistic approaches, which usually require lots of training data to get reliable probabilities. We alleviate such a restriction of probabilistic approaches by introducing a fuzzy network model to provide a method for estimating more reliable parameters of a model under a small amount of training data. Experiments with the Brown corpus show that the performance of the fuzzy network model is much better than that of the hidden Markov model under a limited amount of training data.

Download Full-text