Improving Brill's tagger lexical and transformation rule for Afaan Oromo language

10.7287/peerj.preprints.1225 ◽

2015 ◽

Author(s):

Abraham G Ayana

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Transformation Rule ◽

Initial State ◽

Training Corpus ◽

Part Of Speech Tagging ◽

Pos Tagging ◽

Part Of Speech ◽

Speech Tagging

Natural Language Processing (NLP) refers to Human-like language processing which reveals that it is a discipline within the field of Artificial Intelligence (AI). However, the ultimate goal of research on Natural Language Processing is to parse and understand language, which is not fully achieved yet. For this reason, much research in NLP has focused on intermediate tasks that make sense of some of the structure inherent in language without requiring complete understanding. One such task is part-of-speech tagging, or simply tagging. Lack of standard part of speech tagger for Afaan Oromo will be the main obstacle for researchers in the area of machine translation, spell checkers, dictionary compilation and automatic sentence parsing and constructions. Even though several works have been done in POS tagging for Afaan Oromo, the performance of the tagger is not sufficiently improved yet. Hence,the aim of this thesis is to improve Brill’s tagger lexical and transformation rule for Afaan Oromo POS tagging with sufficiently large training corpus. Accordingly, Afaan Oromo literatures on grammar and morphology are reviewed to understand nature of the language and also to identify possible tagsets. As a result, 26 broad tagsets were identified and 17,473 words from around 1100 sentences containing 6750 distinct words were tagged for training and testing purpose. From which 258 sentences are taken from the previous work. Since there is only a few ready made standard corpuses, the manual tagging process to prepare corpus for this work was challenging and hence, it is recommended that a standard corpus is prepared. Transformation-based Error driven learning are adapted for Afaan Oromo part of speech tagging. Different experiments are conducted for the rule based approach taking 20% of the whole data for testing. A comparison with the previously adapted Brill’s Tagger made. The previously adapted Brill’s Tagger shows an accuracy of 80.08% whereas the improved Brill’s Tagger result shows an accuracy of 95.6% which has an improvement of 15.52%. Hence, it is found that the size of the training corpus, the rule generating system in the lexical rule learner, and moreover, using Afaan Oromo HMM tagger as initial state tagger have a significant effect on the improvement of the tagger.

Download Full-text

Arabic Part-of-Speech Tagger, an Approach Based on Neural Network Modelling

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.29.14009 ◽

2018 ◽

Vol 7 (2.29) ◽

pp. 742

Author(s):

Rabab Ali Abumalloh ◽

Hasan Muaidi Al-Serhan ◽

Othman Bin Ibrahim ◽

Waheeb Abu-Ulbeh

Keyword(s):

Neural Network ◽

Artificial Neural Network ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Part Of Speech Tagging ◽

Pos Tagging ◽

Part Of Speech ◽

Artificial Neural ◽

Speech Tagging

POS-tagging gained the interest of researchers in computational linguistics sciences in the recent years. Part-of-speech tagging systems assign the proper grammatical tag or morpho-syntactical category labels automatically to every word in the corpus per its appearance on the text. POS-tagging serves as a fundamental and preliminary step in linguistic analysis which can help in developing many natural language processing applications such as: word processing systems, spell checking systems, building dictionaries and in parsing systems. Arabic language gained the interest of researchers which led to increasing demand for Arabic natural language processing systems. Artiﬁcial neural networks has been applied in many applications such as speech recognition and part of speech prediction, but it is considered as a new approach in Part-of-speech tagging. In this research, we developed an Arabic POS-tagger using artificial neural network. A corpus of 20,620 words, which were manually assigned to the appropriate tags was developed and used to train the artificial neural network and to test the part of speech tagger systems’ overall performance. The accuracy of the developed tagger reaches 89.04% using the testing dataset. While, it reaches 98.94% using the training dataset. By combining the two datasets, the accuracy rate for the whole system is 96.96%.

Download Full-text

Improving performance of natural language processing part-of-speech tagging on clinical narratives through domain adaptation

Journal of the American Medical Informatics Association ◽

10.1136/amiajnl-2012-001453 ◽

2013 ◽

Vol 20 (5) ◽

pp. 931-939 ◽

Cited By ~ 16

Author(s):

Jeffrey P Ferraro ◽

Hal Daumé ◽

Scott L DuVall ◽

Wendy W Chapman ◽

Henk Harkema ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Domain Adaptation ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

Download Full-text

Aspect Based Sentiments from Tweets using Co-Ranking Multi-Modal Natural Language Processing Methodologies

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.e6305.018520 ◽

2020 ◽

Vol 8 (5) ◽

pp. 1061-1068

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Detection System ◽

Word Segmentation ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Sentiment Detection ◽

Analysis System ◽

Speech Tagging

Now-a-days people interest to spend their time in social sites especially twitters to post lot of tweets in every day. The posted tweets are used by many users to get the knowledge about the particular applications, products and other search engine queries. With the help of the posted tweets, their emotions and sentiments are derived which are used to get opinion about particular event. Lot of traditional sentiment detection system that has been developed but they failed to analyze huge volume of tweets and online contents with temporal patterns were also difficult to analyze. To overcome the above issues, the co-ranking multi-modal natural language processing based sentiment analysis system was developed to detect the emotions from the posted tweets. Initially, tweets of different events are collected from social sites which are processed by natural language procedures such as Stemming, Lemmatization, Part-of-speech tagging, word segmentation and parsing are applied to get the words related to posted tweets for deriving the sentiments. From the extracted emotions, co-ranking process is applied to get the opinion effectively related to particular event. Then the efficiency of the system is examined using experimental results and discussions. The introduced system recognize the sentiments from tweets with 98.80% of accuracy.

Download Full-text

Novel Text Steganography Using Natural Language Processing and Part-of-Speech Tagging

IETE Journal of Research ◽

10.1080/03772063.2018.1491807 ◽

2018 ◽

Vol 66 (3) ◽

pp. 384-395 ◽

Cited By ~ 4

Author(s):

Barnali Gupta Banik ◽

Samir Kumar Bandyopadhyay

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Text Steganography ◽

Speech Tagging

Download Full-text

Natural language processing in support of decision-making: phrases and part-of-speech tagging

Information Processing & Management ◽

10.1016/s0306-4573(00)00061-3 ◽

2001 ◽

Vol 37 (6) ◽

pp. 769-787 ◽

Cited By ~ 6

Author(s):

Robert M. Losee

Keyword(s):

Decision Making ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

Download Full-text

Natural language processing for similar languages, varieties, and dialects: A survey

Natural Language Engineering ◽

10.1017/s1351324920000492 ◽

2020 ◽

Vol 26 (6) ◽

pp. 595-612

Author(s):

Marcos Zampieri ◽

Preslav Nakov ◽

Yves Scherrer

Keyword(s):

Natural Language Processing ◽

Data Collection ◽

Natural Language ◽

Machine Translation ◽

Computational Methods ◽

Language Processing ◽

Language Varieties ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

AbstractThere has been a lot of recent interest in the natural language processing (NLP) community in the computational processing of language varieties and dialects, with the aim to improve the performance of applications such as machine translation, speech recognition, and dialogue systems. Here, we attempt to survey this growing field of research, with focus on computational methods for processing similar languages, varieties, and dialects. In particular, we discuss the most important challenges when dealing with diatopic language variation, and we present some of the available datasets, the process of data collection, and the most common data collection strategies used to compile datasets for similar languages, varieties, and dialects. We further present a number of studies on computational methods developed and/or adapted for preprocessing, normalization, part-of-speech tagging, and parsing similar languages, language varieties, and dialects. Finally, we discuss relevant applications such as language and dialect identification and machine translation for closely related languages, language varieties, and dialects.

Download Full-text

Part-of-Speech Tagging Enhancement to Natural Language Processing for Thai Wh-Question Classification with Deep Learning

Heliyon ◽

10.1016/j.heliyon.2021.e08216 ◽

2021 ◽

pp. e08216

Author(s):

Saranlita Chotirat ◽

Phayung Meesad

Keyword(s):

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Question Classification ◽

Speech Tagging

Download Full-text

Bibliometric and Geographical Analysis of Cell Death Related Literature

10.1101/035204 ◽

2015 ◽

Author(s):

Vijaykumar Yogesh Muley ◽

Anne Hahn ◽

Pravin Paikrao

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Scientific Community ◽

Official Language ◽

Lexical Diversity ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Recent Developments ◽

Speech Tagging

Natural language processing continues to gain importance in a thriving scientific community that communicates its latest results in such a frequency that following up on the most recent developments even in a specific field cannot be managed by human readers alone. Here we summarize and compare the publishing activity of the previous years on a distinct topic across several countries, addressing not only publishing frequency and history, but also stylistic characteristics that are accessible by means of natural language processing. Though there are no profound differences in the sentence lengths or lexical diversity among different countries, writing styles approached by Part-Of-Speech tagging are similar among countries that share history or official language or those are spatially close.

Download Full-text

A FORMULA TO CALCULATE PRUNING THRESHOLD FOR THE PART-OF-SPEECH TAGGING PROBLEM

Vietnam Journal of Science and Technology ◽

10.15625/2525-2518/54/3a/11959 ◽

2018 ◽

Vol 54 (3A) ◽

pp. 64

Author(s):

Nguyen Chi Hieu

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Semantic Information ◽

Viterbi Algorithm ◽

Wall Street ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

The Wall Street Journal ◽

Speech Tagging

The exact tagging of the words in the texts is a very important task in the natural language processing. It can support parsing the text, contribute to the solution of the polysemous word, and help to access a semantic information, etc. One of crucial factors in the POS (Part-of-Speech) tagging approaches based on the statistical method is the processing time. In this paper, we propose an approach to calculate the pruning threshold, which can apply into the Viterbi algorithm of Hidden Markov model for tagging the texts in the natural language processing. Experiment on the 1.000.000 words on the tag of the Wall Street Journal corpus showed that our proposed solution is satisfactory.

Download Full-text