Exploring multinomial naïve Bayes for Yorùbá text document classification

I.I. Ayogu

doi:10.4314/njt.v39i2.23

Exploring multinomial naïve Bayes for Yorùbá text document classification

Nigerian Journal of Technology ◽

10.4314/njt.v39i2.23 ◽

2020 ◽

Vol 39 (2) ◽

pp. 528-535

Author(s):

I.I. Ayogu

Keyword(s):

English Language ◽

Naive Bayes ◽

Naïve Bayes ◽

Document Classification ◽

Text Documents ◽

Bayes Model ◽

Text Document ◽

Text Document Classification ◽

Yoruba Language ◽

Language Text

The recent increase in the emergence of Nigerian language text online motivates this paper in which the problem of classifying text documents written in Yorùbá language into one of a few pre-designated classes is considered. Text document classification/categorization research is well established for English language and many other languages; this is not so for Nigerian languages. This paper evaluated the performance of a multinomial Naive Bayes model learned on a research dataset consisting of 100 samples of text each from business, sporting, entertainment, technology and political domains, separately on unigram, bigram and trigram features obtained using the bag of words representation approach. Results show that the performance of the model over unigram and bigram features is comparable but significantly better than a model learned on trigram features. The results generally indicate a possibility for the practical application of NB algorithm to the classification of text documents written in Yorùbá language. Keywords: Supervised learning, text classification, Yorùbá language, text mining, BoW Representation

Download Full-text

Performance Comparison and Optimization of Text Document Classification using k-NN and Naïve Bayes Classification Techniques

Procedia Computer Science ◽

10.1016/j.procs.2017.10.017 ◽

2017 ◽

Vol 116 ◽

pp. 107-112 ◽

Cited By ~ 9

Author(s):

Zulfany Erlisa Rasjid ◽

Reina Setiawan

Keyword(s):

Naive Bayes ◽

Naïve Bayes ◽

Document Classification ◽

Performance Comparison ◽

Classification Techniques ◽

Text Document ◽

Naive Bayes Classification ◽

Naïve Bayes Classification ◽

Text Document Classification

Download Full-text

AUTOMATIC SUBJECT LABELING IN DOCUMENTS BY USING ONTOLOGY AND GRAPH DATABASES

Journal of Science and Technology - IUH ◽

10.46242/jst-iuh.v38i02.292 ◽

2020 ◽

Vol 38 (02) ◽

Author(s):

TẠ DUY CÔNG CHIẾN

Keyword(s):

Machine Learning ◽

Language Processing ◽

Data Science ◽

Document Classification ◽

Graph Database ◽

Graph Databases ◽

Text Documents ◽

Domain Specific ◽

Text Document ◽

Text Document Classification

Ontologies apply to many applications in recent years, such as information retrieval, information extraction, and text document classification. The purpose of domain-specific ontology is to enrich the identification of concept and the interrelationships. In our research, we use ontology to specify a set of generic subjects (concept) that characterizes the domain as well as their definitions and interrelationships. This paper introduces a system for labeling subjects of a text documents based on the differential layers of domain specific ontology, which contains the information and the vocabularies related to the computer domain. A document can contain several subjects such as data science, database, and machine learning. The subjects in text document classification are determined based on the differential layers of the domain specific ontology. We combine the methodologies of Natural Language Processing with domain ontology to determine the subjects in text document. In order to increase performance, we use graph database to store and access ontology. Besides, the paper focuses on evaluating our proposed algorithm with some other methods. Experimental results show that our proposed algorithm yields performance significantly

Download Full-text

A Novel Approach for Ontology-Based Feature Vector Generation for Web Text Document Classification

International Journal of Software Innovation ◽

10.4018/ijsi.2018010101 ◽

2018 ◽

Vol 6 (1) ◽

pp. 1-10 ◽

Cited By ~ 7

Author(s):

Mohamed K. Elhadad ◽

Khaled M. Badran ◽

Gouda I. Salama

Keyword(s):

Feature Vector ◽

Text Processing ◽

Principal Component ◽

Document Classification ◽

Text Documents ◽

Lexical Categories ◽

Text Document ◽

Novel Approach ◽

Text Document Classification ◽

Traditional Approaches

The task of extracting the used feature vector in mining tasks (classification, clustering …etc.) is considered the most important task for enhancing the text processing capabilities. This paper proposes a novel approach to be used in building the feature vector used in web text document classification process; adding semantics in the generated feature vector. This approach is based on utilizing the benefit of the hierarchal structure of the WordNet ontology, to eliminate meaningless words from the generated feature vector that has no semantic relation with any of WordNet lexical categories; this leads to the reduction of the feature vector size without losing information on the text, also enriching the feature vector by concatenating each word with its corresponding WordNet lexical category. For mining tasks, the Vector Space Model (VSM) is used to represent text documents and the Term Frequency Inverse Document Frequency (TFIDF) is used as a term weighting technique. The proposed ontology based approach was evaluated against the Principal component analysis (PCA) approach, and against an ontology based reduction technique without the process of adding semantics to the generated feature vector using several experiments with five different classifiers (SVM, JRIP, J48, Naive-Bayes, and kNN). The experimental results reveal the effectiveness of the authors' proposed approach against other traditional approaches to achieve a better classification accuracy F-measure, precision, and recall.

Download Full-text

A Novel Approach for Ontology-Based Dimensionality Reduction for Web Text Document Classification

International Journal of Software Innovation ◽

10.4018/ijsi.2017100104 ◽

2017 ◽

Vol 5 (4) ◽

pp. 44-58 ◽

Cited By ~ 1

Author(s):

Mohamed K. Elhadad ◽

Khaled M. Badran ◽

Gouda I. Salama

Keyword(s):

Dimensionality Reduction ◽

Feature Vector ◽

Text Processing ◽

Principal Component ◽

Vital Role ◽

Document Classification ◽

Weighting Method ◽

Text Documents ◽

Text Document ◽

Text Document Classification

Dimensionality reduction of feature vector size plays a vital role in enhancing the text processing capabilities; it aims in reducing the size of the feature vector used in the mining tasks (classification, clustering, etc.). This paper proposes an efficient approach to be used in reducing the size of the feature vector for web text document classification process. This approach is based on using WordNet ontology, utilizing the benefit of its hierarchal structure, to eliminate words from the generated feature vector that has no relation with any of WordNet lexical categories; this leads to the reduction of the feature vector size without losing information on the text. For mining tasks, the Vector Space Model (VSM) is used to represent text documents and the Term Frequency Inverse Document Frequency (TFIDF) is used as a term weighting method. The proposed ontology based approach was evaluated against the Principal component analysis (PCA) approach using several experiments. The experimental results reveal the effectiveness of the authors' proposed approach against other traditional approaches to achieve a better classification accuracy F-measure, precision, and recall.

Download Full-text

Improving a SVM Meta-classifier for Text Documents by using Naive Bayes

International Journal of Computers Communications & Control ◽

10.15837/ijccc.2010.3.2487 ◽

2010 ◽

Vol 5 (3) ◽

pp. 351 ◽

Cited By ~ 4

Author(s):

Daniel Morariu ◽

Radu Crețulescu ◽

Lucian Vințan

Keyword(s):

Classification Accuracy ◽

Text Categorization ◽

Naive Bayes ◽

Naïve Bayes ◽

Experimental Results ◽

Text Documents ◽

Text Document ◽

Bayes Theory ◽

Individual Classifier ◽

Component Classifier

Text categorization is the problem of classifying text documents into a set of predefined classes. In this paper, we investigated two approaches: a) to develop a classifier for text document based on Naive Bayes Theory and b) to integrate this classifier into a meta-classifier in order to increase the classification accuracy. The basic idea is to learn a meta-classifier to optimally select the best component classifier for each data point. The experimental results show that combining classifiers can significantly improve the classification accuracy and that our improved meta-classification strategy gives better results than each individual classifier. For Reuters2000 text documents we obtained classification accuracies up to 93.87%

Download Full-text

Twitter Sentiment Analysis towards COVID-19 Vaccines in the Philippines Using Naïve Bayes

Information ◽

10.3390/info12050204 ◽

2021 ◽

Vol 12 (5) ◽

pp. 204

Author(s):

Charlyn Villavicencio ◽

Julio Jerison Macrohon ◽

X. Alphonse Inbaraj ◽

Jyh-Horng Jeng ◽

Jer-Guang Hsieh

Keyword(s):

Sentiment Analysis ◽

Language Processing ◽

Data Science ◽

Naive Bayes ◽

The Philippines ◽

Naïve Bayes ◽

Social Networking Site ◽

Bayes Model ◽

The Government ◽

Processing Techniques

A year into the COVID-19 pandemic and one of the longest recorded lockdowns in the world, the Philippines received its first delivery of COVID-19 vaccines on 1 March 2021 through WHO’s COVAX initiative. A month into inoculation of all frontline health professionals and other priority groups, the authors of this study gathered data on the sentiment of Filipinos regarding the Philippine government’s efforts using the social networking site Twitter. Natural language processing techniques were applied to understand the general sentiment, which can help the government in analyzing their response. The sentiments were annotated and trained using the Naïve Bayes model to classify English and Filipino language tweets into positive, neutral, and negative polarities through the RapidMiner data science software. The results yielded an 81.77% accuracy, which outweighs the accuracy of recent sentiment analysis studies using Twitter data from the Philippines.

Download Full-text

Comparison Of Naive Bayes And Support Vector Machine Classifiers On Document Classification

2018 IEEE 7th Global Conference on Consumer Electronics (GCCE) ◽

10.1109/gcce.2018.8574785 ◽

2018 ◽

Cited By ~ 4

Author(s):

Zun Hlaing Moe ◽

Thida San ◽

Mie Mie Khin ◽

Hlaing May Tin

Keyword(s):

Support Vector Machine ◽

Naive Bayes ◽

Naïve Bayes ◽

Document Classification ◽

Support Vector

Download Full-text

Varying Naïve Bayes Models With Applications to Classification of Chinese Text Documents

Journal of Business and Economic Statistics ◽

10.1080/07350015.2014.903086 ◽

2014 ◽

Vol 32 (3) ◽

pp. 445-456 ◽

Cited By ~ 3

Author(s):

Guoyu Guan ◽

Jianhua Guo ◽

Hansheng Wang

Keyword(s):

Chinese Text ◽

Naive Bayes ◽

Naïve Bayes ◽

Text Documents

Download Full-text

Hybrid Neural Architecture for Intelligent Recommender System Classification Unit Design

Intelligent Techniques in Recommendation Systems ◽

10.4018/978-1-4666-2542-6.ch010 ◽

2013 ◽

pp. 192-213

Author(s):

Emmanuel Buabin

Keyword(s):

Recommender System ◽

Document Classification ◽

Research Field ◽

Neural Systems ◽

Fully Integrated ◽

Text Document ◽

Unit Design ◽

Boosting Algorithms ◽

New Research ◽

Text Document Classification

The objective is intelligent recommender system classification unit design using hybrid neural techniques. In particular, a neuroscience-based hybrid neural by Buabin (2011a) is introduced, explained, and examined for its potential in real world text document classification on the modapte version of the Reuters news text corpus. The so described neuroscience model (termed Hy-RNC) is fully integrated with a novel boosting algorithm to augment text document classification purposes. Hy-RNC outperforms existing works and opens up an entirely new research field in the area of machine learning. The main contribution of this book chapter is the provision of a step-by-step approach to modeling the hybrid system using underlying concepts such as boosting algorithms, recurrent neural networks, and hybrid neural systems. Results attained in the experiments show impressive performance by the hybrid neural classifier even with a minimal number of neurons in constituting structures.

Download Full-text

A Novel Inherent Distinguishing Feature Selector for Highly Skewed Text Document Classification

Arabian Journal for Science and Engineering ◽

10.1007/s13369-020-04763-5 ◽

2020 ◽

Vol 45 (12) ◽

pp. 10471-10491

Author(s):

Muhammad Sajid Ali ◽

Kashif Javed

Keyword(s):

Document Classification ◽

Text Document ◽

Feature Selector ◽

Text Document Classification

Download Full-text