Improving Techniques for Naïve Bayes Text Classifiers

Handbook of Research on Text and Web Mining Technologies ◽

10.4018/978-1-59904-990-8.ch007 ◽

2010 ◽

pp. 111-127

Author(s):

Han-joon Kim

Keyword(s):

Text Classification ◽

Naive Bayes ◽

Naïve Bayes ◽

Classification Systems ◽

Classification Model ◽

Learning Approaches ◽

Learning Framework ◽

The Em Algorithm ◽

Meta Learning ◽

Text Classifiers

This chapter introduces two practical techniques for improving Naïve Bayes text classifiers that are widely used for text classification. The Naïve Bayes has been evaluated to be a practical text classification algorithm due to its simple classification model, reasonable classification accuracy, and easy update of classification model. Thus, many researchers have a strong incentive to improve the Naïve Bayes by combining it with other meta-learning approaches such as EM (Expectation Maximization) and Boosting. The EM approach is to combine the Naïve Bayes with the EM algorithm and the Boosting approach is to use the Naïve Bayes as a base classifier in the AdaBoost algorithm. For both approaches, a special uncertainty measure fit for Naïve Bayes learning is used. In the Naïve Bayes learning framework, these approaches are expected to be practical solutions to the problem of lack of training documents in text classification systems.

Download Full-text

Statistical Analysis of Public Sentiment on the Ghanaian Government: A Machine Learning Approach

Advances in Human-Computer Interaction ◽

10.1155/2021/5561204 ◽

2021 ◽

Vol 2021 ◽

pp. 1-7

Author(s):

John Andoh ◽

Louis Asiedu ◽

Anani Lotsi ◽

Charlotte Chapman-Wardy

Keyword(s):

Machine Learning ◽

Text Classification ◽

Naive Bayes ◽

Learning Algorithms ◽

Naïve Bayes ◽

Classification Systems ◽

Support Vector ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier

Gathering public opinions on the Internet and Internet-based applications like Twitter has become popular in recent times, as it provides decision-makers with uncensored public views on products, government policies, and programs. Through natural language processing and machine learning techniques, unstructured data forms from these sources can be analyzed using traditional statistical learning. The challenge encountered in machine learning method-based sentiment classification still remains the abundant amount of data available, which makes it difficult to train the learning algorithms in feasible time. This eventually degrades the classification accuracy of the algorithms. From this assertion, the effect of training data sizes in classification tasks cannot be overemphasized. This study statistically assessed the performance of Naive Bayes, support vector machine (SVM), and random forest algorithms on sentiment text classification task. The research also investigated the optimal conditions such as varying data sizes, trees, and kernel types under which each of the respective algorithms performed best. The study collected Twitter data from Ghanaian users which contained sentiments about the Ghanaian Government. The data was preprocessed, manually labeled by the researcher, and then trained using the aforementioned algorithms. These algorithms are three of the most popular learning algorithms which have had lots of success in diverse fields. The Naive Bayes classifier was adjudged the best algorithm for the task as it outperformed the other two machine learning algorithms with an accuracy of 99%, F1 score of 86.51%, and Matthews correlation coefficient of 0.9906. The algorithm also performed well with increasing data sizes. The Naive Bayes classifier is recommended as viable for sentiment text classification, especially for text classification systems which work with Big Data.

Download Full-text

The Simply Implement of Effective Naive Bayes Web News Text Classification Model

Statistics and Applications ◽

10.12677/sa.2014.31005 ◽

2014 ◽

Vol 03 (01) ◽

pp. 30-35 ◽

Cited By ~ 1

Author(s):

致晖吴

Keyword(s):

Text Classification ◽

Naive Bayes ◽

Naïve Bayes ◽

Classification Model ◽

Web News

Download Full-text

An Optimized E-Lecture Video Retrieval based on Machine Learning Classification

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.f9114.088619 ◽

2019 ◽

Vol 8 (6) ◽

pp. 4820-4827

Keyword(s):

Machine Learning ◽

Text Classification ◽

Naive Bayes ◽

Video Retrieval ◽

Naïve Bayes ◽

Classification Algorithm ◽

Classification Model ◽

Support Vector ◽

Machine Learning Classification ◽

E Learning

The advent of internet has lead to colossal development of e-learning frameworks. The efficiency of such systems however relies on the effectiveness and fast content based retrieval approaches. This paper presents a methodology for efficient search and retrieval of lecture videos based on Machine Learning (ML) text classification algorithm. The text transcript is generated exclusively from the audio content extracted from the video lectures. This content is utilized for the summary and keyword extraction which is used for training the ML text classification model. An optimized search is achieved based on the trained ML model. The performance of the system is compared by training the system using Naive Bayes, Support Vector Machine and Logistic Regression algorithms. Performance evaluation was done by precision, recall, F-score and accuracy of the search for each of the classifiers. It is observed that the system trained on Naive Bayes classification algorithm achieved better performance both in terms of time and also with respect to relevancy of the search results

Download Full-text

Children’s Activity Classification for Domestic Risk Scenarios Using Environmental Sound and a Bayesian Network

Healthcare ◽

10.3390/healthcare9070884 ◽

2021 ◽

Vol 9 (7) ◽

pp. 884

Author(s):

Antonio García-Domínguez ◽

Carlos E. Galván-Tejada ◽

Ramón F. Brena ◽

Antonio A. Aguileta ◽

Jorge I. Galván-Tejada ◽

...

Keyword(s):

Feature Selection ◽

Naive Bayes ◽

Naïve Bayes ◽

Classification Model ◽

Activity Classification ◽

Environmental Sound ◽

Non Invasive ◽

Akaike Criterion ◽

Data Source ◽

Feature Selection Techniques

Children’s healthcare is a relevant issue, especially the prevention of domestic accidents, since it has even been defined as a global health problem. Children’s activity classification generally uses sensors embedded in children’s clothing, which can lead to erroneous measurements for possible damage or mishandling. Having a non-invasive data source for a children’s activity classification model provides reliability to the monitoring system where it is applied. This work proposes the use of environmental sound as a data source for the generation of children’s activity classification models, implementing feature selection methods and classification techniques based on Bayesian networks, focused on the recognition of potentially triggering activities of domestic accidents, applicable in child monitoring systems. Two feature selection techniques were used: the Akaike criterion and genetic algorithms. Likewise, models were generated using three classifiers: naive Bayes, semi-naive Bayes and tree-augmented naive Bayes. The generated models, combining the methods of feature selection and the classifiers used, present accuracy of greater than 97% for most of them, with which we can conclude the efficiency of the proposal of the present work in the recognition of potentially detonating activities of domestic accidents.

Download Full-text

Text classification on mahout with Naïve-Bayes machine learning algorithm

2017 International Artificial Intelligence and Data Processing Symposium (IDAP) ◽

10.1109/idap.2017.8090328 ◽

2017 ◽

Cited By ~ 2

Author(s):

Mehmet Umut Salur ◽

Sezai Tokat ◽

Ibrahim Berkan Aydilek

Keyword(s):

Machine Learning ◽

Text Classification ◽

Naive Bayes ◽

Learning Algorithm ◽

Naïve Bayes ◽

Machine Learning Algorithm

Download Full-text

Two feature weighting approaches for naive Bayes text classifiers

Knowledge-Based Systems ◽

10.1016/j.knosys.2016.02.017 ◽

2016 ◽

Vol 100 ◽

pp. 137-144 ◽

Cited By ~ 52

Author(s):

Lungan Zhang ◽

Liangxiao Jiang ◽

Chaoqun Li ◽

Ganggang Kong

Keyword(s):

Naive Bayes ◽

Naïve Bayes ◽

Feature Weighting ◽

Text Classifiers

Download Full-text

Text Classification Based on Naive Bayes with Adjusted Weights via Frequency Ratio of Feature Words

2021 International Conference on Computer Technology and Media Convergence Design (CTMCD) ◽

10.1109/ctmcd53128.2021.00063 ◽

2021 ◽

Author(s):

Zhaoyi Guo

Keyword(s):

Text Classification ◽

Frequency Ratio ◽

Naive Bayes ◽

Naïve Bayes

Download Full-text

Perbandingan Optimasi Feature Selection pada Naïve Bayes untuk Klasifikasi Kepuasan Airline Passenger

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v5i3.3086 ◽

2021 ◽

Vol 5 (3) ◽

pp. 527-533

Author(s):

Yoga Religia ◽

Amali Amali

Keyword(s):

Feature Selection ◽

Customer Satisfaction ◽

Naive Bayes ◽

Naïve Bayes ◽

Point Of View ◽

Classification Model ◽

Passenger Satisfaction ◽

Airline Passenger ◽

Bayes Algorithm

The quality of an airline's services cannot be measured from the company's point of view, but must be seen from the point of view of customer satisfaction. Data mining techniques make it possible to predict airline customer satisfaction with a classification model. The Naïve Bayes algorithm has demonstrated outstanding classification accuracy, but currently independent assumptions are rarely discussed. Some literature suggests the use of attribute weighting to reduce independent assumptions, which can be done using particle swarm optimization (PSO) and genetic algorithm (GA) through feature selection. This study conducted a comparison of PSO and GA optimization on Naïve Bayes for the classification of Airline Passenger Satisfaction data taken from www.kaggle.com. After testing, the best performance is obtained from the model formed, namely the classification of Airline Passenger Satisfaction data using the Naïve Bayes algorithm with PSO optimization, where the accuracy value is 86.13%, the precision value is 87.90%, the recall value is 87.29%, and the value is AUC of 0.923.

Download Full-text

An Improved FloatBoost Algorithm for Naïve Bayes Text Classification

Advances in Web-Age Information Management - Lecture Notes in Computer Science ◽

10.1007/11563952_15 ◽

2005 ◽

pp. 162-171 ◽

Cited By ~ 2

Author(s):

Xiaoming Liu ◽

Jianwei Yin ◽

Jinxiang Dong ◽

Memon Abdul Ghafoor

Keyword(s):

Text Classification ◽

Naive Bayes ◽

Naïve Bayes

Download Full-text

Predicting Student’s Performance Using Machine Learning Algorithm

International Journal of Advanced Research in Science, Communication and Technology ◽

10.48175/ijarsct-1209 ◽

2021 ◽

pp. 53-58

Author(s):

Sheela Rani P ◽

Dhivya S ◽

Dharshini Priya M ◽

Dharmila Chowdary A

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Prediction Model ◽

Naive Bayes ◽

Learning Algorithm ◽

Naïve Bayes ◽

Machine Learning Algorithms ◽

Support Vector ◽

Learning Approaches ◽

K Nearest Neighbors

Machine learning is a new analysis discipline that uses knowledge to boost learning, optimizing the training method and developing the atmosphere within which learning happens. There square measure 2 sorts of machine learning approaches like supervised and unsupervised approach that square measure accustomed extract the knowledge that helps the decision-makers in future to require correct intervention. This paper introduces an issue that influences students' tutorial performance prediction model that uses a supervised variety of machine learning algorithms like support vector machine , KNN(k-nearest neighbors), Naïve Bayes and supplying regression and logistic regression. The results supported by various algorithms are compared and it is shown that the support vector machine and Naïve Bayes performs well by achieving improved accuracy as compared to other algorithms. The final prediction model during this paper may have fairly high prediction accuracy .The objective is not just to predict future performance of students but also provide the best technique for finding the most impactful features that influence student’s while studying.

Download Full-text