Toward Optimal Feature Selection in Naive Bayes for Text Categorization

In many applications, one can define a large set of features to support the classification task at hand. At test time, however, these become prohibitively expensive to evaluate, and only a small subset of features is used, often selected for their information-theoretic value. For threshold-based, Naive Bayes classifiers, recent work has suggested selecting features that maximize the expected robustness of the classifier, that is, the expected probability it maintains its decision after seeing more features. We propose the first algorithm to compute this expected same-decision probability for general Bayesian network classifiers, based on compiling the network into a tractable circuit representation. Moreover, we develop a search algorithm for optimal feature selection that utilizes efficient incremental circuit modifications. Experiments on Naive Bayes, as well as more general networks, show the efficacy and distinct behavior of this decision-making approach.

Download Full-text

Model-Based Book Recommender Systems using Naïve Bayes enhanced with Optimal Feature Selection

Proceedings of the 2019 8th International Conference on Software and Computer Applications - ICSCA '19 ◽

10.1145/3316615.3316727 ◽

2019 ◽

Cited By ~ 1

Author(s):

Thi Thanh Sang Nguyen

Keyword(s):

Feature Selection ◽

Recommender Systems ◽

Naive Bayes ◽

Naïve Bayes ◽

Model Based ◽

Optimal Feature Selection ◽

Optimal Feature

Download Full-text

Improving Text Categorization by Multicriteria Feature Selection

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2005.p0570 ◽

2005 ◽

Vol 9 (5) ◽

pp. 570-575

Author(s):

Son Doan ◽

◽

Susumu Horiguchi ◽

Keyword(s):

Feature Selection ◽

Natural Language ◽

Text Categorization ◽

Naive Bayes ◽

Naïve Bayes ◽

Experimental Results ◽

Benchmark Data ◽

Bayes Algorithm

Text categorization involves assigning a natural language document to one or more predefined classes. One of the most interesting issues is feature selection. We propose an approach using multicriteria ranking of eatures, a new procedure for feature selection, and apply these to text categorization. Experimental results dealing with Reuters-21578 and 20Newsgroups benchmark data and the naive Bayes algorithm show that our proposal outperforms conventional feature selection in text categorization performance.

Download Full-text

Optimal feature selection algorithm based on quantum-inspired clone genetic strategy in text categorization

Proceedings of the first ACM/SIGEVO Summit on Genetic and Evolutionary Computation - GEC '09 ◽

10.1145/1543834.1543946 ◽

2009 ◽

Cited By ~ 2

Author(s):

Hao Chen ◽

Beiji Zou

Keyword(s):

Feature Selection ◽

Text Categorization ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Optimal Feature Selection ◽

Optimal Feature

Download Full-text

A Novel Feature Selection Technique for Text Classification Using Naïve Bayes

International Scholarly Research Notices ◽

10.1155/2014/717092 ◽

2014 ◽

Vol 2014 ◽

pp. 1-10 ◽

Cited By ~ 20

Author(s):

Subhajit Dey Sarkar ◽

Saptarsi Goswami ◽

Aman Agarwal ◽

Javed Aktar

Keyword(s):

Feature Selection ◽

Text Classification ◽

Text Categorization ◽

Naive Bayes ◽

Feature Selection Method ◽

Search Space ◽

Selection Method ◽

Naïve Bayes ◽

Training Data ◽

Feature Selection Technique

With the proliferation of unstructured data, text classification or text categorization has found many applications in topic classification, sentiment analysis, authorship identification, spam detection, and so on. There are many classification algorithms available. Naïve Bayes remains one of the oldest and most popular classifiers. On one hand, implementation of naïve Bayes is simple and, on the other hand, this also requires fewer amounts of training data. From the literature review, it is found that naïve Bayes performs poorly compared to other classifiers in text classification. As a result, this makes the naïve Bayes classifier unusable in spite of the simplicity and intuitiveness of the model. In this paper, we propose a two-step feature selection method based on firstly a univariate feature selection and then feature clustering, where we use the univariate feature selection method to reduce the search space and then apply clustering to select relatively independent feature sets. We demonstrate the effectiveness of our method by a thorough evaluation and comparison over 13 datasets. The performance improvement thus achieved makes naïve Bayes comparable or superior to other classifiers. The proposed algorithm is shown to outperform other traditional methods like greedy search based wrapper or CFS.

Download Full-text

Analysis of Sentiment of Moving a National Capital with Feature Selection Naive Bayes Algorithm and Support Vector Machine

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v4i3.1942 ◽

2020 ◽

Vol 4 (3) ◽

pp. 504-512

Author(s):

Faried Zamachsari ◽

Gabriel Vangeran Saragih ◽

Susafa'ati ◽

Windu Gata

Keyword(s):

Social Media ◽

Support Vector Machine ◽

Feature Selection ◽

Public Opinion ◽

Naive Bayes ◽

Naïve Bayes ◽

Capital City ◽

Support Vector ◽

National Capital ◽

Bayes Algorithm

The decision to move Indonesia's capital city to East Kalimantan received mixed responses on social media. When the poverty rate is still high and the country's finances are difficult to be a factor in disapproval of the relocation of the national capital. Twitter as one of the popular social media, is used by the public to express these opinions. How is the tendency of community responses related to the move of the National Capital and how to do public opinion sentiment analysis related to the move of the National Capital with Feature Selection Naive Bayes Algorithm and Support Vector Machine to get the highest accuracy value is the goal in this study. Sentiment analysis data will take from public opinion using Indonesian from Twitter social media tweets in a crawling manner. Search words used are #IbuKotaBaru and #PindahIbuKota. The stages of the research consisted of collecting data through social media Twitter, polarity, preprocessing consisting of the process of transform case, cleansing, tokenizing, filtering and stemming. The use of feature selection to increase the accuracy value will then enter the ratio that has been determined to be used by data testing and training. The next step is the comparison between the Support Vector Machine and Naive Bayes methods to determine which method is more accurate. In the data period above it was found 24.26% positive sentiment 75.74% negative sentiment related to the move of a new capital city. Accuracy results using Rapid Miner software, the best accuracy value of Naive Bayes with Feature Selection is at a ratio of 9:1 with an accuracy of 88.24% while the best accuracy results Support Vector Machine with Feature Selection is at a ratio of 5:5 with an accuracy of 78.77%.

Download Full-text

Application of GA Feature Selection on Naive Bayes, Random Forest and SVM for Credit Card Fraud Detection

2020 International Conference on Decision Aid Sciences and Application (DASA) ◽

10.1109/dasa51403.2020.9317228 ◽

2020 ◽

Author(s):

Yakub K. Saheed ◽

Moshood A. Hambali ◽

Micheal O. Arowolo ◽

Yinusa A. Olasupo

Keyword(s):

Feature Selection ◽

Random Forest ◽

Credit Card ◽

Naive Bayes ◽

Fraud Detection ◽

Naïve Bayes ◽

Credit Card Fraud

Download Full-text

Children’s Activity Classification for Domestic Risk Scenarios Using Environmental Sound and a Bayesian Network

Healthcare ◽

10.3390/healthcare9070884 ◽

2021 ◽

Vol 9 (7) ◽

pp. 884

Author(s):

Antonio García-Domínguez ◽

Carlos E. Galván-Tejada ◽

Ramón F. Brena ◽

Antonio A. Aguileta ◽

Jorge I. Galván-Tejada ◽

...

Keyword(s):

Feature Selection ◽

Naive Bayes ◽

Naïve Bayes ◽

Classification Model ◽

Activity Classification ◽

Environmental Sound ◽

Non Invasive ◽

Akaike Criterion ◽

Data Source ◽

Feature Selection Techniques

Children’s healthcare is a relevant issue, especially the prevention of domestic accidents, since it has even been defined as a global health problem. Children’s activity classification generally uses sensors embedded in children’s clothing, which can lead to erroneous measurements for possible damage or mishandling. Having a non-invasive data source for a children’s activity classification model provides reliability to the monitoring system where it is applied. This work proposes the use of environmental sound as a data source for the generation of children’s activity classification models, implementing feature selection methods and classification techniques based on Bayesian networks, focused on the recognition of potentially triggering activities of domestic accidents, applicable in child monitoring systems. Two feature selection techniques were used: the Akaike criterion and genetic algorithms. Likewise, models were generated using three classifiers: naive Bayes, semi-naive Bayes and tree-augmented naive Bayes. The generated models, combining the methods of feature selection and the classifiers used, present accuracy of greater than 97% for most of them, with which we can conclude the efficiency of the proposal of the present work in the recognition of potentially detonating activities of domestic accidents.

Download Full-text

High-Accuracy Power Quality Disturbance Classification Using the Adaptive ABC-PSO as Optimal Feature Selection Algorithm

Energies ◽

10.3390/en14051238 ◽

2021 ◽

Vol 14 (5) ◽

pp. 1238

Author(s):

Supanat Chamchuen ◽

Apirat Siritaratiwat ◽

Pradit Fuangfoo ◽

Puripong Suthisopapan ◽

Pirat Khunkitti

Keyword(s):

Feature Selection ◽

Power Quality ◽

Distribution System ◽

Classification Accuracy ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Electrical Distribution ◽

Power Quality Disturbance ◽

Optimal Feature Selection ◽

Optimal Feature

Power quality disturbance (PQD) is an important issue in electrical distribution systems that needs to be detected promptly and identified to prevent the degradation of system reliability. This work proposes a PQD classification using a novel algorithm, comprised of the artificial bee colony (ABC) and the particle swarm optimization (PSO) algorithms, called “adaptive ABC-PSO” as the feature selection algorithm. The proposed adaptive technique is applied to a combination of ABC and PSO algorithms, and then used as the feature selection algorithm. A discrete wavelet transform is used as the feature extraction method, and a probabilistic neural network is used as the classifier. We found that the highest classification accuracy (99.31%) could be achieved through nine optimally selected features out of all 72 extracted features. Moreover, the proposed PQD classification system demonstrated high performance in a noisy environment, as well as the real distribution system. When comparing the presented PQD classification system’s performance to previous studies, PQD classification accuracy using adaptive ABC-PSO as the optimal feature selection algorithm is considered to be at a high-range scale; therefore, the adaptive ABC-PSO algorithm can be used to classify the PQD in a practical electrical distribution system.

Download Full-text

Optimal feature selection for machine learning based intrusion detection system by exploiting attribute dependence

Materials Today Proceedings ◽

10.1016/j.matpr.2021.04.643 ◽

2021 ◽

Author(s):

Ghanshyam Prasad Dubey ◽

Dr. Rakesh Kumar Bhujade

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Intrusion Detection ◽

Intrusion Detection System ◽

Detection System ◽

Optimal Feature Selection ◽

Selection For ◽

Optimal Feature

Download Full-text