Feature Selection Based on Sampling and C4.5 Algorithm to Improve the Quality of Text Classification Using Naïve Bayes

Feature selection for text classification with Naïve Bayes

Expert Systems with Applications ◽

10.1016/j.eswa.2008.06.054 ◽

2009 ◽

Vol 36 (3) ◽

pp. 5432-5435 ◽

Cited By ~ 261

Author(s):

Jingnian Chen ◽

Houkuan Huang ◽

Shengfeng Tian ◽

Youli Qu

Keyword(s):

Feature Selection ◽

Text Classification ◽

Naive Bayes ◽

Naïve Bayes ◽

Selection For

Download Full-text

A new feature selection score for multinomial naive Bayes text classification based on KL-divergence

10.3115/1219044.1219068 ◽

2004 ◽

Cited By ~ 13

Author(s):

Karl-Michael Schneider

Keyword(s):

Feature Selection ◽

Text Classification ◽

Naive Bayes ◽

Naïve Bayes ◽

Kl Divergence ◽

New Feature

Download Full-text

Feature Selection Approach for Twitter Sentiment Analysis and Text Classification Based on Chi-Square and Naïve Bayes

Advances in Intelligent Systems and Computing - International Conference on Applications and Techniques in Cyber Security and Intelligence ATCI 2018 ◽

10.1007/978-3-319-98776-7_30 ◽

2018 ◽

pp. 281-298 ◽

Cited By ~ 1

Author(s):

S. Paudel ◽

P. W. C. Prasad ◽

Abeer Alsadoon ◽

MD. Rafiqul Islam ◽

Amr Elchouemi

Keyword(s):

Feature Selection ◽

Sentiment Analysis ◽

Text Classification ◽

Naive Bayes ◽

Naïve Bayes ◽

Chi Square ◽

Selection Approach ◽

Feature Selection Approach

Download Full-text

Discrimination-Based Feature Selection for Multinomial Naïve Bayes Text Classification

Computer Processing of Oriental Languages. Beyond the Orient: The Research Challenges Ahead - Lecture Notes in Computer Science ◽

10.1007/11940098_15 ◽

2006 ◽

pp. 149-156 ◽

Cited By ~ 1

Author(s):

Jingbo Zhu ◽

Huizhen Wang ◽

Xijuan Zhang

Keyword(s):

Feature Selection ◽

Text Classification ◽

Naive Bayes ◽

Naïve Bayes ◽

Selection For

Download Full-text

A Novel Feature Selection Technique for Text Classification Using Naïve Bayes

International Scholarly Research Notices ◽

10.1155/2014/717092 ◽

2014 ◽

Vol 2014 ◽

pp. 1-10 ◽

Cited By ~ 20

Author(s):

Subhajit Dey Sarkar ◽

Saptarsi Goswami ◽

Aman Agarwal ◽

Javed Aktar

Keyword(s):

Feature Selection ◽

Text Classification ◽

Text Categorization ◽

Naive Bayes ◽

Feature Selection Method ◽

Search Space ◽

Selection Method ◽

Naïve Bayes ◽

Training Data ◽

Feature Selection Technique

With the proliferation of unstructured data, text classification or text categorization has found many applications in topic classification, sentiment analysis, authorship identification, spam detection, and so on. There are many classification algorithms available. Naïve Bayes remains one of the oldest and most popular classifiers. On one hand, implementation of naïve Bayes is simple and, on the other hand, this also requires fewer amounts of training data. From the literature review, it is found that naïve Bayes performs poorly compared to other classifiers in text classification. As a result, this makes the naïve Bayes classifier unusable in spite of the simplicity and intuitiveness of the model. In this paper, we propose a two-step feature selection method based on firstly a univariate feature selection and then feature clustering, where we use the univariate feature selection method to reduce the search space and then apply clustering to select relatively independent feature sets. We demonstrate the effectiveness of our method by a thorough evaluation and comparison over 13 datasets. The performance improvement thus achieved makes naïve Bayes comparable or superior to other classifiers. The proposed algorithm is shown to outperform other traditional methods like greedy search based wrapper or CFS.

Download Full-text

Study on the Method of Feature Selection Based on Hybrid Model for Text Classification

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.433-440.2881 ◽

2012 ◽

Vol 433-440 ◽

pp. 2881-2886 ◽

Cited By ~ 2

Author(s):

Run Zhi Li ◽

Yang Sen Zhang

Keyword(s):

Feature Selection ◽

Hybrid Model ◽

Text Classification ◽

Naive Bayes ◽

Naïve Bayes ◽

Selection Models ◽

Bayes Model ◽

Naïve Bayes Model

In this paper, we study on the problem of how to combine feature selection models in text classification ,and present a method through build the hybrid model for feature selection ,this hybrid model combined with advantage of four feature selection models (DF,MI, IG, CHI), then we use the Naive Bayes model as classifier to verify the effect of the hybrid feature selelction model ,and experiments shows that the hybrid model is correct and effective and get good performance in text classification.

Download Full-text

Klasifikasi Tahap Kematangan Pisang Ambon Berdasarkan Warna Menggunakan Naive Bayes

PIKSEL : Penelitian Ilmu Komputer Sistem Embedded and Logic ◽

10.33558/piksel.v5i2.268 ◽

2018 ◽

Vol 5 (2) ◽

pp. 60-67 ◽

Cited By ~ 1

Author(s):

Dwi Yulianto ◽

Retno Nugroho Whidhiasih ◽

Maimunah Maimunah

Keyword(s):

Naive Bayes ◽

Fruit Production ◽

Naïve Bayes ◽

Primary Data ◽

Banana Fruit ◽

Bayes Method ◽

Classification Image ◽

Average Accuracy ◽

The Government

ABSTRACT Banana fruit is a commodity that contributes a great value to both national and international fruit production achievement. The government through the National Standardization Agency establishes standards to maintain the quality of bananas. The purpose of this Project is to classify the stages of maturity of Ambon banana base on the color index using Naïve Bayes method in accordance with the regulations of SNI 7422:2009. Naive Bayes is used as a method in the classification process by comparing the probability values generated from the variable value of each model to determine the stage of Ambon banana maturity. The data used is the primary data image of 105 pieces of Ambon banana. By using 3 models which consists of different variables obtained the same greatest average accuracy by using the 2nd model which has 9 variable values (r, g, b, v, * a, * b, entropy, energy, and homogeneity) and the 3rd model has 7 variable values (r, g, b, v , * a, entropy and homogeneity) that is 90.48%. Keywords: banana maturity, classification, image processing ABSTRAK Buah pisang merupakan komoditas yang memberikan kontribusi besar terhadap angka produksi buah nasional maupun internasional. Pemerintah melalui Badan Standarisasi Nasional menetapkan standar untuk buah pisang, menjaga mutu buah pisang. Tujuan dari penelitian ini adalah klasifikasi tahapan kematangan dari buah pisang ambon berdasarkan indeks warna menggunakan metode Naïve Bayes sesuai dengan SNI 7422:2009. Naive bayes digunakan sebagai metode dalam proses pengklasifikasian dengan cara membandingkan nilai probabilitas yang dihasilkan dari nilai variabel penduga setiap model untuk menentukan tahap kematangan pisang ambon. Data yang digunakan adalah data primer citra pisang ambon sebanyak 105. Dengan menggunakan 3 buah model yang terdiri dari variabel penduga yang berbeda didapatkan akurasi rata-rata terbesar yang sama yaitu dengan menggunakan model ke-2 yang mempunyai 9 nilai variabel (r, g, b, v, *a, *b, entropi, energi, dan homogenitas) dan model ke-3 yang mempunyai 7 nilai variabel (r, g, b, v, *a, entropi dan homogenitas) yaitu sebesar 90.48%. Kata Kunci : kematangan pisang, klasifikasi, pengolahan citra

Download Full-text

Analysis of Sentiment of Moving a National Capital with Feature Selection Naive Bayes Algorithm and Support Vector Machine

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v4i3.1942 ◽

2020 ◽

Vol 4 (3) ◽

pp. 504-512

Author(s):

Faried Zamachsari ◽

Gabriel Vangeran Saragih ◽

Susafa'ati ◽

Windu Gata

Keyword(s):

Social Media ◽

Support Vector Machine ◽

Feature Selection ◽

Public Opinion ◽

Naive Bayes ◽

Naïve Bayes ◽

Capital City ◽

Support Vector ◽

National Capital ◽

Bayes Algorithm

The decision to move Indonesia's capital city to East Kalimantan received mixed responses on social media. When the poverty rate is still high and the country's finances are difficult to be a factor in disapproval of the relocation of the national capital. Twitter as one of the popular social media, is used by the public to express these opinions. How is the tendency of community responses related to the move of the National Capital and how to do public opinion sentiment analysis related to the move of the National Capital with Feature Selection Naive Bayes Algorithm and Support Vector Machine to get the highest accuracy value is the goal in this study. Sentiment analysis data will take from public opinion using Indonesian from Twitter social media tweets in a crawling manner. Search words used are #IbuKotaBaru and #PindahIbuKota. The stages of the research consisted of collecting data through social media Twitter, polarity, preprocessing consisting of the process of transform case, cleansing, tokenizing, filtering and stemming. The use of feature selection to increase the accuracy value will then enter the ratio that has been determined to be used by data testing and training. The next step is the comparison between the Support Vector Machine and Naive Bayes methods to determine which method is more accurate. In the data period above it was found 24.26% positive sentiment 75.74% negative sentiment related to the move of a new capital city. Accuracy results using Rapid Miner software, the best accuracy value of Naive Bayes with Feature Selection is at a ratio of 9:1 with an accuracy of 88.24% while the best accuracy results Support Vector Machine with Feature Selection is at a ratio of 5:5 with an accuracy of 78.77%.

Download Full-text

Application of GA Feature Selection on Naive Bayes, Random Forest and SVM for Credit Card Fraud Detection

2020 International Conference on Decision Aid Sciences and Application (DASA) ◽

10.1109/dasa51403.2020.9317228 ◽

2020 ◽

Author(s):

Yakub K. Saheed ◽

Moshood A. Hambali ◽

Micheal O. Arowolo ◽

Yinusa A. Olasupo

Keyword(s):

Feature Selection ◽

Random Forest ◽

Credit Card ◽

Naive Bayes ◽

Fraud Detection ◽

Naïve Bayes ◽

Credit Card Fraud

Download Full-text

Children’s Activity Classification for Domestic Risk Scenarios Using Environmental Sound and a Bayesian Network

Healthcare ◽

10.3390/healthcare9070884 ◽

2021 ◽

Vol 9 (7) ◽

pp. 884

Author(s):

Antonio García-Domínguez ◽

Carlos E. Galván-Tejada ◽

Ramón F. Brena ◽

Antonio A. Aguileta ◽

Jorge I. Galván-Tejada ◽

...

Keyword(s):

Feature Selection ◽

Naive Bayes ◽

Naïve Bayes ◽

Classification Model ◽

Activity Classification ◽

Environmental Sound ◽

Non Invasive ◽

Akaike Criterion ◽

Data Source ◽

Feature Selection Techniques

Children’s healthcare is a relevant issue, especially the prevention of domestic accidents, since it has even been defined as a global health problem. Children’s activity classification generally uses sensors embedded in children’s clothing, which can lead to erroneous measurements for possible damage or mishandling. Having a non-invasive data source for a children’s activity classification model provides reliability to the monitoring system where it is applied. This work proposes the use of environmental sound as a data source for the generation of children’s activity classification models, implementing feature selection methods and classification techniques based on Bayesian networks, focused on the recognition of potentially triggering activities of domestic accidents, applicable in child monitoring systems. Two feature selection techniques were used: the Akaike criterion and genetic algorithms. Likewise, models were generated using three classifiers: naive Bayes, semi-naive Bayes and tree-augmented naive Bayes. The generated models, combining the methods of feature selection and the classifiers used, present accuracy of greater than 97% for most of them, with which we can conclude the efficiency of the proposal of the present work in the recognition of potentially detonating activities of domestic accidents.

Download Full-text