Feature Selection Based on Sampling and C4.5 Algorithm to Improve the Quality of Text Classification Using Naïve Bayes

Author(s):  
Viviana Molano ◽  
Carlos Cobos ◽  
Martha Mendoza ◽  
Enrique Herrera-Viedma ◽  
Milos Manic
2009 ◽  
Vol 36 (3) ◽  
pp. 5432-5435 ◽  
Author(s):  
Jingnian Chen ◽  
Houkuan Huang ◽  
Shengfeng Tian ◽  
Youli Qu

2014 ◽  
Vol 2014 ◽  
pp. 1-10 ◽  
Author(s):  
Subhajit Dey Sarkar ◽  
Saptarsi Goswami ◽  
Aman Agarwal ◽  
Javed Aktar

With the proliferation of unstructured data, text classification or text categorization has found many applications in topic classification, sentiment analysis, authorship identification, spam detection, and so on. There are many classification algorithms available. Naïve Bayes remains one of the oldest and most popular classifiers. On one hand, implementation of naïve Bayes is simple and, on the other hand, this also requires fewer amounts of training data. From the literature review, it is found that naïve Bayes performs poorly compared to other classifiers in text classification. As a result, this makes the naïve Bayes classifier unusable in spite of the simplicity and intuitiveness of the model. In this paper, we propose a two-step feature selection method based on firstly a univariate feature selection and then feature clustering, where we use the univariate feature selection method to reduce the search space and then apply clustering to select relatively independent feature sets. We demonstrate the effectiveness of our method by a thorough evaluation and comparison over 13 datasets. The performance improvement thus achieved makes naïve Bayes comparable or superior to other classifiers. The proposed algorithm is shown to outperform other traditional methods like greedy search based wrapper or CFS.


2012 ◽  
Vol 433-440 ◽  
pp. 2881-2886 ◽  
Author(s):  
Run Zhi Li ◽  
Yang Sen Zhang

In this paper, we study on the problem of how to combine feature selection models in text classification ,and present a method through build the hybrid model for feature selection ,this hybrid model combined with advantage of four feature selection models (DF,MI, IG, CHI), then we use the Naive Bayes model as classifier to verify the effect of the hybrid feature selelction model ,and experiments shows that the hybrid model is correct and effective and get good performance in text classification.


2018 ◽  
Vol 5 (2) ◽  
pp. 60-67 ◽  
Author(s):  
Dwi Yulianto ◽  
Retno Nugroho Whidhiasih ◽  
Maimunah Maimunah

ABSTRACT   Banana fruit is a commodity that contributes a great value to both national and international fruit production achievement. The government through the National Standardization Agency establishes standards to maintain the quality of bananas. The purpose of this Project is to classify the stages of maturity of Ambon banana base on the color index using Naïve Bayes method in accordance with the regulations of SNI 7422:2009. Naive Bayes is used as a method in the classification process by comparing the probability values generated from the variable value of each model to determine the stage of Ambon banana maturity. The data used is the primary data image of 105 pieces of Ambon banana. By using 3 models which consists of different variables obtained the same greatest average accuracy by using the 2nd model which has 9 variable values (r, g, b, v, * a, * b, entropy, energy, and homogeneity) and the 3rd model has 7 variable values (r, g, b, v , * a, entropy and homogeneity) that is 90.48%.   Keywords: banana maturity, classification, image processing     ABSTRAK   Buah pisang merupakan komoditas yang memberikan kontribusi besar terhadap angka produksi buah nasional maupun internasional. Pemerintah melalui Badan Standarisasi Nasional menetapkan standar untuk buah pisang, menjaga mutu  buah pisang. Tujuan dari penelitian ini adalah klasifikasi tahapan kematangan dari buah pisang ambon berdasarkan indeks warna menggunakan metode Naïve Bayes  sesuai dengan SNI 7422:2009. Naive bayes digunakan sebagai metode dalam proses pengklasifikasian dengan cara membandingkan nilai probabilitas yang dihasilkan dari nilai variabel penduga setiap model untuk menentukan tahap kematangan pisang ambon. Data yang digunakan adalah data primer citra pisang ambon sebanyak 105. Dengan menggunakan 3 buah model yang terdiri dari variabel penduga yang berbeda didapatkan akurasi rata-rata terbesar yang sama yaitu dengan menggunakan model ke-2 yang mempunyai 9 nilai variabel (r, g, b, v, *a, *b, entropi, energi, dan homogenitas) dan model ke-3 yang mempunyai 7 nilai variabel (r, g, b, v, *a, entropi dan homogenitas) yaitu sebesar 90.48%.   Kata Kunci : kematangan pisang,  klasifikasi, pengolahan citra


2020 ◽  
Vol 4 (3) ◽  
pp. 504-512
Author(s):  
Faried Zamachsari ◽  
Gabriel Vangeran Saragih ◽  
Susafa'ati ◽  
Windu Gata

The decision to move Indonesia's capital city to East Kalimantan received mixed responses on social media. When the poverty rate is still high and the country's finances are difficult to be a factor in disapproval of the relocation of the national capital. Twitter as one of the popular social media, is used by the public to express these opinions. How is the tendency of community responses related to the move of the National Capital and how to do public opinion sentiment analysis related to the move of the National Capital with Feature Selection Naive Bayes Algorithm and Support Vector Machine to get the highest accuracy value is the goal in this study. Sentiment analysis data will take from public opinion using Indonesian from Twitter social media tweets in a crawling manner. Search words used are #IbuKotaBaru and #PindahIbuKota. The stages of the research consisted of collecting data through social media Twitter, polarity, preprocessing consisting of the process of transform case, cleansing, tokenizing, filtering and stemming. The use of feature selection to increase the accuracy value will then enter the ratio that has been determined to be used by data testing and training. The next step is the comparison between the Support Vector Machine and Naive Bayes methods to determine which method is more accurate. In the data period above it was found 24.26% positive sentiment 75.74% negative sentiment related to the move of a new capital city. Accuracy results using Rapid Miner software, the best accuracy value of Naive Bayes with Feature Selection is at a ratio of 9:1 with an accuracy of 88.24% while the best accuracy results Support Vector Machine with Feature Selection is at a ratio of 5:5 with an accuracy of 78.77%.


Healthcare ◽  
2021 ◽  
Vol 9 (7) ◽  
pp. 884
Author(s):  
Antonio García-Domínguez ◽  
Carlos E. Galván-Tejada ◽  
Ramón F. Brena ◽  
Antonio A. Aguileta ◽  
Jorge I. Galván-Tejada ◽  
...  

Children’s healthcare is a relevant issue, especially the prevention of domestic accidents, since it has even been defined as a global health problem. Children’s activity classification generally uses sensors embedded in children’s clothing, which can lead to erroneous measurements for possible damage or mishandling. Having a non-invasive data source for a children’s activity classification model provides reliability to the monitoring system where it is applied. This work proposes the use of environmental sound as a data source for the generation of children’s activity classification models, implementing feature selection methods and classification techniques based on Bayesian networks, focused on the recognition of potentially triggering activities of domestic accidents, applicable in child monitoring systems. Two feature selection techniques were used: the Akaike criterion and genetic algorithms. Likewise, models were generated using three classifiers: naive Bayes, semi-naive Bayes and tree-augmented naive Bayes. The generated models, combining the methods of feature selection and the classifiers used, present accuracy of greater than 97% for most of them, with which we can conclude the efficiency of the proposal of the present work in the recognition of potentially detonating activities of domestic accidents.


Sign in / Sign up

Export Citation Format

Share Document