scholarly journals The Effect of Best First and Spreadsubsample on Selection of a Feature Wrapper With Na�ve Bayes Classifier for The Classification of the Ratio of Inpatients

2016 ◽  
Vol 3 (2) ◽  
pp. 139-148
Author(s):  
M Rizky Wijaya ◽  
Ristu Saptono ◽  
Afrizal Doewes

Diabetes can lead to mortality and disability, so patients should be inpatient again to undergo treatment again to be saved. On previous research about feature selection with greedy stepwise forward fail to predict classification ratio inpatient of patient with the result of recall and precision 0 on data training 60%, 75%, 80%, and 90% and there is suggestion to handle unbalanced class data problem by comparison of data readmitted 6293 and the otherwise 64141. The research purposed to know the effect of choosing the best model using best first instead of greedy stepwise forward and data sampling with spreadsubsample to resolve unbalanced class data problem. The data used was patient data from 130 American Hospital in 1999 until 2008 with 70434 data. The method that used was best first search and spreadsubsample. The result of this research are precision found 0.4 and 0.333 on training dataset 75% and 90% with best first method, while spreadsubsample method found that value of precision and recall is more significantly increased. Spreadsubsample has more effect with the result of precision and recall rather than using best first method.

Author(s):  
Yarma Agustya Dewi Utami ◽  
Volvo Sihombing ◽  
Muhammad Halmi Dar

Sentiment analysis is an important research topic and is currently being developed. Sentiment analysis is carried out to see the opinion or tendency of a person's opinion on a problem or object, whether it tends to have a negative or positive view. The main purpose of this research is to find out public sentiment towards the Full Day school policy comments from the Facebook Page of the Ministry of Education and Culture of the Republic of Indonesia and to determine the performance of the Na-ïve Bayes Classifier Algorithm. The results of this study indicate that the public's negative sentiment towards the Full Day School policy is higher than positive or neutral sentiment. The highest accuracy value is the Naïve Bayes Classifier algorithm with the trigram feature selection of the 300 data training model with a value of 80%. This simulation has proven that the larger the training data and the selection of features used in the NBC Algorithm affect the accuracy of the results. Meanwhile, the simulation results from 10 test data with 5 different NBC and Lexicon algorithms also show that the Full Day School Policy proposed by the Indonesian Minister of Education and Culture has a higher negative sentiment than positive or neutral by most Facebook users who express opinions through comments. The highest accuracy value is the Naïve Bayes Classifier algorithm with the trigram feature selection of the 300 data training model with a value of 80%. This simulation has proven that the larger the training data and the selection of features used in the NBC Algorithm affect the accuracy of the results. Meanwhile, the simulation results from 10 test data with 5 different NBC and Lexicon algorithms also show that the Full Day School Policy proposed by the Indonesian Minister of Education and Culture has a higher negative sentiment than positive or neutral by most users. Facebook that expresses opinions through comments. The highest accuracy value is the Naïve Bayes Classifier algorithm with the tri-gram feature selection of the 300 data training model with a value of 80%. This simulation has proven that the larger the training data and the selection of features used in the NBC Algorithm affect the accuracy results.


Author(s):  
Joshua G. McNeil ◽  
Brian Y. Lattimer

Robotic firefighting is an area of increased focus as a way of limiting the exposure of firefighters to hazardous environments. A suppression system must incorporate multiple functionalities to allow for closed-loop firefighting control. One area of development is classifying water spray as a way of correcting errors between suppressant placement and fire location. An IR vision system is presented which is capable of identifying water. Image segmentation is performed, followed by a process that classifies regions of interest as water or non-water objects. A probabilistic classification method, using Naïve Bayes classifier, was applied on a varied dataset of differing water temperatures and sprays. Objects were segmented using frame differencing with image intensity and difference thresholds. Segments were manually labeled to create a training dataset. Precision, recall, F-measure, and G-measure results of the classifier on a separate test dataset ranged from 86.1-97.4% for classifying water objects using the test dataset with water classification alone having 94.2-97.4% accuracy.


Author(s):  
Р.И. Кузьмич ◽  
А.А. Ступина ◽  
М.И. Цепкова ◽  
С.Н. Ежеманская

Предлагается подход для отбора важных признаков при классификации наблюдений. Реализация подхода основана на построении логических правил на базе метода логического анализа данных и учете частоты использования признаков при их формировании для конкретной задачи классификации. An approach is proposed for the selection of important features in the classification of observations. The implementation of the approach is based on the construction of patterns based on the method of logical analysis of data and taking into account the frequency of using features when forming them for a specific classification task.


Author(s):  
A. A. Artemyev ◽  
E. A. Kazachkov ◽  
S. N. Matyugin ◽  
V. V. Sharonov

This paper considers the problem of classifying surface water objects, e.g. ships of different classes, in visible spectrum images using convolutional neural networks. A technique for forming a database of images of surface water objects and a special training dataset for creating a classification are presented. A method for forming and training of a convolutional neural network is described. The dependence of the probability of correct recognition on the number and variants of the selection of specific classes of surface water objects is analysed. The results of recognizing different sets of classes are presented.


Kilat ◽  
2020 ◽  
Vol 9 (1) ◽  
pp. 103-114
Author(s):  
Arini - Arini ◽  
Luh Kesuma Wardhani ◽  
Dimas - Octaviano

Towards an election year (elections) in 2019 to come, many mass campaign conducted through social media networks one of them on twitter. One online campaign is very popular among the people of the current campaign with the hashtag #2019GantiPresiden. In studies sentiment analysis required hashtag 2019GantiPresiden classifier and the selection of robust functionality that mendaptkan high accuracy values. One of the classifier and feature selection algorithms are Naive Bayes classifier (NBC) with Tri-Gram feature selection Character & Term-Frequency which previous research has resulted in a fairly high accuracy. The purpose of this study was to determine the implementation of Algorithm Naive Bayes classifier (NBC) with each selection and compare features and get accurate results from Algorithm Naive Bayes classifier (NBC) with both the selection of the feature. The author uses the method of observation to collect data and do the simulation. By using the data of 1,000 tweets originating from hashtag # 2019GantiPresiden taken on 15 September 2018, the author divides into two categories: 950 tweets as training data and 50 tweets as test data where the labeling process using methods Lexicon Based sentiment. From this study showed Naïve Bayes classifier algorithm accuracy (NBC) with feature selection Character Tri-Gram by 76% and Term-Frequency by 74%,the result show that the feature selection Character Tri-Gram better than Term-Frequency.


2014 ◽  
Vol 543-547 ◽  
pp. 3614-3620
Author(s):  
Zhi Qiang Li ◽  
De Quan Yang ◽  
Yuan Tan ◽  
Yuan Ping Zou

For the attribute-weighted based naive Bayesian classification algorithms, the selection of the weight directly affects the classification results. Based on this, the drawbacks of the TFIDF feature selection approaches in sentiment classification for the microblogs are analyzed, and an improved algorithm named TF-D(t)-CHI is proposed, which applies statistical calculation to obtain the correlation degree between the feature words and the classes. It presents the distribution of the feature items by variance in classes, which solves the problem that the short-texts contain few feature words while the high frequency feature words have too high weight. Experimental result indicate that TF-D(T)-CHI based naive Bayesian classification for feature selection and weight calculation has better classification results in sentiment classification for microblogs.


2017 ◽  
Vol 4 (2) ◽  
pp. 179-190
Author(s):  
Wandha Budhi Trihanto ◽  
Riza Arifudin ◽  
Much Aziz Muslim

The journal is known as one of the relevant serial literature that can support a researcher in doing his research. In its development journal has two formats that can be accessed by library users namely: printed format and digital format. Then from the number of published journals, not accompanied by the growing amount of information and knowledge that can be retrieved from these documents. The TF-IDF method is one of the fastest and most efficient text mining methods to extract useful words as the value of information from a document. This method combines two concepts of weight calculation that is the frequency of word appearance on a particular document and the inverse frequency of documents containing the word. Furthermore, data analysis of journal title is done by Nave Bayes Classifier method. The purpose of the research is to build a website-based information retrieval system that can help to classify and define trends from Indonesian journal titles. This research produces a system that can be used to classify journal titles in Indonesian language, with system accuracy in determining the classification of 90,6% and 9,4% error rate. The highest percentage result that became the trend of title classification was decision support system category which was 24.7%.


Sign in / Sign up

Export Citation Format

Share Document