scholarly journals NAIVE BAYES CLASSIFIER, DECISION TREE AND ADABOOST ENSEMBLE ALGORITHM – ADVANTAGES AND DISADVANTAGES

Author(s):  
Neli Kalcheva ◽  
◽  
Maya Todorova ◽  
Ginka Marinova ◽  
◽  
...  

The purpose of the publication is to analyse popular classification algorithms in machine learning. The following classifiers were studied: Naive Bayes Classifier, Decision Tree and AdaBoost Ensemble Algorithm. Their advantages and disadvantages are discussed. Research shows that there is no comprehensive universal method or algorithm for classification in machine learning. Each method or algorithm works well depending on the specifics of the task and the data used.

2020 ◽  
Vol 17 (1) ◽  
pp. 37-42
Author(s):  
Yuris Alkhalifi ◽  
Ainun Zumarniansyah ◽  
Rian Ardianto ◽  
Nila Hardi ◽  
Annisa Elfina Augustia

Non-Cash Food Assistance or Bantuan Pangan Non-Tunai (BPNT) is food assistance from the government given to the Beneficiary Family (KPM) every month through an electronic account mechanism that is used only to buy food at the Electronic Shop Mutual Assistance Joint Business Group Hope Family Program (e-Warong KUBE PKH ) or food traders working with Bank Himbara. In its distribution, BPNT still has problems that occur that are experienced by the village apparatus especially the apparatus of Desa Wanasari on making decisions, which ones are worthy of receiving (poor) and not worthy of receiving (not poor). So one way that helps in making decisions can be done through the concept of data mining. In this study, a comparison of 2 algorithms will be carried out namely Naive Bayes Classifier and Decision Tree C.45. The total sample used is as much as 200 head of household data which will then be divided into 2 parts into validation techniques is 90% training data and 10% test data of the total sample used then the proposed model is made in the RapidMiner application and then evaluated using the Confusion Matrix table to find out the highest level of accuracy from 2 of these methods. The results in this classification indicate that the level of accuracy in the Naive Bayes Classifier method is 98.89% and the accuracy level in the Decision Tree C.45 method is 95.00%. Then the conclusion that in this study the algorithm with the highest level of accuracy is the Naive Bayes Classifier algorithm method with a difference in the accuracy rate of 3.89%.


Author(s):  
Mingtao Wu ◽  
Vir V. Phoha ◽  
Young B. Moon ◽  
Amith K. Belman

3D printing, or additive manufacturing, is a key technology for future manufacturing systems. However, 3D printing systems have unique vulnerabilities presented by the ability to affect the infill without affecting the exterior. In order to detect malicious infill defects in 3D printing process, this paper proposes the following: 1) investigate malicious defects in the 3D printing process, 2) extract features based on simulated 3D printing process images, and 3) an experiment of image classification with one group of non-defect infill image and the other group of defect infill training image from 3D printing process. The images are captured layer by layer from the top view of software simulation preview. The data extracted from images is input to two machine learning algorithms, Naive Bayes Classifier and J48 Decision Trees. The result shows Naive Bayes Classifier has an accuracy of 85.26% and J48 Decision Trees has an accuracy of 95.51% for classification.


2017 ◽  
Vol 5 (8) ◽  
pp. 260-266
Author(s):  
Subhankar Manna ◽  
Malathi G.

Healthcare industry collects huge amount of unclassified data every day.  For an effective diagnosis and decision making, we need to discover hidden data patterns. An instance of such dataset is associated with a group of metabolic diseases that vary greatly in their range of attributes. The objective of this paper is to classify the diabetic dataset using classification techniques like Naive Bayes, ID3 and k means classification. The secondary objective is to study the performance of various classification algorithms used in this work. We propose to implement the classification algorithm using R package. This work used the dataset that is imported from the UCI Machine Learning Repository, Diabetes 130-US hospitals for years 1999-2008 Data Set. Motivation/Background: Naïve Bayes is a probabilistic classifier based on Bayes theorem. It provides useful perception for understanding many algorithms. In this paper when Bayesian algorithm applied on diabetes dataset, it shows high accuracy. Is assumes variables are independent of each other. In this paper, we construct a decision tree from diabetes dataset in which it selects attributes at each other node of the tree like graph and model, each branch represents an outcome of the test, and each node hold a class attribute. This technique separates observation into branches to construct tree. In this technique tree is split in a recursive way called recursive partitioning. Decision tree is widely used in various areas because it is good enough for dataset distribution. For example, by using ID3 (Decision tree) algorithm we get a result like they are belong to diabetes or not. Method: We will use Naïve Bayes for probabilistic classification and ID3 for decision tree.  Results: The dataset is related to Diabetes dataset. There are 18 columns like – Races, Gender, Take_metformin, Take_repaglinide, Insulin, Body_mass_index, Self_reported_health etc. and 623 rows. Naive Bayes Classifier algorithm will be used for getting the probability of having diabetes or not. Here Diabetes is the class for Diabetes data set. There are two conditions “Yes” and “No” and have some personal information about the patient like - Races, Gender, Take_metformin, Take_repaglinide, Insulin, Body_mass_index, Self_reported_health etc. We will see the probability that for “Yes” what unit of probability and for “No” what unit of probability which is given bellow. For Example: Gender – Female have 0.4964 for “No” and 0.5581 for “Yes” and for Male 0.5035 is for “No” and 0.4418 for “Yes”. Conclusions: In this paper two algorithms had been implemented Naive Bayes Classifier algorithm and ID3 algorithm. From Naive Bayes Classifier algorithm, the probability of having diabetes has been predicted and from ID3 algorithm a decision tree has been generated.


With the growing volume and the amount of spam message, the demand for identifying the effective method for spam detection is in claim. The growth of mobile phone and Smartphone has led to the drastic increase in the SMS spam messages. The advancement and the clean process of mobile message servicing channel have attracted the hackers to perform their hacking through SMS messages. This leads to the fraud usage of other accounts and transaction that result in the loss of service and profit to the owners. With this background, this paper focuses on predicting the Spam SMS messages. The SMS Spam Message Detection dataset from KAGGLE machine learning Repository is used for prediction analysis. The analysis of Spam message detection is achieved in four ways. Firstly, the distribution of the target variable Spam Type the dataset is identified and represented by the graphical notations. Secondly, the top word features for the Spam and Ham messages in the SMS messages is extracted using Count Vectorizer and it is displayed using spam and Ham word cloud. Thirdly, the extracted Counter vectorized feature importance SMS Spam Message detection dataset is fitted to various classifiers like KNN classifier, Random Forest classifier, Linear SVM classifier, Ada Boost classifier, Kernel SVM classifier, Logistic Regression classifier, Gaussian Naive Bayes classifier, Decision Tree classifier, Extra Tree classifier, Gradient Boosting classifier and Multinomial Naive Bayes classifier. Performance analysis is done by analyzing the performance metrics like Accuracy, FScore, Precision and Recall. The implementation is done by python in Anaconda Spyder Navigator. Experimental Results shows that the Multinomial Naive Bayes classifier have achieved the effective prediction with the precision of 0.98, recall of 0.98, FScore of 0.98 , and Accuracy of 98.20%..


2022 ◽  
Vol 07 (01) ◽  
Author(s):  
Ramakrishna Hegde ◽  

The researcher explained the implementation process of finding the scholarship for the students by using machine learning supervised learning algorithm i.e. Naïve Bayes algorithm. Addition to this it includes a small description of naïve bayes classifier which used to be used through the authors. It explains the significance of training facts set and trying out information set in Machine mastering techniques. Machine learning nowadays becomes plenty used technique in the field of IT industry. It is a very effective instrument and technique for many quite a number fields such as education, IT and even in enterprise industry. In this paper, the researcher attempt to find computerized end result reputation of scholarships of college students by way of using naïve bayes classifier algorithm primarily based on the scholar educational performance, conversation skills, greedy power, IHS, income, time management, regularity etc. A scholarship offers a strength and self assurance to a student. It also boosts the performance of students indirectly. Usually scholarships are furnished by governments or authorities organizations. It is very essential for students to recognize their personal potentiality early in their educational profession so that they faster its growth, receiving attention from an employer or corporation helps college students take this step. Students can apply for scholarships primarily based on the eligibility criteria (such as caste category, annual income, etc). The scholarship will be issued based on merit, student performance and career specific. Different schemes of scholarships are provided for the students based on distinct eligibility criteria. By the use of a naïve bayes classifier, the researcher acquired a end result with accuracy of 96.7% and error of 3.3%. The repute of scholarship students was once displayed in the form of yes or no.


2015 ◽  
Vol 50 (4) ◽  
pp. 293-296 ◽  
Author(s):  
D Chaki ◽  
A Das ◽  
MI Zaber

The classification of heart disease patients is of great importance in cardiovascular disease diagnosis. Numerous data mining techniques have been used so far by the researchers to aid health care professionals in the diagnosis of heart disease. For this task, many algorithms have been proposed in the previous few years. In this paper, we have studied different supervised machine learning techniques for classification of heart disease data and have performed a procedural comparison of these. We have used the C4.5 decision tree classifier, a naïve Bayes classifier, and a Support Vector Machine (SVM) classifier over a large set of heart disease data. The data used in this study is the Cleveland Clinic Foundation Heart Disease Data Set available at UCI Machine Learning Repository. We have found that SVM outperformed both naïve Bayes and C4.5 classifier, giving the best accuracy rate of correctly classifying highest number of instances. We have also found naïve Bayes classifier achieved a competitive performance though the assumption of normality of the data is strongly violated.Bangladesh J. Sci. Ind. Res. 50(4), 293-296, 2015


2013 ◽  
Vol 2013 ◽  
pp. 1-9 ◽  
Author(s):  
Anunchai Assawamakin ◽  
Supakit Prueksaaroon ◽  
Supasak Kulawonganunchai ◽  
Philip James Shaw ◽  
Vara Varavithya ◽  
...  

Identification of suitable biomarkers for accurate prediction of phenotypic outcomes is a goal for personalized medicine. However, current machine learning approaches are either too complex or perform poorly. Here, a novel two-step machine-learning framework is presented to address this need. First, a Naïve Bayes estimator is used to rank features from which the top-ranked will most likely contain the most informative features for prediction of the underlying biological classes. The top-ranked features are then used in a Hidden Naïve Bayes classifier to construct a classification prediction model from these filtered attributes. In order to obtain the minimum set of the most informative biomarkers, the bottom-ranked features are successively removed from the Naïve Bayes-filtered feature list one at a time, and the classification accuracy of the Hidden Naïve Bayes classifier is checked for each pruned feature set. The performance of the proposed two-step Bayes classification framework was tested on different types of -omicsdatasets including gene expression microarray, single nucleotide polymorphism microarray (SNParray), and surface-enhanced laser desorption/ionization time-of-flight (SELDI-TOF) proteomic data. The proposed two-step Bayes classification framework was equal to and, in some cases, outperformed other classification methods in terms of prediction accuracy, minimum number of classification markers, and computational time.


Sign in / Sign up

Export Citation Format

Share Document