scholarly journals A comparison of three discrete methods for classification of heart disease data

2015 ◽  
Vol 50 (4) ◽  
pp. 293-296 ◽  
Author(s):  
D Chaki ◽  
A Das ◽  
MI Zaber

The classification of heart disease patients is of great importance in cardiovascular disease diagnosis. Numerous data mining techniques have been used so far by the researchers to aid health care professionals in the diagnosis of heart disease. For this task, many algorithms have been proposed in the previous few years. In this paper, we have studied different supervised machine learning techniques for classification of heart disease data and have performed a procedural comparison of these. We have used the C4.5 decision tree classifier, a naïve Bayes classifier, and a Support Vector Machine (SVM) classifier over a large set of heart disease data. The data used in this study is the Cleveland Clinic Foundation Heart Disease Data Set available at UCI Machine Learning Repository. We have found that SVM outperformed both naïve Bayes and C4.5 classifier, giving the best accuracy rate of correctly classifying highest number of instances. We have also found naïve Bayes classifier achieved a competitive performance though the assumption of normality of the data is strongly violated.Bangladesh J. Sci. Ind. Res. 50(4), 293-296, 2015

2021 ◽  
Vol 2 (2) ◽  
pp. 101-107
Author(s):  
Akhmad Muzaki ◽  
Arita Witanti

The 2020 regional elections in the midst of the COVID-19 pandemic are starting to get crowded starting from the real world and in cyberspace, especially on Twitter social media. Twitter's existence has been widely used by various communities in recent years. Twitter is one of the media that represents the public response regarding public issu. Ahead of the general election (PEMILU), there are usually some parties who want to know the results of public sentiment or response to the issue, namely academics, intellectuals or even political opponents. Nevertheless, the implementation of local elections is very polemic in the community, therefore this study tries to analyze tweets that talk about issue public, namely the 2020 elections in the wake of the COVID-19 Pandemic. The analysis usually uses the classification of tweets containing public sentiment about the issue. The classification method used in this research is Naive Bayes Classifier (NBC) And Support Vector Machine (SVM). Naive Bayes Classifier is combined with features that can detect weighting using probability. The classification of tweets in this study was obtained based on a combination of two classes namely sentiment class and category class. The classification of sentiment consists of positive and negative. Test results on built-in applications show that accuracy with Naive Bayes delivers better results than Support Vector Machine. However, overall the use of the Naive Bayes method has a good performance to classify tweets with an accuracy rate of 92.2%


With the increase in the usage of mobile technology, the rate of information is duplicated as a huge volume. Due to the volume duplication of message, the identification of spam messages leads to challenging task. The growth of mobile usage leads to instant communication only through messages. This drastically leads to hackers and unauthorized users to the spread and misuse of sending spam messages. The identification of spam messages is a research oriented problem for the mobile service providers in order to raise the number of customers and to retain them. With this overview, this paper focuses on identifying and prediction of spam and ham messages. The SMS Spam Message Detection dataset from KAGGLE machine learning Repository is used for prediction analysis. The identification of spam and ham messages is done in the following ways. Firstly, the levels of spread of target variable namely spam or ham is identified and they are depicted as a graph. Secondly, the essential tokens that are responsible for the spam and ham messages are identified and they are found by using the hashing Vectorizer and it is portrayed in the form of spam and Ham messages word cloud. Thirdly, the hash vectorized SMS Spam Message detection dataset is fitted to various classifiers like Ada Boost Classifier, Extra Tree classifier, KNN classifier, Random Forest classifier, Linear SVM classifier, Kernel SVM classifier, Logistic Regression classifier, Gaussian Naive Bayes classifier, Decision Tree classifier, Gradient Boosting classifier and Multinomial Naive Bayes classifier. The evaluation of the classifier models are done by analyzing the Performance analysis metrics like Accuracy, Recall, FScore, Precision and Recall. The implementation is done by python in Anaconda Spyder Navigator. Experimental Results shows that the Linear Support Vector Machine classifier have achieved the effective performance indicators with the precision of 0.98, recall of 0.98, FScore of 0.98 , and Accuracy of 98.71%.


2020 ◽  
Author(s):  
Mohimenul Karim ◽  
Md. Rashid Abid

AbstractSpecific gene regions in DNA, like COI (Cytochrome c oxidase I) in case of animals, have been defined as DNA barcode and many studies proved that it can be used as an identifier to distinguish species. The standard length of a DNA barcode is approximately 650 bp. But because of the challenges in sequencing technologies or unavailability of high-quality genomic DNA, it is not always possible to get the full barcode sequence of an organism. As a result, recent studies suggest that mini-barcodes can provide a good contribution in the species identification process. Among various methods proposed for the identification task, supervised machine learning methods have been shown effective. In this study, we have analyzed the effect of different barcode lengths on species identification from the perspective of supervised machine learning and suggested a general approximation of required length of mini-barcode in this regard. We have implemented a Naïve Bayes classifier as our model and implied the effectiveness of mini-barcode by demonstrating the accuracy responses varying the length of DNA barcode sequences.


2021 ◽  
Vol 2021 ◽  
pp. 1-7
Author(s):  
John Andoh ◽  
Louis Asiedu ◽  
Anani Lotsi ◽  
Charlotte Chapman-Wardy

Gathering public opinions on the Internet and Internet-based applications like Twitter has become popular in recent times, as it provides decision-makers with uncensored public views on products, government policies, and programs. Through natural language processing and machine learning techniques, unstructured data forms from these sources can be analyzed using traditional statistical learning. The challenge encountered in machine learning method-based sentiment classification still remains the abundant amount of data available, which makes it difficult to train the learning algorithms in feasible time. This eventually degrades the classification accuracy of the algorithms. From this assertion, the effect of training data sizes in classification tasks cannot be overemphasized. This study statistically assessed the performance of Naive Bayes, support vector machine (SVM), and random forest algorithms on sentiment text classification task. The research also investigated the optimal conditions such as varying data sizes, trees, and kernel types under which each of the respective algorithms performed best. The study collected Twitter data from Ghanaian users which contained sentiments about the Ghanaian Government. The data was preprocessed, manually labeled by the researcher, and then trained using the aforementioned algorithms. These algorithms are three of the most popular learning algorithms which have had lots of success in diverse fields. The Naive Bayes classifier was adjudged the best algorithm for the task as it outperformed the other two machine learning algorithms with an accuracy of 99%, F1 score of 86.51%, and Matthews correlation coefficient of 0.9906. The algorithm also performed well with increasing data sizes. The Naive Bayes classifier is recommended as viable for sentiment text classification, especially for text classification systems which work with Big Data.


Electronics ◽  
2021 ◽  
Vol 10 (17) ◽  
pp. 2083
Author(s):  
Sylwia Rapacz ◽  
Piotr Chołda ◽  
Marek Natkaniec

The paper elaborates on how text analysis influences classification—a key part of the spam-filtering process. The authors propose a multistage meta-algorithm for checking classifier performance. As a result, the algorithm allows for the fast selection of the best-performing classifiers as well as for the analysis of higher-dimensionality data. The last aspect is especially important when analyzing large datasets. The approach of cross-validation between different datasets for supervised learning is applied in the meta-algorithm. Three machine-learning methods allowing a user to classify e-mails as desirable (ham) or potentially harmful (spam) messages were compared in the paper to illustrate the operation of the meta-algorithm. The used methods are simple, but as the results showed, they are powerful enough. We use the following classifiers: k-nearest neighbours (k-NNs), support vector machines (SVM), and the naïve Bayes classifier (NB). The conducted research gave us the conclusion that multinomial naïve Bayes classifier can be an excellent weapon in the fight against the constantly increasing amount of spam messages. It was also confirmed that the proposed solution gives very accurate results.


2021 ◽  
Vol 2 (2) ◽  
pp. 96-104
Author(s):  
REYNALDA NABILA CIKANIA

Halodoc is a telemedicine-based healthcare application that connects patients with health practitioners such as doctors, pharmacies, and laboratories. There are some comments from halodoc users, both positive and negative comments. This indicates the public's concern for the Halodoc application so it is necessary to analyze the sentiment or comments that appear on the Halodoc application service, especially during the COVID-19 pandemic in order for Halodoc application services to be better. The Naïve Bayes Classifier (NBC) and Support Vector Machine (SVM) algorithms are used to analyze the public sentiment of Halodoc's telemedicine service application users. The negative category sentiment classification result was 12.33%, while the positive category sentiment was 87.67% from 5,687 reviews which means that the positive review sentiment is more than the negative review sentiment. The accuracy performance of the Naive Bayes Classifier Algorithm resulted in an accuracy rate of 87.77% with an AUC value of 57.11% and a G-Mean of 40.08%, while svm algorithm with KERNEL RBF had an accuracy value of 86.1% with an AUC value of 60.149% and a G-Mean value of 49.311%. Based on the accuracy value of the model can be known SVM Kernel RBF model better than NBC on classifying the review of user sentiment of halodoc telemedicine service


Sign in / Sign up

Export Citation Format

Share Document