Analysis on Ensemble Methods for the Prediction of Cardiovascular Disease

2021 ◽  
Vol 11 (10) ◽  
pp. 2529-2537
Author(s):  
C. Murale ◽  
M. Sundarambal ◽  
R. Nedunchezhian

Coronary Heart disease is one of the dominant sources of death and morbidity for the people worldwide. The identification of cardiac disease in the clinical review is considered one of the main problems. As the amount of data grows increasingly, interpretation and retrieval become even more complex. In addition, the Ensemble learning prediction model seems to be an important fact in this area of study. The prime aim of this paper is also to forecast CHD accurately. This paper is intended to offer a modern paradigm for prediction of cardiovascular diseases with the use of such processes such as pre-processing, detection of features, feature selection and classification. The pre-processing will initially be performed using the ordinal encoding technique, and the statistical and the features of higher order are extracted using the Fisher algorithm. Later, the minimization of record and attribute is performed, in which principle component analysis performs its extensive part in figuring out the “curse of dimensionality.” Lastly, the process of prediction is carried out by the different Ensemble models (SVM, Gaussian Naïve Bayes, Random forest, K-nearest neighbor, Logistic regression, decision tree and Multilayer perceptron that intake the features with reduced dimensions. Finally, in comparison to such success metrics the reliability of the proposal work is compared and its superiority has been confirmed. From the analysis, Naïve bayes with regards to accuracy is 98.4% better than other Ensemble algorithms.

Author(s):  
Amir Ahmad ◽  
Hamza Abujabal ◽  
C. Aswani Kumar

A classifier ensemble is a combination of diverse and accurate classifiers. Generally, a classifier ensemble performs better than any single classifier in the ensemble. Naive Bayes classifiers are simple but popular classifiers for many applications. As it is difficult to create diverse naive Bayes classifiers, naive Bayes ensembles are not very successful. In this paper, we propose Random Subclasses (RS) ensembles for Naive Bayes classifiers. In the proposed method, new subclasses for each class are created by using 1-Nearest Neighbor (1-NN) framework that uses randomly selected points from the training data. A classifier considers each subclass as a class of its own. As the method to create subclasses is random, diverse datasets are generated. Each classifier in an ensemble learns on one dataset from the pool of diverse datasets. Diverse training datasets ensure diverse classifiers in the ensemble. New subclasses create easy to learn decision boundaries that in turn create accurate naive Bayes classifiers. We developed two variants of RS, in the first variant RS(2), two subclasses per class were created whereas in the second variant RS(4), four subclasses per class were created. We studied the performance of these methods against other popular ensemble methods by using naive Bayes as the base classifier. RS(4) outperformed other popular ensemble methods. A detailed study was carried out to understand the behavior of RS ensembles.


2020 ◽  
Vol 4 (1) ◽  
pp. 28-36
Author(s):  
Azminuddin I. S. Azis ◽  
Budy Santoso ◽  
Serwin

Naïve Bayes (NB) algorithm is still in the top ten of the Data Mining algorithms because of it is simplicity, efficiency, and performance. To handle classification on numerical data, the Gaussian distribution and kernel approach can be applied to NB (GNB and KNB). However, in the process of NB classifying, attributes are considered independent, even though the assumption is not always right in many cases. Absolute Correlation Coefficient can determine correlations between attributes and work on numerical attributes, so that it can be applied for attribute weighting to GNB (ACW-NB). Furthermore, because performance of NB does not increase in large datasets, so ACW-NB can be a classifier in the local learning model, where other classification methods, such as K-Nearest Neighbor (K-NN) which are very well known in local learning can be used to obtain sub-dataset in the ACW-NB training. To reduction of noise/bias, then missing value replacement and data normalization can also be applied. This proposed method is termed "LL-KNN ACW-NB (Local Learning K-Nearest Neighbor in Absolute Correlation Weighted Naïve Bayes)," with the objective to improve the performance of NB (GNB and KNB) in handling classification on numerical data. The results of this study indicate that the LL-KNN ACW-NB is able to improve the performance of NB, with an average accuracy of 91,48%, 1,92% better than GNB and 2,86% better than KNB.  


2021 ◽  
Vol 10 (5) ◽  
pp. 2530-2538
Author(s):  
Pulung Nurtantio Andono ◽  
Eko Hari Rachmawanto ◽  
Nanna Suryana Herman ◽  
Kunio Kondo

Orchid flower as ornamental plants with a variety of types where one type of orchid has various characteristics in the form of different shapes and colors. Here, we chosen support vector machine (SVM), Naïve Bayes, and k-nearest neighbor algorithm which generates text input. This system aims to assist the community in recognizing orchid plants based on their type. We used more than 2250 and 1500 images for training and testing respectively which consists of 15 types. Testing result shown impact analysis of comparison of three supervised algorithm using extraction or not and several variety distance. Here, we used SVM in Linear, Polynomial, and Gaussian kernel while k-nearest neighbor operated in distance starting from K1 until K11. Based on experimental results provide Linear kernel as best classifier and extraction process had been increase accuracy. Compared with Naïve Bayes in 66%, and a highest KNN in K=1 and d=1 is 98%, SVM had a better accuracy. SVM-GLCM-HSV better than SVM-HSV only that achieved 98.13% and 93.06% respectively both in Linear kernel. On the other side, a combination of SVM-KNN yield highest accuracy better than selected algorithm here.


Data mining usually specifies the discovery of specific pattern or analysis of data from a large dataset. Classification is one of an efficient data mining technique, in which class the data are classified are already predefined using the existing datasets. The classification of medical records in terms of its symptoms using computerized method and storing the predicted information in the digital format is of great importance in the diagnosis of various diseases in the medical field. In this paper, finding the algorithm with highest accuracy range is concentrated so that a cost-effective algorithm can be found. Here the data mining classification algorithms are compared with their accuracy of finding exact data according to the diagnosis report and their execution rate to identify how fast the records are classified. The classification technique based algorithms used in this study are the Naive Bayes Classifier, the C4.5 tree classifier and the K-Nearest Neighbor (KNN) to predict which algorithm is the best suited for classifying any kind of medical dataset. Here the datasets such as Breast Cancer, Iris and Hypothyroid are used to predict which of the three algorithms is suitable for classifying the datasets with highest accuracy of finding the records of patients with the particular health problems. The experimental results represented in the form of table and graph shows the performance and the importance of Naïve Bayes, C4.5 and K-Nearest Neighbor algorithms. From the performance outcome of the three algorithms the C4.5 algorithm is a lot better than the Naïve Bayes and the K-Nearest Neighbor algorithm.


Author(s):  
Rajni Rajni ◽  
Amandeep Amandeep

<p>Diabetes is a major concern all over the world. It is increasing at a fast pace. People can avoid diabetes at an early stage without any test. The goal of this paper is to predict the probability of whether the person has a risk of diabetes or not at an early stage. This would lead to having a great impact on their quality of human life. The datasets are Pima Indians diabetes and Cleveland coronary illness and consist of 768 records. Though there are a number of solutions available for information extraction from a huge datasets and to predict the possibility of having diabetes, but the accuracy of their mining process is far from accurate. For achieving highest accuracy, the issue of zero probability which is generally faced by naïve bayes analysis needs to be addressed suitably. The proposed framework RB-Bayes aims to extract the required information with high accuracy that could survive the problem of zero probability and also configure accuracy with other methods like Support Vector Machine, Naive Bayes, and K Nearest Neighbor. We calculated mean to handle missing data and calculated probability for yes (positive) and no (negative). The highest value between yes and no decide the value for the tuple. It is mostly used in text classification. The outcomes on Pima Indian diabetes dataset demonstrate that the proposed methodology enhances the precision as a contrast with other regulated procedures. The accuracy of the proposed methodology large dataset is 72.9%.</p>


2019 ◽  
Vol 4 (1) ◽  
Author(s):  
Deny Haryadi ◽  
Rila Mandala

Harga minyak kelapa sawit bisa mengalami kenaikan, penurunan maupun tetap setiap hari karena faktor yang mempengaruhi harga minyak kelapa sawit seperti harga minyak nabati lain (minyak kedelai dan minyak canola), harga minyak mentah dunia, maupun nilai tukar riil antara kurs dolar terhadap mata uang negara produsen (rupiah, ringgit, dan canada) atau mata uang negara konsumen (rupee). Untuk itu dibutuhkan prediksi harga minyak kelapa sawit yang cukup akurat agar para investor bisa mendapatkan keuntungan sesuai perencanaan yang dibuat. tujuan dari penelitian ini yaitu untuk mengetahui perbandingan accuracy, precision, dan recall yang dihasilkan oleh algoritma Naïve Bayes, Support Vector Machine, dan K-Nearest Neighbor dalam menyelesaikan masalah prediksi harga minyak kelapa sawit dalam investasi. Berdasarkan hasil pengujian dalam penelitian yang telah dilakukan, algoritma Support Vector Machine memiliki accuracy, precision, dan recall dengan jumlah paling tinggi dibandingkan dengan algoritma Naïve Bayes dan algoritma K-Nearest Neighbor. Nilai accuracy tertinggi pada penelitian ini yaitu 82,46% dengan precision tertinggi yaitu 86% dan recall tertinggi yaitu 89,06%.


2010 ◽  
Vol 5 (2) ◽  
pp. 133-137 ◽  
Author(s):  
Mohammed J. Islam ◽  
Q. M. Jonathan Wu ◽  
Majid Ahmadi ◽  
Maher A. SidAhmed

Sign in / Sign up

Export Citation Format

Share Document