scholarly journals Perbandingan Support Vector Machine dan Modified Balanced Random Forest dalam Deteksi Pasien Penyakit Diabetes

2021 ◽  
Vol 5 (2) ◽  
pp. 393-399
Author(s):  
Mahendra Dwifebri Purbolaksono ◽  
Muhammad Irvan Tantowi ◽  
Adnan Imam Hidayat ◽  
Adiwijaya Adiwijaya

Diabetes (diabetes) was a metabolic disorder caused by high levels of sugar in the blood caused by disorders of the pancreas and insulin. According to data from the Ministry of Health of the Republic of Indonesia, Diabetes was the third-largest cause of death in Indonesia with a percentage of 6.7%. The high rate of death from diabetes encouraged this study, with the aim of early detection. This research used a Machine Learning approach to classify the data. In this paper, a comparison of Support Vector Machine (SVM) and Modified Balanced Random Forest (MBRF) was discussed for classifying diabetes patient data. Both methods were chosen because it was proven in previous studies to get high accuracy, so that the two methods are compared to find the best classification model. Several preprocessing methods were used to prepare the data for the classification process. The entire combination of preprocessing steps will be carried out on the two classification methods to produce the same dataset. The evaluation was carried out using the Confusion Matrix method. Based on the experimental results in the process of testing the system being built, the maximum performance results were 87.94% using SVM and 97.8% using MBRF.

2019 ◽  
Vol 2 (2) ◽  
pp. 43
Author(s):  
Lalu Mutawalli ◽  
Mohammad Taufan Asri Zaen ◽  
Wire Bagye

In the era of technological disruption of mass communication, social media became a reference in absorbing public opinion. The digitalization of data is very rapidly produced by social media users because it is an attempt to represent the feelings of the audience. Data production in question is the user posts the status and comments on social media. Data production by the public in social media raises a very large set of data or can be referred to as big data. Big data is a collection of data sets in very large numbers, complex, has a relatively fast appearance time, so that makes it difficult to handle. Analysis of big data with data mining methods to get knowledge patterns in it. This study analyzes the sentiments of netizens on Twitter social media on Mr. Wiranto stabbing case. The results of the sentiment analysis showed 41% gave positive comments, 29% commented neutrally, and 29% commented negatively on events. Besides, modeling of the data is carried out using a support vector machine algorithm to create a system capable of classifying positive, neutral, and negative connotations. The classification model that has been made is then tested using the confusion matrix technique with each result is a precision value of 83%, a recall value of 80%, and finally, as much as 80% obtained in testing the accuracy.


2021 ◽  
Vol 42 (Supplement_1) ◽  
Author(s):  
M J Espinosa Pascual ◽  
P Vaquero Martinez ◽  
V Vaquero Martinez ◽  
J Lopez Pais ◽  
B Izquierdo Coronel ◽  
...  

Abstract Introduction Out of all patients admitted with Myocardial Infarction, 10 to 15% have Myocardial Infarction with Non-Obstructive Coronaries Arteries (MINOCA). Classification algorithms based on deep learning substantially exceed traditional diagnostic algorithms. Therefore, numerous machine learning models have been proposed as useful tools for the detection of various pathologies, but to date no study has proposed a diagnostic algorithm for MINOCA. Purpose The aim of this study was to estimate the diagnostic accuracy of several automated learning algorithms (Support-Vector Machine [SVM], Random Forest [RF] and Logistic Regression [LR]) to discriminate between people suffering from MINOCA from those with Myocardial Infarction with Obstructive Coronary Artery Disease (MICAD) at the time of admission and before performing a coronary angiography, whether invasive or not. Methods A Diagnostic Test Evaluation study was carried out applying the proposed algorithms to a database constituted by 553 consecutive patients admitted to our Hospital with Myocardial Infarction. According to the definitions of 2016 ESC Position Paper on MINOCA, patients were classified into two groups: MICAD and MINOCA. Out of the total 553 patients, 214 were discarded due to the lack of complete data. The set of machine learning algorithms was trained on 244 patients (training sample: 75%) and tested on 80 patients (test sample: 25%). A total of 64 variables were available for each patient, including demographic, clinical and laboratorial features before the angiographic procedure. Finally, the diagnostic precision of each architecture was taken. Results The most accurate classification model was the Random Forest algorithm (Specificity [Sp] 0.88, Sensitivity [Se] 0.57, Negative Predictive Value [NPV] 0.93, Area Under the Curve [AUC] 0.85 [CI 0.83–0.88]) followed by the standard Logistic Regression (Sp 0.76, Se 0.57, NPV 0.92 AUC 0.74 and Support-Vector Machine (Sp 0.84, Se 0.38, NPV 0.90, AUC 0.78) (see graph). The variables that contributed the most in order to discriminate a MINOCA from a MICAD were the traditional cardiovascular risk factors, biomarkers of myocardial injury, hemoglobin and gender. Results were similar when the 19 patients with Takotsubo syndrome were excluded from the analysis. Conclusion A prediction system for diagnosing MINOCA before performing coronary angiographies was developed using machine learning algorithms. Results show higher accuracy of diagnosing MINOCA than conventional statistical methods. This study supports the potential of machine learning algorithms in clinical cardiology. However, further studies are required in order to validate our results. FUNDunding Acknowledgement Type of funding sources: None. ROC curves of different algorithms


2021 ◽  
Vol 9 (1) ◽  
pp. 126-136
Author(s):  
Rahmat Robi Waliyansyah ◽  
Umar Hafidz Asy'ari Hasbullah

Coffee is one of the many favorite drinks of Indonesians. In Indonesia there are 2 types of coffee, namely Arabica & Robusta. The classification of coffee beans is usually done in a traditional way & depends on the human senses. However, the human senses are often inconsistent, because it depends on the mental or physical condition in question at that time, and only qualitative measures can be determined. In this study, to classify coffee beans is done by digital image processing. The parameters used are texture analysis using the Gray Level Coocurrence Matrix (GLCM) method with 4 features, namely Energy, Correlation, Homogeneity & Contrast. For feature extraction using a classification algorithm, namely Naïve Bayes, Tree, Support Vector Machine (SVM) and Logistic Regression. The evaluation of the coffee bean classification model uses the following parameters: AUC, F1, CA, precision & recall. The dataset used is 29 images of Arabica coffee beans and 29 images of Robusta beans. To test the accuracy of the model using Cross Validation. The results obtained will be evaluated using the confusion Matrix. Based on the results of testing and evaluation of the model, it is obtained that the SVM method is the best with the value of AUC = 1, CA = 0.983, F1 = 0.983, Precision = 0.983 and Recall = 0.983.


Author(s):  
L. E. Christovam ◽  
G. G. Pessoa ◽  
M. H. Shimabukuro ◽  
M. L. B. T. Galo

<p><strong>Abstract.</strong> Land Use and Land Cover (LULC) information is an important data source for modeling environmental variables, so it is essential to develop high quality LULC maps. The hundreds of continuous spectral bands gathered with hyperspectral sensors provide high spectral detail and consequently confirm hyperspectral remote sensing as an appropriate option for many LULC applications. Despite increased spectral detail, issues like high dimensionality, huge volume of data and redundant information, mean that hyperspectral image classification is a complex task. It is therefore essential to develop classification approaches that deals with these issues. Since classification results are directly dependent on the dataset used, it is fundamental to compare and validate the classification approaches in public datasets. With this in mind, aiming to provide a baseline, four classification models in the relatively new hyperspectral HyRANK dataset were evaluated. The classification models were defined with three well-known classification algorithms: Spectral Angle Mapper (SAM), Support Vector Machine (SVM) and Random Forest (RF). A classification model with SAM and another with RF were defined with the 176 surface reflectance bands. A dimensionality reduction with principal component analysis was carried out and a classification model with SVM and another with RF were defined using 14 principal components as features. The results show that SVM and RF algorithms outperformed by far the SAM in terms of accuracy, and that the RF is slightly better than the SVM in this respect. It is also possible to see from the results that the use of principal components as features provided an improvement in the accuracy of the RF and an improvement of 28% in the time spent fitting the classification model.</p>


Sensors ◽  
2021 ◽  
Vol 21 (23) ◽  
pp. 7916
Author(s):  
Mingu Kang ◽  
Siho Shin ◽  
Gengjia Zhang ◽  
Jaehyo Jung ◽  
Youn Tae Kim

Examining mental health is crucial for preventing mental illnesses such as depression. This study presents a method for classifying electrocardiogram (ECG) data into four emotional states according to the stress levels using one-against-all and naive Bayes algorithms of a support vector machine. The stress classification criteria were determined by calculating the average values of the R-S peak, R-R interval, and Q-T interval of the ECG data to improve the stress classification accuracy. For the performance evaluation of the stress classification model, confusion matrix, receiver operating characteristic (ROC) curve, and minimum classification error were used. The average accuracy of the stress classification was 97.6%. The proposed model improved the accuracy by 8.7% compared to the previous stress classification algorithm. Quantifying the stress signals experienced by people can facilitate a more effective management of their mental state.


Atmosphere ◽  
2021 ◽  
Vol 12 (7) ◽  
pp. 846
Author(s):  
Ilseok Noh ◽  
Hae-Won Doh ◽  
Soo-Ock Kim ◽  
Su-Hyun Kim ◽  
Seoleun Shin ◽  
...  

Spring frosts damage crops that have weakened freezing resistance after germination. We developed a machine learning (ML)-based frost-classification model and optimized it for orchard farming environments. First, logistic regression, decision tree, random forest, and support vector machine models were trained using balanced Korea Meteorological Administration (KMA) Automated Synoptic Observing System (ASOS) frost observation data for March from the last 10 years (2008–2017). Random forest and support vector machine models showed good classification performance and were selected as the main techniques, which were optimized for orchard fields based on initial frost occurrence times. The training period was then extended to March–April for 20 years (2000–2019). Finally, the model was applied to the KMA ASOS frost observation data from March to April 2020, which were not used in the previous steps, and RGB data were extracted by digital cameras installed in an orchard in Gyeonggi-do. The developed model successfully classified 117 of 139 frost observation cases from the domestic ASOS data and 35 of 37 orchard camera observations. The assumption of the initial frost occurrence time for training helped the most in improving the frost-classification model. These results clearly indicate that the frost-classification model using ML has applicable accuracy in orchard farming.


2021 ◽  
Vol 13 (19) ◽  
pp. 3899
Author(s):  
Guanyao Xie ◽  
Simona Niculescu

Land cover/land use (LCLU) is currently a very important topic, especially for coastal areas that connect the land and the coast and tend to change frequently. LCLU plays a crucial role in land and territory planning and management tasks. This study aims to complement information on the types and rates of LCLU multiannual changes with the distributions, rates, and consequences of these changes in the Crozon Peninsula, a highly fragmented coastal area. To evaluate the multiannual change detection (CD) capabilities using high-resolution (HR) satellite imagery, we implemented three remote sensing algorithms: a support vector machine (SVM), a random forest (RF) combined with geographic object-based image analysis techniques (GEOBIA), and a convolutional neural network (CNN), with SPOT 5 and Sentinel 2 data from 2007 and 2018. Accurate and timely CD is the most important aspect of this process. Although all algorithms were indicated as efficient in our study, with accuracy indices between 70% and 90%, the CNN had significantly higher accuracy than the SVM and RF, up to 90%. The inclusion of the CNN significantly improved the classification performance (5–10% increase in the overall accuracy) compared with the SVM and RF classifiers applied in our study. The CNN eliminated some of the confusion that characterizes a coastal area. Through the study of CD results by post-classification comparison (PCC), multiple changes in LCLU could be observed between 2007 and 2018: both the cultivated and non-vegetated areas increased, accompanied by high deforestation, which could be explained by the high rate of urbanization in the peninsula.


2021 ◽  
Author(s):  
Karunakaran Velswamy ◽  
Rajasekar Velswamy ◽  
Iwin Thanakumar Joseph Swamidason

Abstract Now-a-days a healthcare field produces a huge amount of data, for processing those data some efficient techniques are required. In this paper, a classification model is developed for heart disease prediction and the attribute selection is carried out through a modified bee algorithm. The prediction of heart disease through models will help the practitioners to make a precise decision about patient health. Heart disease dataset is obtained from the UCI repository. Dataset consists of 76 features and all those seventy-six features have not contributed equal information during the time classification. In the entire attributes, some of the attributes have contributed a large amount of information at the time of classification and some of the attributes have contributed only a small amount of information during the classification task. In this paper, a modified bee algorithm is used to identify the best subset of features from the entire features in the dataset i.e., in the training phase of classification only retain those features that are contributing more information during classification and it will reduce the training time of classifiers. The experiment is analyzed with a obtained reduced subset of features by using the following classifiers such as Support Vector Machine, Navie bayes, Decision tree and Random forest. The experimental result shows that the Support Vector Machine classifier will provide a good classification accuracy, true positive rate, true negative rate, false positive rate and false negative rate compared to Navie bayes and Random forest tree classifier.


2021 ◽  
Vol 10 (3) ◽  
pp. 346-358
Author(s):  
Sola Fide ◽  
Suparti Suparti ◽  
Sudarno Sudarno

Corona virus pandemic requires people to do activities from home so the number of internet usage in Indonesia has increased because information is carried out through social media. One of the popular social media in Indonesia is TikTok. However, the Tiktok’s popularity cannot be separated from the footsteps of TikTok in Indonesia which was blocked by government for committing many violations. Each application allows users to provide a review about the application. To find out the users TikTok’s sentiment, sentiment analysis was carried out to classify reviews into positive and negative sentiments. Classification is carried out using the Support Vector Machine (SVM) with kernel Radial Basis Function (RBF) method which is more effective classification algorithm and kernel function, seen from previous studies. The parameters used in the SVM gamma default 0.0004255 and the Cost (C) parameter experiment used is 0,01; 0,1; 1; 10; 100; 1000. The  results can provide information that can be retrieved using the association method. The steps are scrapping data, data preprocessing, sentiment scoring, TF-IDF weighting, classifying using the SVM RBF kernel method and text association. Evaluation of the model using a confusion matrix with the value of accuracy and kappa. The greater the value of accuracy and kappa, the better the performance of the classification model. The review classification resulted in the best accuracy rate of 90.62% and the best kappa of 81.24% which means that it includes an almost perfect classification result. Based on the data association, positive reviews are given because users like and are comfortable with the current version of TikTok which contains funny videos on fyp. Meanwhile, negative reviews were given because the user failed to register and his account was blocked, so the user asked TikTok to continue to make improvements.


Sign in / Sign up

Export Citation Format

Share Document