scholarly journals Klasifikasi Pengembalian Radar dari Ionosfer Menggunakan SVM, Naïve Bayes dan Random Forest

2021 ◽  
Vol 10 (2) ◽  
pp. 111-117
Author(s):  
Yulia Aryani ◽  
Arie Wahyu Wijayanto

ABSTRAK – Klasifikasi merupakan salah satu topik utama dalam data mining atau machine learning. Klasifikasi adalah suatu pengelompokan data dimana data yang digunakan tersebut mempunyai kelas label atau target. Klasifikasi digunakan untuk mengambil data dan ditempatkan kedalam kelompok tertentu.  Studi tentang ionosfer penting untuk penelitian di berbagai domain, khususnya dalam sistem komunikasi.  Dalam penelitian ionosfer, perlu dilakukan klasifikasi radar yang berguna dan tidak berguna dari ionosfer. Pada makalah ini, akan dilakukan klasifikasi  terhadap data inosphere yang diambil dari UCI machine learning repository.  Klasifikasi dilakukan dengan menggunakan tiga metode klasifikasi, yakni  SVM ( Support Vector Machine ) , Naïve Bayes, dan Random Forest. Hasil dari percobaan ini bisa menunjukkan prediksi dari setiap percobaan dengan tingkat akurasi dan prediksi yang berbeda-beda di setiap metode yang digunakan. Hasil akurasi, presisi, dan recall terbaik didapatkan pada metode Random Forest dengan rasio data latih dan data uji sebesar 85% didapat akurasi dari data uji sebesar 90,57% dengan presisi sebesar 94,12%. Kata Kunci – Ionosfer; Klasifikasi; SVM; Naïve Bayes; Random Forest.

Author(s):  
Farshid Bagheri Saravi ◽  
Shadi Moghanian ◽  
Giti Javidi ◽  
Ehsan O Sheybani

Disease-related data and information collected by physicians, patients, and researchers seem insignificant at first glance. Still, the same unorganized data contain valuable information that is often hidden. The task of data mining techniques is to extract patterns to classify the data accurately. One of the various Data mining and its methods have been used often to diagnose various diseases. In this study, a machine learning (ML) technique based on distributed computing in the Apache Spark computing space is used to diagnose diabetics or hidden pattern of the illness to detect the disease using a large dataset in real-time. Implementation results of three ML techniques of Decision Tree (DT) technique or Random Forest (RF) or Support Vector Machine (SVM) in the Apache Spark computing environment using the Scala programming language and WEKA show that RF is more efficient and faster to diagnose diabetes in big data.


Author(s):  
Noviyanti Santoso ◽  
Wahyu Wibowo ◽  
Hilda Hikmawati

In the data mining, a class imbalance is a problematic issue to look for the solutions. It probably because machine learning is constructed by using algorithms with assuming the number of instances in each balanced class, so when using a class imbalance, it is possible that the prediction results are not appropriate. They are solutions offered to solve class imbalance issues, including oversampling, undersampling, and synthetic minority oversampling technique (SMOTE). Both oversampling and undersampling have its disadvantages, so SMOTE is an alternative to overcome it. By integrating SMOTE in the data mining classification method such as Naive Bayes, Support Vector Machine (SVM), and Random Forest (RF) is expected to improve the performance of accuracy. In this research, it was found that the data of SMOTE gave better accuracy than the original data. In addition to the three classification methods used, RF gives the highest average AUC, F-measure, and G-means score.


RSC Advances ◽  
2014 ◽  
Vol 4 (106) ◽  
pp. 61624-61630 ◽  
Author(s):  
N. S. Hari Narayana Moorthy ◽  
Silvia A. Martins ◽  
Sergio F. Sousa ◽  
Maria J. Ramos ◽  
Pedro A. Fernandes

Classification models to predict the solvation free energies of organic molecules were developed using decision tree, random forest and support vector machine approaches and with MACCS fingerprints, MOE and PaDEL descriptors.


Witheverypassingsecondsocialnetworkcommunityisgrowingrapidly,becauseofthat,attackershaveshownkeeninterestinthesekindsofplatformsandwanttodistributemischievouscontentsontheseplatforms.Withthefocus on introducing new set of characteristics and features forcounteractivemeasures,agreatdealofstudieshasresearchedthe possibility of lessening the malicious activities on social medianetworks. This research was to highlight features for identifyingspammers on Instagram and additional features were presentedto improve the performance of different machine learning algorithms. Performance of different machine learning algorithmsnamely, Multilayer Perceptron (MLP), Random Forest (RF), K-Nearest Neighbor (KNN) and Support Vector Machine (SVM)were evaluated on machine learning tools named, RapidMinerand WEKA. The results from this research tells us that RandomForest (RF) outperformed all other selected machine learningalgorithmsonbothselectedmachinelearningtools.OverallRandom Forest (RF) provided best results on RapidMiner. Theseresultsareusefulfortheresearcherswhoarekeentobuildmachine learning models to find out the spamming activities onsocialnetworkcommunities.


2020 ◽  
Vol 11 (40) ◽  
pp. 8-23
Author(s):  
Pius MARTHIN ◽  
Duygu İÇEN

Online product reviews have become a valuable source of information which facilitate customer decision with respect to a particular product. With the wealthy information regarding user's satisfaction and experiences about a particular drug, pharmaceutical companies make the use of online drug reviews to improve the quality of their products. Machine learning has enabled scientists to train more efficient models which facilitate decision making in various fields. In this manuscript we applied a drug review dataset used by (Gräβer, Kallumadi, Malberg,& Zaunseder, 2018), available freely from machine learning repository website of the University of California Irvine (UCI) to identify best machine learning model which provide a better prediction of the overall drug performance with respect to users' reviews. Apart from several manipulations done to improve model accuracy, all necessary procedures required for text analysis were followed including text cleaning and transformation of texts to numeric format for easy training machine learning models. Prior to modeling, we obtained overall sentiment scores for the reviews. Customer's reviews were summarized and visualized using a bar plot and word cloud to explore the most frequent terms. Due to scalability issues, we were able to use only the sample of the dataset. We randomly sampled 15000 observations from the 161297 training dataset and 10000 observations were randomly sampled from the 53766 testing dataset. Several machine learning models were trained using 10 folds cross-validation performed under stratified random sampling. The trained models include Classification and Regression Trees (CART), classification tree by C5.0, logistic regression (GLM), Multivariate Adaptive Regression Spline (MARS), Support vector machine (SVM) with both radial and linear kernels and a classification tree using random forest (Random Forest). Model selection was done through a comparison of accuracies and computational efficiency. Support vector machine (SVM) with linear kernel was significantly best with an accuracy of 83% compared to the rest. Using only a small portion of the dataset, we managed to attain reasonable accuracy in our models by applying the TF-IDF transformation and Latent Semantic Analysis (LSA) technique to our TDM.


Techno Com ◽  
2021 ◽  
Vol 20 (3) ◽  
pp. 352-361
Author(s):  
Wahyu Nugraha ◽  
Raja Sabaruddin

Penderita diabetes di seluruh dunia terus mengalami peningkatan dengan angka kematian sebesar 4,6 juta pada tahun 2011 dan diperkirakan akan terus meningkat secara global menjadi 552 juta pada tahun 2030. Pencegahan Penyakit diabetes mungkin dapat dilakukan secara efektif dengan cara mendeteksinya sejak dini. Data mining dan machine learning terus dikembangkan agar menjadi alat yang handal dalam membangun model komputasi untuk mengidentifikasi penyakit diabetes pada tahap awal. Namun, masalah yang sering dihadapi dalam menganalisis penyakit diabetes ialah masalah ketidakseimbangan class. Kelas yang tidak seimbang membuat model pembelajaran akan sulit melakukan prediksi karena model pembelajaran didominasi oleh instance kelas mayoritas sehingga mengabaikan prediksi kelas minoritas. Pada penelitian ini kami mencoba menganalisa dan mencoba mengatasi masalah ketidakseimbangan kelas dengan menggunakan pendekatan level data yaitu teknik resampling data. Eksperimen ini menggunakan R language dengan library ROSE (version 0.0-4). Dataset Pima Indians dipilih pada penelitian ini karena merupakan salah satu dataset yang mengalami ketidakseimbangan kelas. Model pengklasifikasian pada penelitian ini menggunakan algoritma decision tree C4.5, RF (Random Forest), dan SVM (Support Vector Machines). Dari hasil eksperimen yang dilakukan model klasifikasi SVM dengan teknik resampling yang menggabungkan over dan under-sampling menjadi model yang memiliki performa terbaik dengan nilai AUC (Area Under Curve) sebesar 0.80


Author(s):  
Prathima P

Abstract: Fall is a significant national health issue for the elderly people, generally resulting in severe injuries when the person lies down on the floor over an extended period without any aid after experiencing a great fall. Thus, elders need to be cared very attentively. A supervised-machine learning based fall detection approach with accelerometer, gyroscope is devised. The system can detect falls by grouping different actions as fall or non-fall events and the care taker is alerted immediately as soon as the person falls. The public dataset SisFall with efficient class of features is used to identify fall. The Random Forest (RF) and Support Vector Machine (SVM) machine learning algorithms are employed to detect falls with lesser false alarms. The SVM algorithm obtain a highest accuracy of 99.23% than RF algorithm. Keywords: Fall detection, Machine learning, Supervised classification, Sisfall, Activities of daily living, Wearable sensors, Random Forest, Support Vector Machine


Author(s):  
Syaifulloh Amien Pandega Perdana ◽  
Teguh Bharata Aji ◽  
Ridi Ferdiana

Ulasan pelanggan merupakan opini terhadap kualitas barang atau jasa yang dirasakan konsumen. Ulasan pelanggan mengandung informasi yang berguna bagi konsumen maupun penyedia barang atau jasa. Ketersediaan ulasan pelanggan dalam jumlah besar pada website membutuhkan suatu framework untuk mengekstraksi sentimen secara otomatis. Sebuah ulasan pelanggan sering kali mengandung banyak aspek sehingga Aspect Based Sentiment Analysis (ABSA) harus digunakan untuk mengetahui polaritas masing-masing aspek. Salah satu tugas penting dalam ABSA adalah Aspect Category Detection. Metode machine learning untuk Aspect Category Detection sudah banyak dilakukan pada domain berbahasa Inggris, tetapi pada domain bahasa Indonesia masih sedikit. Makalah ini membandingkan kinerja tiga algoritme machine learning, yaitu Naïve Bayes (NB), Support Vector Machine (SVM), dan Random Forest (RF) pada ulasan pelanggan berbahasa Indonesia menggunakan Term Frequency–Inverse Document Frequency (TF-IDF) sebagai term weighting. Hasil menunjukkan bahwa RF memiliki kinerja paling unggul dibandingkan NB dan SVM pada tiga domain yang berbeda, yaitu restoran, hotel, dan e-commerce, dengan nilai f1-score untuk masing-masing domain adalah 84.3%, 85.7%, dan 89,3%.


Author(s):  
S. Bhaskaran ◽  
Raja Marappan

AbstractA decision-making system is one of the most important tools in data mining. The data mining field has become a forum where it is necessary to utilize users' interactions, decision-making processes and overall experience. Nowadays, e-learning is indeed a progressive method to provide online education in long-lasting terms, contrasting to the customary head-to-head process of educating with culture. Through e-learning, an ever-increasing number of learners have profited from different programs. Notwithstanding, the highly assorted variety of the students on the internet presents new difficulties to the conservative one-estimate fit-all learning systems, in which a solitary arrangement of learning assets is specified to the learners. The problems and limitations in well-known recommender systems are much variations in the expected absolute error, consuming more query processing time, and providing less accuracy in the final recommendation. The main objectives of this research are the design and analysis of a new transductive support vector machine-based hybrid personalized hybrid recommender for the machine learning public data sets. The learning experience has been achieved through the habits of the learners. This research designs some of the new strategies that are experimented with to improve the performance of a hybrid recommender. The modified one-source denoising approach is designed to preprocess the learner dataset. The modified anarchic society optimization strategy is designed to improve the performance measurements. The enhanced and generalized sequential pattern strategy is proposed to mine the sequential pattern of learners. The enhanced transductive support vector machine is developed to evaluate the extracted habits and interests. These new strategies analyze the confidential rate of learners and provide the best recommendation to the learners. The proposed generalized model is simulated on public datasets for machine learning such as movies, music, books, food, merchandise, healthcare, dating, scholarly paper, and open university learning recommendation. The experimental analysis concludes that the enhanced clustering strategy discovers clusters that are based on random size. The proposed recommendation strategies achieve better significant performance over the methods in terms of expected absolute error, accuracy, ranking score, recall, and precision measurements. The accuracy of the proposed datasets lies between 82 and 98%. The MAE metric lies between 5 and 19.2% for the simulated public datasets. The simulation results prove the proposed generalized recommender has a great strength to improve the quality and performance.


Sign in / Sign up

Export Citation Format

Share Document