Prediksi penyakit ginjal kronis menggunakan metode pengurangan fitur Symetrical Uncertainty

JNANALOKA ◽

10.36802/jnanaloka.2020.v1-no1-1 ◽

2020 ◽

pp. 1-10

Author(s):

Muhammad Kurniawan

Keyword(s):

Data Mining ◽

Support Vector Machine ◽

Random Forest ◽

Serum Creatinine ◽

Cell Volume ◽

Packed Cell Volume ◽

Gradient Boosting ◽

Support Vector ◽

Symmetrical Uncertainty

Data mining berhubungan dengan pencarian data untuk menemukan pola atau pengetahuan da- ri data keseluruhan. Data mining dapat digunakan untuk memprediksi suatu keadaan, seperti apakah seseorang terkena penyakit ginjal kronis atau tidak. Dalam penelitian ini metode pengu- rangan fitur symmetrical uncertainty dengan algoritma klasifikasi Gradient Boosting, Random Forest, Support Vector Machine, dan Naïve Bayes digunakan untuk memprediksi penyakit ginjal kronis. Jumlah atribut yang diklasifikasi adalah 24, 12, 6, 5, dan 4 atribut. Peningkatan nilai akurasi didapatkan pada pengurangan atribut dari 24 ke 12 dengan algoritma Naïve Bayes. Se- lain itu, diperoleh Support Vector Machine memiliki akurasi terbaik pada semua jumlah atribut, diikuti Gradient Boosting, Random Forest, dan Naïve Bayes. Pada klasifikasi 5 atribut, terlihat algoritma Support Vector Machine dan Gradient Boosting masih memiliki akurasi 1. Kelima atribut tersebut antara lain: hemoglobin, packed cell volume, serum creatinine, albumin, dan specifity gravity. Pengurangan atribut dapat meningkatkan akurasi dan dapat memudahkan proses prediksi karena jumlah atribut lebih sedikit. Belum ada

Download Full-text

Prediksi penyakit ginjal kronis menggunakan metode pengurangan fitur Symetrical Uncertainty

JNANALOKA ◽

10.36802/jnanaloka.2020.v1-no1-1-10 ◽

2020 ◽

pp. 1-10

Author(s):

Muhammad Kurniawan

Keyword(s):

Data Mining ◽

Support Vector Machine ◽

Random Forest ◽

Serum Creatinine ◽

Cell Volume ◽

Packed Cell Volume ◽

Gradient Boosting ◽

Support Vector ◽

Symmetrical Uncertainty

Download Full-text

Investigating the use of random forest, gradient boosting machine, support vector machine and their ensemble applied to fault detection

10.26678/abcm.cobem2017.cob17-1600 ◽

2017 ◽

Author(s):

Luis Felipe Nogoseke ◽

Gabriel Herman Bernardim Andrade ◽

Marco Boaretto ◽

Leandro Coelho

Keyword(s):

Support Vector Machine ◽

Random Forest ◽

Fault Detection ◽

Gradient Boosting ◽

Support Vector ◽

Gradient Boosting Machine

Download Full-text

Algorithmic and data modeling: Will algorithmic modeling improve predictions of traits evaluated on ordinal scales?

10.1101/2020.10.07.329466 ◽

2020 ◽

Author(s):

Zhanyou Xu ◽

Andreomar Kurek ◽

Steven B. Cannon ◽

Williams D. Beavis

Keyword(s):

Support Vector Machine ◽

Random Forest ◽

Ridge Regression ◽

Genomic Prediction ◽

Ordinal Data ◽

Prediction Models ◽

Characteristic Curve ◽

Gradient Boosting ◽

Support Vector ◽

Data Types

AbstractSelection of markers linked to alleles at quantitative trait loci (QTL) for tolerance to Iron Deficiency Chlorosis (IDC) has not been successful. Genomic selection has been advocated for continuous numeric traits such as yield and plant height. For ordinal data types such as IDC, genomic prediction models have not been systematically compared. The objectives of research reported in this manuscript were to evaluate the most commonly used genomic prediction method, ridge regression and it’s equivalent logistic ridge regression method, with algorithmic modeling methods including random forest, gradient boosting, support vector machine, K-nearest neighbors, Naïve Bayes, and artificial neural network using the usual comparator metric of prediction accuracy. In addition we compared the methods using metrics of greater importance for decisions about selecting and culling lines for use in variety development and genetic improvement projects. These metrics include specificity, sensitivity, precision, decision accuracy, and area under the receiver operating characteristic curve. We found that Support Vector Machine provided the best specificity for culling IDC susceptible lines, while Random Forest GP models provided the best combined set of decision metrics for retaining IDC tolerant and culling IDC susceptible lines.

Download Full-text

Modelos de machine learning para predição do sucesso de startups

Revista de Gestão e Projetos ◽

10.5585/gep.v12i2.18942 ◽

2021 ◽

Vol 12 (2) ◽

pp. 28-55

Author(s):

Fabiano Rodrigues ◽

Francisco Aparecido Rodrigues ◽

Thelma Valéria Rocha Rodrigues

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Random Forest ◽

Decision Tree ◽

Initial Public Offering ◽

Gradient Boosting ◽

Support Vector ◽

Trade Offs ◽

Extreme Gradient Boosting ◽

Public Offering

Este estudo analisa resultados obtidos com modelos de machine learning para predição do sucesso de startups. Como proxy de sucesso considera-se a perspectiva do investidor, na qual a aquisição da startup ou realização de IPO (Initial Public Offering) são formas de recuperação do investimento. A revisão da literatura aborda startups e veículos de financiamento, estudos anteriores sobre predição do sucesso de startups via modelos de machine learning, e trade-offs entre técnicas de machine learning. Na parte empírica, foi realizada uma pesquisa quantitativa baseada em dados secundários oriundos da plataforma americana Crunchbase, com startups de 171 países. O design de pesquisa estabeleceu como filtro startups fundadas entre junho/2010 e junho/2015, e uma janela de predição entre junho/2015 e junho/2020 para prever o sucesso das startups. A amostra utilizada, após etapa de pré-processamento dos dados, foi de 18.571 startups. Foram utilizados seis modelos de classificação binária para a predição: Regressão Logística, Decision Tree, Random Forest, Extreme Gradiente Boosting, Support Vector Machine e Rede Neural. Ao final, os modelos Random Forest e Extreme Gradient Boosting apresentaram os melhores desempenhos na tarefa de classificação. Este artigo, envolvendo machine learning e startups, contribui para áreas de pesquisa híbridas ao mesclar os campos da Administração e Ciência de Dados. Além disso, contribui para investidores com uma ferramenta de mapeamento inicial de startups na busca de targets com maior probabilidade de sucesso.

Download Full-text

Recognition of gasoline in fire debris using machine learning: Part I, Application of Random Forest, Gradient Boosting, Support Vector Machine and Naïve Bayes

Forensic Science International ◽

10.1016/j.forsciint.2021.111146 ◽

2021 ◽

pp. 111146 ◽

Cited By ~ 1

Author(s):

C. Bogdal ◽

R. Schellenberg ◽

O. Höpli ◽

M. Bovens ◽

M. Lory

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Random Forest ◽

Naive Bayes ◽

Naïve Bayes ◽

Gradient Boosting ◽

Support Vector ◽

Fire Debris ◽

In Fire

Download Full-text

Machine Learning in Apache Spark Environment for Diagnosis of Diabetes

10.20944/preprints202111.0200.v1 ◽

2021 ◽

Author(s):

Farshid Bagheri Saravi ◽

Shadi Moghanian ◽

Giti Javidi ◽

Ehsan O Sheybani

Keyword(s):

Machine Learning ◽

Data Mining ◽

Support Vector Machine ◽

Big Data ◽

Random Forest ◽

Apache Spark ◽

Support Vector ◽

Computing Environment ◽

Large Dataset ◽

Related Data

Disease-related data and information collected by physicians, patients, and researchers seem insignificant at first glance. Still, the same unorganized data contain valuable information that is often hidden. The task of data mining techniques is to extract patterns to classify the data accurately. One of the various Data mining and its methods have been used often to diagnose various diseases. In this study, a machine learning (ML) technique based on distributed computing in the Apache Spark computing space is used to diagnose diabetics or hidden pattern of the illness to detect the disease using a large dataset in real-time. Implementation results of three ML techniques of Decision Tree (DT) technique or Random Forest (RF) or Support Vector Machine (SVM) in the Apache Spark computing environment using the Scala programming language and WEKA show that RF is more efficient and faster to diagnose diabetes in big data.

Download Full-text

Klasifikasi Pengembalian Radar dari Ionosfer Menggunakan SVM, NaÃ¯ve Bayes dan Random Forest

Komputika : Jurnal Sistem Komputer ◽

10.34010/komputika.v10i2.4347 ◽

2021 ◽

Vol 10 (2) ◽

pp. 111-117

Author(s):

Yulia Aryani ◽

Arie Wahyu Wijayanto

Keyword(s):

Machine Learning ◽

Data Mining ◽

Support Vector Machine ◽

Random Forest ◽

Support Vector ◽

Ve Bayes

ABSTRAK â€“ Klasifikasi merupakan salah satu topik utama dalam data mining atau machine learning. Klasifikasi adalah suatu pengelompokan data dimana data yang digunakan tersebut mempunyai kelas label atau target. Klasifikasi digunakan untuk mengambil data dan ditempatkan kedalam kelompok tertentu. Studi tentang ionosfer penting untuk penelitian di berbagai domain, khususnya dalam sistem komunikasi. Dalam penelitian ionosfer, perlu dilakukan klasifikasi radar yang berguna dan tidak berguna dari ionosfer. Pada makalah ini, akan dilakukan klasifikasi terhadap data inosphere yang diambil dari UCI machine learning repository. Klasifikasi dilakukan dengan menggunakan tiga metode klasifikasi, yakni SVM ( Support Vector Machine ) , NaÃ¯ve Bayes, dan Random Forest. Hasil dari percobaan ini bisa menunjukkan prediksi dari setiap percobaan dengan tingkat akurasi dan prediksi yang berbeda-beda di setiap metode yang digunakan. Hasil akurasi, presisi, dan recall terbaik didapatkan pada metode Random Forest dengan rasio data latih dan data uji sebesar 85% didapat akurasi dari data uji sebesar 90,57% dengan presisi sebesar 94,12%. Kata Kunci â€“ Ionosfer; Klasifikasi; SVM; NaÃ¯ve Bayes; Random Forest.

Download Full-text

Sign language dactyl recognition based on machine learning algorithms

Eastern-European Journal of Enterprise Technologies ◽

10.15587/1729-4061.2021.239253 ◽

2021 ◽

Vol 4 (2(112)) ◽

pp. 58-72

Author(s):

Chingiz Kenshimov ◽

Zholdas Buribayev ◽

Yedilkhan Amirgaliyev ◽

Aisulyu Ataniyazova ◽

Askhat Aitimov

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Random Forest ◽

Sign Language ◽

Gesture Recognition ◽

Research Work ◽

Gradient Boosting ◽

Support Vector ◽

Extreme Gradient Boosting

In the course of our research work, the American, Russian and Turkish sign languages were analyzed. The program of recognition of the Kazakh dactylic sign language with the use of machine learning methods is implemented. A dataset of 5000 images was formed for each gesture, gesture recognition algorithms were applied, such as Random Forest, Support Vector Machine, Extreme Gradient Boosting, while two data types were combined into one database, which caused a change in the architecture of the system as a whole. The quality of the algorithms was also evaluated. The research work was carried out due to the fact that scientific work in the field of developing a system for recognizing the Kazakh language of sign dactyls is currently insufficient for a complete representation of the language. There are specific letters in the Kazakh language, because of the peculiarities of the spelling of the language, problems arise when developing recognition systems for the Kazakh sign language. The results of the work showed that the Support Vector Machine and Extreme Gradient Boosting algorithms are superior in real-time performance, but the Random Forest algorithm has high recognition accuracy. As a result, the accuracy of the classification algorithms was 98.86 % for Random Forest, 98.68 % for Support Vector Machine and 98.54 % for Extreme Gradient Boosting. Also, the evaluation of the quality of the work of classical algorithms has high indicators. The practical significance of this work lies in the fact that scientific research in the field of gesture recognition with the updated alphabet of the Kazakh language has not yet been conducted and the results of this work can be used by other researchers to conduct further research related to the recognition of the Kazakh dactyl sign language, as well as by researchers, engaged in the development of the international sign language

Download Full-text

Prediction of active debt in the State of Pernambuco, Brazil

Revista de Engenharia e Pesquisa Aplicada ◽

10.25286/repa.v5i1.1299 ◽

2020 ◽

Vol 5 (1) ◽

pp. 88-95

Author(s):

Álvaro Farias Pinheiro ◽

João Alberto Da Silva Amaral ◽

Geraldo Torres Galindo Neto ◽

José Nilo Martins Sampaio ◽

Wedson Lino Soares

Keyword(s):

Data Mining ◽

Support Vector Machine ◽

Logistic Regression ◽

Random Forest ◽

Decision Tree ◽

The State ◽

Support Vector ◽

Data Mining Techniques ◽

Collection Process ◽

Mining Model

Application of data mining (DM) techniques to optimize the process of collection of Active Debt (AD) of the State of Pernambuco, Brazil. We apply the following data mining techniques: Decision Tree (DT), Logistic regression (LR), Nayve bayes (NB), Support vector machine (SVM), also applied to the Random Forest technique which is considered an essemble method. We observed that the RF technique obtained better results than all the techniques of classification, reaching higher values in all metrics analyzed. We note that the creation of a data mining model to choose which debts can succeed in the collection process can bring benefits to the pernambuco government. With the application of RF technique, we obtained indexes above 85% in the evaluation of the metrics.

Download Full-text

Perbandingan Teknik Klasifikasi Dalam Data Mining Untuk Bank Direct Marketing

Jurnal Teknologi Informasi dan Ilmu Komputer ◽

10.25126/jtiik.201855958 ◽

2018 ◽

Vol 5 (5) ◽

pp. 567 ◽

Cited By ~ 2

Author(s):

Irvi Oktanisa ◽

Ahmad Afif Supianto

Keyword(s):

Data Mining ◽

Support Vector Machine ◽

Random Forest ◽

Gradient Descent ◽

Naive Bayes ◽

Direct Marketing ◽

Naïve Bayes ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Support Vector

Klasifikasi merupakan teknik dalam data mining untuk mengelompokkan data berdasarkan keterikatan data terhadap data sampel. Pada penelitian ini, kami melakukan perbandingan 9 teknik klasifikasi untuk mengklasifikasi respon pelanggan pada dataset Bank Direct Marketing. Perbandingan teknik klasifikasi ini dilakukan untuk mengetahui model dalam teknik klasfikasi yang paling efektif untuk mengklasifikasi target pada dataset Bank Direct Marketing. Teknik klasifikasi yang digunakan yaitu Support Vector Machine, AdaBoost, Naïve Bayes, Constant, KNN, Tree, Random Forest, Stochastic Gradient Descent, dan CN2 Rule. Proses klasifikasi diawali dengan preprocessing data untuk melakukan penghilangan missing value dan pemilihan fitur pada dataset. Pada tahap evaluasi digunakan teknik 10 fold cross validation. Setelah dilakukan pengujian, didapatkan bahwa hasil klasifikasi menunjukkan akurasi terbaik diperoleh oleh model Tree, Constant, Naive Bayes, dan Stochastic Gardient Descent. Kemudian diikuti oleh model Random Forest, K-Nearest Neighbor, CN-2 Rule, AdaBoost dan Support Vector Machine. Dari keempat model yang menunjukkan hasil akurasi terbaik, untuk kasus ini Stochastic Gradient Descent terpilih sebagai model yang memiliki akurasi terbaik dengan nilai akurasi sebesar 0,972 dan hasil visualisasi yang dihasilkan lebih jelas untuk mengklasifikasi target pada dataset Bank Direct Marketing. AbstractClassification is a technique in data mining to classify data based on the attachment of data to the sample data.. In this paper, we present the comparison of 9 classification techniques performed to classify customer response on the dataset of Bank Direct Marketing. The techniques performed to find out the effectiveness model in the classification technique used to classify targets on the dataset of Bank Direct Marketing. The techniques used are Support Vector Machine, AdaBoost, Naïve Bayes, Constant, KNN, Tree, Random Forest, Stochastic Gradient Descent, and CN2 Rule. The classification process begins with preprocessing data to perform missing value omissions and feature selection on the dataset. Cross validation technique, with k value is 10, used in the evaluation stage. After testing, it was found that the classification results showed the best accuracy obtained when using the Tree model, Constant, Naive Bayes and Stochastic Gradient Descent. Afterwards the Random Forest model, K-Nearest Neighbor, CN-2 Rule, AdaBoost, and Support Vector Machine are followed. Of the four models with the high accuracy results, in this case Stochastic Gradient Descent was selected as the best accuracy model with an accuracy value of 0.972 and resulting visualization more clearly to classify targets on the dataset of Bank Direct Marketing.

Download Full-text