Spatiotemporal Mapping and Monitoring of Mangrove Forests Changes From 1990 to 2019 in the Northern Emirates, UAE Using Random Forest, Kernel Logistic Regression and Naive Bayes Tree Models

Tingginya penggunaan mesin ATM, sehingga menimbulkan celah fraud yang dapat dilakukan oleh pihak ketiga dalam membantu PT. Bank Central Asia Tbk untuk menjaga mesin ATM agar selalu siap digunakan oleh nasabah. Lambat dan sulitnya mengidentifikasi fraud mesin ATM menjadi salah satu kendala yang dihadapi PT. Bank Central Asia Tbk. Dengan adanya permasalahan tersebut maka peneliti mengumpulkan 5 dataset dan melakukan pre-processing dataset sehingga dapat digunakan untuk pemodelan dan pengujian algoritma, guna menjawab permasalahan yang terjadi. Dilakukan 7 perbandingan algoritma diantaranya decision tree, gradient boosted trees, logistic regression, naive bayes ( kernel ), naive bayes, random forest dan random tree. Setelah dilakukan pemodelan dan pengujian didapatkan hasil bahwa algoritma gradient boosted trees merupakan algoritma terbaik dengan hasil akurasi sebesar 99.85% dan nilai AUC sebesar 1, tingginya hasil algoritma ini disebabkan karena kecocokan setiap attribut yang diuji dengan karakter gradient boosted trees dimana algoritma ini menyimpan dan mengevaluasi hasil yang ada. Maka algoritma gradient boosted trees merupakan penyelesaian dari permasalahan yang dihadapi oleh PT. Bank Central Asia Tbk.

Download Full-text

Spatial prediction of landslide susceptibility using data mining-based kernel logistic regression, naive Bayes and RBFNetwork models for the Long County area (China)

Bulletin of Engineering Geology and the Environment ◽

10.1007/s10064-018-1256-z ◽

2018 ◽

Vol 78 (1) ◽

pp. 247-266 ◽

Cited By ~ 53

Author(s):

Wei Chen ◽

Xusheng Yan ◽

Zhou Zhao ◽

Haoyuan Hong ◽

Dieu Tien Bui ◽

...

Keyword(s):

Data Mining ◽

Logistic Regression ◽

Landslide Susceptibility ◽

Naive Bayes ◽

Spatial Prediction ◽

Naïve Bayes ◽

Kernel Logistic Regression ◽

Using Data

Download Full-text

Identifying undetected dementia in UK primary care patients: a retrospective case-control study comparing machine-learning and standard epidemiological approaches

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-019-0991-9 ◽

2019 ◽

Vol 19 (1) ◽

Cited By ~ 6

Author(s):

Elizabeth Ford ◽

Philip Rooney ◽

Seb Oliver ◽

Richard Hoile ◽

Peter Hurley ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Health Service ◽

Naive Bayes ◽

Case Control ◽

Naïve Bayes ◽

Support Vector ◽

Clinical Practice Research Datalink ◽

Patient Records

Abstract Background Identifying dementia early in time, using real world data, is a public health challenge. As only two-thirds of people with dementia now ultimately receive a formal diagnosis in United Kingdom health systems and many receive it late in the disease process, there is ample room for improvement. The policy of the UK government and National Health Service (NHS) is to increase rates of timely dementia diagnosis. We used data from general practice (GP) patient records to create a machine-learning model to identify patients who have or who are developing dementia, but are currently undetected as having the condition by the GP. Methods We used electronic patient records from Clinical Practice Research Datalink (CPRD). Using a case-control design, we selected patients aged >65y with a diagnosis of dementia (cases) and matched them 1:1 by sex and age to patients with no evidence of dementia (controls). We developed a list of 70 clinical entities related to the onset of dementia and recorded in the 5 years before diagnosis. After creating binary features, we trialled machine learning classifiers to discriminate between cases and controls (logistic regression, naïve Bayes, support vector machines, random forest and neural networks). We examined the most important features contributing to discrimination. Results The final analysis included data on 93,120 patients, with a median age of 82.6 years; 64.8% were female. The naïve Bayes model performed least well. The logistic regression, support vector machine, neural network and random forest performed very similarly with an AUROC of 0.74. The top features retained in the logistic regression model were disorientation and wandering, behaviour change, schizophrenia, self-neglect, and difficulty managing. Conclusions Our model could aid GPs or health service planners with the early detection of dementia. Future work could improve the model by exploring the longitudinal nature of patient data and modelling decline in function over time.

Download Full-text

IProCAD: Intelligent Prognosis of Coronary Artery Disease Excluding Angiogram in Patient with Stable Angina

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.e3101.039520 ◽

2020 ◽

Vol 9 (5) ◽

pp. 2032-2040

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Heart Disease ◽

Random Forest ◽

Decision Tree ◽

Stable Angina ◽

Naive Bayes ◽

Feature Vector ◽

Naïve Bayes ◽

The Other

Cardiovascular diseases are one of the main causes of mortality in the world. A proper prediction mechanism system with reasonable cost can significantly reduce this death toll in the low-income countries like Bangladesh. For those countries we propose machine learning backed embedded system that can predict possible cardiac attack effectively by excluding the high cost angiogram and incorporating only twelve (12) low cost features which are age, sex, chest pain, blood pressure, cholesterol, blood sugar, ECG results, heart rate, exercise induced angina, old peak, slope, and history of heart disease. Here, two heart disease datasets of own built NICVD (National Institute of Cardiovascular Disease, Bangladesh) patients’, and UCI (University of California Irvin) are used. The overall process comprises into four phases: Comprehensive literature review, collection of stable angina patients’ data through survey questionnaires from NICVD, feature vector dimensionality is reduced manually (from 14 to 12 dimensions), and the reduced feature vector is fed to machine learning based classifiers to obtain a prediction model for the heart disease. From the experiments, it is observed that the proposed investigation using NICVD patient’s data with 12 features without incorporating angiographic disease status to Artificial Neural Network (ANN) shows better classification accuracy of 92.80% compared to the other classifiers Decision Tree (82.50%), Naïve Bayes (85%), Support Vector Machine (SVM) (75%), Logistic Regression (77.50%), and Random Forest (75%) using the 10-fold cross validation. To accommodate small scale training and test data in our experimental environment we have observed the accuracy of ANN, Decision Tree, Naïve Bayes, SVM, Logistic Regression and Random Forest using Jackknife method, which are 84.80%, 71%, 75.10%, 75%, 75.33% and 71.42% respectively. On the other hand, the classification accuracies of the corresponding classifiers are 91.7%, 76.90%, 86.50%, 76.3%, 67.0% and 67.3%, respectively for the UCI dataset with 12 attributes. Whereas the same dataset with 14 attributes including angiographic status shows the accuracies 93.5%, 76.7%, 86.50%, 76.8%, 67.7% and 69.6% for the respective classifiers

Download Full-text

Application of a time-series deep learning model to predict cardiac dysrhythmias in electronic health records

PLoS ONE ◽

10.1371/journal.pone.0239007 ◽

2021 ◽

Vol 16 (9) ◽

pp. e0239007

Author(s):

Aixia Guo ◽

Sakima Smith ◽

Yosef M. Khan ◽

James R. Langabeer II ◽

Randi E. Foraker

Keyword(s):

Neural Networks ◽

Time Series ◽

Logistic Regression ◽

Deep Learning ◽

Random Forest ◽

Deep Neural Networks ◽

Naive Bayes ◽

Naïve Bayes ◽

Cardiac Dysrhythmias ◽

Electronic Health

Background Cardiac dysrhythmias (CD) affect millions of Americans in the United States (US), and are associated with considerable morbidity and mortality. New strategies to combat this growing problem are urgently needed. Objectives Predicting CD using electronic health record (EHR) data would allow for earlier diagnosis and treatment of the condition, thus improving overall cardiovascular outcomes. The Guideline Advantage (TGA) is an American Heart Association ambulatory quality clinical data registry of EHR data representing 70 clinics distributed throughout the US, and has been used to monitor outpatient prevention and disease management outcome measures across populations and for longitudinal research on the impact of preventative care. Methods For this study, we represented all time-series cardiovascular health (CVH) measures and the corresponding data collection time points for each patient by numerical embedding vectors. We then employed a deep learning technique–long-short term memory (LSTM) model–to predict CD from the vector of time-series CVH measures by 5-fold cross validation and compared the performance of this model to the results of deep neural networks, logistic regression, random forest, and Naïve Bayes models. Results We demonstrated that the LSTM model outperformed other traditional machine learning models and achieved the best prediction performance as measured by the average area under the receiver operator curve (AUROC): 0.76 for LSTM, 0.71 for deep neural networks, 0.66 for logistic regression, 0.67 for random forest, and 0.59 for Naïve Bayes. The most influential feature from the LSTM model were blood pressure. Conclusions These findings may be used to prevent CD in the outpatient setting by encouraging appropriate surveillance and management of CVH.

Download Full-text

Comparison of Naive Bayes, Random Forest, Decision Tree, Support Vector Machines, and Logistic Regression Classifiers for Text Reviews Classification

Baltic Journal of Modern Computing ◽

10.22364/bjmc.2017.5.2.05 ◽

2017 ◽

Vol 5 (2) ◽

Cited By ~ 22

Author(s):

Tomas Pranckevičius ◽

Virginijus Marcinkevičius

Keyword(s):

Logistic Regression ◽

Support Vector Machines ◽

Random Forest ◽

Decision Tree ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

Vector Machines

Download Full-text

ANALISIS PERBANDINGAN KORELASI SPEARMAN DAN MAXIMAL INFORMATION COEFFICIENT DALAM SELEKSI FITUR WEBSITE PHISHING MENGGUNAKAN ALGORITMA MACHINE LEARNING

CSRID (Computer Science Research and Its Development Journal) ◽

10.22303/csrid.12.2.2020.107-116 ◽

2021 ◽

Vol 12 (2) ◽

pp. 107

Author(s):

Jimmy H. Moedjahedy ◽

Arief Setyanto ◽

Komang Aryasa

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Naive Bayes ◽

Naïve Bayes ◽

Total Information ◽

Information Coefficient ◽

Maximal Information Coefficient

aan yang menipu maupun secara teknis untuk mencuri data identitas pribadi konsumen dan kredensial akun keuangan. Phishing dirancang untuk mengarahkan konsumen ke website phishing yang menipu penerima untuk membocorkan data keuangan seperti nama pengguna dan kata sandi. Dalam dataset phishing, terdapat fitur-fitur yang bisa mengkategorikan apakah sebuah website adalah website phishing atau bukan. Tujuan dari penelitian ini adalah untuk membandingkan hasil seleksi fitur-fitur yang ada dengan menggunakan dua metode yaitu metode gabungan Maximal Information coefficient dan Total Information Coefficient dengan metode korelasi Spearman. Hasil seleksi diuji dengan lima algoritma machine learning yaitu, Logistic Regression, Naïve Bayes, J48, AdaBoost MI dan Random Forest. Hasil dari penelitian ini adalah metode gabungan Maximal Information coefficent dan Total Information Coefficient memiliki nilai akurasi 97.25 % dengan menggunakan Random Forest mengungguli metode korelasi Spearman dengan nilai akurasi 95,33%.

Download Full-text

Impact of the COVID-19 pandemic on the expression of emotions in social media

Multiple Criteria Decision Making ◽

10.22367/mcdm.2020.15.02 ◽

2020 ◽

Vol 15 ◽

pp. 23-35

Author(s):

Debabrata Ghosh ◽

Keyword(s):

Social Media ◽

Logistic Regression ◽

Random Forest ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

Emotion Classification ◽

Machine Learning Classification ◽

Expression Of Emotions ◽

The Mind

In the age of social media, every second thousands of messages are exchanged. Analyzing those unstructured data to find out specific emotions is a challenging task. Analysis of emotions involves evaluation and classification of text into emotion classes such as Happy, Sad, Anger, Disgust, Fear, Surprise, as defined by emotion dimensional models which are described in the theory of psychology (www 1; Russell, 2005). The main goal of this paper is to cover the COVID-19 pandemic situation in India and its impact on human emotions. As people very often express their state of the mind through social media, analyzing and tracking their emotions can be very effective for government and local authorities to take required measures. We have analyzed different machine learning classification models, such as Naïve Bayes, Support Vector Machine, Random Forest Classifier, Decision Tree and Logistic Regression with 10-fold cross validation to find out top ML models for emotion classification. After tuning the Hyperparameter, we got Logistic regression as the best suited model with accuracy 77% with the given datasets. We worked on algorithm based supervised ML technique to get the expected result. Although multiple studies were conducted earlier along the same lines, none of them performed comparative study among different ML techniques or hyperparameter tuning to optimize the results. Besides, this study has been done on the dataset of the most recent COVID-19 pandemic situation, which is itself unique. We captured Twitter data for a duration of 45 days with hashtag #COVID19India OR #COVID19 and analyzed the data using Logistic Regression to find out how the emotion changed over time based on certain social factors. Keywords: classification, COVID-19, emotion, emotion analysis, Naïve Bayes, Pandemic, Random Forest, SVM.

Download Full-text

PREDIKCIJA POZICIJE FUDBALSKOG IGRAČA UPOTREBOM ALGORITAMA MAŠINSKOG UČENJA

Zbornik radova Fakulteta tehničkih nauka u Novom Sadu ◽

10.24867/13be31skiljevic ◽

2021 ◽

Vol 36 (07) ◽

pp. 1267-1270

Author(s):

Aleksandar Kovačević ◽

Dragan Škiljević

Keyword(s):

Logistic Regression ◽

Random Forest ◽

Naive Bayes ◽

Nearest Neighbors ◽

Naïve Bayes ◽

K Nearest Neighbors

Fudbal je kolektivni sport koji se igra između dvije ekipe, sa po jedanaest igrača. Iako igrači igraju na unaprijed određenoj poziciji, oni mogu lako preći i na neku drugu poziciju. U ovome radu je vršena predikcija najbolje pozicije igrača na osnovu njegovih fizičkih i psihičkih osobina. Osnovni motiv ovoga rada jeste olakšavanje posla fubalskim stručnjacima koji se profesionalno bave svojim poslom. Rješenje ovoga projekta bi u velikoj mjeri olakšalo posao trenerima čiji klubovi se susreću sa mnoštvom povreda, pa je potrebno često vršiti promjenu formacije tima. To bi pomoglo da se u maksimalnoj mjeri iskoristi potencijal svakog igrača. Da bi se što lakše odredila pozicija na kojoj će određeni igrač igrati, u ovom radu, koristićemo skup podataka sa 65 atributa za svakog igrača, na osnovu kojih će se određivati pozicija uz pomoć obučavanja sledećih modela: Multiomial Logistic Regression, K-Nearest Neighbors, Random Forest, Gaussian Naive Bayes, Suport Vector Machine.

Download Full-text