COMPARISON OF MACHINE LEARNING CLASSIFICATION ALGORITHM ON HOTEL REVIEW SENTIMENT ANALYSIS (CASE STUDY: LUMINOR HOTEL PECENONGAN)

Analysis of hotel review sentiment is very helpful to be used as a benchmark or reference for making hotel business decisions today. However, all the review information obtained must be processed first by using an algorithm. The purpose of this study is to compare the Classification Algorithm of Machine Learning to obtain information that has a better level of accuracy in the analysis of hotel reviews. The algorithm that will be used is k-NN (k-Nearest Neighbor) and NB (Naive Bayes). After doing the calculation, the following accuracy level is obtained: k-NN of 60,50% with an AUC value of 0.632 and NB of 85,25% with an AUC value of 0.658. These results can be determined by the right algorithm to assist in making accurate decisions by business people in the analysis of hotel reviews using the NB Algorithm.

Download Full-text

Assessing the Relation between Mud Components and Rheology for Loss Circulation Prevention Using Polymeric Gels: A Machine Learning Approach

Energies ◽

10.3390/en14051377 ◽

2021 ◽

Vol 14 (5) ◽

pp. 1377

Author(s):

Musaab I. Magzoub ◽

Raj Kiran ◽

Saeed Salehi ◽

Ibnelwaleed A. Hussein ◽

Mustafa S. Nasser

Keyword(s):

Machine Learning ◽

Rheological Properties ◽

Nearest Neighbor ◽

Drilling Fluid ◽

Gradient Boosting ◽

K Nearest Neighbor ◽

Wide Range ◽

Machine Learning Approach ◽

Drilling Operations

The traditional way to mitigate loss circulation in drilling operations is to use preventative and curative materials. However, it is difficult to quantify the amount of materials from every possible combination to produce customized rheological properties. In this study, machine learning (ML) is used to develop a framework to identify material composition for loss circulation applications based on the desired rheological characteristics. The relation between the rheological properties and the mud components for polyacrylamide/polyethyleneimine (PAM/PEI)-based mud is assessed experimentally. Four different ML algorithms were implemented to model the rheological data for various mud components at different concentrations and testing conditions. These four algorithms include (a) k-Nearest Neighbor, (b) Random Forest, (c) Gradient Boosting, and (d) AdaBoosting. The Gradient Boosting model showed the highest accuracy (91 and 74% for plastic and apparent viscosity, respectively), which can be further used for hydraulic calculations. Overall, the experimental study presented in this paper, together with the proposed ML-based framework, adds valuable information to the design of PAM/PEI-based mud. The ML models allowed a wide range of rheology assessments for various drilling fluid formulations with a mean accuracy of up to 91%. The case study has shown that with the appropriate combination of materials, reasonable rheological properties could be achieved to prevent loss circulation by managing the equivalent circulating density (ECD).

Download Full-text

Diabetes Prediction Using Machine Learning Techniques

Journal of Intelligent Systems with Applications ◽

10.54856/10.54856/jiswa.202112183 ◽

2021 ◽

pp. 150-152

Author(s):

Seyma Kiziltas Koc ◽

Mustafa Yeniad

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

High Performance ◽

Nearest Neighbor ◽

Classification Performance ◽

Machine Learning Techniques ◽

Support Vector ◽

Classification Algorithms ◽

K Nearest Neighbor ◽

Machine Learning Classification

Technologies which are used in the healthcare industry are changing rapidly because the technology is evolving to improve people's lifestyles constantly. For instance, different technological devices are used for the diagnosis and treatment of diseases. It has been revealed that diagnosis of disease can be made by computer systems with developing technology.Machine learning algorithms are frequently used tools because of their high performance in the field of health as well as many field. The aim of this study is to investigate different machine learning classification algorithms that can be used in the diagnosis of diabetes and to make comparative analyzes according to the metrics in the literature. In the study, seven classification algorithms were used in the literature. These algorithms are Logistic Regression, K-Nearest Neighbor, Multilayer Perceptron, Random Forest, Decision Trees, Support Vector Machine and Naive Bayes. Firstly, classification performance of algorithms are compared. These comparisons are based on accuracy, sensitivity, precision, and F1-score. The results obtained showed that support vector machine algorithm had the highest accuracy with 78.65%.

Download Full-text

Performance Evaluation of Different Machine Learning Classification Algorithms for Disease Diagnosis

International Journal of E-Health and Medical Communications ◽

10.4018/ijehmc.20211101.oa5 ◽

2021 ◽

Vol 12 (6) ◽

pp. 1-28

Author(s):

Munder Abdulatef Al-Hashem ◽

Ali Mohammad Alqudah ◽

Qasem Qananwah

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Performance Metrics ◽

Confusion Matrix ◽

Learning Algorithms ◽

Disease Diagnosis ◽

Machine Learning Algorithms ◽

Classification Algorithms ◽

K Nearest Neighbor ◽

Machine Learning Classification

Knowledge extraction within a healthcare field is a very challenging task since we are having many problems such as noise and imbalanced datasets. They are obtained from clinical studies where uncertainty and variability are popular. Lately, a wide number of machine learning algorithms are considered and evaluated to check their validity of being used in the medical field. Usually, the classification algorithms are compared against medical experts who are specialized in certain disease diagnoses and provide an effective methodological evaluation of classifiers by applying performance metrics. The performance metrics contain four criteria: accuracy, sensitivity, and specificity forming the confusion matrix of each used algorithm. We have utilized eight different well-known machine learning algorithms to evaluate their performances in six different medical datasets. Based on the experimental results we conclude that the XGBoost and K-Nearest Neighbor classifiers were the best overall among the used datasets and signs can be used for diagnosing various diseases.

Download Full-text

Machine Learning Classification and Feature Extraction of Arrhythmic ECG Data

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b3548.079220 ◽

2020 ◽

Vol 9 (2) ◽

pp. 6-12

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Nearest Neighbor ◽

Extraction Process ◽

Support Vector ◽

Ecg Signal ◽

Data Sets ◽

K Nearest Neighbor ◽

Machine Learning Classification ◽

Artificial Neural Network Ann

Electrocardiogram (ECG) is the analysis of the electrical movement of the heart over a period of time. The detailed information about the condition of the heart is measured by analyzing the ECG signal. Wavelet transform, fast Fourier transform are the different methods to disorganize cardiac disease. The paper elaborates the survey on ECG signal analysis and related study on arrhythmic and non arrhythmic data. Here we discuss the efficient feature extraction process for electrocardiogram, where based on position and priority six best P-QRS-T fragments are studied. This survey examines the the outcome of the system by using various Machine learning classification algorithms for feature extraction and analysis of ECG Signals. Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Artificial Neural Network (ANN) are the most important algorithms used here for this purpose. There are several publicly available data sets which are used for arrhythmia analysis and among them MIT-BIH ECG-ID database is mostly used. The drawbacks and limitations are also discussed here and from there future challenges and concluding remarks can be done.

Download Full-text

Penerapan Teknik SMOTE untuk Mengatasi Imbalance Class dalam Klasifikasi Objektivitas Berita Online Menggunakan Algoritma KNN

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v3i2.945 ◽

2019 ◽

Vol 3 (2) ◽

pp. 196-201 ◽

Cited By ~ 2

Author(s):

Anis Nikmatul Kasanah ◽

Muladi Muladi ◽

Utomo Pujianto

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Sampling Technique ◽

Online News ◽

Classification Algorithm ◽

Average Increase ◽

K Nearest Neighbor ◽

Amount Of Information ◽

Special System ◽

Average Decrease

Amount of information in the form of online news needs to be balanced with the ability of readers to sort or classify subjective or objective news. So that a special system is needed that can be used for online news objectivity classification so that it can help readers to pick up subjective or objective news. This research proposes the development of techniques in machine learning to help sort out news objectivity automatically based on the content of the news. The algorithm proposed is K-Nearest Neighbor (KNN) algorithm. News samples obtained from kompas.com by scrapping occur imbalance classes where the number of objective news and subjective news are not balanced. So that it can affect the performance of the classification algorithm. One technique to overcome the imbalance class is to apply the Synthetic Minority Over-sampling Technique (SMOTE) technique.. SMOTE is the generation of minority data as much as the majority data. This study compares the performance of KNN algorithm without SMOTE and the performance of KNN algorithm with SMOTE. Based on the results of the study by applying a variety of neighboring k values, namely 1, 3, 5, 7 and 9, it was found that the application of SMOTE could improve the accuracy of the KNN algorithm at values k = 1 and k = 3 with an average increase of 3.36. At values k 5, 7 and 9 the algorithm experiences an average decrease in accuracy of 6.67.

Download Full-text

Identification of People with Diabetes Treatment through Lipids Profile Using Machine Learning Algorithms

Healthcare ◽

10.3390/healthcare9040422 ◽

2021 ◽

Vol 9 (4) ◽

pp. 422

Author(s):

Vanessa Alcalá-Rmz ◽

Carlos E. Galván-Tejada ◽

Alejandra García-Hernández ◽

Adan Valladares-Salgado ◽

Miguel Cruz ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Nearest Neighbor ◽

Machine Learning Algorithms ◽

Diabetes Treatment ◽

K Nearest Neighbor ◽

The World ◽

Auc Value ◽

Lipids Profile

Diabetes incidence has been a problem, because according with the World Health Organization and the International Diabetes Federation, the number of people with this disease is increasing very fast all over the world. Diabetic treatment is important to prevent the development of several complications, also lipid profile monitoring is important. For that reason the aim of this work is the implementation of machine learning algorithms that are able to classify cases, that corresponds to patients diagnosed with diabetes that have diabetes treatment, and controls that refers to subjects who do not have diabetes treatment but some of them have diabetes, bases on lipids profile levels. Logistic regression, K-nearest neighbor, decision trees and random forest were implemented, all of them were evaluated with accuracy, sensitivity, specificity and AUC-ROC curve metrics. Artificial neural network obtain an acurracy of 0.685 and an AUC value of 0.750, logistic regression achieve an accuracy of 0.729 and an AUC value of 0.795, K-nearest neighbor gets an accuracy of 0.669 and an AUC value of 0.709, on the other hand, decision tree reached an accuracy pg 0.691 and a AUC value of 0.683, finally random forest achieve an accuracy of 0.704 and an AUC curve of 0.776. The performance of all models was statistically significant, but the best performance model for this problem corresponds to logistic regression.

Download Full-text

Weight-Constrained Neural Networks in Forecasting Tourist Volumes: A Case Study

Electronics ◽

10.3390/electronics8091005 ◽

2019 ◽

Vol 8 (9) ◽

pp. 1005 ◽

Cited By ~ 6

Author(s):

Ioannis E. Livieris ◽

Emmanuel Pintelas ◽

Theodore Kotsilieris ◽

Stavros Stavroyiannis ◽

Panagiotis Pintelas

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Nearest Neighbor ◽

Machine Learning Algorithms ◽

Support Vector ◽

Tourist Industry ◽

Box Constraints ◽

K Nearest Neighbor ◽

Tourism Resources

Tourism forecasting is a significant tool/attribute in tourist industry in order to provide for careful planning and management of tourism resources. Although accurate tourist volume prediction is a very challenging task, reliable and precise predictions offer the opportunity of gaining major profits. Thus, the development and implementation of more sophisticated and advanced machine learning algorithms can be beneficial for the tourism forecasting industry. In this work, we explore the prediction performance of Weight Constrained Neural Networks (WCNNs) for forecasting tourist arrivals in Greece. WCNNs constitute a new machine learning prediction model that is characterized by the application of box-constraints on the weights of the network. Our experimental results indicate that WCNNs outperform classical neural networks and the state-of-the-art regression models: support vector regression, k-nearest neighbor regression, radial basis function neural network, M5 decision tree and Gaussian processes.

Download Full-text

Identification of Cardio Diseases in Modern Healthcare using Machine Learning Classification

Turkish Journal of Computer and Mathematics Education (TURCOMAT) ◽

10.17762/turcomat.v12i2.2020 ◽

2021 ◽

Vol 12 (2) ◽

pp. 2356-2365

Author(s):

P.Priya Et. al.

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Good Accuracy ◽

Heart Diseases ◽

Support Vector ◽

Disease Prediction ◽

K Nearest Neighbor ◽

Coronary Heart Diseases ◽

Machine Learning Classification ◽

Clinical Statistics

The health region produces a massive quantity of facts. This statistics is not always made use to the full quantity and is frequently underutilized the usage of this big quantity of statistics, a ailment can be detected, predicated or maybe cured. A large hazard to human type is caused by sicknesses like heart disease, most cancers, tumour, and Alzheimer’s disease prediction. Using machine getting to know strategies, the coronary heart ailment may be expected. Clinical data which includes blood strain, hypertension, diabetes, the quantity of each day cigarettes smoked, and so forth. Are used as input, so these traits are modeled to expect. This model can then be used to are expecting future clinical statistics. The algorithms like Decision Tree , k – Nearest Neighbor and Support Vector Machine are used. The accuracy of the model the use of every of the algorithm is calculated. Then the only with the good accuracy is taken because the version for predicting the coronary heart diseases.

Download Full-text

COMPARISON OF MACHINE LEARNING METHODS IN CLASSIFYING POVERTY IN INDONESIA IN 2018

Jurnal Teknik Informatika (Jutif) ◽

10.20884/1.jutif.2021.2.1.52 ◽

2021 ◽

Vol 2 (1) ◽

pp. 51-56

Author(s):

Pardomuan Robinson Sihombing ◽

Ade Marsinta Arsani

Keyword(s):

Machine Learning ◽

Sampling Method ◽

Nearest Neighbor ◽

Choice Model ◽

Imbalanced Data ◽

K Nearest Neighbor ◽

Learning Methods ◽

Rotation Forest ◽

Machine Learning Classification ◽

Machine Learning Methods

Poverty is still one of the main problems in economic development besides inequality, unemployment, and economic growth. This study aims to model poverty directly using a discrete choice model, namely the machine learning classification method. The data used are imbalanced data where one of the categories is small enough so that the resample of both sampling method is used. In this study, several machine learning methods were applied, including the Decision Tree, Naïve Bayes, K-Nearest Neighbor (KNN), and Rotation Forest. The results show that the technique of using resample both samplings provides optimal results for the four machine learning methods. If viewed from the indicators of accuracy, specificity, sensitivity, AUC, and the highest Kappa coefficient produced, the best method is the KNN method. The KNN model has an accuracy value of 0.73 percent, sensitivity of 0.68 percent, specificity of 78 percent, and AUC of 0.73.

Download Full-text

Breast Cancer Prediction Using Machine Learning

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit206457 ◽

2020 ◽

pp. 278-284

Author(s):

Gaurav Singh

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Nearest Neighbor ◽

Machine Learning Algorithms ◽

Support Vector ◽

Breast Cancer Dataset ◽

K Nearest Neighbor ◽

Cancer Dataset ◽

Implementation Phase ◽

Machine Learning Classification

Breast cancer may be a prevalent explanation for death, and it's the sole sort of cancer that's widespread among women worldwide. The prime objective of this paper creates the model for predicting breast cancer using various machine learning classification algorithms like k Nearest Neighbor (kNN), Support Vector Machine (SVM), Logistic Regression (LR), and Gaussian Naive Bayes (NB). And furthermore, assess and compare the performance of the varied classifiers as far as accuracy, precision, recall, f1-Score, and Jaccard index. The breast cancer dataset is publicly available on the UCI Machine Learning Repository and therefore the implementation phase dataset is going to be partitioned as 80% for the training phase and 20% for the testing phase then apply the machine learning algorithms. k Nearest Neighbors achieved a significant performance in respect of all parameters.

Download Full-text