scholarly journals Booking Prediction Models for Peer-to-peer Accommodation Listings using Logistics Regression, Decision Tree, K-Nearest Neighbor, and Random Forest Classifiers

Author(s):  
Mochammad Agus Afrianto ◽  
Meditya Wasesa

Background: Literature in the peer-to-peer accommodation has put a substantial focus on accommodation listings' price determinants. Developing prediction models related to the demand for accommodation listings is vital in revenue management because accurate price and demand forecasts will help determine the best revenue management responses.Objective: This study aims to develop prediction models to determine the booking likelihood of accommodation listings.Methods: Using an Airbnb dataset, we developed four machine learning models, namely Logistics Regression, Decision Tree, K-Nearest Neighbor (KNN), and Random Forest Classifiers. We assessed the models using the AUC-ROC score and the model development time by using the ten-fold three-way split and the ten-fold cross-validation procedures.Results: In terms of average AUC-ROC score, the Random Forest Classifiers outperformed other evaluated models. In three-ways split procedure, it had a 15.03% higher AUC-ROC score than Decision Tree, 2.93 % higher than KNN, and 2.38% higher than Logistics Regression. In the cross-validation procedure, it has a 26,99% higher AUC-ROC score than Decision Tree, 4.41 % higher than KNN, and 3.31% higher than Logistics Regression.  It should be noted that the Decision Tree model has the lowest AUC-ROC score, but it has the smallest model development time.Conclusion: The performance of random forest models in predicting booking likelihood of accommodation listings is the most superior. The model can be used by peer-to-peer accommodation owners to improve their revenue management responses. 

2021 ◽  
Author(s):  
Zahid Ullah ◽  
Farrukh Saleem ◽  
Mona Jamjoom

BACKGROUND The use of artificial intelligence (AI) has revolutionized every area of life such as business and trade, social and electronic media, education and learning, manufacturing industries, medical and sciences, and every other sector. The new reforms and advanced technologies of AI have enabled data analysts to transmute raw data generated by these sectors into meaningful insights for an effective decision-making process. Health care is one of the integral sectors where a large amount of data is generated daily, and making effective decisions based on this data is therefore a challenge. In health care, cases related to childbirth either by the traditional method of vaginal delivery or cesarean delivery have been investigated in this study. Cesarean delivery is performed to save both mother and fetal lives when complications arise related to vaginal birth. OBJECTIVE To develop reliable prediction models for a maternity care decision support system to predict mode of delivery before birth. METHODS This study is conducted in two folds for identifying the mode of delivery: firstly, to enrich the existing dataset; secondly, to investigate previous medical records about the mode of delivery using machine learning algorithms and extract meaningful insight into the unseen cases. To achieve this objective, several prediction models were trained such as Decision Tree (DT), Random Forest (RF), AdaBoostM1 (AB), Bagging, and k-Nearest Neighbor (k-NN), based on original and enriched datasets. RESULTS To achieve the objective, several prediction models were trained such as Decision Tree (DT), Random Forest (RF), AdaBoostM1 (AB), Bagging, and k-Nearest Neighbor (k-NN) based on original and enriched datasets. As an outcome, the prediction models based on enriched data performed well in terms of accuracy, sensitivity, specificity, F-measure, and ROC. Specifically, k-NN outperformed with an accuracy of 84.38%, Bagging (83.75%), RF (83.13%), DT (81.25%), and AB (80.63%). In the end, enriching the dataset improves the accuracy of the prediction process, which supports maternity care practitioners in making decisions for critical cases. CONCLUSIONS Enriching the dataset improves the accuracy of the prediction process, which supports maternity care practitioners in making decisions for critical cases. The enriched dataset in its current stage used in this study yields better results, but this could be even better if its records were increased with real clinical data.


Sebatik ◽  
2020 ◽  
Vol 24 (2) ◽  
Author(s):  
Anifuddin Azis

Indonesia merupakan negara dengan keanekaragaman hayati terbesar kedua di dunia setelah Brazil. Indonesia memiliki sekitar 25.000 spesies tumbuhan dan 400.000 jenis hewan dan ikan. Diperkirakan 8.500 spesies ikan hidup di perairan Indonesia atau merupakan 45% dari jumlah spesies yang ada di dunia, dengan sekitar 7.000an adalah spesies ikan laut. Untuk menentukan berapa jumlah spesies tersebut dibutuhkan suatu keahlian di bidang taksonomi. Dalam pelaksanaannya mengidentifikasi suatu jenis ikan bukanlah hal yang mudah karena memerlukan suatu metode dan peralatan tertentu, juga pustaka mengenai taksonomi. Pemrosesan video atau citra pada data ekosistem perairan yang dilakukan secara otomatis mulai dikembangkan. Dalam pengembangannya, proses deteksi dan identifikasi spesies ikan menjadi suatu tantangan dibandingkan dengan deteksi dan identifikasi pada objek yang lain. Metode deep learning yang berhasil dalam melakukan klasifikasi objek pada citra mampu untuk menganalisa data secara langsung tanpa adanya ekstraksi fitur pada data secara khusus. Sistem tersebut memiliki parameter atau bobot yang berfungsi sebagai ektraksi fitur maupun sebagai pengklasifikasi. Data yang diproses menghasilkan output yang diharapkan semirip mungkin dengan data output yang sesungguhnya.  CNN merupakan arsitektur deep learning yang mampu mereduksi dimensi pada data tanpa menghilangkan ciri atau fitur pada data tersebut. Pada penelitian ini akan dikembangkan model hybrid CNN (Convolutional Neural Networks) untuk mengekstraksi fitur dan beberapa algoritma klasifikasi untuk mengidentifikasi spesies ikan. Algoritma klasifikasi yang digunakan pada penelitian ini adalah : Logistic Regression (LR), Support Vector Machine (SVM), Decision Tree, K-Nearest Neighbor (KNN),  Random Forest, Backpropagation.


2017 ◽  
Vol 55 (4) ◽  
pp. 681-700 ◽  
Author(s):  
Sangjae Lee ◽  
Joon Yeon Choeh

Purpose The purpose of this paper is to suggest important determinants for helpfulness from the reviews’ product data, review characteristics, and textual characteristics, and identify the more crucial factors among these determinants by using statistical methods. Furthermore, this study intends to propose a classification-based review recommender using a decision tree (CRDT) that uses a decision tree to identify and recommend reviews that have a high level of helpfulness. Design/methodology/approach This study used publicly available data from Amazon.com to construct measures of determinants and helpfulness. To examine this, the authors collected data about economic transactions on Amazon.com and analyzed the associated review system. The final sample included 10,000 reviews composed of 4,799 helpful and 5,201 not helpful reviews. Findings The study selected more crucial determinants from a comprehensive group of product, reviewer, and textual characteristics through using a t-test and logistics regression. The five important variables found to be significant in both t-test and logistic regression analysis were the total number of reviews for the product, the reviewer’s history macro, the reviewer’s rank, the disclosure of the reviewer’s name, and the length of the review in words. The decision tree method produced decision rules for determining helpfulness from the value of the product data, review characteristics, and textual characteristics. The prediction accuracy of CRDT was better than that of the k-nearest neighbor (kNN) method and linear multivariate discriminant analysis in terms of prediction error. CRDT can suggest better determinants that have a greater effect on the degree of helpfulness. Practical implications The important factors suggested as affecting review helpfulness should be considered in the design of websites, as online retail sites with more helpful reviews can provide a greater potential value to customers. The results of the study suggest managers and marketers better understand customers’ review and increase the value to customers by proving enhanced diagnosticity to consumers. Originality/value This study is different from previous studies in that it investigated the holistic aspect of determinants, that is, product, review, and textual characteristics for classifying helpful reviews, and selected more crucial determinants from a comprehensive group of product, reviewer, and textual characteristics by using a t-test and logistics regression. This study utilized a decision tree, which has rarely been used in predicting review helpfulness, to provide rules for identifying helpful online reviews.


2021 ◽  
Author(s):  
Hemalatha N ◽  
Akhil Wilson ◽  
Akhil Thankachan

Plastic pollution is one of the challenging problems in the environment. But a life without plastic we cannot imagine. This paper deals with the prediction of plastic degrading microbes using Machine Learning. Here we have used Decision Tree, Random Forest, Support vector Machine and K Nearest Neighbor algorithms in order to predict the plastic degrading microbes. Among the four classifiers, Random Forest model gave the best accuracy of 99.1%.


MATICS ◽  
2021 ◽  
Vol 13 (1) ◽  
pp. 21-27
Author(s):  
Via Ardianto Nugroho ◽  
Derry Pramono Adi ◽  
Achmad Teguh Wibowo ◽  
MY Teguh Sulistyono ◽  
Agustinus Bimo Gumelar

Pada industri jasa pelayanan peti kemas, Terminal Nilam merupakan pelanggan dari PT. BIMA, yang secara khusus bergerak dibidang jasa perbaikan dan perawatan alat berat. Terminal ini menjadi sentral tempat untuk melakukan aktifitas bongkar muat peti kemas domestik yang memiliki empat buah container crane untuk melayani dua kapal. Proses perawatan alat berat seperti container crane yang selama ini beroperasi, agaknya kurang memperhatikan data pengelompokkan atau klasifikasi jenis perawatan yang dibutuhkan oleh alat berat tersebut. Di kemudian hari, alat berat dapat menunjukkan kinerja yang tidak maksimal bahkan dapat berujung pada kecelakaan kerja. Selain itu, kelalaian perawatan container crane juga dapat menyebabkan pembengkakan biaya perawatan lanjut. Target produksi bongkar muat dapat berkurang dan juga keterlambatan jadwal kapal sandar sangat mungkin terjadi. Metode pembelajaran menggunakan mesin atau biasa disebut dengan Machine Learning (ML), dengan mudah dapat melenyapkan kemungkinan-kemungkinan tersebut. ML dalam penelitian ini, kami rancang agar bekerja dengan mengidentifikasi lalu mengelompokkan jenis perawatan container crane yang sesuai, yaitu ringan atau berat. Metode ML yang pilih untuk digunakan dalam penelitian ini yaitu Random Forest, Support Vector Machine, k-Nearest Neighbor, Naïve Bayes, Logistic Regression, J48, dan Decision Tree. Penelitian ini menunjukkan keberhasilan ML model tree dalam melakukan pembelajaran jenis data perawatan container crane (numerik dan kategoris), dengan J48 menunjukkan performa terbaik dengan nilai akurasi dan nilai ROC-AUC mencapai 99,1%. Pertimbangan klasifikasi kami lakukan dengan mengacu kepada tanggal terakhir perawatan, hour meter, breakdown, shutdown, dan sparepart.


2019 ◽  
Vol 24 (3) ◽  
pp. 161-170
Author(s):  
Ardea Bagas Wibisono ◽  
Achmad Fahrurozi

Penyakit Jantung Koroner (PJK) menjadi penyebab kematian tertinggi pada semua umur setelah stroke. Hal ini mendorong banyak penelitian terhadap penyakit jantung koroner, salah satunya menggunakan metode berbasis komputer. Pengolahan data dalam jumlah besar dapat dilakukan dengan klasifikasi menggunakan algoritma tertentu sehingga hasilnya cepat dan akurat. Metode klasifikasi yang umum digunakan antara lain Naïve Bayes, K-Nearest Neighbor, Decision Tree dan Random Forest. Metode Naïve Bayes menggunakan probabilitas disetiap data, metode K-Nearest Neighbor menggunakan perhitungan jarak, metode Decision Tree menggunakan pohon keputusan, sedangkan metode Random Forest menggunakan beberapa pohon keputusan yang disatukan. Penelitian ini bertujuan untuk membandingkan keempat algoritma tersebut dalam mengklasifikasikan data penyakit jantung koroner. Perbandingan algoritma akan dilihat berdasarkan performance measure yang terdiri dari tingkatan akurasi, recall disetiap kelas, dan presisi disetiap kelas. Pada setiap algoritma diuji menggunakan cross validation. Berdasarkan hasil perbandingan terhadap 300 dataset penyakit jantung koroner, algoritma Random Forest lebih baik dan optimal dibanding dengan Algoritma Naïve Bayes, K-Nearest Neighbor, dan Decision Tree untuk mengklasifikasikan penyakit jantung koroner. Hasil klasifikasi dengan algoritma Random Forest memiliki rerata tingkat akurasi sebesar 85,668 % dengan recall kelas ’1’ adalah 89 %, recall kelas ’0’ adalah 83,6%, presisi kelas ’1’ adalah 85%, dan presisi kelas ’0’ adalah 85,8%.


Author(s):  
Fairoz Q. Kareem ◽  
Adnan Mohsin Abdulazeez ◽  
Dathar A. Hasan

Weather forecasting is the process of predicting the status of the atmosphere for certain regions or locations by utilizing recent technology. Thousands of years ago, humans tried to foretell the weather state in some civilizations by studying the science of stars and astronomy. Realizing the weather conditions has a direct impact on many fields, such as commercial, agricultural, airlines, etc. With the recent development in technology, especially in the DM and machine learning techniques, many researchers proposed weather forecasting prediction systems based on data mining classification techniques. In this paper, we utilized neural networks, Naïve Bayes, random forest, and K-nearest neighbor algorithms to build weather forecasting prediction models. These models classify the unseen data instances to multiple class rain, fog, partly-cloudy day, clear-day and cloudy. These model performance for each algorithm has been trained and tested using synoptic data from the Kaggle website. This dataset contains (1796) instances and (8) attributes in our possession. Comparing with other algorithms, the Random forest algorithm achieved the best performance accuracy of 89%. These results indicate the ability of data mining classification algorithms to present optimal tools to predict weather forecasting.


10.2196/28856 ◽  
2021 ◽  
Vol 23 (6) ◽  
pp. e28856
Author(s):  
Zahid Ullah ◽  
Farrukh Saleem ◽  
Mona Jamjoom ◽  
Bahjat Fakieh

Background The use of artificial intelligence has revolutionized every area of life such as business and trade, social and electronic media, education and learning, manufacturing industries, medicine and sciences, and every other sector. The new reforms and advanced technologies of artificial intelligence have enabled data analysts to transmute raw data generated by these sectors into meaningful insights for an effective decision-making process. Health care is one of the integral sectors where a large amount of data is generated daily, and making effective decisions based on these data is therefore a challenge. In this study, cases related to childbirth either by the traditional method of vaginal delivery or cesarean delivery were investigated. Cesarean delivery is performed to save both the mother and the fetus when complications related to vaginal birth arise. Objective The aim of this study was to develop reliable prediction models for a maternity care decision support system to predict the mode of delivery before childbirth. Methods This study was conducted in 2 parts for identifying the mode of childbirth: first, the existing data set was enriched and second, previous medical records about the mode of delivery were investigated using machine learning algorithms and by extracting meaningful insights from unseen cases. Several prediction models were trained to achieve this objective, such as decision tree, random forest, AdaBoostM1, bagging, and k-nearest neighbor, based on original and enriched data sets. Results The prediction models based on enriched data performed well in terms of accuracy, sensitivity, specificity, F-measure, and receiver operating characteristic curves in the outcomes. Specifically, the accuracy of k-nearest neighbor was 84.38%, that of bagging was 83.75%, that of random forest was 83.13%, that of decision tree was 81.25%, and that of AdaBoostM1 was 80.63%. Enrichment of the data set had a good impact on improving the accuracy of the prediction process, which supports maternity care practitioners in making decisions in critical cases. Conclusions Our study shows that enriching the data set improves the accuracy of the prediction process, thereby supporting maternity care practitioners in making informed decisions in critical cases. The enriched data set used in this study yields good results, but this data set can become even better if the records are increased with real clinical data.


2021 ◽  
Author(s):  
Rekha G ◽  
Shanthini B ◽  
Ranjith Kumar V

Heart diseases or Cardiovascular Diseases (CVDs) are the main cause of death on the planet throughout the most recent years and become the most dangerous disease in India and the entire world. The UCI repository is utilized to calculate the exactness of the AI calculations for foreseeing coronary illness, as k-nearest neighbor, decision tree, linear regression, and support vector machine. Different indications like chest pain, fasting of heartbeat, etc., are referenced. Large datasets, which are not available in medical and clinical research, are required in order to apply deep learning techniques. Surrogate data is generated from Cleveland dataset. The predicted results show that there is an improvement in classification accuracy. Heart disease is one of the most challenging diseases to diagnose as it is the most recognized killer in the present day. Utilizing AI algorithms, this paper gives anticipating coronary illness. Here, we will use the various machine learning algorithms such as Support Vector Machine, Random Forest, KNN, Naive Bayes, Decision Tree and LR.


Author(s):  
Nana Suryana ◽  
Pratiwi Pratiwi ◽  
Rizki Tri Prasetio

Industri telekomunikasi menghadapi persaingan yang ketat antara penyedia layanan (service provider). Persaingan ini mengakibatkan customer churn atau berpindahnya pelanggan dari satu layanan ke layanan lain. Customer churn menjadi masalah utama karena dapat mempengaruhi pendapatan perusahaan, profitabilitas, serta kelangsungan hidup perusahaan. Oleh karena itu, mengetahui pelanggan yang akan melakukan churn secara dini menjadi salah satu cara yang cukup efektif dilakukan, karena dapat membantu perusahaan dalam membuat rencana yang efektif untuk tetap mempertahankan pelanggannya. Jumlah pelanggan yang mengundurkan diri dari layanannya saat ini biasanya dimiliki perusahaan dalam jumlah yang sedikit. Kondisi kekurangan data ini menyebabkan kesulitan dalam memprediksi customer churn. Tujuan umum dari penelitian ini adalah memprediksi pelanggan yang akan berpindah ke layanan lain atau mengundurkan diri dari layanannya saat ini. Sementara tujuan khusus penelitian Penelitian ini berusaha menangani ketidakseimbangan data dalam prediksi customer churn menggunakan optimasi pada level data melalui metode sampling yaitu Synthetic Minority Over Sampling. Kemudian dikombinasikan dengan optimasi level algoritma melalui pendekatan teknik Boosting. Pada penelitian beberapa algoritma prediksi seperti random forest, naïve bayes, decision tree, k-nearest neighbor dan deep learning yang akan diimplementasikan untuk mengetahui algoritma yang paling baik setelah dilakukan optimasi menggunakan SMOTE dan Boosting. Metode penelitian yang digunakan pada penelitian ini adalah CRISP-DM, yang merupakan kerangka penelitian data mining untuk penelitian lintas industri. Hasil penelitian ini menunjukan bahwa algoritma random forest merupakan algoritma yang menghasilkan akurasi paling optimal setelah dioptimasi menggunakan SMOTE dan Boosting dengan hasil akurasi 89,19%. The telecommunications industry faces stiff competition between service providers. This competition results in customer churn. Customer churn is a major problem because it can affect company revenue, profitability, survival, and service quality of the company. Therefore, knowing which customers will churn in the future early is one of the most effective ways to do it, because it can help companies make an effective plan to keep their customers. The number of customers who withdrew from its current services is usually owned by a small number. This lack of data causes difficulties in predicting customer churn. This problem then becomes a challenging issue in machine learning. The general purpose of this research is to predict customers who will churn. While the specific purpose of this research is to try to deal with data imbalances in predicting customer churn using optimization at the data level through the sampling method, namely Synthetic Minority Over Sampling (SMOTE). Then combined with algorithm level optimization through the Boosting technique approach. In this study, several prediction algorithms like the random forest, naïve Bayes, decision tree, k-nearest neighbor, and deep learning will be implemented to find out the best algorithm after optimization using SMOTE and Boosting. The method used in this study is CRISP-DM, which is a data mining research framework for cross-industry research. The results of this study indicate that the random forest algorithm is an algorithm that produces the most optimal accuracy after being optimized using SMOTE and Boosting with an accuracy of 89.19%.


Sign in / Sign up

Export Citation Format

Share Document