scholarly journals Using Decision Tree to Predict Response Rates of Consumer Satisfaction, Attitude, and Loyalty Surveys

2019 ◽  
Vol 11 (8) ◽  
pp. 2306 ◽  
Author(s):  
Jian Han ◽  
Miaodan Fang ◽  
Shenglu Ye ◽  
Chuansheng Chen ◽  
Qun Wan ◽  
...  

Response rate has long been a major concern in survey research commonly used in many fields such as marketing, psychology, sociology, and public policy. Based on 244 published survey studies on consumer satisfaction, loyalty, and trust, this study aimed to identify factors that were predictors of response rates. Results showed that response rates were associated with the mode of data collection (face-to-face > mail/telephone > online), type of survey sponsors (government agencies > universities/research institutions > commercial entities), confidentiality (confidential > non-confidential), direct invitation (yes > no), and cultural orientation (individualism > collectivism). A decision tree regression analysis (using classification and regression Tree (C&RT) algorithm on 80% of the studies as the training set and 20% as the test set) revealed that a model with all above-mentioned factors attained a linear correlation coefficient (0.578) between the predicted values and actual values, which was higher than the corresponding coefficient of the traditional linear regression model (0.423). A decision tree analysis (using C5.0 algorithm on 80% of the studies as the training set and 20% as the test set) revealed that a model with all above-mentioned factors attained an overall accuracy of 78.26% in predicting whether a survey had a high (>50%) or low (<50%) response rate. Direct invitation was the most important factor in all three models and had a consistent trend in predicting response rate.

SPE Journal ◽  
2018 ◽  
Vol 23 (04) ◽  
pp. 1075-1089 ◽  
Author(s):  
Jared Schuetter ◽  
Srikanta Mishra ◽  
Ming Zhong ◽  
Randy LaFollette (ret.)

Summary Considerable amounts of data are being generated during the development and operation of unconventional reservoirs. Statistical methods that can provide data-driven insights into production performance are gaining in popularity. Unfortunately, the application of advanced statistical algorithms remains somewhat of a mystery to petroleum engineers and geoscientists. The objective of this paper is to provide some clarity to this issue, focusing on how to build robust predictive models and how to develop decision rules that help identify factors separating good wells from poor performers. The data for this study come from wells completed in the Wolfcamp Shale Formation in the Permian Basin. Data categories used in the study included well location and assorted metrics capturing various aspects of well architecture, well completion, stimulation, and production. Predictive models for the production metric of interest are built using simple regression and other advanced methods such as random forests (RFs), support-vector regression (SVR), gradient-boosting machine (GBM), and multidimensional Kriging. The data-fitting process involves splitting the data into a training set and a test set, building a regression model on the training set and validating it with the test set. Repeated application of a “cross-validation” procedure yields valuable information regarding the robustness of each regression-modeling approach. Furthermore, decision rules that can identify extreme behavior in production wells (i.e., top x% of the wells vs. bottom x%, as ranked by the production metric) are generated using the classification and regression-tree algorithm. The resulting decision tree (DT) provides useful insights regarding what variables (or combinations of variables) can drive production performance into such extreme categories. The main contributions of this paper are to provide guidelines on how to build robust predictive models, and to demonstrate the utility of DTs for identifying factors responsible for good vs. poor wells.


2017 ◽  
Vol 08 (04) ◽  
pp. 1022-1030 ◽  
Author(s):  
Richard Boyce ◽  
Jeremy Jao ◽  
Taylor Miller ◽  
Sandra Kane-Gill

Objective To conduct research to show the value of text mining for automatically identifying suspected bleeding adverse drug events (ADEs) in the emergency department (ED). Methods A corpus of ED admission notes was manually annotated for bleeding ADEs. The notes were taken for patients ≥ 65 years of age who had an ICD-9 code for bleeding, the presence of hemoglobin value ≤ 8 g/dL, or were transfused > 2 units of packed red blood cells. This training corpus was used to develop bleeding ADE algorithms using Random Forest and Classification and Regression Tree (CART). A completely separate set of notes was annotated and used to test the classification performance of the final models using the area under the ROC curve (AUROC). Results The best performing CART resulted in an AUROC on the training set of 0.882. The model's AUROC on the test set was 0.827. At a sensitivity of 0.679, the model had a specificity of 0.908 and a positive predictive value (PPV) of 0.814. It had a relatively simple and intuitive structure consisting of 13 decision nodes and 14 leaf nodes. Decision path probabilities ranged from 0.041 to 1.0. The AUROC for the best performing Random Forest method on the training set was 0.917. On the test set, the model's AUROC was 0.859. At a sensitivity of 0.274, the model had a specificity of 0.986 and a PPV of 0.92. Conclusion Both models accurately identify bleeding ADEs using the presence or absence of certain clinical concepts in ED admission notes for older adult patients. The CART model is particularly noteworthy because it does not require significant technical overhead to implement. Future work should seek to replicate the results on a larger test set pulled from another institution.


2021 ◽  
Vol 12 (2) ◽  
Author(s):  
Mohammad Haekal ◽  
Henki Bayu Seta ◽  
Mayanda Mega Santoni

Untuk memprediksi kualitas air sungai Ciliwung, telah dilakukan pengolahan data-data hasil pemantauan secara Online Monitoring dengan menggunakan Metode Data Mining. Pada metode ini, pertama-tama data-data hasil pemantauan dibuat dalam bentuk tabel Microsoft Excel, kemudian diolah menjadi bentuk Pohon Keputusan yang disebut Algoritma Pohon Keputusan (Decision Tree) mengunakan aplikasi WEKA. Metode Pohon Keputusan dipilih karena lebih sederhana, mudah dipahami dan mempunyai tingkat akurasi yang sangat tinggi. Jumlah data hasil pemantauan kualitas air sungai Ciliwung yang diolah sebanyak 5.476 data. Hasil klarifikasi dengan Pohon Keputusan, dari 5.476 data ini diperoleh jumlah data yang mengindikasikan sungai Ciliwung Tidak Tercemar sebanyak 1.059 data atau sebesar 19,3242%, dan yang mengindikasikan Tercemar sebanyak 4.417 data atau 80,6758%. Selanjutnya data-data hasil pemantauan ini dievaluasi menggunakan 4 Opsi Tes (Test Option) yaitu dengan Use Training Set, Supplied Test Set, Cross-Validation folds 10, dan Percentage Split 66%. Hasil evaluasi dengan 4 opsi tes yang digunakan ini, semuanya menunjukkan tingkat akurasi yang sangat tinggi, yaitu diatas 99%. Dari data-data hasil peneltian ini dapat diprediksi bahwa sungai Ciliwung terindikasi sebagai sungai tercemar bila mereferensi kepada Peraturan Pemerintah Republik Indonesia nomor 82 tahun 2001 dan diketahui pula bahwa penggunaan aplikasi WEKA dengan Algoritma Pohon Keputusan untuk mengolah data-data hasil pemantauan dengan mengambil tiga parameter (pH, DO dan Nitrat) adalah sangat akuran dan tepat. Kata Kunci : Kualitas air sungai, Data Mining, Algoritma Pohon Keputusan, Aplikasi WEKA.


2019 ◽  
Vol 2 (2) ◽  
pp. 92-98
Author(s):  
Hespri Yomeldi ◽  
Moh Roufiq Azmy ◽  
Ryche Pranita

Ship health checks must be carried out which function to provide a sailing permit. The implementation of ship health checks is carried out in collaboration with the ministry of health and transportation. The implementation of the activity, commonly known as Port Health Quarantine Clearance (PHQC) requires time to check and the ship makes a payment check to be able to issue a sailing permit. The problem that arises in the field is that the ship delays PHQC payments and then impacts on the buildup of ships in the port, besides that officers also need longer time to process the issuance of sailing permits. This of course has an impact on other port services such as dwelling time and scheduled departures that can be delayed. In overcoming this problem, an in-depth study is needed to analyze the trend of late payment of ship health checks, what variables influence it and how treatment is done to overcome these problems. Using logistic regression and decision tree with Classification and Regression Tree algorithm , a model is then developed that determines the variables that affect the delay of the ship making PHQC payments.


2021 ◽  
Author(s):  
Peng Song ◽  
Shengwei Ren ◽  
Yu Liu ◽  
Pei Li ◽  
Qingyan Zeng

Abstract The aim of this study was to develop a predictive model for subclinical keratoconus (SKC) based on decision tree (DT) algorithms. A total of 194 eyes (including 105 normal eyes and 89 SKC) were included in the double-center retrospective study. Data were separately used for training and validation databases. The baseline variables were derived from tomography and biomechanical imaging. DT models were generated in the training database using Chi-square automatic interaction detection (CHAID) and classification and regression tree (CART) algorithms. The discriminating rules of the CART model selected variables of the Belin/Ambrósio deviation (BAD-D), stiffness parameter at first applanation (SPA1), back eccentricity (Becc), and maximum pachymetric progression index in order, while the CHAID model selected BAD-D, deformation amplitude ratio, SPA1, and Becc. The CART model allowed discrimination between normal and SKC eyes with 92.2% accuracy, which was higher than that of the CHAID model (88.3%), BAD-D (82.0%), Corvis biomechanical index (CBI, 77.3%), and tomographic and biomechanical index (TBI, 78.1%). The discriminating performance of the CART model was validated with 92.4% accuracy, while the CHAID model was validated with 86.4% accuracy in the validation database. Thus, the CART model using tomography and biomechanical imaging was an excellent model for SKC screening and provided easy-to-understand discriminating rules.


Foods ◽  
2019 ◽  
Vol 8 (7) ◽  
pp. 274 ◽  
Author(s):  
Mohammed Gagaoua ◽  
Valérie Monteils ◽  
Sébastien Couvreur ◽  
Brigitte Picard

This trial aimed to integrate metadata that spread over farm-to-fork continuum of 110 Protected Designation of Origin (PDO)Maine-Anjou cows and combine two statistical approaches that are chemometrics and supervised learning; to identify the potential predictors of beef tenderness analyzed using the instrumental Warner-Bratzler Shear force (WBSF). Accordingly, 60 variables including WBSF and belonging to 4 levels of the continuum that are farm-slaughterhouse-muscle-meat were analyzed by Partial Least Squares (PLS) and three decision tree methods (C&RT: classification and regression tree; QUEST: quick, unbiased, efficient regression tree and CHAID: Chi-squared Automatic Interaction Detection) to select the driving factors of beef tenderness and propose predictive decision tools. The former method retained 24 variables from 59 to explain 75% of WBSF. Among the 24 variables, six were from farm level, four from slaughterhouse level, 11 were from muscle level which are mostly protein biomarkers, and three were from meat level. The decision trees applied on the variables retained by the PLS model, allowed identifying three WBSF classes (Tender (WBSF ≤ 40 N/cm2), Medium (40 N/cm2 < WBSF < 45 N/cm2), and Tough (WBSF ≥ 45 N/cm2)) using CHAID as the best decision tree method. The resultant model yielded an overall predictive accuracy of 69.4% by five splitting variables (total collagen, µ-calpain, fiber area, age of weaning and ultimate pH). Therefore, two decision model rules allow achieving tender meat on PDO Maine-Anjou cows: (i) IF (total collagen < 3.6 μg OH-proline/mg) AND (µ-calpain ≥ 169 arbitrary units (AU)) AND (ultimate pH < 5.55) THEN meat was very tender (mean WBSF values = 36.2 N/cm2, n = 12); or (ii) IF (total collagen < 3.6 μg OH-proline/mg) AND (µ-calpain < 169 AU) AND (age of weaning < 7.75 months) AND (fiber area < 3100 µm2) THEN meat was tender (mean WBSF values = 39.4 N/cm2, n = 30).


2020 ◽  
Vol 39 (5) ◽  
pp. 6073-6087
Author(s):  
Meltem Yontar ◽  
Özge Hüsniye Namli ◽  
Seda Yanik

Customer behavior prediction is gaining more importance in the banking sector like in any other sector recently. This study aims to propose a model to predict whether credit card users will pay their debts or not. Using the proposed model, potential unpaid risks can be predicted and necessary actions can be taken in time. For the prediction of customers’ payment status of next months, we use Artificial Neural Network (ANN), Support Vector Machine (SVM), Classification and Regression Tree (CART) and C4.5, which are widely used artificial intelligence and decision tree algorithms. Our dataset includes 10713 customer’s records obtained from a well-known bank in Taiwan. These records consist of customer information such as the amount of credit, gender, education level, marital status, age, past payment records, invoice amount and amount of credit card payments. We apply cross validation and hold-out methods to divide our dataset into two parts as training and test sets. Then we evaluate the algorithms with the proposed performance metrics. We also optimize the parameters of the algorithms to improve the performance of prediction. The results show that the model built with the CART algorithm, one of the decision tree algorithm, provides high accuracy (about 86%) to predict the customers’ payment status for next month. When the algorithm parameters are optimized, classification accuracy and performance are increased.


Author(s):  
Ying Yao ◽  
Xiaohua Zhao ◽  
Hongji Du ◽  
Yunlong Zhang ◽  
Guohui Zhang ◽  
...  

It is a commonly known fact that both alcohol and fatigue impair driving performance. Therefore, the identification of fatigue and drinking status is very important. In this study, each of the 22 participants finished five driving tests in total. The control condition, serving as the benchmark in the five driving tests, refers to alert driving. The other four test conditions include driving with three blood alcohol content (BAC) levels (0.02%, 0.05%, and 0.08%) and driving in a fatigued state. The driving scenario included straight and curved roads. The straight roads connected the curved ones with radii of 200 m, 500 m, and 800 m with two turning directions (left and right). Driving performance indicators such as the average and standard deviation of longitudinal speed and lane position were selected to identify drunk driving and fatigued driving. In the process of identification, road geometry (straight segments, radius, and direction of curves) was also taken into account. Alert vs. abnormal and fatigued vs. drunk driving with various BAC levels were analyzed separately using the Classification and Regression Tree (CART) model, and the significance of the variables on the binary response variable was determined. The results showed that the decision tree could be used to distinguish normal driving from abnormal driving, fatigued driving, and drunk driving based on the indexes of vehicle speed and lane position at curves with different radii. The overall accuracy of classification of “alert” and “abnormal” driving was 90.9%, and that of “fatigued” and “drunk” driving was 94.4%. The accuracy was relatively low in identifying different BAC degrees. This experiment is designed to provide a reference for detecting dangerous driving states.


2013 ◽  
Vol 864-867 ◽  
pp. 2782-2786
Author(s):  
Bao Hua Yang ◽  
Shuang Li

This papers deals with the study of the algorithm of classification method based on decision tree for remote sensing image. The experimental area is located in the Xiangyang district, the data source for the 2010 satellite images of SPOT and TM fusion. Moreover, classification method based on decision tree is optimized with the help of the module of RuleGen and applied in regional remote sensing image of interest. The precision of Maximum likelihood ratio is 95.15 percent, and 94.82 percent for CRAT. Experimental results show that the classification method based on classification and regression tree method is as well as the traditional one.


2014 ◽  
Vol 2014 ◽  
pp. 1-9 ◽  
Author(s):  
Mutasem Sh. Alkhasawneh ◽  
Umi Kalthum Ngah ◽  
Lea Tien Tay ◽  
Nor Ashidi Mat Isa ◽  
Mohammad Subhi Al-Batah

This paper proposes a decision tree model for specifying the importance of 21 factors causing the landslides in a wide area of Penang Island, Malaysia. These factors are vegetation cover, distance from the fault line, slope angle, cross curvature, slope aspect, distance from road, geology, diagonal length, longitude curvature, rugosity, plan curvature, elevation, rain perception, soil texture, surface area, distance from drainage, roughness, land cover, general curvature, tangent curvature, and profile curvature. Decision tree models are used for prediction, classification, and factors importance and are usually represented by an easy to interpret tree like structure. Four models were created using Chi-square Automatic Interaction Detector (CHAID), Exhaustive CHAID, Classification and Regression Tree (CRT), and Quick-Unbiased-Efficient Statistical Tree (QUEST). Twenty-one factors were extracted using digital elevation models (DEMs) and then used as input variables for the models. A data set of 137570 samples was selected for each variable in the analysis, where 68786 samples represent landslides and 68786 samples represent no landslides. 10-fold cross-validation was employed for testing the models. The highest accuracy was achieved using Exhaustive CHAID (82.0%) compared to CHAID (81.9%), CRT (75.6%), and QUEST (74.0%) model. Across the four models, five factors were identified as most important factors which are slope angle, distance from drainage, surface area, slope aspect, and cross curvature.


Sign in / Sign up

Export Citation Format

Share Document