Abstract: Data Mining Alternatives to Logistic Regression for Propensity Score Estimation: Neural Networks and Support Vector Machines

2013 ◽  
Vol 48 (1) ◽  
pp. 164-164 ◽  
Author(s):  
Bryan S. B. Keller ◽  
Jee-Seon Kim ◽  
Peter M. Steiner
Author(s):  
Jasleen Kaur ◽  
Khushdeep Dharni

Uniqueness in economies and stock markets has given rise to an interesting domain of exploring data mining techniques across global indices. Previously, very few studies have attempted to compare the performance of data mining techniques in diverse markets. The current study adds to the understanding regarding the variations in performance of data mining techniques across the global stock indices. We compared the performance of Neural Networks and Support Vector Machines using accuracy measures Mean Absolute Error (MAE) and R­­­­oot Mean Square Error (RMSE) across seven major stock markets. For prediction purpose, technical analysis has been employed on selected indicators based on daily values of indices spanning a period of 12 years. We created 196 data sets spanning different time periods for model building such as 1 year, 2 years, 3 years, 4 years, 6 years and 12 years for selected seven stock indices. Based on prediction models built using Neural Networks and Support Vector Machines, the findings of the study indicate there is a significant difference, both for MAE and RMSE, across the selected global indices. Also, Mean Absolute Error and Root Mean Square Error of models built using NN were greater than Mean Absolute Error and Root Mean Square Error of models built using SVM.


2017 ◽  
Author(s):  
Eelke B. Lenselink ◽  
Niels ten Dijke ◽  
Brandon Bongers ◽  
George Papadatos ◽  
Herman W.T. van Vlijmen ◽  
...  

AbstractThe increase of publicly available bioactivity data in recent years has fueled and catalyzed research in chemogenomics, data mining, and modeling approaches. As a direct result, over the past few years a multitude of different methods have been reported and evaluated, such as target fishing, nearest neighbor similarity-based methods, and Quantitative Structure Activity Relationship (QSAR)-based protocols. However, such studies are typically conducted on different datasets, using different validation strategies, and different metrics.In this study, different methods were compared using one single standardized dataset obtained from ChEMBL, which is made available to the public, using standardized metrics (BEDROC and Matthews Correlation Coefficient). Specifically, the performance of Naive Bayes, Random Forests, Support Vector Machines, Logistic Regression, and Deep Neural Networks was assessed using QSAR and proteochemometric (PCM) methods. All methods were validated using both a random split validation and a temporal validation, with the latter being a more realistic benchmark of expected prospective execution.Deep Neural Networks are the top performing classifiers, highlighting the added value of Deep Neural Networks over other more conventional methods. Moreover, the best method (‘DNN_PCM’) performed significantly better at almost one standard deviation higher than the mean performance. Furthermore, Multi task and PCM implementations were shown to improve performance over single task Deep Neural Networks. Conversely, target prediction performed almost two standard deviations under the mean performance. Random Forests, Support Vector Machines, and Logistic Regression performed around mean performance. Finally, using an ensemble of DNNs, alongside additional tuning, enhanced the relative performance by another 27% (compared with unoptimized DNN_PCM).Here, a standardized set to test and evaluate different machine learning algorithms in the context of multitask learning is offered by providing the data and the protocols.


2019 ◽  
Vol 6 (4) ◽  
pp. 12-31
Author(s):  
Özge Hüsniye Namlı Dağ

The banking sector, like other service sector, improves in accordance with the customer's needs. Therefore, to know the needs of customers and to predict customer behaviors are very important for competition in the banking sector. Data mining uncovers relationships and hidden patterns in large data sets. Classification algorithms, one of the applications of data mining, is used very effectively in decision making. In this study, the c4.5 algorithm, a decision trees algorithm widely used in classification problems, is used in an integrated way with the ensemble machine learning methods in order to increase the efficiency of the algorithms. Data obtained via direct marketing campaigns from Portugal Banks was used to classify whether customers have term deposit accounts or not. Artificial Neural Networks and Support Vector Machines as Traditional Artificial Intelligence Methods and Bagging-C4.5 and Boosted-C.45 as ensemble-decision tree hybrid methods were used in classification. Bagging-C4.5 as ensemble-decision tree algorithm achieved more powerful classification success than other used algorithms. The ensemble-decision tree hybrid methods give better results than artificial neural networks and support vector machines as traditional artificial intelligence methods for this study.


Sign in / Sign up

Export Citation Format

Share Document