scholarly journals Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests

2011 ◽  
Vol 4 (1) ◽  
Author(s):  
João Maroco ◽  
Dina Silva ◽  
Ana Rodrigues ◽  
Manuela Guerreiro ◽  
Isabel Santana ◽  
...  
2017 ◽  
Author(s):  
Eelke B. Lenselink ◽  
Niels ten Dijke ◽  
Brandon Bongers ◽  
George Papadatos ◽  
Herman W.T. van Vlijmen ◽  
...  

AbstractThe increase of publicly available bioactivity data in recent years has fueled and catalyzed research in chemogenomics, data mining, and modeling approaches. As a direct result, over the past few years a multitude of different methods have been reported and evaluated, such as target fishing, nearest neighbor similarity-based methods, and Quantitative Structure Activity Relationship (QSAR)-based protocols. However, such studies are typically conducted on different datasets, using different validation strategies, and different metrics.In this study, different methods were compared using one single standardized dataset obtained from ChEMBL, which is made available to the public, using standardized metrics (BEDROC and Matthews Correlation Coefficient). Specifically, the performance of Naive Bayes, Random Forests, Support Vector Machines, Logistic Regression, and Deep Neural Networks was assessed using QSAR and proteochemometric (PCM) methods. All methods were validated using both a random split validation and a temporal validation, with the latter being a more realistic benchmark of expected prospective execution.Deep Neural Networks are the top performing classifiers, highlighting the added value of Deep Neural Networks over other more conventional methods. Moreover, the best method (‘DNN_PCM’) performed significantly better at almost one standard deviation higher than the mean performance. Furthermore, Multi task and PCM implementations were shown to improve performance over single task Deep Neural Networks. Conversely, target prediction performed almost two standard deviations under the mean performance. Random Forests, Support Vector Machines, and Logistic Regression performed around mean performance. Finally, using an ensemble of DNNs, alongside additional tuning, enhanced the relative performance by another 27% (compared with unoptimized DNN_PCM).Here, a standardized set to test and evaluate different machine learning algorithms in the context of multitask learning is offered by providing the data and the protocols.


2013 ◽  
Vol 19 (3) ◽  
pp. 177 ◽  
Author(s):  
Lily Tapak ◽  
Hossein Mahjub ◽  
Omid Hamidi ◽  
Jalal Poorolajal

Author(s):  
Zhao Yang Dong ◽  
Tapan Kumar Saha ◽  
Kit Po Wong

This chapter introduces advanced techniques such as artificial neural networks, wavelet decomposition, support vector machines, and data-mining techniques in electricity market demand and price forecasts. It argues that various techniques can offer different advantages in providing satisfactory demand and price signal forecast results for a deregulated electricity market, depending on the specific needs in forecasting. Furthermore, the authors hope that an understanding of these techniques and their application will help the reader to form a comprehensive view of electricity market data analysis needs, not only for the traditional time-series based forecast, but also the new correlation-based, price spike analysis.


Author(s):  
M Perzyk ◽  
R Biernacki ◽  
J Kozlowski

Determination of the most significant manufacturing process parameters using collected past data can be very helpful in solving important industrial problems, such as the detection of root causes of deteriorating product quality, the selection of the most efficient parameters to control the process, and the prediction of breakdowns of machines, equipment, etc. A methodology of determination of relative significances of process variables and possible interactions between them, based on interrogations of generalized regression models, is proposed and tested. The performance of several types of data mining tool, such as artificial neural networks, support vector machines, regression trees, classification trees, and a naïve Bayesian classifier, is compared. Also, some simple non-parametric statistical methods, based on an analysis of variance (ANOVA) and contingency tables, are evaluated for comparison purposes. The tests were performed using simulated data sets, with assumed hidden relationships, as well as on real data collected in the foundry industry. It was found that the performance of significance and interaction factors obtained from regression models, and, in particular, neural networks, is satisfactory, while the other methods appeared to be less accurate and/or less reliable.


Sign in / Sign up

Export Citation Format

Share Document