Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests

AbstractThe increase of publicly available bioactivity data in recent years has fueled and catalyzed research in chemogenomics, data mining, and modeling approaches. As a direct result, over the past few years a multitude of different methods have been reported and evaluated, such as target fishing, nearest neighbor similarity-based methods, and Quantitative Structure Activity Relationship (QSAR)-based protocols. However, such studies are typically conducted on different datasets, using different validation strategies, and different metrics.In this study, different methods were compared using one single standardized dataset obtained from ChEMBL, which is made available to the public, using standardized metrics (BEDROC and Matthews Correlation Coefficient). Specifically, the performance of Naive Bayes, Random Forests, Support Vector Machines, Logistic Regression, and Deep Neural Networks was assessed using QSAR and proteochemometric (PCM) methods. All methods were validated using both a random split validation and a temporal validation, with the latter being a more realistic benchmark of expected prospective execution.Deep Neural Networks are the top performing classifiers, highlighting the added value of Deep Neural Networks over other more conventional methods. Moreover, the best method (‘DNN_PCM’) performed significantly better at almost one standard deviation higher than the mean performance. Furthermore, Multi task and PCM implementations were shown to improve performance over single task Deep Neural Networks. Conversely, target prediction performed almost two standard deviations under the mean performance. Random Forests, Support Vector Machines, and Logistic Regression performed around mean performance. Finally, using an ensemble of DNNs, alongside additional tuning, enhanced the relative performance by another 27% (compared with unoptimized DNN_PCM).Here, a standardized set to test and evaluate different machine learning algorithms in the context of multitask learning is offered by providing the data and the protocols.

Download Full-text

Spatial prediction of permafrost occurrence in Sikkim Himalayas using logistic regression, random forests, support vector machines and neural networks

Geomorphology ◽

10.1016/j.geomorph.2020.107331 ◽

2020 ◽

Vol 371 ◽

pp. 107331

Author(s):

Prashant Baral ◽

M. Anul Haq

Keyword(s):

Neural Networks ◽

Logistic Regression ◽

Support Vector Machines ◽

Random Forests ◽

Spatial Prediction ◽

Support Vector ◽

Vector Machines

Download Full-text

Development of a model for trauma outcome prediction: a real-data comparison of artificial neural networks, logistic regression and data mining techniques

International Journal of Biomedical Engineering and Technology ◽

10.1504/ijbet.2012.049327 ◽

2012 ◽

Vol 10 (1) ◽

pp. 84

Author(s):

C. Koukouvinos ◽

C. Parpoula

Keyword(s):

Data Mining ◽

Neural Networks ◽

Logistic Regression ◽

Artificial Neural Networks ◽

Outcome Prediction ◽

Real Data ◽

Data Mining Techniques ◽

Data Comparison ◽

Trauma Outcome ◽

Artificial Neural

Download Full-text

Real-Data Comparison of Data Mining Methods in Prediction of Diabetes in Iran

Healthcare Informatics Research ◽

10.4258/hir.2013.19.3.177 ◽

2013 ◽

Vol 19 (3) ◽

pp. 177 ◽

Cited By ~ 30

Author(s):

Lily Tapak ◽

Hossein Mahjub ◽

Omid Hamidi ◽

Jalal Poorolajal

Keyword(s):

Data Mining ◽

Real Data ◽

Data Comparison ◽

Mining Methods ◽

Prediction Of Diabetes

Download Full-text

1463: Comparative Analysis of Logistic Regression, Support Vector Machines and Artificial Neural Networks for 3D Power Doppler Imaging of Solid Breast Tumors

Ultrasound in Medicine & Biology ◽

10.1016/j.ultrasmedbio.2009.06.849 ◽

2009 ◽

Vol 35 (8) ◽

pp. S225

Author(s):

Hung-Ting Lin ◽

Yu-Fen Wang ◽

Shou-Tung Chen ◽

Dar-Ren Chen ◽

Shou-Tung Chen ◽

...

Keyword(s):

Neural Networks ◽

Logistic Regression ◽

Artificial Neural Networks ◽

Power Doppler ◽

Breast Tumors ◽

Doppler Imaging ◽

Support Vector ◽

3D Power Doppler ◽

Vector Machines ◽

Power Doppler Imaging

Download Full-text

Artificial Intelligence in Electricity Market Operations and Management

Intelligent Information Technologies ◽

10.4018/978-1-59904-941-0.ch105 ◽

2011 ◽

pp. 1821-1840

Author(s):

Zhao Yang Dong ◽

Tapan Kumar Saha ◽

Kit Po Wong

Keyword(s):

Artificial Intelligence ◽

Data Mining ◽

Neural Networks ◽

Time Series ◽

Electricity Market ◽

Support Vector ◽

Market Demand ◽

Deregulated Electricity Market ◽

Vector Machines ◽

Spike Analysis

This chapter introduces advanced techniques such as artificial neural networks, wavelet decomposition, support vector machines, and data-mining techniques in electricity market demand and price forecasts. It argues that various techniques can offer different advantages in providing satisfactory demand and price signal forecast results for a deregulated electricity market, depending on the specific needs in forecasting. Furthermore, the authors hope that an understanding of these techniques and their application will help the reader to form a comprehensive view of electricity market data analysis needs, not only for the traditional time-series based forecast, but also the new correlation-based, price spike analysis.

Download Full-text

Data mining in manufacturing: Significance analysis of process parameters

Proceedings of the Institution of Mechanical Engineers Part B Journal of Engineering Manufacture ◽

10.1243/09544054jem1182 ◽

2008 ◽

Vol 222 (11) ◽

pp. 1503-1516 ◽

Cited By ~ 18

Author(s):

M Perzyk ◽

R Biernacki ◽

J Kozlowski

Keyword(s):

Data Mining ◽

Neural Networks ◽

Process Parameters ◽

Regression Models ◽

Simulated Data ◽

Real Data ◽

Support Vector ◽

Data Sets ◽

Interaction Factors

Determination of the most significant manufacturing process parameters using collected past data can be very helpful in solving important industrial problems, such as the detection of root causes of deteriorating product quality, the selection of the most efficient parameters to control the process, and the prediction of breakdowns of machines, equipment, etc. A methodology of determination of relative significances of process variables and possible interactions between them, based on interrogations of generalized regression models, is proposed and tested. The performance of several types of data mining tool, such as artificial neural networks, support vector machines, regression trees, classification trees, and a naïve Bayesian classifier, is compared. Also, some simple non-parametric statistical methods, based on an analysis of variance (ANOVA) and contingency tables, are evaluated for comparison purposes. The tests were performed using simulated data sets, with assumed hidden relationships, as well as on real data collected in the foundry industry. It was found that the performance of significance and interaction factors obtained from regression models, and, in particular, neural networks, is satisfactory, while the other methods appeared to be less accurate and/or less reliable.

Download Full-text