Evaluation of Different Machine Learning Methods and Deep-Learning Convolutional Neural Networks for Landslide Detection

Machine learning methods are attracting considerable attention from the pharmaceutical industry for use in drug discovery and applications beyond. In recent studies we have applied multiple machine learning algorithms, modeling metrics and in some cases compared molecular descriptors to build models for individual targets or properties on a relatively small scale. Several research groups have used large numbers of datasets from public databases such as ChEMBL in order to evaluate machine learning methods of interest to them. The largest of these types of studies used on the order of 1400 datasets. We have now extracted well over 5000 datasets from CHEMBL for use with the ECFP6 fingerprint and comparison of our proprietary software Assay CentralTM with random forest, k-Nearest Neighbors, support vector classification, naïve Bayesian, AdaBoosted decision trees, and deep neural networks (3 levels). Model performance <a>was</a> assessed using an array of five-fold cross-validation metrics including area-under-the-curve, F1 score, Cohen’s kappa and Matthews correlation coefficient. <a>Based on ranked normalized scores for the metrics or datasets all methods appeared comparable while the distance from the top indicated Assay CentralTM and support vector classification were comparable. </a>Unlike prior studies which have placed considerable emphasis on deep neural networks (deep learning), no advantage was seen in this case where minimal tuning was performed of any of the methods. If anything, Assay CentralTM may have been at a slight advantage as the activity cutoff for each of the over 5000 datasets representing over 570,000 unique compounds was based on Assay CentralTMperformance, but support vector classification seems to be a strong competitor. We also apply Assay CentralTM to prospective predictions for PXR and hERG to further validate these models. This work currently appears to be the largest comparison of machine learning algorithms to date. Future studies will likely evaluate additional databases, descriptors and algorithms, as well as further refining methods for evaluating and comparing models.

Download Full-text

A Very Large-Scale Bioactivity Comparison of Deep Learning and Multiple Machine Learning Algorithms for Drug Discovery

10.26434/chemrxiv.12781241.v1 ◽

2020 ◽

Author(s):

Thomas R. Lane ◽

Daniel H. Foil ◽

Eni Minerali ◽

Fabio Urbina ◽

Kimberley M. Zorn ◽

...

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Deep Learning ◽

Drug Discovery ◽

Deep Neural Networks ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Learning Methods ◽

Machine Learning Methods

Machine learning methods are attracting considerable attention from the pharmaceutical industry for use in drug discovery and applications beyond. In recent studies we have applied multiple machine learning algorithms, modeling metrics and in some cases compared molecular descriptors to build models for individual targets or properties on a relatively small scale. Several research groups have used large numbers of datasets from public databases such as ChEMBL in order to evaluate machine learning methods of interest to them. The largest of these types of studies used on the order of 1400 datasets. We have now extracted well over 5000 datasets from CHEMBL for use with the ECFP6 fingerprint and comparison of our proprietary software Assay CentralTM with random forest, k-Nearest Neighbors, support vector classification, naïve Bayesian, AdaBoosted decision trees, and deep neural networks (3 levels). Model performance <a>was</a> assessed using an array of five-fold cross-validation metrics including area-under-the-curve, F1 score, Cohen’s kappa and Matthews correlation coefficient. <a>Based on ranked normalized scores for the metrics or datasets all methods appeared comparable while the distance from the top indicated Assay CentralTM and support vector classification were comparable. </a>Unlike prior studies which have placed considerable emphasis on deep neural networks (deep learning), no advantage was seen in this case where minimal tuning was performed of any of the methods. If anything, Assay CentralTM may have been at a slight advantage as the activity cutoff for each of the over 5000 datasets representing over 570,000 unique compounds was based on Assay CentralTMperformance, but support vector classification seems to be a strong competitor. We also apply Assay CentralTM to prospective predictions for PXR and hERG to further validate these models. This work currently appears to be the largest comparison of machine learning algorithms to date. Future studies will likely evaluate additional databases, descriptors and algorithms, as well as further refining methods for evaluating and comparing models.

Download Full-text

Detecting a keystone species European aspen in boreal forests with airborne hyperspectral, LiDAR and UAV data with machine learning methods

10.5194/egusphere-egu21-16273 ◽

2021 ◽

Author(s):

Timo Kumpula ◽

Janne Mäyrä ◽

Anton Kuzmin ◽

Arto Viinikka ◽

Sonja Kivinen ◽

...

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Deep Learning ◽

High Resolution ◽

Boreal Forests ◽

Tree Level ◽

Support Vector ◽

Learning Methods ◽

Machine Learning Methods ◽

European Aspen

Sustainable forest management increasingly highlights the maintenance of biological diversity and requires up-to-date information on the occurrence and distribution of key ecological features in forest environments. Different proxy variables indicating species richness and quality of the sites are essential for efficient detecting and monitoring forest biodiversity. European aspen (Populus tremula L.) is a minor deciduous tree species with a high importance in maintaining biodiversity in boreal forests. Large aspen trees host hundreds of species, many of them classified as threatened. However, accurate fine-scale spatial data on aspen occurrence remains scarce and incomprehensive.&#160;We studied detection of aspen using different remote sensing techniques in Evo, southern Finland. Our study area of 83 km2 contains both managed and protected southern boreal forests characterized by Scots pine (Pinus sylvestris L.), Norway spruce (Picea abies (L.) Karst), and birch (Betula pendula and pubescens L.), whereas European aspen has a relatively sparse and scattered occurrence in the area. We collected high-resolution airborne hyperspectral and airborne laser scanning data covering the whole study area and ultra-high resolution unmanned aerial vehicle (UAV) data with RGB and multispectral sensors from selected parts of the area. We tested the discrimination of aspen from other species at tree level using different machine learning methods (Support Vector Machines, Random Forest, Gradient Boosting Machine) and deep learning methods (3D convolutional neural networks).&#160;Airborne hyperspectral and lidar data gave excellent results with machine learning and deep learning classification methods The highest classification accuracies for aspen varied between 91-92% (F1-score). The most important wavelengths for discriminating aspen from other species included reflectance bands of red edge range (724&#8211;727 nm) and shortwave infrared (1520&#8211;1564 nm and 1684&#8211;1706 nm) (Viinikka et al. 2020; M&#228;yr&#228; et al 2021). Aspen detection using RGB and multispectral data also gave good results (highest F1-score of aspen = 87%) (Kuzmin et al 2021). Different remote sensing data enabled production of a spatially explicit map of aspen occurrence in the study area. Information on aspen occurrence and abundance can significantly contribute to biodiversity management and conservation efforts in boreal forests. Our results can be further utilized in upscaling efforts aiming at aspen detection over larger geographical areas using satellite images.

Download Full-text

Black Box Machine-Learning Methods: Neural Networks and Support Vector Machines

Data Science and Predictive Analytics ◽

10.1007/978-3-319-72347-1_11 ◽

2018 ◽

pp. 383-422 ◽

Cited By ~ 1

Author(s):

Ivo D. Dinov

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Support Vector Machines ◽

Black Box ◽

Support Vector ◽

Learning Methods ◽

Machine Learning Methods ◽

Vector Machines

Download Full-text

Single-Cell Phenotype Classification Using Deep Convolutional Neural Networks

CrossRef Listing of Deleted DOIs ◽

10.1177/1087057116631284 ◽

2016 ◽

Vol 21 (9) ◽

pp. 998-1003 ◽

Cited By ~ 42

Author(s):

Oliver Dürr ◽

Beate Sick

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Deep Learning ◽

Single Cell ◽

Convolutional Neural Networks ◽

State Of The Art ◽

Misclassification Rate ◽

Support Vector ◽

Learning Methods ◽

Phenotype Classification

Deep learning methods are currently outperforming traditional state-of-the-art computer vision algorithms in diverse applications and recently even surpassed human performance in object recognition. Here we demonstrate the potential of deep learning methods to high-content screening–based phenotype classification. We trained a deep learning classifier in the form of convolutional neural networks with approximately 40,000 publicly available single-cell images from samples treated with compounds from four classes known to lead to different phenotypes. The input data consisted of multichannel images. The construction of appropriate feature definitions was part of the training and carried out by the convolutional network, without the need for expert knowledge or handcrafted features. We compare our results against the recent state-of-the-art pipeline in which predefined features are extracted from each cell using specialized software and then fed into various machine learning algorithms (support vector machine, Fisher linear discriminant, random forest) for classification. The performance of all classification approaches is evaluated on an untouched test image set with known phenotype classes. Compared to the best reference machine learning algorithm, the misclassification rate is reduced from 8.9% to 6.6%.

Download Full-text

Deep learning accurately predicts estrogen receptor status in breast cancer metabolomics data

10.1101/214254 ◽

2017 ◽

Author(s):

Fadhl M Alakwaa ◽

Kumardeep Chaudhary ◽

Lana X Garmire

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Estrogen Receptor ◽

Deep Learning ◽

Support Vector ◽

Integrated Analysis ◽

Learning Method ◽

Learning Methods ◽

Metabolomics Data ◽

Machine Learning Methods

ABSTRACTMetabolomics holds the promise as a new technology to diagnose highly heterogeneous diseases. Conventionally, metabolomics data analysis for diagnosis is done using various statistical and machine learning based classification methods. However, it remains unknown if deep neural network, a class of increasingly popular machine learning methods, is suitable to classify metabolomics data. Here we use a cohort of 271 breast cancer tissues, 204 positive estrogen receptor (ER+) and 67 negative estrogen receptor (ER-), to test the accuracies of autoencoder, a deep learning (DL) framework, as well as six widely used machine learning models, namely Random Forest (RF), Support Vector Machines (SVM), Recursive Partitioning and Regression Trees (RPART), Linear Discriminant Analysis (LDA), Prediction Analysis for Microarrays (PAM), and Generalized Boosted Models (GBM). DL framework has the highest area under the curve (AUC) of 0.93 in classifying ER+/ER-patients, compared to the other six machine learning algorithms. Furthermore, the biological interpretation of the first hidden layer reveals eight commonly enriched significant metabolomics pathways (adjusted P-value<0.05) that cannot be discovered by other machine learning methods. Among them, protein digestion & absorption and ATP-binding cassette (ABC) transporters pathways are also confirmed in integrated analysis between metabolomics and gene expression data in these samples. In summary, deep learning method shows advantages for metabolomics based breast cancer ER status classification, with both the highest prediction accurcy (AUC=0.93) and better revelation of disease biology. We encourage the adoption of autoencoder based deep learning method in the metabolomics research community for classification.

Download Full-text

Machine Learning Methods for Prediction of Food Effects on Bioavailability: A Comparison of Support Vector Machines and Artificial Neural Networks

European Journal of Pharmaceutical Sciences ◽

10.1016/j.ejps.2021.106018 ◽

2021 ◽

pp. 106018

Author(s):

Harriet Bennett-Lenane ◽

Brendan T. Griffin ◽

Joseph P. O'Shea

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Artificial Neural Networks ◽

Support Vector Machines ◽

Support Vector ◽

Food Effects ◽

Learning Methods ◽

Machine Learning Methods ◽

Vector Machines ◽

Artificial Neural

Download Full-text

Advanced machine learning methods in psychiatry: an introduction

General Psychiatry ◽

10.1136/gpsych-2020-100197 ◽

2020 ◽

Vol 33 (2) ◽

pp. e100197 ◽

Cited By ~ 2

Author(s):

Tsung-Chin Wu ◽

Zhirou Zhou ◽

Hongyue Wang ◽

Bokai Wang ◽

Tuo Lin ◽

...

Keyword(s):

Mental Health ◽

Machine Learning ◽

Neural Networks ◽

Artificial Neural Networks ◽

Support Vector Machines ◽

Support Vector ◽

Learning Methods ◽

Machine Learning Methods ◽

Vector Machines ◽

Artificial Neural

Mental health questions can be tackled through machine learning (ML) techniques. Apart from the two ML methods we introduced in our previous paper, we discuss two more advanced ML approaches in this paper: support vector machines and artificial neural networks. To illustrate how these ML methods have been employed in mental health, recent research applications in psychiatry were reported.

Download Full-text

A stacking ensemble deep learning approach to cancer type classification based on TCGA data

Scientific Reports ◽

10.1038/s41598-021-95128-x ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Mohanad Mohammed ◽

Henry Mwambi ◽

Innocent B. Mboya ◽

Murtada K. Elbashir ◽

Bernard Omolo

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Feature Selection Method ◽

Morphological Characteristics ◽

Support Vector ◽

Cancer Type ◽

Learning Methods ◽

Machine Learning Methods ◽

Proposed Model ◽

Significant Difference

AbstractCancer tumor classification based on morphological characteristics alone has been shown to have serious limitations. Breast, lung, colorectal, thyroid, and ovarian are the most commonly diagnosed cancers among women. Precise classification of cancers into their types is considered a vital problem for cancer diagnosis and therapy. In this paper, we proposed a stacking ensemble deep learning model based on one-dimensional convolutional neural network (1D-CNN) to perform a multi-class classification on the five common cancers among women based on RNASeq data. The RNASeq gene expression data was downloaded from Pan-Cancer Atlas using GDCquery function of the TCGAbiolinks package in the R software. We used least absolute shrinkage and selection operator (LASSO) as feature selection method. We compared the results of the new proposed model with and without LASSO with the results of the single 1D-CNN and machine learning methods which include support vector machines with radial basis function, linear, and polynomial kernels; artificial neural networks; k-nearest neighbors; bagging trees. The results show that the proposed model with and without LASSO has a better performance compared to other classifiers. Also, the results show that the machine learning methods (SVM-R, SVM-L, SVM-P, ANN, KNN, and bagging trees) with under-sampling have better performance than with over-sampling techniques. This is supported by the statistical significance test of accuracy where the p-values for differences between the SVM-R and SVM-P, SVM-R and ANN, SVM-R and KNN are found to be p = 0.003, p = < 0.001, and p = < 0.001, respectively. Also, SVM-L had a significant difference compared to ANN p = 0.009. Moreover, SVM-P and ANN, SVM-P and KNN are found to be significantly different with p-values p = < 0.001 and p = < 0.001, respectively. In addition, ANN and bagging trees, ANN and KNN were found to be significantly different with p-values p = < 0.001 and p = 0.004, respectively. Thus, the proposed model can help in the early detection and diagnosis of cancer in women, and hence aid in designing early treatment strategies to improve survival.

Download Full-text

Modeling Traders’ Behavior with Deep Learning and Machine Learning Methods: Evidence from BIST 100 Index

Complexity ◽

10.1155/2020/8285149 ◽

2020 ◽

Vol 2020 ◽

pp. 1-16

Author(s):

Afan Hasan ◽

Oya Kalıpsız ◽

Selim Akyokuş

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Financial Market ◽

Performance Metrics ◽

Confusion Matrix ◽

Support Vector ◽

Human Beings ◽

Learning Methods ◽

Technical Indicators ◽

Machine Learning Methods

Although the vast majority of fundamental analysts believe that technical analysts’ estimates and technical indicators used in these analyses are unresponsive, recent research has revealed that both professionals and individual traders are using technical indicators. A correct estimate of the direction of the financial market is a very challenging activity, primarily due to the nonlinear nature of the financial time series. Deep learning and machine learning methods on the other hand have achieved very successful results in many different areas where human beings are challenged. In this study, technical indicators were integrated into the methods of deep learning and machine learning, and the behavior of the traders was modeled in order to increase the accuracy of forecasting of the financial market direction. A set of technical indicators has been examined based on their application in technical analysis as input features to predict the oncoming (one-period-ahead) direction of Istanbul Stock Exchange (BIST100) national index. To predict the direction of the index, Deep Neural Network (DNN), Support Vector Machine (SVM), Random Forest (RF), and Logistic Regression (LR) classification techniques are used. The performance of these models is evaluated on the basis of various performance metrics such as confusion matrix, compound return, and max drawdown.

Download Full-text