A Comparison of Scaling Methods to Obtain Calibrated Probabilities of Activity for Ligand-Target Predictions

10.26434/chemrxiv.11526132.v2 ◽

2020 ◽

Author(s):

Lewis Mervin ◽

Avid M. Afzal ◽

Ola Engkvist ◽

Andreas Bender

Keyword(s):

Target Prediction ◽

Support Vector ◽

Machine Learning Method ◽

Learning Method ◽

Protein Target ◽

Bioactivity Prediction ◽

Vector Machines ◽

Scaling Methods ◽

Data Points ◽

Compound Target

In the context of bioactivity prediction, the question of how to calibrate a score produced by a machine learning method into reliable probability of binding to a protein target is not yet satisfactorily addressed. In this study, we compared the performance of three such methods, namely Platt Scaling, Isotonic Regression and Venn-ABERS in calibrating prediction scores for ligand-target prediction comprising the Naïve Bayes, Support Vector Machines and Random Forest algorithms with bioactivity data available at AstraZeneca (40 million data points (compound-target pairs) across 2112 targets). Performance was assessed using Stratified Shuffle Split (SSS) and Leave 20% of Scaffolds Out (L20SO) validation.

Get full-text (via PubEx)

A Comparison of Scaling Methods to Obtain Calibrated Probabilities of Activity for Ligand-Target Predictions

10.26434/chemrxiv.11526132.v1 ◽

2020 ◽

Author(s):

Lewis Mervin ◽

Avid M. Afzal ◽

Ola Engkvist ◽

Andreas Bender

Keyword(s):

Target Prediction ◽

Support Vector ◽

Machine Learning Method ◽

Learning Method ◽

Protein Target ◽

Bioactivity Prediction ◽

Vector Machines ◽

Scaling Methods ◽

Data Points ◽

Compound Target

In the context of bioactivity prediction, the question of how to calibrate a score produced by a machine learning method into reliable probability of binding to a protein target is not yet satisfactorily addressed. In this study, we compared the performance of three such methods, namely Platt Scaling, Isotonic Regression and Venn-ABERS in calibrating prediction scores for ligand-target prediction comprising the Naïve Bayes, Support Vector Machines and Random Forest algorithms with bioactivity data available at AstraZeneca (40 million data points (compound-target pairs) across 2112 targets). Performance was assessed using Stratified Shuffle Split (SSS) and Leave 20% of Scaffolds Out (L20SO) validation.

Get full-text (via PubEx)

Advanced Topics

Reproducible Econometrics Using R ◽

10.1093/oso/9780190900663.003.0007 ◽

2019 ◽

pp. 210-226

Author(s):

Jeffrey S. Racine

Keyword(s):

Machine Learning ◽

Support Vector Machines ◽

Kernel Regression ◽

Support Vector ◽

Machine Learning Method ◽

Learning Method ◽

Vector Machines ◽

Nonparametric Kernel ◽

Nonparametric Kernel Regression

This chapter covers two advanced topics: a machine learning method (support vector machines useful for classification) and nonparametric kernel regression.

Get full-text (via PubEx)

Evaluating Downscaling Factors of Microwave Satellite Soil Moisture Based on Machine Learning Method

Remote Sensing ◽

10.3390/rs13010133 ◽

2021 ◽

Vol 13 (1) ◽

pp. 133

Author(s):

Hao Sun ◽

Yajing Cui

Keyword(s):

Machine Learning ◽

Soil Moisture ◽

Land Surface ◽

Regional Scale ◽

Support Vector ◽

Machine Learning Method ◽

Learning Method ◽

Feed Forward Neural Network ◽

Comparative Performance ◽

Geographical Factors

Downscaling microwave remotely sensed soil moisture (SM) is an effective way to obtain spatial continuous SM with fine resolution for hydrological and agricultural applications on a regional scale. Downscaling factors and functions are two basic components of SM downscaling where the former is particularly important in the era of big data. Based on machine learning method, this study evaluated Land Surface Temperature (LST), Land surface Evaporative Efficiency (LEE), and geographical factors from Moderate Resolution Imaging Spectroradiometer (MODIS) products for downscaling SMAP (Soil Moisture Active and Passive) SM products. This study spans from 2015 to the end of 2018 and locates in the central United States. Original SMAP SM and in-situ SM at sparse networks and core validation sites were used as reference. Experiment results indicated that (1) LEE presented comparative performance with LST as downscaling factors; (2) adding geographical factors can significantly improve the performance of SM downscaling; (3) integrating LST, LEE, and geographical factors got the best performance; (4) using Z-score normalization or hyperbolic-tangent normalization methods did not change the above conclusions, neither did using support vector regression nor feed forward neural network methods. This study demonstrates the possibility of LEE as an alternative of LST for downscaling SM when there is no available LST due to cloud contamination. It also provides experimental evidence for adding geographical factors in the downscaling process.

Get full-text (via PubEx)

A Learning Method for Robust Support Vector Machines

Advances in Neural Networks – ISNN 2004 - Lecture Notes in Computer Science ◽

10.1007/978-3-540-28647-9_79 ◽

2004 ◽

pp. 474-479

Author(s):

Jun Guo ◽

Norikazu Takahashi ◽

Tetsuo Nishi

Keyword(s):

Support Vector Machines ◽

Support Vector ◽

Learning Method ◽

Vector Machines

Get full-text (via PubEx)

Research on enterprise financial economics early warning based on machine learning method

Journal of Computational Methods in Sciences and Engineering ◽

10.3233/jcm-215783 ◽

2021 ◽

pp. 1-11

Author(s):

Jian Yi

Keyword(s):

Machine Learning ◽

Early Warning ◽

Financial Economics ◽

Rapid Development ◽

Support Vector ◽

Machine Learning Method ◽

Learning Method ◽

Combined Model ◽

Svm Algorithm ◽

The Stability

The stability of the economic market is an important factor for the rapid development of the economy, especially for the listed companies, whose financial and economic stability affects the stability of the financial market. It is helpful for the healthy development of enterprises and financial markets to make an accurate early warning of the financial economy of listed enterprises. This paper briefly introduced the support vector machine (SVM) and back-propagation neural network (BPNN) algorithms in the machine learning method. To make up for the defects of the two algorithms, they were combined and applied to the enterprise financial economics early warning. A simulation experiment was carried out on the single SVM algorithm-based, single BPNN algorithm-based, and SVM algorithm and BPNN algorithm combined model with the MATLAB software. The results show that the SVM algorithm and BP algorithm combined model converges faster and has higher precision and recall rate and larger area under the curve (AUC) than the single SVM algorithm-based model and the single BPNN algorithm-based model.

Get full-text (via PubEx)

Landslide Susceptibility Mapping Using the Stacking Ensemble Machine Learning Method in Lushui, Southwest China

Applied Sciences ◽

10.3390/app10114016 ◽

2020 ◽

Vol 10 (11) ◽

pp. 4016 ◽

Cited By ~ 3

Author(s):

Xudong Hu ◽

Han Zhang ◽

Hongbo Mei ◽

Dunhui Xiao ◽

Yuanyuan Li ◽

...

Keyword(s):

Machine Learning ◽

Landslide Susceptibility ◽

Southwest China ◽

Susceptibility Mapping ◽

Landslide Susceptibility Mapping ◽

Support Vector ◽

Machine Learning Method ◽

Learning Method ◽

Statistical Measures ◽

Ensemble Machine Learning

Landslide susceptibility mapping is considered to be a prerequisite for landslide prevention and mitigation. However, delineating the spatial occurrence pattern of the landslide remains a challenge. This study investigates the potential application of the stacking ensemble learning technique for landslide susceptibility assessment. In particular, support vector machine (SVM), artificial neural network (ANN), logical regression (LR), and naive Bayes (NB) were selected as base learners for the stacking ensemble method. The resampling scheme and Pearson’s correlation analysis were jointly used to evaluate the importance level of these base learners. A total of 388 landslides and 12 conditioning factors in the Lushui area (Southwest China) were used as the dataset to develop landslide modeling. The landslides were randomly separated into two parts, with 70% used for model training and 30% used for model validation. The models’ performance was evaluated using the area under the receiver operating characteristic (ROC) curve (AUC) and statistical measures. The results showed that the stacking-based ensemble model achieved an improved predictive accuracy as compared to the single algorithms, while the SVM-ANN-NB-LR (SANL) model, the SVM-ANN-NB (SAN) model, and the ANN-NB-LR (ANL) models performed equally well, with AUC values of 0.931, 0.940, and 0.932, respectively, for validation stage. The correlation coefficient between the LR and SVM was the highest for all resampling rounds, with a value of 0.72 on average. This connotes that LR and SVM played an almost equal role when the ensemble of SANL was applied for landslide susceptibility analysis. Therefore, it is feasible to use the SAN model or the ANL model for the study area. The finding from this study suggests that the stacking ensemble machine learning method is promising for landslide susceptibility mapping in the Lushui area and is capable of targeting areas prone to landslides.

Get full-text (via PubEx)

PSO-Based Support Vector Machine with Cuckoo Search Technique for Clinical Disease Diagnoses

The Scientific World JOURNAL ◽

10.1155/2014/548483 ◽

2014 ◽

Vol 2014 ◽

pp. 1-7 ◽

Cited By ~ 19

Author(s):

Xiaoyong Liu ◽

Hui Fu

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Cuckoo Search ◽

Disease Diagnosis ◽

Support Vector ◽

Machine Learning Method ◽

Learning Method ◽

Svm Model ◽

Two Stages ◽

Best Parameters

Disease diagnosis is conducted with a machine learning method. We have proposed a novel machine learning method that hybridizes support vector machine (SVM), particle swarm optimization (PSO), and cuckoo search (CS). The new method consists of two stages: firstly, a CS based approach for parameter optimization of SVM is developed to find the better initial parameters of kernel function, and then PSO is applied to continue SVM training and find the best parameters of SVM. Experimental results indicate that the proposed CS-PSO-SVM model achieves better classification accuracy and F-measure than PSO-SVM and GA-SVM. Therefore, we can conclude that our proposed method is very efficient compared to the previously reported algorithms.

Get full-text (via PubEx)

Characterization and identification of lysine crotonylation sites based on machine learning method on both plant and mammalian

Scientific Reports ◽

10.1038/s41598-020-77173-0 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Rulan Wang ◽

Zhuo Wang ◽

Hongfei Wang ◽

Yuxuan Pang ◽

Tzong-Yi Lee

Keyword(s):

Machine Learning ◽

Predictive Performance ◽

Support Vector ◽

Machine Learning Method ◽

Learning Method ◽

Post Translational Modification ◽

Cellular Regulation ◽

Histone Protein ◽

Independent Test ◽

Small Dataset

AbstractLysine crotonylation (Kcr) is a type of protein post-translational modification (PTM), which plays important roles in a variety of cellular regulation and processes. Several methods have been proposed for the identification of crotonylation. However, most of these methods can predict efficiently only on histone or non-histone protein. Therefore, this work aims to give a more balanced performance in different species, here plant (non-histone) and mammalian (histone) are involved. SVM (support vector machine) and RF (random forest) were employed in this study. According to the results of cross-validations, the RF classifier based on EGAAC attribute achieved the best predictive performance which performs competitively good as existed methods, meanwhile more robust when dealing with imbalanced datasets. Moreover, an independent test was carried out, which compared the performance of this study and existed methods based on the same features or the same classifier. The classifiers of SVM and RF could achieve best performances with 92% sensitivity, 88% specificity, 90% accuracy, and an MCC of 0.80 in the mammalian dataset, and 77% sensitivity, 83% specificity, 70% accuracy and 0.54 MCC in a relatively small dataset of mammalian and a large-scaled plant dataset respectively. Moreover, a cross-species independent testing was also carried out in this study, which has proved the species diversity in plant and mammalian.

Get full-text (via PubEx)

An Improved Training Algorithm of Support Vector Machines Based on Three Data Points Iteration

2008 International Conference on Computer Science and Information Technology ◽

10.1109/iccsit.2008.91 ◽

2008 ◽

Cited By ~ 1

Author(s):

Li Cunhe ◽

Liu Kangwei ◽

Zhu Lina

Keyword(s):

Support Vector Machines ◽

Support Vector ◽

Training Algorithm ◽

Vector Machines ◽

Data Points

Get full-text (via PubEx)