Advanced Topics

Author(s):  
Jeffrey S. Racine

This chapter covers two advanced topics: a machine learning method (support vector machines useful for classification) and nonparametric kernel regression.

2020 ◽  
Author(s):  
Lewis Mervin ◽  
Avid M. Afzal ◽  
Ola Engkvist ◽  
Andreas Bender

In the context of bioactivity prediction, the question of how to calibrate a score produced by a machine learning method into reliable probability of binding to a protein target is not yet satisfactorily addressed. In this study, we compared the performance of three such methods, namely Platt Scaling, Isotonic Regression and Venn-ABERS in calibrating prediction scores for ligand-target prediction comprising the Naïve Bayes, Support Vector Machines and Random Forest algorithms with bioactivity data available at AstraZeneca (40 million data points (compound-target pairs) across 2112 targets). Performance was assessed using Stratified Shuffle Split (SSS) and Leave 20% of Scaffolds Out (L20SO) validation.


2021 ◽  
Vol 13 (1) ◽  
pp. 133
Author(s):  
Hao Sun ◽  
Yajing Cui

Downscaling microwave remotely sensed soil moisture (SM) is an effective way to obtain spatial continuous SM with fine resolution for hydrological and agricultural applications on a regional scale. Downscaling factors and functions are two basic components of SM downscaling where the former is particularly important in the era of big data. Based on machine learning method, this study evaluated Land Surface Temperature (LST), Land surface Evaporative Efficiency (LEE), and geographical factors from Moderate Resolution Imaging Spectroradiometer (MODIS) products for downscaling SMAP (Soil Moisture Active and Passive) SM products. This study spans from 2015 to the end of 2018 and locates in the central United States. Original SMAP SM and in-situ SM at sparse networks and core validation sites were used as reference. Experiment results indicated that (1) LEE presented comparative performance with LST as downscaling factors; (2) adding geographical factors can significantly improve the performance of SM downscaling; (3) integrating LST, LEE, and geographical factors got the best performance; (4) using Z-score normalization or hyperbolic-tangent normalization methods did not change the above conclusions, neither did using support vector regression nor feed forward neural network methods. This study demonstrates the possibility of LEE as an alternative of LST for downscaling SM when there is no available LST due to cloud contamination. It also provides experimental evidence for adding geographical factors in the downscaling process.


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Yao Huimin

With the development of cloud computing and distributed cluster technology, the concept of big data has been expanded and extended in terms of capacity and value, and machine learning technology has also received unprecedented attention in recent years. Traditional machine learning algorithms cannot solve the problem of effective parallelization, so a parallelization support vector machine based on Spark big data platform is proposed. Firstly, the big data platform is designed with Lambda architecture, which is divided into three layers: Batch Layer, Serving Layer, and Speed Layer. Secondly, in order to improve the training efficiency of support vector machines on large-scale data, when merging two support vector machines, the “special points” other than support vectors are considered, that is, the points where the nonsupport vectors in one subset violate the training results of the other subset, and a cross-validation merging algorithm is proposed. Then, a parallelized support vector machine based on cross-validation is proposed, and the parallelization process of the support vector machine is realized on the Spark platform. Finally, experiments on different datasets verify the effectiveness and stability of the proposed method. Experimental results show that the proposed parallelized support vector machine has outstanding performance in speed-up ratio, training time, and prediction accuracy.


PLoS ONE ◽  
2021 ◽  
Vol 16 (10) ◽  
pp. e0257901
Author(s):  
Yanjing Bi ◽  
Chao Li ◽  
Yannick Benezeth ◽  
Fan Yang

Phoneme pronunciations are usually considered as basic skills for learning a foreign language. Practicing the pronunciations in a computer-assisted way is helpful in a self-directed or long-distance learning environment. Recent researches indicate that machine learning is a promising method to build high-performance computer-assisted pronunciation training modalities. Many data-driven classifying models, such as support vector machines, back-propagation networks, deep neural networks and convolutional neural networks, are increasingly widely used for it. Yet, the acoustic waveforms of phoneme are essentially modulated from the base vibrations of vocal cords, and this fact somehow makes the predictors collinear, distorting the classifying models. A commonly-used solution to address this issue is to suppressing the collinearity of predictors via partial least square regressing algorithm. It allows to obtain high-quality predictor weighting results via predictor relationship analysis. However, as a linear regressor, the classifiers of this type possess very simple topology structures, constraining the universality of the regressors. For this issue, this paper presents an heterogeneous phoneme recognition framework which can further benefit the phoneme pronunciation diagnostic tasks by combining the partial least square with support vector machines. A French phoneme data set containing 4830 samples is established for the evaluation experiments. The experiments of this paper demonstrates that the new method improves the accuracy performance of the phoneme classifiers by 0.21 − 8.47% comparing to state-of-the-arts with different data training data density.


Author(s):  
Jian Yi

The stability of the economic market is an important factor for the rapid development of the economy, especially for the listed companies, whose financial and economic stability affects the stability of the financial market. It is helpful for the healthy development of enterprises and financial markets to make an accurate early warning of the financial economy of listed enterprises. This paper briefly introduced the support vector machine (SVM) and back-propagation neural network (BPNN) algorithms in the machine learning method. To make up for the defects of the two algorithms, they were combined and applied to the enterprise financial economics early warning. A simulation experiment was carried out on the single SVM algorithm-based, single BPNN algorithm-based, and SVM algorithm and BPNN algorithm combined model with the MATLAB software. The results show that the SVM algorithm and BP algorithm combined model converges faster and has higher precision and recall rate and larger area under the curve (AUC) than the single SVM algorithm-based model and the single BPNN algorithm-based model.


Sign in / Sign up

Export Citation Format

Share Document