scholarly journals Research on Parallel Support Vector Machine Based on Spark Big Data Platform

2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Yao Huimin

With the development of cloud computing and distributed cluster technology, the concept of big data has been expanded and extended in terms of capacity and value, and machine learning technology has also received unprecedented attention in recent years. Traditional machine learning algorithms cannot solve the problem of effective parallelization, so a parallelization support vector machine based on Spark big data platform is proposed. Firstly, the big data platform is designed with Lambda architecture, which is divided into three layers: Batch Layer, Serving Layer, and Speed Layer. Secondly, in order to improve the training efficiency of support vector machines on large-scale data, when merging two support vector machines, the “special points” other than support vectors are considered, that is, the points where the nonsupport vectors in one subset violate the training results of the other subset, and a cross-validation merging algorithm is proposed. Then, a parallelized support vector machine based on cross-validation is proposed, and the parallelization process of the support vector machine is realized on the Spark platform. Finally, experiments on different datasets verify the effectiveness and stability of the proposed method. Experimental results show that the proposed parallelized support vector machine has outstanding performance in speed-up ratio, training time, and prediction accuracy.

2019 ◽  
Vol 9 (16) ◽  
pp. 3322 ◽  
Author(s):  
Stephen Dankwa ◽  
Wenfeng Zheng

Machine learning (ML) is the technology that allows a computer system to learn from the environment, through re-iterative processes, and improve itself from experience. Recently, machine learning has gained massive attention across numerous fields, and is making it easy to model data extremely well, without the importance of using strong assumptions about the modeled system. The rise of machine learning has proven to better describe data as a result of providing both engineering solutions and an important benchmark. Therefore, in this current research work, we applied three different machine learning algorithms, which were, the Random Forest (RF), Support Vector Machines (SVM), and Artificial Neural Network (ANN) to predict kyphosis disease based on a biomedical data. At the initial stage of the experiments, we performed 5- and 10-Fold Cross-Validation using Logistic Regression as a baseline model to compare with our ML models without performing grid search. We then evaluated the models and compared their performances based on 5- and 10-Fold Cross-Validation after running grid search algorithms on the ML models. Among the Support Vector Machines, we experimented with the three kernels (Linear, Radial Basis Function (RBF), Polynomial). We observed overall accuracies of the models between 79%–85%, and 77%–86% based on the 5- and 10-Fold Cross-Validation, after running grid search respectively. Based on the 5- and 10-Fold Cross-Validation as evaluation metrics, the RF, SVM-RBF, and ANN models achieved accuracies more than 80%. The RF, SVM-RBF and ANN models outperformed the baseline model based on the 10-Fold Cross-Validation with grid search. Overall, in terms of accuracies, the ANN model outperformed all the other ML models, achieving 85.19% and 86.42% based on the 5- and 10-Fold Cross-Validation. We proposed that RF, SVM-RBF and ANN models should be used to detect and predict kyphosis disease after a patient had undergone surgery or operation. We suggest that machine learning should be adopted and used as an essential and critical tool across the maximum spectrum of answering biomedical questions.


2020 ◽  
Vol 22 (26) ◽  
pp. 14976-14982
Author(s):  
Anthony Tabet ◽  
Thomas Gebhart ◽  
Guanglu Wu ◽  
Charlie Readman ◽  
Merrick Pierson Smela ◽  
...  

We evaluate the ability of support-vector machines to predict the equilibrium binding constant of small molecules to cucurbit[7]uril.


2011 ◽  
Vol 230-232 ◽  
pp. 625-628
Author(s):  
Lei Shi ◽  
Xin Ming Ma ◽  
Xiao Hong Hu

E-bussiness has grown rapidly in the last decade and massive amount of data on customer purchases, browsing pattern and preferences has been generated. Classification of electronic data plays a pivotal role to mine the valuable information and thus has become one of the most important applications of E-bussiness. Support Vector Machines are popular and powerful machine learning techniques, and they offer state-of-the-art performance. Rough set theory is a formal mathematical tool to deal with incomplete or imprecise information and one of its important applications is feature selection. In this paper, rough set theory and support vector machines are combined to construct a classification model to classify the data of E-bussiness effectively.


2020 ◽  
Vol 2020 ◽  
pp. 1-7
Author(s):  
Nalindren Naicker ◽  
Timothy Adeliyi ◽  
Jeanette Wing

Educational Data Mining (EDM) is a rich research field in computer science. Tools and techniques in EDM are useful to predict student performance which gives practitioners useful insights to develop appropriate intervention strategies to improve pass rates and increase retention. The performance of the state-of-the-art machine learning classifiers is very much dependent on the task at hand. Investigating support vector machines has been used extensively in classification problems; however, the extant of literature shows a gap in the application of linear support vector machines as a predictor of student performance. The aim of this study was to compare the performance of linear support vector machines with the performance of the state-of-the-art classical machine learning algorithms in order to determine the algorithm that would improve prediction of student performance. In this quantitative study, an experimental research design was used. Experiments were set up using feature selection on a publicly available dataset of 1000 alpha-numeric student records. Linear support vector machines benchmarked with ten categorical machine learning algorithms showed superior performance in predicting student performance. The results of this research showed that features like race, gender, and lunch influence performance in mathematics whilst access to lunch was the primary factor which influences reading and writing performance.


2010 ◽  
Vol 07 (01) ◽  
pp. 59-80
Author(s):  
D. CHENG ◽  
S. Q. XIE ◽  
E. HÄMMERLE

Local descriptor matching is the most overlooked stage of the three stages of the local descriptor process, and this paper proposes a new method for matching local descriptors based on support vector machines. Results from experiments show that the developed method is more robust for matching local descriptors for all image transformations considered. The method is able to be integrated with different local descriptor methods, and with different machine learning algorithms and this shows that the approach is sufficiently robust and versatile.


2021 ◽  
Author(s):  
Igor Miranda ◽  
Gildeberto Cardoso ◽  
Madhurananda Pahar ◽  
Gabriel Oliveira ◽  
Thomas Niesler

Predicting the need for hospitalization due to COVID-19 may help patients to seek timely treatment and assist health professionals to monitor cases and allocate resources. We investigate the use of machine learning algorithms to predict the risk of hospitalization due to COVID-19 using the patient's medical history and self-reported symptoms, regardless of the period in which they occurred. Three datasets containing information regarding 217,580 patients from three different states in Brazil have been used. Decision trees, neural networks, and support vector machines were evaluated, achieving accuracies between 79.1% to 84.7%. Our analysis shows that better performance is achieved in Brazilian states ranked more highly in terms of the official human development index (HDI), suggesting that health facilities with better infrastructure generate data that is less noisy. One of the models developed in this study has been incorporated into a mobile app that is available for public use.


Author(s):  
A.L. Kulikov ◽  
D.I. Bezdushnyi

The development of present-day power systems is associated with the wide use of digital technologies and intelligent algorithms in control and protection systems. It opens up new opportunities to improve relay protection and automation hardware and develop its design principles. Simulation modeling becomes a new tool not only for studying power systems operation but also for designing new relay protection methods. The use of simulation modeling in combination with machine learning algorithms makes it possible to create fundamentally new types of digital relay protections adaptable to a specific protected facility and able to use all the available current and voltage measurements to the fullest extent possible. Machine learning also allows developing auxiliary selective elements for improving the basic characteristics of existing relay protection algorithms such as selectivity, sensitivity, and speed of operation. The paper considers an example of designing an auxiliary element to provide selectivity in the backup zone of distance protection. The problem is solved using one of the most widely known machine learning techniques, i.e., the method of support vector machines (SVM).


2020 ◽  
Author(s):  
Mauricio Alberto Ortega-Ruíz ◽  
Cefa Karabağ ◽  
Victor García Garduño ◽  
Constantino Carlos Reyes-Aldasoro

AbstractThis paper describes a method for residual tumour cellularity (TC) estimation in Neoadjuvant treatment (NAT) of advanced breast cancer. This is determined manually by visual inspection by a radiologist, then an automated computation will contribute to reduce time workload and increase precision and accuracy. TC is estimated as the ratio of tumour area by total image area estimated after the NAT. The method proposed computes TC by using machine learning techniques trained with information on morphological parameters of segmented nuclei in order to classify regions of the image as tumour or normal. The data is provided by the 2019 SPIE Breast challenge, which was proposed to develop automated TC computation algorithms. Three algorithms were implemented: Support Vector Machines, Nearest K-means and Adaptive Boosting (AdaBoost) decision trees. Performance based on accuracy is compared and evaluated and the best result was obtained with Support Vector Machines. Results obtained by the methods implemented were submitted during ongoing challenge with a maximum of 0.76 of prediction probability of success.


Author(s):  
Goutham Cheedella

Handwritten Digit Recognition is probably one of the most exciting works in the field of science and technology as it is a hard task for the machines to recognize the digits which are written by different people. The handwritten digits may not be perfect and also consist of different flavors. And there is a necessity for handwritten digit recognition in many real-time purposes. The widely used MNIST dataset consists of almost 60000 handwritten digits. And to classify these kinds of images, many machine learning algorithms are used. This paper presents an in-depth analysis of accuracies and performances of Support Vector Machines (SVM), Neural Networks (NN), Decision Tree (DT) algorithms using Microsoft Azure ML Studio.


Sign in / Sign up

Export Citation Format

Share Document