Support vector machines for big data analysis

With the development of increasingly advanced information technology and electronic technology, especially with regard to physical information systems, cloud computing systems, and social services, big data will be widely visible, creating benefits for people and at the same time facing huge challenges. In addition, with the advent of the era of big data, the scale of data sets is getting larger and larger. Traditional data analysis methods can no longer solve the problem of large-scale data sets, and the hidden information behind big data is digging out, especially in the field of e-commerce. We have become a key factor in competition among enterprises. We use a support vector machine method based on parallel computing to analyze the data. First, the training samples are divided into several working subsets through the SOM self-organizing neural network classification method. Compared with the ever-increasing progress of information technology and electronic equipment, especially the related physical information system finally merges the training results of each working set, so as to quickly deal with the problem of massive data prediction and analysis. This paper proposes that big data has the flexibility of expansion and quality assessment system, so it is meaningful to replace the double-sidedness of quality assessment with big data. Finally, considering the excellent performance of parallel support vector machines in data mining and analysis, we apply this method to the big data analysis of e-commerce. The research results show that parallel support vector machines can solve the problem of processing large-scale data sets. The emergence of data dirty problems has increased the effective rate by at least 70%.

Download Full-text

Exploring Support Vector Machines for Big Data Analyses

10.1145/3494885.3494891 ◽

2021 ◽

Author(s):

Siyang Lu ◽

Yihong Chen ◽

Xiaolin Zhu ◽

Ziyi Wang ◽

Yangjun Ou ◽

...

Keyword(s):

Big Data ◽

Support Vector Machines ◽

Support Vector ◽

Data Analyses ◽

Vector Machines

Download Full-text

Research on Parallel Support Vector Machine Based on Spark Big Data Platform

Scientific Programming ◽

10.1155/2021/7998417 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Yao Huimin

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Big Data ◽

Support Vector Machines ◽

Cross Validation ◽

Machine Learning Algorithms ◽

Support Vector ◽

Lambda Architecture ◽

Vector Machines ◽

Data Platform

With the development of cloud computing and distributed cluster technology, the concept of big data has been expanded and extended in terms of capacity and value, and machine learning technology has also received unprecedented attention in recent years. Traditional machine learning algorithms cannot solve the problem of effective parallelization, so a parallelization support vector machine based on Spark big data platform is proposed. Firstly, the big data platform is designed with Lambda architecture, which is divided into three layers: Batch Layer, Serving Layer, and Speed Layer. Secondly, in order to improve the training efficiency of support vector machines on large-scale data, when merging two support vector machines, the “special points” other than support vectors are considered, that is, the points where the nonsupport vectors in one subset violate the training results of the other subset, and a cross-validation merging algorithm is proposed. Then, a parallelized support vector machine based on cross-validation is proposed, and the parallelization process of the support vector machine is realized on the Spark platform. Finally, experiments on different datasets verify the effectiveness and stability of the proposed method. Experimental results show that the proposed parallelized support vector machine has outstanding performance in speed-up ratio, training time, and prediction accuracy.

Download Full-text

Combining Support Vector Machines and the t-statistic for Gene Selection in DNA Microarray Data Analysis

Advances in Knowledge Discovery and Data Mining - Lecture Notes in Computer Science ◽

10.1007/978-3-642-13672-6_6 ◽

2010 ◽

pp. 55-62

Author(s):

Tao Yang ◽

Vojislave Kecman ◽

Longbing Cao ◽

Chengqi Zhang

Keyword(s):

Data Analysis ◽

Support Vector Machines ◽

Dna Microarray ◽

Microarray Data ◽

Gene Selection ◽

Microarray Data Analysis ◽

Support Vector ◽

Dna Microarray Data ◽

Vector Machines

Download Full-text

Big-data driven building retrofitting: An integrated Support Vector Machines and Fuzzy C-means clustering method

IOP Conference Series Earth and Environmental Science ◽

10.1088/1755-1315/588/4/042013 ◽

2020 ◽

Vol 588 ◽

pp. 042013

Author(s):

Weizhuo Lu ◽

Kailun Feng

Keyword(s):

Big Data ◽

Support Vector Machines ◽

Data Driven ◽

Support Vector ◽

Clustering Method ◽

Fuzzy C Means ◽

Vector Machines ◽

Fuzzy C Means Clustering

Download Full-text

Fuzzy support vector machines for biomedical data analysis

2005 IEEE International Conference on Granular Computing ◽

10.1109/grc.2005.1547251 ◽

2005 ◽

Cited By ~ 3

Author(s):

X. Chen ◽

R. Harrison ◽

Y.-Q. Zhang

Keyword(s):

Data Analysis ◽

Support Vector Machines ◽

Support Vector ◽

Biomedical Data ◽

Biomedical Data Analysis ◽

Vector Machines ◽

Fuzzy Support Vector Machines

Download Full-text

Grapevine variety identification using “Big Data” collected with miniaturized spectrometer combined with support vector machines and convolutional neural networks

Computers and Electronics in Agriculture ◽

10.1016/j.compag.2019.104855 ◽

2019 ◽

Vol 163 ◽

pp. 104855 ◽

Cited By ~ 2

Author(s):

Armando M. Fernandes ◽

Andrei B. Utkin ◽

José Eiras-Dias ◽

Jorge Cunha ◽

José Silvestre ◽

...

Keyword(s):

Neural Networks ◽

Big Data ◽

Support Vector Machines ◽

Convolutional Neural Networks ◽

Variety Identification ◽

Support Vector ◽

Grapevine Variety ◽

Vector Machines

Download Full-text

Acoustic Diversity Classification Using Machine Learning Techniques: Towards Automated Marine Big Data Analysis

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213020600118 ◽

2020 ◽

Vol 29 (03n04) ◽

pp. 2060011

Author(s):

Emna Hachicha Belghith ◽

François Rioult ◽

Medjber Bouzidi

Keyword(s):

Machine Learning ◽

Big Data ◽

Data Analysis ◽

Big Data Analysis ◽

Machine Learning Techniques ◽

Support Vector ◽

K Nearest Neighbor ◽

Learning Techniques ◽

Acoustic Diversity ◽

Marine Data

During the last years, big data has become the new emerging trend that increasingly attracting the attention of the R&D community in several fields (e.g., image processing, database engineering, data mining, artificial intelligence). Marine data is part of these fields which accommodates this growth, hence the appearance of marine big data paradigm that monitoring advocates the assessment of human impact on marine data. Nonetheless, supporting acoustic sounds classification is missing in such environment, with taking into account the diversity of such data (i.e., sounds of living undersea species, sounds of human activities, and sounds of environmental effects). To overcome this issue, we propose in this paper an approach that efficiently allowing acoustic diversity classification using machine learning techniques. The aim is to reach an automated support of marine big data analysis. We have conducted a set of experiments, using a real marine dataset, in order to validate our approach and show its effectiveness and efficiency. To do so, three machine learning techniques are employed: (i) classic machine learning models (i.e., k-nearest neighbor and support vector machine), (ii) deep learning based on convolutional neural networks, and (iii) transfer learning based on the reuse of pretrained models.

Download Full-text