Large-Scale Data Classification Based on Ball Vector Machine

2013 ◽  
Vol 312 ◽  
pp. 771-776
Author(s):  
Min Juan Zheng ◽  
Guo Jian Cheng ◽  
Fei Zhao

The quadratic programming problem in the standard support vector machine (SVM) algorithm has high time complexity and space complexity in solving the large-scale problems which becomes a bottleneck in the SVM applications. Ball Vector Machine (BVM) converts the quadratic programming problem of the traditional SVM into the minimum enclosed ball problem (MEB). It can indirectly get the solution of quadratic programming through solving the MEB problem which significantly reduces the time complexity and space complexity. The experiments show that when handling five large-scale and high-dimensional data sets, the BVM and standard SVM have a considerable accuracy, but the BVM has higher speed and less requirement space than standard SVM.

2020 ◽  
Vol 10 (19) ◽  
pp. 6979
Author(s):  
Minho Ryu ◽  
Kichun Lee

Support vector machines (SVMs) are a well-known classifier due to their superior classification performance. They are defined by a hyperplane, which separates two classes with the largest margin. In the computation of the hyperplane, however, it is necessary to solve a quadratic programming problem. The storage cost of a quadratic programming problem grows with the square of the number of training sample points, and the time complexity is proportional to the cube of the number in general. Thus, it is worth studying how to reduce the training time of SVMs without compromising the performance to prepare for sustainability in large-scale SVM problems. In this paper, we proposed a novel data reduction method for reducing the training time by combining decision trees and relative support distance. We applied a new concept, relative support distance, to select good support vector candidates in each partition generated by the decision trees. The selected support vector candidates improved the training speed for large-scale SVM problems. In experiments, we demonstrated that our approach significantly reduced the training time while maintaining good classification performance in comparison with existing approaches.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Yixue Zhu ◽  
Boyue Chai

With the development of increasingly advanced information technology and electronic technology, especially with regard to physical information systems, cloud computing systems, and social services, big data will be widely visible, creating benefits for people and at the same time facing huge challenges. In addition, with the advent of the era of big data, the scale of data sets is getting larger and larger. Traditional data analysis methods can no longer solve the problem of large-scale data sets, and the hidden information behind big data is digging out, especially in the field of e-commerce. We have become a key factor in competition among enterprises. We use a support vector machine method based on parallel computing to analyze the data. First, the training samples are divided into several working subsets through the SOM self-organizing neural network classification method. Compared with the ever-increasing progress of information technology and electronic equipment, especially the related physical information system finally merges the training results of each working set, so as to quickly deal with the problem of massive data prediction and analysis. This paper proposes that big data has the flexibility of expansion and quality assessment system, so it is meaningful to replace the double-sidedness of quality assessment with big data. Finally, considering the excellent performance of parallel support vector machines in data mining and analysis, we apply this method to the big data analysis of e-commerce. The research results show that parallel support vector machines can solve the problem of processing large-scale data sets. The emergence of data dirty problems has increased the effective rate by at least 70%.


1989 ◽  
Vol 21 (1) ◽  
pp. 99-114 ◽  
Author(s):  
A Nagurney ◽  
Referee H K Chen

In this paper a quadratic programming problem is considered. It contains, as special cases, formulations of constrained matrix problems with unknown row and column totals, and classical spatial price equilibrium problems with congestion. An equilibration algorithm, which is of the relaxation type, is introduced into the problem. It resolves the system into subproblems, which in turn, can be solved exactly, even in the presence of upper bounds. Also provided is computational experience for several large-scale examples. This work identifies the equivalency between constrained matrix problems and spatial price equilibrium problems which had been postulated, but, heretofore, not made.


2021 ◽  
Vol 25 (2) ◽  
pp. 265-281
Author(s):  
Junyou Ye ◽  
Zhixia Yang ◽  
Zhilin Li

We present a novel kernel-free regressor, called quadratic hyper-surface kernel-free least squares support vector regression (QLSSVR), for some regression problems. The task of this approach is to find a quadratic function as the regression function, which is obtained by solving a quadratic programming problem with the equality constraints. Basically, the new model just needs to solve a system of linear equations to achieve the optimal solution instead of solving a quadratic programming problem. Therefore, compared with the standard support vector regression, our approach is much efficient due to kernel-free and solving a set of linear equations. Numerical results illustrate that our approach has better performance than other existing regression approaches in terms of regression criterion and CPU time.


2020 ◽  
pp. 1-11
Author(s):  
Jingwen Hou

At present, online education evaluation models are insufficient when dealing with small-scale evaluation data sets. In order to discriminate the learner’s learning state, this paper further studies online teaching machine learning methods, and introduces adaptive learning rate and momentum terms to improve the gradient descent method of BP neural network to improve the convergence rate of the model. Moreover, this study proposes a deep neural network model to deal with complex high-dimensional large-scale data set problems. In the process of supervised prediction, this study uses support vector regression as a predictor for supervised prediction, and this study maps complex non-linear relationships into high-dimensional space to achieve a linear relationship similar to low-dimensional space. In addition, in this study, small-scale teaching quality evaluation data sets and large-scale data sets are input into the model to perform experiments. Finally, the model proposed in this study is compared with other shallow models. The results show that the model proposed in this research is effective and advantageous in evaluating teaching quality in universities and processing large-scale data sets.


Author(s):  
Hao Liu ◽  
◽  
Satoshi Oyama ◽  
Masahito Kurihara ◽  
Haruhiko Sato

Clustering is an important tool for data analysis and many clustering techniques have been proposed over the past years. Among them are density-based clustering methods, which have several benefits such as the number of clusters is not required before carrying out clustering; the detected clusters can be represented in an arbitrary shape and outliers can be detected and removed. Recently, the density-based algorithms were extended with the fuzzy set theory, which has made these algorithm more robust. However, the density-based clustering algorithms usually require a time complexity ofO(n2) wherenis the number of data in the data set, implying that they are not suitable to work with large scale data sets. In this paper, a novel clustering algorithm called landmark fuzzy neighborhood DBSCAN (landmark FN-DBSCAN) is proposed. The concept, landmark, is used to represent a subset of the input data set which makes the algorithm efficient on large scale data sets. We give a theoretical analysis on time complexity and space complexity, which shows both of them are linear to the size of the data set. The experiments show that the landmark FN-DBSCAN is much faster than FN-DBSCAN and provides a very good quality of clustering.


Sign in / Sign up

Export Citation Format

Share Document