Cluster Reduction Support Vector Machine for Large-Scale Data Set Classification

Big data is a new trend at present, forcing the significant impacts on information technologies. In big data applications, one of the most concerned issues is dealing with large-scale data sets that often require computation resources provided by public cloud services. How to analyze big data efficiently becomes a big challenge. In this paper, we collaborate interval regression with the smooth support vector machine (SSVM) to analyze big data. Recently, the smooth support vector machine (SSVM) was proposed as an alternative of the standard SVM that has been proved more efficient than the traditional SVM in processing large-scale data. In addition the soft margin method is proposed to modify the excursion of separation margin and to be effective in the gray zone that the distribution of data becomes hard to be described and the separation margin between classes.

Download Full-text

An online incremental learning support vector machine for large-scale data

Neural Computing and Applications ◽

10.1007/s00521-011-0793-1 ◽

2012 ◽

Vol 22 (5) ◽

pp. 1023-1035 ◽

Cited By ~ 39

Author(s):

Jun Zheng ◽

Furao Shen ◽

Hongjun Fan ◽

Jinxi Zhao

Keyword(s):

Support Vector Machine ◽

Incremental Learning ◽

Large Scale ◽

Support Vector ◽

Learning Support ◽

Large Scale Data ◽

Online Incremental Learning ◽

Scale Data

Download Full-text

Research on Large Scale Data Set Processing Based on SVM

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.216.738 ◽

2011 ◽

Vol 216 ◽

pp. 738-741

Author(s):

Yue E Chen ◽

Bai Li Ren

Keyword(s):

Large Scale ◽

Support Vector ◽

Simulation Experiments ◽

Data Set ◽

Training Time ◽

Training Support ◽

Large Scale Data ◽

Vector Machines ◽

Speech Classification ◽

Scale Data

SVM has got very good results in the area of solving the classification, regression and density estimation problem in machine learning, has been successfully applied to practical problems of text recognition, speech classification, but the training time is too long is a big drawback. A new reduction strategy is proposed for training support vector machines. This method is fast in convergence without learning machine’s generalization performance, the results of simulation experiments show the feasibility and effectiveness of that method through this method.

Download Full-text

An Online Incremental Learning Support Vector Machine for Large-scale Data

Artificial Neural Networks – ICANN 2010 - Lecture Notes in Computer Science ◽

10.1007/978-3-642-15822-3_9 ◽

2010 ◽

pp. 76-81 ◽

Cited By ~ 4

Author(s):

Jun Zheng ◽

Hui Yu ◽

Furao Shen ◽

Jinxi Zhao

Keyword(s):

Support Vector Machine ◽

Incremental Learning ◽

Large Scale ◽

Support Vector ◽

Learning Support ◽

Large Scale Data ◽

Online Incremental Learning ◽

Scale Data

Download Full-text

An Evaluation Model and Benchmark for Parallel Computing Frameworks

Mobile Information Systems ◽

10.1155/2018/3890341 ◽

2018 ◽

Vol 2018 ◽

pp. 1-14 ◽

Cited By ~ 2

Author(s):

Weibei Fan ◽

Zhijie Han ◽

Ruchuan Wang

Keyword(s):

Support Vector Machine ◽

Parallel Computing ◽

Comparative Evaluation ◽

Large Scale ◽

Evaluation Model ◽

Performance Model ◽

Support Vector ◽

Large Scale Data ◽

Performance Evaluation Model ◽

Scale Data

MARS and Spark are two popular parallel computing frameworks and widely used for large-scale data analysis. In this paper, we first propose a performance evaluation model based on support vector machine (SVM), which is used to analyze the performance of parallel computing frameworks. Furthermore, we give representative results of a set of analysis with the proposed analytical performance model and then perform a comparative evaluation of MARS and Spark by using representative workloads and considering factors, such as performance and scalability. The experiments show that our evaluation model has higher accuracy than multifactor line regression (MLR) in predicting execution time, and it also provides a resource consumption requirement. Finally, we study benchmark experiments between MARS and Spark. MARS has better performance than Spark in both throughput and speedup in the executions of logistic regression and Bayesian classification because MARS has a large number of GPU threads that can handle higher parallelism. It also shows that Spark has lower latency than MARS in the execution of the four benchmarks.

Download Full-text

Application of Data Mining Technology under K-means Algorithm Combined with BIM Technology in Management Engineering

International Journal of Advanced Information and Communication Technology ◽

10.46532/ijaict-2020030 ◽

2020 ◽

pp. 141-147

Author(s):

Jun Wang ◽

Zhan Chen

Keyword(s):

Support Vector Machine ◽

Large Scale ◽

Clustering Algorithm ◽

Facility Management ◽

Support Vector ◽

Detection Time ◽

Improved Method ◽

Large Scale Data ◽

Operation And Maintenance ◽

Scale Data

The data mining technology of the K-means algorithm combined with BIM (Building Information Modeling) technology is applied to management engineering, which is convenient for project management personnel. Method: The K-means clustering algorithm is combined with the support vector machine algorithm. The support vector machine is used to ensure the high accuracy of the anomaly detection algorithm. The K-means clustering algorithm is used to divide the support vector machine into blocks. It also analyzes the different needs of the facility management staff, and clearly defines the content and level of detail required to build the BIM model. It not only meets the data requirements for operation and maintenance but also avoids waste caused by excessive modeling. Result: Compared with traditional support vector machines, the improved algorithm in this paper has a higher detection rate and lower false alarm rate. Also, it can shorten the detection time of large-scale data to provide an effective method for abnormal detection of sensor networks and processing of large-scale data sets. The improved method increases the detection accuracy by 8.13% and decreases the false alarm rate by 89.08%. In terms of detection time, the improved method increases by 3.82s, which is 4.67 times the traditional method. Conclusion: The structural health monitoring system can efficiently and accurately monitor the accuracy of the data. BIM can provide rich operation and maintenance data for facility management to effectively improve the efficiency of facility management.

Download Full-text

ProGen:Provenance database generator for large-scale data set

Journal of Computer Applications ◽

10.3724/sp.j.1087.2008.02737 ◽

2009 ◽

Vol 28 (11) ◽

pp. 2737-2740

Author(s):

Xiao ZHANG ◽

Shan WANG ◽

Na LIAN

Keyword(s):

Large Scale ◽

Data Set ◽

Large Scale Data ◽

Scale Data

Download Full-text

Matrix-based Kernel Principal Component analysis for large-scale data set

2009 International Joint Conference on Neural Networks ◽

10.1109/ijcnn.2009.5178692 ◽

2009 ◽

Cited By ~ 3

Author(s):

Weiya Shi ◽

Yue-Fei Guo ◽

Xiangyang Xue

Keyword(s):

Principal Component Analysis ◽

Large Scale ◽

Principal Component ◽

Component Analysis ◽

Kernel Principal Component Analysis ◽

Data Set ◽

Large Scale Data ◽

Scale Data

Download Full-text

Support Vector Machines in Big Data Classification: A Systematic Literature Review

10.21203/rs.3.rs-663359/v1 ◽

2021 ◽

Author(s):

Mohammad Hassan Almaspoor ◽

Ali Safaei ◽

Afshin Salajegheh ◽

Behrouz Minaei-Bidgoli

Keyword(s):

Machine Learning ◽

Big Data ◽

Large Scale ◽

Support Vector ◽

Research Areas ◽

Large Scale Data ◽

Training Samples ◽

Big Data Classification ◽

Scale Data

Abstract Classification is one of the most important and widely used issues in machine learning, the purpose of which is to create a rule for grouping data to sets of pre-existing categories is based on a set of training sets. Employed successfully in many scientific and engineering areas, the Support Vector Machine (SVM) is among the most promising methods of classification in machine learning. With the advent of big data, many of the machine learning methods have been challenged by big data characteristics. The standard SVM has been proposed for batch learning in which all data are available at the same time. The SVM has a high time complexity, i.e., increasing the number of training samples will intensify the need for computational resources and memory. Hence, many attempts have been made at SVM compatibility with online learning conditions and use of large-scale data. This paper focuses on the analysis, identification, and classification of existing methods for SVM compatibility with online conditions and large-scale data. These methods might be employed to classify big data and propose research areas for future studies. Considering its advantages, the SVM can be among the first options for compatibility with big data and classification of big data. For this purpose, appropriate techniques should be developed for data preprocessing in order to covert data into an appropriate form for learning. The existing frameworks should also be employed for parallel and distributed processes so that SVMs can be made scalable and properly online to be able to handle big data.

Download Full-text