An iterative method for classification of binary data

Information and Inference A Journal of the IMA ◽

10.1093/imaiai/iaaa003 ◽

2020 ◽

Author(s):

Denali Molitor ◽

Deanna Needell

Keyword(s):

Binary Data ◽

Large Scale ◽

Support Vector ◽

Large Scale Data ◽

Classification Framework ◽

Vector Machines ◽

Inference Methods ◽

Compressed Data ◽

Scale Data

Abstract In today’s data-driven world, storing, processing and gleaning insights from large-scale data are major challenges. Data compression is often required in order to store large amounts of high-dimensional data, and thus, efficient inference methods for analyzing compressed data are necessary. Building on a recently designed simple framework for classification using binary data, we demonstrate that one can improve classification accuracy of this approach through iterative applications whose output serves as input to the next application. As a side consequence, we show that the original framework can be used as a data preprocessing step to improve the performance of other methods, such as support vector machines. For several simple settings, we showcase the ability to obtain theoretical guarantees for the accuracy of the iterative classification method. The simplicity of the underlying classification framework makes it amenable to theoretical analysis.

Download Full-text

Support Vector Machines in Big Data Classification: A Systematic Literature Review

10.21203/rs.3.rs-663359/v1 ◽

2021 ◽

Author(s):

Mohammad Hassan Almaspoor ◽

Ali Safaei ◽

Afshin Salajegheh ◽

Behrouz Minaei-Bidgoli

Keyword(s):

Machine Learning ◽

Big Data ◽

Large Scale ◽

Support Vector ◽

Research Areas ◽

Large Scale Data ◽

Training Samples ◽

Big Data Classification ◽

Scale Data

Abstract Classification is one of the most important and widely used issues in machine learning, the purpose of which is to create a rule for grouping data to sets of pre-existing categories is based on a set of training sets. Employed successfully in many scientific and engineering areas, the Support Vector Machine (SVM) is among the most promising methods of classification in machine learning. With the advent of big data, many of the machine learning methods have been challenged by big data characteristics. The standard SVM has been proposed for batch learning in which all data are available at the same time. The SVM has a high time complexity, i.e., increasing the number of training samples will intensify the need for computational resources and memory. Hence, many attempts have been made at SVM compatibility with online learning conditions and use of large-scale data. This paper focuses on the analysis, identification, and classification of existing methods for SVM compatibility with online conditions and large-scale data. These methods might be employed to classify big data and propose research areas for future studies. Considering its advantages, the SVM can be among the first options for compatibility with big data and classification of big data. For this purpose, appropriate techniques should be developed for data preprocessing in order to covert data into an appropriate form for learning. The existing frameworks should also be employed for parallel and distributed processes so that SVMs can be made scalable and properly online to be able to handle big data.

Download Full-text

Influencing Factors of e-Commerce Enterprise Development Based on Mobile Computing Big Data Analysis

Wireless Communications and Mobile Computing ◽

10.1155/2021/8750111 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Yixue Zhu ◽

Boyue Chai

Keyword(s):

Big Data ◽

Data Analysis ◽

Large Scale ◽

Big Data Analysis ◽

Support Vector ◽

Data Sets ◽

Large Scale Data ◽

Vector Machines ◽

Physical Information ◽

Scale Data

With the development of increasingly advanced information technology and electronic technology, especially with regard to physical information systems, cloud computing systems, and social services, big data will be widely visible, creating benefits for people and at the same time facing huge challenges. In addition, with the advent of the era of big data, the scale of data sets is getting larger and larger. Traditional data analysis methods can no longer solve the problem of large-scale data sets, and the hidden information behind big data is digging out, especially in the field of e-commerce. We have become a key factor in competition among enterprises. We use a support vector machine method based on parallel computing to analyze the data. First, the training samples are divided into several working subsets through the SOM self-organizing neural network classification method. Compared with the ever-increasing progress of information technology and electronic equipment, especially the related physical information system finally merges the training results of each working set, so as to quickly deal with the problem of massive data prediction and analysis. This paper proposes that big data has the flexibility of expansion and quality assessment system, so it is meaningful to replace the double-sidedness of quality assessment with big data. Finally, considering the excellent performance of parallel support vector machines in data mining and analysis, we apply this method to the big data analysis of e-commerce. The research results show that parallel support vector machines can solve the problem of processing large-scale data sets. The emergence of data dirty problems has increased the effective rate by at least 70%.

Download Full-text

Research on Large Scale Data Set Processing Based on SVM

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.216.738 ◽

2011 ◽

Vol 216 ◽

pp. 738-741

Author(s):

Yue E Chen ◽

Bai Li Ren

Keyword(s):

Large Scale ◽

Support Vector ◽

Simulation Experiments ◽

Data Set ◽

Training Time ◽

Training Support ◽

Large Scale Data ◽

Vector Machines ◽

Speech Classification ◽

Scale Data

SVM has got very good results in the area of solving the classification, regression and density estimation problem in machine learning, has been successfully applied to practical problems of text recognition, speech classification, but the training time is too long is a big drawback. A new reduction strategy is proposed for training support vector machines. This method is fast in convergence without learning machine’s generalization performance, the results of simulation experiments show the feasibility and effectiveness of that method through this method.

Download Full-text

Sparse Reductions for Fixed-Size Least Squares Support Vector Machines on Large Scale Data

Advances in Knowledge Discovery and Data Mining - Lecture Notes in Computer Science ◽

10.1007/978-3-642-37453-1_14 ◽

2013 ◽

pp. 161-173 ◽

Cited By ~ 9

Author(s):

Raghvendra Mall ◽

Johan A. K. Suykens

Keyword(s):

Support Vector Machines ◽

Least Squares ◽

Large Scale ◽

Support Vector ◽

Fixed Size ◽

Large Scale Data ◽

Vector Machines ◽

Scale Data

Download Full-text

Fast and scalable support vector clustering for large-scale data analysis

Knowledge and Information Systems ◽

10.1007/s10115-013-0724-9 ◽

2014 ◽

Vol 43 (2) ◽

pp. 281-310 ◽

Cited By ~ 9

Author(s):

Yuan Ping ◽

Yun Feng Chang ◽

Yajian Zhou ◽

Ying Jie Tian ◽

Yi Xian Yang ◽

...

Keyword(s):

Data Analysis ◽

Large Scale ◽

Support Vector ◽

Support Vector Clustering ◽

Large Scale Data ◽

Vector Clustering ◽

Scale Data

Download Full-text

Cluster Reduction Support Vector Machine for Large-Scale Data Set Classification

2008 IEEE Pacific-Asia Workshop on Computational Intelligence and Industrial Application ◽

10.1109/paciia.2008.43 ◽

2008 ◽

Author(s):

Guangxi Chen ◽

Yan Cheng ◽

Jian Xu

Keyword(s):

Support Vector Machine ◽

Large Scale ◽

Support Vector ◽

Data Set ◽

Large Scale Data ◽

Scale Data ◽

Cluster Reduction

Download Full-text

Classification of large-scale data and data batch stream with forward stagewise algorithm

Journal of the Korean Data and Information Science Society ◽

10.7465/jkdi.2014.25.6.1283 ◽

2014 ◽

Vol 25 (6) ◽

pp. 1283-1291

Author(s):

Young Joo Yoon

Keyword(s):

Large Scale ◽

Large Scale Data ◽

Scale Data

Download Full-text

Analyzing Big Data with the Hybrid Interval Regression Methods

The Scientific World JOURNAL ◽

10.1155/2014/243921 ◽

2014 ◽

Vol 2014 ◽

pp. 1-8 ◽

Cited By ~ 1

Author(s):

Chia-Hui Huang ◽

Keng-Chieh Yang ◽

Han-Ying Kao

Keyword(s):

Support Vector Machine ◽

Big Data ◽

Information Technologies ◽

Large Scale ◽

Cloud Services ◽

Support Vector ◽

Large Scale Data ◽

Big Data Applications ◽

Smooth Support Vector Machine ◽

Scale Data

Big data is a new trend at present, forcing the significant impacts on information technologies. In big data applications, one of the most concerned issues is dealing with large-scale data sets that often require computation resources provided by public cloud services. How to analyze big data efficiently becomes a big challenge. In this paper, we collaborate interval regression with the smooth support vector machine (SSVM) to analyze big data. Recently, the smooth support vector machine (SSVM) was proposed as an alternative of the standard SVM that has been proved more efficient than the traditional SVM in processing large-scale data. In addition the soft margin method is proposed to modify the excursion of separation margin and to be effective in the gray zone that the distribution of data becomes hard to be described and the separation margin between classes.

Download Full-text