An online classification algorithm for large scale data streams: iGNGSVM

In an enterprise setting, a major challenge for any data mining operation is managing data streams or feeds, both data and metadata, to ensure a stable and certifiably accurate flow of data. Data feeds in this environment can be complex, numerous and opaque. The management of frequently changing data and metadata presents a considerable challenge. In this paper, we articulate the technical issues involved in the task of managing enterprise data and propose a multi-disciplinary solution, derived from fields such as knowledge engineering and statistics, to understand, standardize, and automate information acquisition and quality management in preparation for enterprise mining.

Download Full-text

Efficient subspace clustering of large-scale data streams with misses

2016 Annual Conference on Information Science and Systems (CISS) ◽

10.1109/ciss.2016.7460569 ◽

2016 ◽

Cited By ~ 4

Author(s):

Panagiotis A. Traganitis ◽

Georgios B. Giannakis

Keyword(s):

Data Streams ◽

Large Scale ◽

Subspace Clustering ◽

Large Scale Data ◽

Scale Data

Download Full-text

A kernel fused perceptron for the online classification of large-scale data

Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining Algorithms, Systems, Programming Models and Applications - BigMine '12 ◽

10.1145/2351316.2351332 ◽

2012 ◽

Author(s):

Huijun He ◽

Mingmin Chi ◽

Wenqiang Zhang

Keyword(s):

Large Scale ◽

Large Scale Data ◽

Online Classification ◽

Scale Data

Download Full-text

Semisupervised local preserving embedding algorithm based on maximum margin criterion for large-scale data streams

Concurrency and Computation Practice and Experience ◽

10.1002/cpe.4246 ◽

2017 ◽

Vol 29 (19) ◽

pp. e4246 ◽

Cited By ~ 1

Author(s):

Chao Tan ◽

Genlin Ji

Keyword(s):

Data Streams ◽

Large Scale ◽

Maximum Margin ◽

Large Scale Data ◽

Maximum Margin Criterion ◽

Scale Data

Download Full-text

E-Commerce data classification in the cloud environment based on bayesian algorithm

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189421 ◽

2020 ◽

pp. 1-8

Author(s):

Bing Xu

Keyword(s):

Large Scale ◽

High Efficiency ◽

Feature Selection Method ◽

Data Classification ◽

Classification Algorithm ◽

Testing Time ◽

Bayesian Algorithm ◽

Large Scale Data ◽

Distributed Platform ◽

Scale Data

In the process of e-commerce transactions, a large amount of data will be generated, whose effective classification is one of current research hotspots. An improved feature selection method was proposed based on the characteristics of Bayesian classification algorithm. Due to the long training and testing time of modern large-scale data classification on a single computer, a data classification algorithm based on Naive Bayes was designed and implemented on the Hadoop distributed platform. The experimental results showed that the improved algorithm could effectively improve the accuracy of classification, and the designed parallel Bayesian data classification algorithm had high efficiency, which was suitable for the processing and analysis of massive data.

Download Full-text

Parallel Strategy for the Large-Scale Data Streams Processing

Proceedings of the 2016 International Conference on Computer Engineering and Information Systems ◽

10.2991/ceis-16.2016.45 ◽

2016 ◽

Author(s):

Ya-Juan Yuan ◽

Guo-Jie Ma

Keyword(s):

Data Streams ◽

Large Scale ◽

Large Scale Data ◽

Scale Data ◽

Parallel Strategy

Download Full-text