An Active Learning Based LDA Algorithm for Large-Scale Data Classification

Multiview active learning (MAL) is a technique which can achieve a large decrease in the size of the version space than traditional active learning and has great potential applications in large-scale data analysis. In this paper, we present a new deep multiview active learning (DMAL) framework which is the first to combine multiview active learning and deep learning for annotation effort reduction. In this framework, our approach advances the existing active learning methods in two aspects. First, we incorporate two different deep convolutional neural networks into active learning which uses multiview complementary information to improve the feature learnings. Second, through the properly designed framework, the feature representation and the classifier can be simultaneously updated with progressively annotated informative samples. The experiments with two challenging image datasets demonstrate that our proposed DMAL algorithm can achieve promising results than several state-of-the-art active learning algorithms.

Download Full-text

Approximate kernel extreme learning machine for large scale data classification

Neurocomputing ◽

10.1016/j.neucom.2016.09.023 ◽

2017 ◽

Vol 219 ◽

pp. 210-220 ◽

Cited By ~ 23

Author(s):

Alexandros Iosifidis ◽

Anastasios Tefas ◽

Ioannis Pitas

Keyword(s):

Extreme Learning Machine ◽

Large Scale ◽

Data Classification ◽

Large Scale Data ◽

Kernel Extreme Learning Machine ◽

Learning Machine ◽

Scale Data

Download Full-text

Large-scale Data Classification based on K-means Clustering and Deep Learning

The Journal of King Mongkut s University of Technology North Bangkok ◽

10.14416/j.kmutnb.2021.03.012 ◽

2021 ◽

Vol 32 (4) ◽

Author(s):

Nuntuschaporn Senawong ◽

Supawadee Wichitchan ◽

Orawich Kumphon

Keyword(s):

Deep Learning ◽

Large Scale ◽

Data Classification ◽

Large Scale Data ◽

Scale Data

Download Full-text

Large-scale data classification based on hierarchical clustering and re-sampling

Journal of Computer Applications ◽

10.3724/sp.j.1087.2013.02801 ◽

2013 ◽

Vol 33 (10) ◽

pp. 2801-2803

Author(s):

Yong ZHANG ◽

Panpan FU ◽

Yuting ZHANG

Keyword(s):

Hierarchical Clustering ◽

Large Scale ◽

Data Classification ◽

Large Scale Data ◽

Scale Data

Download Full-text

E-Commerce data classification in the cloud environment based on bayesian algorithm

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189421 ◽

2020 ◽

pp. 1-8

Author(s):

Bing Xu

Keyword(s):

Large Scale ◽

High Efficiency ◽

Feature Selection Method ◽

Data Classification ◽

Classification Algorithm ◽

Testing Time ◽

Bayesian Algorithm ◽

Large Scale Data ◽

Distributed Platform ◽

Scale Data

In the process of e-commerce transactions, a large amount of data will be generated, whose effective classification is one of current research hotspots. An improved feature selection method was proposed based on the characteristics of Bayesian classification algorithm. Due to the long training and testing time of modern large-scale data classification on a single computer, a data classification algorithm based on Naive Bayes was designed and implemented on the Hadoop distributed platform. The experimental results showed that the improved algorithm could effectively improve the accuracy of classification, and the designed parallel Bayesian data classification algorithm had high efficiency, which was suitable for the processing and analysis of massive data.

Download Full-text

High-Performance Machine Learning for Large-Scale Data Classification considering Class Imbalance

Scientific Programming ◽

10.1155/2020/1953461 ◽

2020 ◽

Vol 2020 ◽

pp. 1-16

Author(s):

Yang Liu ◽

Xiang Li ◽

Xianbang Chen ◽

Xi Wang ◽

Huaqiang Li

Keyword(s):

Neural Network ◽

Machine Learning ◽

High Performance ◽

Large Scale ◽

Class Imbalance ◽

Data Classification ◽

Training Dataset ◽

Large Scale Data ◽

Input Layer ◽

Scale Data

Currently, data classification is one of the most important ways to analysis data. However, along with the development of data collection, transmission, and storage technologies, the scale of the data has been sharply increased. Additionally, due to multiple classes and imbalanced data distribution in the dataset, the class imbalance issue is also gradually highlighted. The traditional machine learning algorithms lack of abilities for handling the aforementioned issues so that the classification efficiency and precision may be significantly impacted. Therefore, this paper presents an improved artificial neural network in enabling the high-performance classification for the imbalanced large volume data. Firstly, the Borderline-SMOTE (synthetic minority oversampling technique) algorithm is employed to balance the training dataset, which potentially aims at improving the training of the back propagation neural network (BPNN), and then, zero-mean, batch-normalization, and rectified linear unit (ReLU) are further employed to optimize the input layer and hidden layers of BPNN. At last, the ensemble learning-based parallelization of the improved BPNN is implemented using the Hadoop framework. Positive conclusions can be summarized according to the experimental results. Benefitting from Borderline-SMOTE, the imbalanced training dataset can be balanced, which improves the training performance and the classification accuracy. The improvements for the input layer and hidden layer also enhance the training performances in terms of convergence. The parallelization and the ensemble learning techniques enable BPNN to implement the high-performance large-scale data classification. The experimental results show the effectiveness of the presented classification algorithm.

Download Full-text