scholarly journals An Active Learning Based LDA Algorithm for Large-Scale Data Classification

2016 ◽  
Vol 9 (11) ◽  
pp. 29-36
Author(s):  
Xu Yu ◽  
Yan-ping Zhou ◽  
Chun-nian Ren
2020 ◽  
Vol 2020 ◽  
pp. 1-7
Author(s):  
Tuozhong Yao ◽  
Wenfeng Wang ◽  
Yuhong Gu

Multiview active learning (MAL) is a technique which can achieve a large decrease in the size of the version space than traditional active learning and has great potential applications in large-scale data analysis. In this paper, we present a new deep multiview active learning (DMAL) framework which is the first to combine multiview active learning and deep learning for annotation effort reduction. In this framework, our approach advances the existing active learning methods in two aspects. First, we incorporate two different deep convolutional neural networks into active learning which uses multiview complementary information to improve the feature learnings. Second, through the properly designed framework, the feature representation and the classifier can be simultaneously updated with progressively annotated informative samples. The experiments with two challenging image datasets demonstrate that our proposed DMAL algorithm can achieve promising results than several state-of-the-art active learning algorithms.


Author(s):  
Bing Xu

In the process of e-commerce transactions, a large amount of data will be generated, whose effective classification is one of current research hotspots. An improved feature selection method was proposed based on the characteristics of Bayesian classification algorithm. Due to the long training and testing time of modern large-scale data classification on a single computer, a data classification algorithm based on Naive Bayes was designed and implemented on the Hadoop distributed platform. The experimental results showed that the improved algorithm could effectively improve the accuracy of classification, and the designed parallel Bayesian data classification algorithm had high efficiency, which was suitable for the processing and analysis of massive data.


2020 ◽  
Vol 2020 ◽  
pp. 1-16
Author(s):  
Yang Liu ◽  
Xiang Li ◽  
Xianbang Chen ◽  
Xi Wang ◽  
Huaqiang Li

Currently, data classification is one of the most important ways to analysis data. However, along with the development of data collection, transmission, and storage technologies, the scale of the data has been sharply increased. Additionally, due to multiple classes and imbalanced data distribution in the dataset, the class imbalance issue is also gradually highlighted. The traditional machine learning algorithms lack of abilities for handling the aforementioned issues so that the classification efficiency and precision may be significantly impacted. Therefore, this paper presents an improved artificial neural network in enabling the high-performance classification for the imbalanced large volume data. Firstly, the Borderline-SMOTE (synthetic minority oversampling technique) algorithm is employed to balance the training dataset, which potentially aims at improving the training of the back propagation neural network (BPNN), and then, zero-mean, batch-normalization, and rectified linear unit (ReLU) are further employed to optimize the input layer and hidden layers of BPNN. At last, the ensemble learning-based parallelization of the improved BPNN is implemented using the Hadoop framework. Positive conclusions can be summarized according to the experimental results. Benefitting from Borderline-SMOTE, the imbalanced training dataset can be balanced, which improves the training performance and the classification accuracy. The improvements for the input layer and hidden layer also enhance the training performances in terms of convergence. The parallelization and the ensemble learning techniques enable BPNN to implement the high-performance large-scale data classification. The experimental results show the effectiveness of the presented classification algorithm.


2006 ◽  
Vol 10 (5) ◽  
pp. 604-616 ◽  
Author(s):  
G. Folino ◽  
C. Pizzuti ◽  
G. Spezzano

2018 ◽  
Vol 23 (11) ◽  
pp. 3793-3801 ◽  
Author(s):  
Tinglong Tang ◽  
Shengyong Chen ◽  
Meng Zhao ◽  
Wei Huang ◽  
Jake Luo

Sign in / Sign up

Export Citation Format

Share Document