Barrage Text Classification with Improved Active Learning and CNN
Traditional convolutional neural networks (CNNs) use a pooling layer to reduce the dimensionality of texts, but lose semantic information. To solve this problem, this paper proposes a convolutional neural network model based on singular value decomposition algorithm (SVD-CNN). First, an improved density-based center point clustering active learning sampling algorithm (DBC-AL) is used to obtain a high-quality training set at a low labelling cost. Second, the method uses the singular value decomposition algorithm for feature extraction and dimensionality reduction instead of a pooling layer, fuses the dimensionality reduction matrix, and completes the barrage text classification task. Finally, the partial sampling gradient descent algorithm (PSGD) is applied to optimize the model parameters, which accelerates the convergence speed of the model while ensuring stability of the model training. To verify the effectiveness of the improved algorithm, several barrage datasets were used to compare the proposed model and common text classification models. The experimental results show that the improved algorithm preserves the semantic features of the text more successfully, ensures the stability of the training process, and improves the convergence speed of the model. Further, the model’s classification performance on different barrage texts is superior to traditional algorithms.