Toxic Comment Classification Based on Bidirectional Gated Recurrent Unit and Convolutional Neural Network
For English toxic comment classification, this paper presents the model that combines Bi-GRU and CNN optimized by global average pooling (BG-GCNN) based on the bidirectional gated recurrent unit (Bi-GRU) and global pooling optimized convolution neural network (CNN) . The model treats each type of toxic comment as a binary classification. First, Bi-GRU is used to extract the time-series features of the comment and then the dimensionality is reduced through global pooling optimized convolution neural network. Finally, the classification result is output by Sigmoid function. Comparative experiments show the BG-GCNN model has a better classification effect than Text-CNN, LSTM, Bi-GRU, and other models. The Macro-F1 value of the toxic comment dataset on the Kaggle competition platform is 0.62. The F1 values of the three toxic label classification results (toxic, obscene, and insult label) are 0.81, 0.84, and 0.74, respectively, which are the highest values in the comparative experiment.