A Character-Word Graph Attention Networks for Chinese Text Classification

Author(s):  
Shigang Yang ◽  
Yongguo Liu
Author(s):  
Weipeng Jing ◽  
Xianyang Song ◽  
Donglin Di ◽  
Houbing Song

In the area of geographic information processing, there are few researches on geographic text classification. However, the application of this task in Chinese is relatively rare. In our work, we intend to implement a method to extract text containing geographical entities from a large number of network texts. The geographic information in these texts is of great practical significance to transportation, urban and rural planning, disaster relief, and other fields. We use the method of graph convolutional neural network with attention mechanism to achieve this function. Graph attention networks (GAT) is an improvement of graph convolutional neural networks (GCN). Compared with GCN, the advantage of GAT is that the attention mechanism is proposed to weight the sum of the characteristics of adjacent vertices. In addition, We construct a Chinese dataset containing geographical classification from multiple datasets of Chinese text classification. The Macro-F Score of the geoGAT we used reached 95% on the new Chinese dataset.


Author(s):  
Hanqing Tao ◽  
Shiwei Tong ◽  
Hongke Zhao ◽  
Tong Xu ◽  
Binbin Jin ◽  
...  

Recent years, Chinese text classification has attracted more and more research attention. However, most existing techniques which specifically aim at English materials may lose effectiveness on this task due to the huge difference between Chinese and English. Actually, as a special kind of hieroglyphics, Chinese characters and radicals are semantically useful but still unexplored in the task of text classification. To that end, in this paper, we first analyze the motives of using multiple granularity features to represent a Chinese text by inspecting the characteristics of radicals, characters and words. For better representing the Chinese text and then implementing Chinese text classification, we propose a novel Radicalaware Attention-based Four-Granularity (RAFG) model to take full advantages of Chinese characters, words, characterlevel radicals, word-level radicals simultaneously. Specifically, RAFG applies a serialized BLSTM structure which is context-aware and able to capture the long-range information to model the character sharing property of Chinese and sequence characteristics in texts. Further, we design an attention mechanism to enhance the effects of radicals thus model the radical sharing property when integrating granularities. Finally, we conduct extensive experiments, where the experimental results not only show the superiority of our model, but also validate the effectiveness of radicals in the task of Chinese text classification.


Sign in / Sign up

Export Citation Format

Share Document