scholarly journals Large-scale text classification with deeper and wider convolution neural network

2020 ◽  
Vol 15 (1/2) ◽  
pp. 120
Author(s):  
Min Huang ◽  
Wei Huang
IEEE Access ◽  
2019 ◽  
Vol 7 ◽  
pp. 171548-171558 ◽  
Author(s):  
Jiaying Wang ◽  
Yaxin Li ◽  
Jing Shan ◽  
Jinling Bao ◽  
Chuanyu Zong ◽  
...  

Symmetry ◽  
2020 ◽  
Vol 12 (2) ◽  
pp. 186 ◽  
Author(s):  
Huiming Zhu ◽  
Chunhui He ◽  
Yang Fang ◽  
Bin Ge ◽  
Meng Xing ◽  
...  

With the rapid growth of patent applications, it has become an urgent problem to automatically classify the accepted patent application documents accurately and quickly. Most previous patent automatic classification studies are based on feature engineering and traditional machine learning methods like SVM, and some even rely on the knowledge of domain experts, hence they suffer from low accuracy problem and have poor generalization ability. In this paper, we propose a patent automatic classification method via the symmetric hierarchical convolution neural network (CNN) named PAC-HCNN. We use the title and abstract of the patent as the input data, and then apply the word embedding technique to segment and vectorize the input data. Then we design a symmetric hierarchical CNN framework to classify the patents based on the word embeddings, which is much more efficient than traditional RNN models dealing with texts, meanwhile keeping the history and future information of the input sequence. We also add gated linear units (GLUs) and residual connection to help realize the deep CNN. Additionally, we equip our model with a self attention mechanism to address the long-term dependency problem. Experiments are performed on large-scale datasets for Chinese short text patent classification. Experimental results prove our proposed model’s effectiveness, and it performs better than other state-of-the-art models significantly and consistently on both fine-grained and coarse-grained classification.


Sign in / Sign up

Export Citation Format

Share Document