A gating context-aware text classification model with BERT and graph convolutional networks

2021 ◽  
pp. 1-13
Author(s):  
Weiqi Gao ◽  
Hao Huang

Graph convolutional networks (GCNs), which are capable of effectively processing graph-structural data, have been successfully applied in text classification task. Existing studies on GCN based text classification model largely concerns with the utilization of word co-occurrence and Term Frequency-Inverse Document Frequency (TF–IDF) information for graph construction, which to some extent ignore the context information of the texts. To solve this problem, we propose a gating context-aware text classification model with Bidirectional Encoder Representations from Transformers (BERT) and graph convolutional network, named as Gating Context GCN (GC-GCN). More specifically, we integrates the graph embedding with BERT embedding by using a GCN with gating mechanism enables the acquisition of context coding. We carry out text classification experiments to show the effectiveness of the proposed model. Experimental results shown our model has respectively obtained 0.19%, 0.57%, 1.05% and 1.17% improvements over the Text-GCN baseline on the 20NG, R8, R52, and Ohsumed benchmark datasets. Furthermore, to overcome the problem that word co-occurrence and TF–IDF are not suitable for graph construction for short texts, Euclidean distance is used to combine with word co-occurrence and TF–IDF information. We obtain an improvement by 1.38% on the MR dataset compared to Text-GCN baseline.

Author(s):  
Noha Ali ◽  
Ahmed H. AbuEl-Atta ◽  
Hala H. Zayed

<span id="docs-internal-guid-cb130a3a-7fff-3e11-ae3d-ad2310e265f8"><span>Deep learning (DL) algorithms achieved state-of-the-art performance in computer vision, speech recognition, and natural language processing (NLP). In this paper, we enhance the convolutional neural network (CNN) algorithm to classify cancer articles according to cancer hallmarks. The model implements a recent word embedding technique in the embedding layer. This technique uses the concept of distributed phrase representation and multi-word phrases embedding. The proposed model enhances the performance of the existing model used for biomedical text classification. The result of the proposed model overcomes the previous model by achieving an F-score equal to 83.87% using an unsupervised technique that trained on PubMed abstracts called PMC vectors (PMCVec) embedding. Also, we made another experiment on the same dataset using the recurrent neural network (RNN) algorithm with two different word embeddings Google news and PMCVec which achieving F-score equal to 74.9% and 76.26%, respectively.</span></span>


2020 ◽  
Vol 34 (01) ◽  
pp. 27-34 ◽  
Author(s):  
Lei Chen ◽  
Le Wu ◽  
Richang Hong ◽  
Kun Zhang ◽  
Meng Wang

Graph Convolutional Networks~(GCNs) are state-of-the-art graph based representation learning models by iteratively stacking multiple layers of convolution aggregation operations and non-linear activation operations. Recently, in Collaborative Filtering~(CF) based Recommender Systems~(RS), by treating the user-item interaction behavior as a bipartite graph, some researchers model higher-layer collaborative signals with GCNs. These GCN based recommender models show superior performance compared to traditional works. However, these models suffer from training difficulty with non-linear activations for large user-item graphs. Besides, most GCN based models could not model deeper layers due to the over smoothing effect with the graph convolution operation. In this paper, we revisit GCN based CF models from two aspects. First, we empirically show that removing non-linearities would enhance recommendation performance, which is consistent with the theories in simple graph convolutional networks. Second, we propose a residual network structure that is specifically designed for CF with user-item interaction modeling, which alleviates the over smoothing problem in graph convolution aggregation operation with sparse user-item interaction data. The proposed model is a linear model and it is easy to train, scale to large datasets, and yield better efficiency and effectiveness on two real datasets. We publish the source code at https://github.com/newlei/LR-GCCF.


2021 ◽  
Vol 11 (21) ◽  
pp. 9910
Author(s):  
Yo-Han Park ◽  
Gyong-Ho Lee ◽  
Yong-Seok Choi ◽  
Kong-Joo Lee

Sentence compression is a natural language-processing task that produces a short paraphrase of an input sentence by deleting words from the input sentence while ensuring grammatical correctness and preserving meaningful core information. This study introduces a graph convolutional network (GCN) into a sentence compression task to encode syntactic information, such as dependency trees. As we upgrade the GCN to activate a directed edge, the compression model with the GCN layers can distinguish between parent and child nodes in a dependency tree when aggregating adjacent nodes. Furthermore, by increasing the number of GCN layers, the model can gradually collect high-order information of a dependency tree when propagating node information through the layers. We implement a sentence compression model for Korean and English, respectively. This model consists of three components: pre-trained BERT model, GCN layers, and a scoring layer. The scoring layer can determine whether a word should remain in a compressed sentence by relying on the word vector containing contextual and syntactic information encoded by BERT and GCN layers. To train and evaluate the proposed model, we used the Google sentence compression dataset for English and a Korean sentence compression corpus containing about 140,000 sentence pairs for Korean. The experimental results demonstrate that the proposed model achieves state-of-the-art performance for English. To the best of our knowledge, this sentence compression model based on the deep learning model trained with a large-scale corpus is the first attempt for Korean.


2020 ◽  
Vol 10 (12) ◽  
pp. 4081
Author(s):  
Zhe Wang ◽  
Chun-Hua Wu ◽  
Qing-Biao Li ◽  
Bo Yan ◽  
Kang-Feng Zheng

Personality recognition is a classic and important problem in social engineering. Due to the small number and particularity of personality recognition databases, only limited research has explored convolutional neural networks for this task. In this paper, we explore the use of graph convolutional network techniques for inferring a user’s personality traits from their Facebook status updates or essay information. Since the basic five personality traits (such as openness) and their aspects (such as status information) are related to a wide range of text features, this work takes the Big Five personality model as the core of the study. We construct a single user personality graph for the corpus based on user-document relations, document-word relations, and word co-occurrence and then learn the personality graph convolutional networks (personality GCN) for the user. The parameters or the inputs of our personality GCN are initialized with a one-hot representation for users, words and documents; then, under the supervision of users and documents with known class labels, it jointly learns the embeddings for users, words, and documents. We used feature information sharing to incorporate the correlation between the five personality traits into personality recognition to perfect the personality GCN. Our experimental results on two public and authoritative benchmark datasets show that the general personality GCN without any external word embeddings or knowledge is superior to the state-of-the-art methods for personality recognition. The personality GCN method is efficient on small datasets, and the average F1-score and accuracy of personality recognition are improved by up to approximately 3.6% and 2.4–2.57%, respectively.


2020 ◽  
Vol 2020 ◽  
pp. 1-9
Author(s):  
Fangyuan Lei ◽  
Xun Liu ◽  
Qingyun Dai ◽  
Bingo Wing-Kuen Ling ◽  
Huimin Zhao ◽  
...  

With the higher-order neighborhood information of a graph network, the accuracy of graph representation learning classification can be significantly improved. However, the current higher-order graph convolutional networks have a large number of parameters and high computational complexity. Therefore, we propose a hybrid lower-order and higher-order graph convolutional network (HLHG) learning model, which uses a weight sharing mechanism to reduce the number of network parameters. To reduce the computational complexity, we propose a novel information fusion pooling layer to combine the high-order and low-order neighborhood matrix information. We theoretically compare the computational complexity and the number of parameters of the proposed model with those of the other state-of-the-art models. Experimentally, we verify the proposed model on large-scale text network datasets using supervised learning and on citation network datasets using semisupervised learning. The experimental results show that the proposed model achieves higher classification accuracy with a small set of trainable weight parameters.


2021 ◽  
Author(s):  
Gabriel Andres Orellana ◽  
Javier Caceres-Delpiano ◽  
Roberto Ibañez ◽  
Leonardo Álvarez

The increasing integration between protein engineering and machine learning has led to many interesting results. A problem still to solve is to evaluate the likelihood that a sequence will fold into a target structure. This problem can be also viewed as sequence prediction from a known structure.In the current work, we propose improvements in the recent architecture of Geometric Vector Perceptrons in order to optimize the sampling of sequences from a known backbone structure. The proposed model differs from the original in that there is: (i) no updating in the vectorial embedding, only in the scalar one, (ii) only one layer of decoding. The first aspect improves the accuracy of the model and reduces the use of memory, the second allows for training of the model with several tasks without incurring data leakage.We treat the trained classifier as an Energy-Based Model and sample sequences by sampling amino acids in a non-autoreggresive manner in the empty positions of the sequence using energy-guided criteria and followed by Monte Carlo optimization.We improve the median identity of samples from 40.2% to 44.7%.An additional question worth investigating is whether sampled and original sequences fold into similar structures independent of their identity. We chose proteins in our test set whose sampled sequences show low identity (under 30%) but for which our model predicted favorable energies. We used trRosetta server and observed that the predicted structures for sampled sequences highly resemble the predicted structures for original sequences, with an average TM score of 0.848.


Author(s):  
Teng Jiang ◽  
Liang Gong ◽  
Yupu Yang

Attention-based encoder–decoder framework has greatly improved image caption generation tasks. The attention mechanism plays a transitional role by transforming static image features into sequential captions. To generate reasonable captions, it is of great significance to detect spatial characteristics of images. In this paper, we propose a spatial relational attention approach to consider spatial positions and attributes. Image features are firstly weighted by the attention mechanism. Then they are concatenated with contextual features to form a spatial–visual tensor. The tensor is feature extracted by a fully convolutional network to produce visual concepts for the decoder network. The fully convolutional layers maintain spatial topology of images. Experiments conducted on the three benchmark datasets, namely Flickr8k, Flickr30k and MSCOCO, demonstrate the effectiveness of our proposed approach. Captions generated by the spatial relational attention method precisely capture spatial relations of objects.


Author(s):  
Ihsan Ullah ◽  
Mario Manzo ◽  
Mitul Shah ◽  
Michael G. Madden

AbstractA graph can represent a complex organization of data in which dependencies exist between multiple entities or activities. Such complex structures create challenges for machine learning algorithms, particularly when combined with the high dimensionality of data in current applications. Graph convolutional networks were introduced to adopt concepts from deep convolutional networks (i.e. the convolutional operations/layers) that have shown good results. In this context, we propose two major enhancements to two of the existing graph convolutional network frameworks: (1) topological information enrichment through clustering coefficients; and (2) structural redesign of the network through the addition of dense layers. Furthermore, we propose minor enhancements using convex combinations of activation functions and hyper-parameter optimization. We present extensive results on four state-of-art benchmark datasets. We show that our approach achieves competitive results for three of the datasets and state-of-the-art results for the fourth dataset while having lower computational costs compared to competing methods.


2020 ◽  
Vol 34 (05) ◽  
pp. 8409-8416
Author(s):  
Xien Liu ◽  
Xinxin You ◽  
Xiao Zhang ◽  
Ji Wu ◽  
Ping Lv

Compared to sequential learning models, graph-based neural networks exhibit some excellent properties, such as ability capturing global information. In this paper, we investigate graph-based neural networks for text classification problem. A new framework TensorGCN (tensor graph convolutional networks), is presented for this task. A text graph tensor is firstly constructed to describe semantic, syntactic, and sequential contextual information. Then, two kinds of propagation learning perform on the text graph tensor. The first is intra-graph propagation used for aggregating information from neighborhood nodes in a single graph. The second is inter-graph propagation used for harmonizing heterogeneous information between graphs. Extensive experiments are conducted on benchmark datasets, and the results illustrate the effectiveness of our proposed framework. Our proposed TensorGCN presents an effective way to harmonize and integrate heterogeneous information from different kinds of graphs.


Sign in / Sign up

Export Citation Format

Share Document