Relation classification via BERT with piecewise convolution and focal loss

Recent relation extraction models’ architecture are evolved from the shallow neural networks to natural language model, such as convolutional neural networks or recurrent neural networks to Bert. However, these methods did not consider the semantic information in the sequence or the distance dependence problem, the internal semantic information may contain the useful knowledge which can help relation classification. Focus on these problems, this paper proposed a BERT-based relation classification method. Compare with the existing Bert-based architecture, the proposed model can obtain the internal semantic information between entity pair and solve the distance semantic dependence better. The pre-trained BERT model after fine tuning is used in this paper to abstract the semantic representation of sequence, then adopt the piecewise convolution to obtain semantic information which influence the extraction results. Compare with the existing methods, the proposed method can achieve a better accuracy on relational extraction task because of the internal semantic information extracted in the sequence. While, the generalization ability is still a problem that cannot be ignored, and the numbers of the relationships are difference between different categories. In this paper, the focal loss function is adopted to solve this problem by assigning a heavy weight to less number or hard classify categories. Finally, comparing with the existing methods, the F1 metric of the proposed method can reach a superior result 89.95% on the SemEval-2010 Task 8 dataset.

Download Full-text

Forecasting Economy-Related Data Utilizing Weight-Constrained Recurrent Neural Networks

Algorithms ◽

10.3390/a12040085 ◽

2019 ◽

Vol 12 (4) ◽

pp. 85 ◽

Cited By ~ 9

Author(s):

Ioannis E. Livieris

Keyword(s):

Neural Networks ◽

Recurrent Neural Networks ◽

Training Algorithm ◽

Economic Data ◽

Classification Problems ◽

Useful Knowledge ◽

Related Data ◽

Limited Memory ◽

Proposed Model ◽

Projection Strategy

During the last few decades, machine learning has constituted a significant tool in extracting useful knowledge from economic data for assisting decision-making. In this work, we evaluate the performance of weight-constrained recurrent neural networks in forecasting economic classification problems. These networks are efficiently trained with a recently-proposed training algorithm, which has two major advantages. Firstly, it exploits the numerical efficiency and very low memory requirements of the limited memory BFGS matrices; secondly, it utilizes a gradient-projection strategy for handling the bounds on the weights. The reported numerical experiments present the classification accuracy of the proposed model, providing empirical evidence that the application of the bounds on the weights of the recurrent neural network provides more stable and reliable learning.

Download Full-text

Antonym-Synonym Classification Based on New Sub-Space Embeddings

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33016204 ◽

2019 ◽

Vol 33 ◽

pp. 6204-6211

Author(s):

Muhammad Asif Ali ◽

Yifang Sun ◽

Xiaoling Zhou ◽

Wei Wang ◽

Xiang Zhao

Keyword(s):

Large Scale ◽

Semantic Information ◽

Relation Extraction ◽

Specific Information ◽

Lexical Semantic ◽

Word Level ◽

Novel Approach ◽

Proposed Model ◽

Low Performance ◽

And Performance

Distinguishing antonyms from synonyms is a key challenge for many NLP applications focused on the lexical-semantic relation extraction. Existing solutions relying on large-scale corpora yield low performance because of huge contextual overlap of antonym and synonym pairs. We propose a novel approach entirely based on pre-trained embeddings. We hypothesize that the pre-trained embeddings comprehend a blend of lexical-semantic information and we may distill the task-specific information using Distiller, a model proposed in this paper. Later, a classifier is trained based on features constructed from the distilled sub-spaces along with some word level features to distinguish antonyms from synonyms. Experimental results show that the proposed model outperforms existing research on antonym synonym distinction in both speed and performance.

Download Full-text

A CNN Model for Human Parsing Based on Capacity Optimization

Applied Sciences ◽

10.3390/app9071330 ◽

2019 ◽

Vol 9 (7) ◽

pp. 1330 ◽

Cited By ~ 1

Author(s):

Yalong Jiang ◽

Zheru Chi

Keyword(s):

Neural Networks ◽

Computational Efficiency ◽

Semantic Information ◽

State Of The Art ◽

Depth Estimation ◽

Baseline Model ◽

Computational Burden ◽

Proposed Model ◽

Saliency Prediction ◽

Benchmark Solutions

Although a state-of-the-art performance has been achieved in pixel-specific tasks, such as saliency prediction and depth estimation, convolutional neural networks (CNNs) still perform unsatisfactorily in human parsing where semantic information of detailed regions needs to be perceived under the influences of variations in viewpoints, poses, and occlusions. In this paper, we propose to improve the robustness of human parsing modules by introducing a depth-estimation module. A novel scheme is proposed for the integration of a depth-estimation module and a human-parsing module. The robustness of the overall model is improved with the automatically obtained depth labels. As another major concern, the computational efficiency is also discussed. Our proposed human parsing module with 24 layers can achieve a similar performance as the baseline CNN model with over 100 layers. The number of parameters in the overall model is less than that in the baseline model. Furthermore, we propose to reduce the computational burden by replacing a conventional CNN layer with a stack of simplified sub-layers to further reduce the overall number of trainable parameters. Experimental results show that the integration of two modules contributes to the improvement of human parsing without additional human labeling. The proposed model outperforms the benchmark solutions and the capacity of our model is better matched to the complexity of the task.

Download Full-text

Distant Supervision for Relation Extraction with Sentence Selection and Interaction Representation

Wireless Communications and Mobile Computing ◽

10.1155/2021/8889075 ◽

2021 ◽

Vol 2021 ◽

pp. 1-16

Author(s):

Tiantian Chen ◽

Nianbin Wang ◽

Hongbin Wang ◽

Haomin Zhan

Keyword(s):

Large Scale ◽

Semantic Information ◽

State Of The Art ◽

Relation Extraction ◽

Semantic Features ◽

Distant Supervision ◽

Word Level ◽

Proposed Model ◽

Relation Prediction ◽

Better Than

Distant supervision (DS) has been widely used for relation extraction (RE), which automatically generates large-scale labeled data. However, there is a wrong labeling problem, which affects the performance of RE. Besides, the existing method suffers from the lack of useful semantic features for some positive training instances. To address the above problems, we propose a novel RE model with sentence selection and interaction representation for distantly supervised RE. First, we propose a pattern method based on the relation trigger words as a sentence selector to filter out noisy sentences to alleviate the wrong labeling problem. After clean instances are obtained, we propose the interaction representation using the word-level attention mechanism-based entity pairs to dynamically increase the weights of the words related to entity pairs, which can provide more useful semantic information for relation prediction. The proposed model outperforms the strongest baseline by 2.61 in F1-score on a widely used dataset, which proves that our model performs significantly better than the state-of-the-art RE systems.

Download Full-text

Distilling Knowledge from Well-Informed Soft Labels for Neural Relation Extraction

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6509 ◽

2020 ◽

Vol 34 (05) ◽

pp. 9620-9627 ◽

Cited By ~ 1

Author(s):

Zhenyu Zhang ◽

Xiaobo Shu ◽

Bowen Yu ◽

Tingwen Liu ◽

Jiapeng Zhao ◽

...

Keyword(s):

Neural Networks ◽

Bipartite Graph ◽

Prior Knowledge ◽

Semantic Information ◽

Relation Extraction ◽

Important Task ◽

Experimental Results ◽

Plain Text ◽

Knowledge Distillation ◽

The Rich

Extracting relations from plain text is an important task with wide application. Most existing methods formulate it as a supervised problem and utilize one-hot hard labels as the sole target in training, neglecting the rich semantic information among relations. In this paper, we aim to explore the supervision with soft labels in relation extraction, which makes it possible to integrate prior knowledge. Specifically, a bipartite graph is first devised to discover type constraints between entities and relations based on the entire corpus. Then, we combine such type constraints with neural networks to achieve a knowledgeable model. Furthermore, this model is regarded as teacher to generate well-informed soft labels and guide the optimization of a student network via knowledge distillation. Besides, a multi-aspect attention mechanism is introduced to help student mine latent information from text. In this way, the enhanced student inherits the dark knowledge (e.g., type constraints and relevance among relations) from teacher, and directly serves the testing scenarios without any extra constraints. We conduct extensive experiments on the TACRED and SemEval datasets, the experimental results justify the effectiveness of our approach.

Download Full-text

Continuous Similarity Learning with Shared Neural Semantic Representation for Joint Event Detection and Evolution

Computational Intelligence and Neuroscience ◽

10.1155/2020/8859407 ◽

2020 ◽

Vol 2020 ◽

pp. 1-13

Author(s):

Pengpeng Zhou ◽

Yao Luo ◽

Nianwen Ning ◽

Zhen Cao ◽

Bingjing Jia ◽

...

Keyword(s):

Neural Networks ◽

Event Detection ◽

Semantic Representation ◽

Rapid Development ◽

Experimental Results ◽

Similarity Metrics ◽

Similarity Learning ◽

Proposed Model ◽

Joint Training ◽

Joint Event

In the era of the rapid development of today’s Internet, people often feel overwhelmed by vast official news streams or unofficial self-media tweets. To help people obtain the news topics they care about, there is a growing need for systems that can extract important events from this amount of data and construct the evolution procedure of events logically into a story. Most existing methods treat event detection and evolution as two independent subtasks under an integrated pipeline setting. However, the interdependence between these two subtasks is often ignored, which leads to a biased propagation. Besides, due to the limitations of news documents’ semantic representation, the performance of event detection and evolution is still limited. To tackle these problems, we propose a Joint Event Detection and Evolution (JEDE) model, to detect events and discover the event evolution relationships from news streams in this paper. Specifically, the proposed JEDE model is built upon the Siamese network, which first introduces the bidirectional GRU attention network to learn the vector-based semantic representation for news documents shared across two subtask networks. Then, two continuous similarity metrics are learned using stacked neural networks to judge whether two news documents are related to the same event or two events are related to the same story. Furthermore, due to the limited available dataset with ground truths, we make efforts to construct a new dataset, named EDENS, which contains valid labels of events and stories. The experimental results on this newly created dataset demonstrate that, thanks to the shared representation and joint training, the proposed model consistently achieves significant improvements over the baseline methods.

Download Full-text

Document Re-Ranking Model for Machine-Reading and Comprehension

Applied Sciences ◽

10.3390/app10217547 ◽

2020 ◽

Vol 10 (21) ◽

pp. 7547

Author(s):

Youngjin Jang ◽

Harksoo Kim

Keyword(s):

Neural Networks ◽

Artificial Neural Networks ◽

High Performance ◽

Language Model ◽

Text Retrieval ◽

Retrieval Models ◽

Similarity Network ◽

Proposed Model ◽

Ranking Model ◽

Machine Reading

Recently, the performance of machine-reading and comprehension (MRC) systems has been significantly enhanced. However, MRC systems require high-performance text retrieval models because text passages containing answer phrases should be prepared in advance. To improve the performance of text retrieval models underlying MRC systems, we propose a re-ranking model, based on artificial neural networks, that is composed of a query encoder, a passage encoder, a phrase modeling layer, an attention layer, and a similarity network. The proposed model learns degrees of associations between queries and text passages through dot products between phrases that constitute questions and passages. In experiments with the MS-MARCO dataset, the proposed model demonstrated higher mean reciprocal ranks (MRRs), 0.8%p–13.2%p, than most of the previous models, except for the models based on BERT (a pre-trained language model). Although the proposed model demonstrated lower MRRs than the BERT-based models, it was approximately 8 times lighter and 3.7 times faster than the BERT-based models.

Download Full-text

Augment BERT with average pooling layer for Chinese summary generation

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-211229 ◽

2021 ◽

pp. 1-10

Author(s):

Shuai Zhao ◽

Fucheng You ◽

Wen Chang ◽

Tianyu Zhang ◽

Man Hu

Keyword(s):

Experimental Data ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Semantic Information ◽

Chinese Language ◽

Language Model ◽

Fine Tuning ◽

Generation Model ◽

Expected Effect

The BERT pre-trained language model has achieved good results in various subtasks of natural language processing, but its performance in generating Chinese summaries is not ideal. The most intuitive reason is that the BERT model is based on character-level composition, while the Chinese language is mostly in the form of phrases. Directly fine-tuning the BERT model cannot achieve the expected effect. This paper proposes a novel summary generation model with BERT augmented by the pooling layer. In our model, we perform an average pooling operation on token embedding to improve the model’s ability to capture phrase-level semantic information. We use LCSTS and NLPCC2017 to verify our proposed method. Experimental data shows that the average pooling model’s introduction can effectively improve the generated summary quality. Furthermore, different data needs to be set with varying pooling kernel sizes to achieve the best results through comparative analysis. In addition, our proposed method has strong generalizability. It can be applied not only to the task of generating summaries, but also to other natural language processing tasks.

Download Full-text

Joint Entity and Relation Extraction with a Hybrid Transformer and Reinforcement Learning Based Model

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6471 ◽

2020 ◽

Vol 34 (05) ◽

pp. 9314-9321

Author(s):

Ya Xiao ◽

Chengxiang Tan ◽

Zhijie Fan ◽

Qian Xu ◽

Wenye Zhu

Keyword(s):

Reinforcement Learning ◽

Noisy Data ◽

Relation Extraction ◽

Input State ◽

Data Filtering ◽

The Public ◽

Distant Supervision ◽

Proposed Model ◽

Public Dataset ◽

Relation Classification

Joint extraction of entities and relations is a task that extracts the entity mentions and semantic relations between entities from the unstructured texts with one single model. Existing entity and relation extraction datasets usually rely on distant supervision methods which cannot identify the corresponding relations between a relation and the sentence, thus suffers from noisy labeling problem. We propose a hybrid deep neural network model to jointly extract the entities and relations, and the model is also capable of filtering noisy data. The hybrid model contains a transformer-based encoding layer, an LSTM entity detection module and a reinforcement learning-based relation classification module. The output of the transformer encoder and the entity embedding generated from the entity detection module are combined as the input state of the reinforcement learning module to improve the relation classification and noisy data filtering. We conduct experiments on the public dataset produced by the distant supervision method to verify the effectiveness of our proposed model. Different experimental results show that our model gains better performance on entity and relation extraction than the compared methods and also has the ability to filter noisy sentences.

Download Full-text

Targeted Sentiment Classification Based on Attentional Encoding and Graph Convolutional Networks

Applied Sciences ◽

10.3390/app10030957 ◽

2020 ◽

Vol 10 (3) ◽

pp. 957 ◽

Cited By ~ 3

Author(s):

Luwei Xiao ◽

Xiaohui Hu ◽

Yinong Chen ◽

Yun Xue ◽

Donghong Gu ◽

...

Keyword(s):

Neural Networks ◽

Semantic Information ◽

Sentiment Classification ◽

Convolutional Network ◽

Convolutional Networks ◽

Dependency Tree ◽

Proposed Model ◽

Syntactic Information ◽

Specific Goal ◽

State Of Art

Targeted sentiment classification aims to predict the emotional trend of a specific goal. Currently, most methods (e.g., recurrent neural networks and convolutional neural networks combined with an attention mechanism) are not able to fully capture the semantic information of the context and they also lack a mechanism to explain the relevant syntactical constraints and long-range word dependencies. Therefore, syntactically irrelevant context words may mistakenly be recognized as clues to predict the target sentiment. To tackle these problems, this paper considers that the semantic information, syntactic information, and their interaction information are very crucial to targeted sentiment analysis, and propose an attentional-encoding-based graph convolutional network (AEGCN) model. Our proposed model is mainly composed of multi-head attention and an improved graph convolutional network built over the dependency tree of a sentence. Pre-trained BERT is applied to this task, and new state-of-art performance is achieved. Experiments on five datasets show the effectiveness of the model proposed in this paper compared with a series of the latest models.

Download Full-text