Document-level Relation Extraction as Semantic Segmentation

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/551 ◽

2021 ◽

Author(s):

Ningyu Zhang ◽

Xiang Chen ◽

Xin Xie ◽

Shumin Deng ◽

Chuanqi Tan ◽

...

Keyword(s):

Computer Vision ◽

State Of The Art ◽

Relation Extraction ◽

Semantic Segmentation ◽

Experimental Results ◽

Context Information ◽

Global Information ◽

Benchmark Datasets ◽

Segmentation Task ◽

Document Level

Document-level relation extraction aims to extract relations among multiple entity pairs from a document. Previously proposed graph-based or transformer-based models utilize the entities independently, regardless of global information among relational triples. This paper approaches the problem by predicting an entity-level relation matrix to capture local and global information, parallel to the semantic segmentation task in computer vision. Herein, we propose a Document U-shaped Network for document-level relation extraction. Specifically, we leverage an encoder module to capture the context information of entities and a U-shaped segmentation module over the image-style feature map to capture global interdependency among triples. Experimental results show that our approach can obtain state-of-the-art performance on three benchmark datasets DocRED, CDR, and GDA.

Download Full-text

SceneEncoder: Scene-Aware Semantic Segmentation of Point Clouds with A Learnable Scene Descriptor

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/84 ◽

2020 ◽

Cited By ~ 1

Author(s):

Jiachen Xu ◽

Jingyu Gong ◽

Jie Zhou ◽

Xin Tan ◽

Yuan Xie ◽

...

Keyword(s):

Essential Role ◽

State Of The Art ◽

Semantic Segmentation ◽

Point Clouds ◽

Local Features ◽

Local Region ◽

Global Information ◽

Benchmark Datasets ◽

Distinguishing Features ◽

Point Level

Besides local features, global information plays an essential role in semantic segmentation, while recent works usually fail to explicitly extract the meaningful global information and make full use of it. In this paper, we propose a SceneEncoder module to impose a scene-aware guidance to enhance the effect of global information. The module predicts a scene descriptor, which learns to represent the categories of objects existing in the scene and directly guides the point-level semantic segmentation through filtering out categories not belonging to this scene. Additionally, to alleviate segmentation noise in local region, we design a region similarity loss to propagate distinguishing features to their own neighboring points with the same label, leading to the enhancement of the distinguishing ability of point-wise features. We integrate our methods into several prevailing networks and conduct extensive experiments on benchmark datasets ScanNet and ShapeNet. Results show that our methods greatly improve the performance of baselines and achieve state-of-the-art performance.

Download Full-text

Attention-Based Multi-Context Guiding for Few-Shot Semantic Segmentation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33018441 ◽

2019 ◽

Vol 33 ◽

pp. 8441-8448 ◽

Cited By ~ 14

Author(s):

Tao Hu ◽

Pengwan Yang ◽

Chiliang Zhang ◽

Gang Yu ◽

Yadong Mu ◽

...

Keyword(s):

Deep Learning ◽

Feature Fusion ◽

State Of The Art ◽

Semantic Segmentation ◽

Research Topic ◽

Context Information ◽

Multi Scale ◽

Support Set ◽

Segmentation Task ◽

Context Features

Few-shot learning is a nascent research topic, motivated by the fact that traditional deep learning methods require tremendous amounts of data. The scarcity of annotated data becomes even more challenging in semantic segmentation since pixellevel annotation in segmentation task is more labor-intensive to acquire. To tackle this issue, we propose an Attentionbased Multi-Context Guiding (A-MCG) network, which consists of three branches: the support branch, the query branch, the feature fusion branch. A key differentiator of A-MCG is the integration of multi-scale context features between support and query branches, enforcing a better guidance from the support set. In addition, we also adopt a spatial attention along the fusion branch to highlight context information from several scales, enhancing self-supervision in one-shot learning. To address the fusion problem in multi-shot learning, Conv-LSTM is adopted to collaboratively integrate the sequential support features to elevate the final accuracy. Our architecture obtains state-of-the-art on unseen classes in a variant of PASCAL VOC12 dataset and performs favorably against previous work with large gains of 1.1%, 1.4% measured in mIoU in the 1-shot and 5-shot setting.

Download Full-text

Named Entity Recognition and Relation Extraction

ACM Computing Surveys ◽

10.1145/3445965 ◽

2021 ◽

Vol 54 (1) ◽

pp. 1-39

Author(s):

Zara Nasar ◽

Syed Waqar Jaffry ◽

Muhammad Kamran Malik

Keyword(s):

Deep Learning ◽

State Of The Art ◽

Named Entity Recognition ◽

Relation Extraction ◽

The State ◽

Entity Recognition ◽

Joint Models ◽

Named Entity ◽

Textual Data ◽

Benchmark Datasets

With the advent of Web 2.0, there exist many online platforms that result in massive textual-data production. With ever-increasing textual data at hand, it is of immense importance to extract information nuggets from this data. One approach towards effective harnessing of this unstructured textual data could be its transformation into structured text. Hence, this study aims to present an overview of approaches that can be applied to extract key insights from textual data in a structured way. For this, Named Entity Recognition and Relation Extraction are being majorly addressed in this review study. The former deals with identification of named entities, and the latter deals with problem of extracting relation between set of entities. This study covers early approaches as well as the developments made up till now using machine learning models. Survey findings conclude that deep-learning-based hybrid and joint models are currently governing the state-of-the-art. It is also observed that annotated benchmark datasets for various textual-data generators such as Twitter and other social forums are not available. This scarcity of dataset has resulted into relatively less progress in these domains. Additionally, the majority of the state-of-the-art techniques are offline and computationally expensive. Last, with increasing focus on deep-learning frameworks, there is need to understand and explain the under-going processes in deep architectures.

Download Full-text

Random Forest with Adaptive Local Template for Pedestrian Detection

Mathematical Problems in Engineering ◽

10.1155/2015/767423 ◽

2015 ◽

Vol 2015 ◽

pp. 1-11 ◽

Cited By ~ 2

Author(s):

Tao Xiang ◽

Tao Li ◽

Mao Ye ◽

Zijian Liu

Keyword(s):

Computer Vision ◽

Random Forest ◽

Classification Accuracy ◽

Template Matching ◽

Detection Method ◽

State Of The Art ◽

Pedestrian Detection ◽

Sliding Window ◽

Experimental Results ◽

Training Samples

Pedestrian detection with large intraclass variations is still a challenging task in computer vision. In this paper, we propose a novel pedestrian detection method based on Random Forest. Firstly, we generate a few local templates with different sizes and different locations in positive exemplars. Then, the Random Forest is built whose splitting functions are optimized by maximizing class purity of matching the local templates to the training samples, respectively. To improve the classification accuracy, we adopt a boosting-like algorithm to update the weights of the training samples in a layer-wise fashion. During detection, the trained Random Forest will vote the category when a sliding window is input. Our contributions are the splitting functions based on local template matching with adaptive size and location and iteratively weight updating method. We evaluate the proposed method on 2 well-known challenging datasets: TUD pedestrians and INRIA pedestrians. The experimental results demonstrate that our method achieves state-of-the-art or competitive performance.

Download Full-text

Adaptive Context Encoding Module for Semantic Segmentation

Electronic Imaging ◽

10.2352/issn.2470-1173.2020.10.ipas-027 ◽

2020 ◽

Vol 2020 (10) ◽

pp. 27-1-27-7

Author(s):

Congcong Wang ◽

Faouzi Alaya Cheikh ◽

Azeddine Beghdadi ◽

Ole Jakob Elle

Keyword(s):

Neural Networks ◽

State Of The Art ◽

Experimental Studies ◽

Semantic Segmentation ◽

Multiple Scale ◽

Context Information ◽

Convolution Operation ◽

Sampling Locations ◽

Spatial Pyramid Pooling ◽

Spatial Pyramid

The object sizes in images are diverse, therefore, capturing multiple scale context information is essential for semantic segmentation. Existing context aggregation methods such as pyramid pooling module (PPM) and atrous spatial pyramid pooling (ASPP) employ different pooling size or atrous rate, such that multiple scale information is captured. However, the pooling sizes and atrous rates are chosen empirically. Rethinking of ASPP leads to our observation that learnable sampling locations of the convolution operation can endow the network learnable fieldof- view, thus the ability of capturing object context information adaptively. Following this observation, in this paper, we propose an adaptive context encoding (ACE) module based on deformable convolution operation where sampling locations of the convolution operation are learnable. Our ACE module can be embedded into other Convolutional Neural Networks (CNNs) easily for context aggregation. The effectiveness of the proposed module is demonstrated on Pascal-Context and ADE20K datasets. Although our proposed ACE only consists of three deformable convolution blocks, it outperforms PPM and ASPP in terms of mean Intersection of Union (mIoU) on both datasets. All the experimental studies confirm that our proposed module is effective compared to the state-of-the-art methods.

Download Full-text

A Span-based Joint Model for Opinion Target Extraction and Target Sentiment Classification

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/762 ◽

2019 ◽

Cited By ~ 1

Author(s):

Yan Zhou ◽

Longtao Huang ◽

Tao Guo ◽

Jizhong Han ◽

Songlin Hu

Keyword(s):

Sentiment Analysis ◽

State Of The Art ◽

Joint Model ◽

The State ◽

Attention Mechanism ◽

Sentiment Classification ◽

Global Information ◽

Target Extraction ◽

Benchmark Datasets ◽

Tagging Methods

Target-Based Sentiment Analysis aims at extracting opinion targets and classifying the sentiment polarities expressed on each target. Recently, token based sequence tagging methods have been successfully applied to jointly solve the two tasks, which aims to predict a tag for each token. Since they do not treat a target containing several words as a whole, it might be difficult to make use of the global information to identify that opinion target, leading to incorrect extraction. Independently predicting the sentiment for each token may also lead to sentiment inconsistency for different words in an opinion target. In this paper, inspired by span-based methods in NLP, we propose a simple and effective joint model to conduct extraction and classification at span level rather than token level. Our model first emulates spans with one or more tokens and learns their representation based on the tokens inside. And then, a span-aware attention mechanism is designed to compute the sentiment information towards each span. Extensive experiments on three benchmark datasets show that our model consistently outperforms the state-of-the-art methods.

Download Full-text

Self-Ensembling Attention Networks: Addressing Domain Shift for Semantic Segmentation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33015581 ◽

2019 ◽

Vol 33 ◽

pp. 5581-5588 ◽

Cited By ~ 3

Author(s):

Yonghao Xu ◽

Bo Du ◽

Lefei Zhang ◽

Qian Zhang ◽

Guoli Wang ◽

...

Keyword(s):

Domain Adaptation ◽

State Of The Art ◽

Semantic Segmentation ◽

Great Success ◽

Learning Models ◽

Target Domain ◽

Attention Networks ◽

Source Domain ◽

Benchmark Datasets ◽

Different Levels

Recent years have witnessed the great success of deep learning models in semantic segmentation. Nevertheless, these models may not generalize well to unseen image domains due to the phenomenon of domain shift. Since pixel-level annotations are laborious to collect, developing algorithms which can adapt labeled data from source domain to target domain is of great significance. To this end, we propose self-ensembling attention networks to reduce the domain gap between different datasets. To the best of our knowledge, the proposed method is the first attempt to introduce selfensembling model to domain adaptation for semantic segmentation, which provides a different view on how to learn domain-invariant features. Besides, since different regions in the image usually correspond to different levels of domain gap, we introduce the attention mechanism into the proposed framework to generate attention-aware features, which are further utilized to guide the calculation of consistency loss in the target domain. Experiments on two benchmark datasets demonstrate that the proposed framework can yield competitive performance compared with the state of the art methods.

Download Full-text

Bilateral Multi-Perspective Matching for Natural Language Sentences

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/579 ◽

2017 ◽

Cited By ~ 92

Author(s):

Zhiguo Wang ◽

Wael Hamza ◽

Radu Florian

Keyword(s):

Natural Language ◽

State Of The Art ◽

The State ◽

Experimental Results ◽

The Other ◽

Multiple Perspectives ◽

Time Step ◽

Benchmark Datasets ◽

Sentence Matching ◽

Fully Connected

Natural language sentence matching is a fundamental technology for a variety of tasks. Previous approaches either match sentences from a single direction or only apply single granular (word-by-word or sentence-by-sentence) matching. In this work, we propose a bilateral multi-perspective matching (BiMPM) model. Given two sentences P and Q, our model first encodes them with a BiLSTM encoder. Next, we match the two encoded sentences in two directions P against Q and P against Q. In each matching direction, each time step of one sentence is matched against all time-steps of the other sentence from multiple perspectives. Then, another BiLSTM layer is utilized to aggregate the matching results into a fix-length matching vector. Finally, based on the matching vector, a decision is made through a fully connected layer. We evaluate our model on three tasks: paraphrase identification, natural language inference and answer sentence selection. Experimental results on standard benchmark datasets show that our model achieves the state-of-the-art performance on all tasks.

Download Full-text

A Co-Embedding Model with Variational Auto-Encoder for Knowledge Graphs

Applied Sciences ◽

10.3390/app12020715 ◽

2022 ◽

Vol 12 (2) ◽

pp. 715

Author(s):

Luodi Xie ◽

Huimin Huang ◽

Qing Du

Keyword(s):

State Of The Art ◽

Relation Extraction ◽

Semantic Space ◽

Knowledge Graph ◽

High Quality ◽

Gaussian Distributions ◽

Benchmark Datasets ◽

Semantic Spaces ◽

Knowledge Graphs ◽

Low Dimensional

Knowledge graph (KG) embedding has been widely studied to obtain low-dimensional representations for entities and relations. It serves as the basis for downstream tasks, such as KG completion and relation extraction. Traditional KG embedding techniques usually represent entities/relations as vectors or tensors, mapping them in different semantic spaces and ignoring the uncertainties. The affinities between entities and relations are ambiguous when they are not embedded in the same latent spaces. In this paper, we incorporate a co-embedding model for KG embedding, which learns low-dimensional representations of both entities and relations in the same semantic space. To address the issue of neglecting uncertainty for KG components, we propose a variational auto-encoder that represents KG components as Gaussian distributions. In addition, compared with previous methods, our method has the advantages of high quality and interpretability. Our experimental results on several benchmark datasets demonstrate our model’s superiority over the state-of-the-art baselines.

Download Full-text

PlaneNet: an efficient local feature extraction network

PeerJ Computer Science ◽

10.7717/peerj-cs.783 ◽

2021 ◽

Vol 7 ◽

pp. e783

Author(s):

Bin Lin ◽

Houcheng Su ◽

Danyang Li ◽

Ao Feng ◽

Hongxiang Li ◽

...

Keyword(s):

Mobile Devices ◽

State Of The Art ◽

Computing Time ◽

Semantic Segmentation ◽

Vital Role ◽

Light Weight ◽

Practical Application ◽

Classification Tasks ◽

Segmentation Task ◽

Local Feature Extraction

Due to memory and computing resources limitations, deploying convolutional neural networks on embedded and mobile devices is challenging. However, the redundant use of the 1 × 1 convolution in traditional light-weight networks, such as MobileNetV1, has increased the computing time. By utilizing the 1 × 1 convolution that plays a vital role in extracting local features more effectively, a new lightweight network, named PlaneNet, is introduced. PlaneNet can improve the accuracy and reduce the numbers of parameters and multiply-accumulate operations (Madds). Our model is evaluated on classification and semantic segmentation tasks. In the classification tasks, the CIFAR-10, Caltech-101, and ImageNet2012 datasets are used. In the semantic segmentation task, PlaneNet is tested on the VOC2012 datasets. The experimental results demonstrate that PlaneNet (74.48%) can obtain higher accuracy than MobileNetV3-Large (73.99%) and GhostNet (72.87%) and achieves state-of-the-art performance with fewer network parameters in both tasks. In addition, compared with the existing models, it has reached the practical application level on mobile devices. The code of PlaneNet on GitHub: https://github.com/LinB203/planenet.

Download Full-text