Multi-Level Head-Wise Match and Aggregation in Transformer for Textual Sequence Matching

Shuohang Wang; Yunshi Lan; Yi Tay; Jing Jiang; Jingjing Liu

doi:10.1609/aaai.v34i05.6458

Multi-Level Head-Wise Match and Aggregation in Transformer for Textual Sequence Matching

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6458 ◽

2020 ◽

Vol 34 (05) ◽

pp. 9209-9216

Author(s):

Shuohang Wang ◽

Yunshi Lan ◽

Yi Tay ◽

Jing Jiang ◽

Jingjing Liu

Keyword(s):

Language Processing ◽

State Of The Art ◽

Vector Representation ◽

Sequence Pair ◽

Sequence Matching ◽

New Approach ◽

Multiple Tasks ◽

Pair Matching ◽

Multi Level ◽

Multiple Levels

Transformer has been successfully applied to many natural language processing tasks. However, for textual sequence matching, simple matching between the representation of a pair of sequences might bring in unnecessary noise. In this paper, we propose a new approach to sequence pair matching with Transformer, by learning head-wise matching representations on multiple levels. Experiments show that our proposed approach can achieve new state-of-the-art performance on multiple tasks that rely only on pre-computed sequence-vector-representation, such as SNLI, MNLI-match, MNLI-mismatch, QQP, and SQuAD-binary.

Download Full-text

Enriching Word Vectors with Subword Information

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00051 ◽

2017 ◽

Vol 5 ◽

pp. 135-146 ◽

Cited By ~ 1156

Author(s):

Piotr Bojanowski ◽

Edouard Grave ◽

Armand Joulin ◽

Tomas Mikolov

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

State Of The Art ◽

Training Data ◽

Vector Representation ◽

New Approach ◽

Word Similarity ◽

Art Performance ◽

N Gram

Continuous word representations, trained on large unlabeled corpora are useful for many natural language processing tasks. Popular models that learn such representations ignore the morphology of words, by assigning a distinct vector to each word. This is a limitation, especially for languages with large vocabularies and many rare words. In this paper, we propose a new approach based on the skipgram model, where each word is represented as a bag of character n-grams. A vector representation is associated to each character n-gram; words being represented as the sum of these representations. Our method is fast, allowing to train models on large corpora quickly and allows us to compute word representations for words that did not appear in the training data. We evaluate our word representations on nine different languages, both on word similarity and analogy tasks. By comparing to recently proposed morphological word representations, we show that our vectors achieve state-of-the-art performance on these tasks.

Download Full-text

M2Det: A Single-Shot Object Detector Based on Multi-Level Feature Pyramid Network

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33019259 ◽

2019 ◽

Vol 33 ◽

pp. 9259-9266 ◽

Cited By ~ 78

Author(s):

Qijie Zhao ◽

Tao Sheng ◽

Yongtao Wang ◽

Zhi Tang ◽

Ying Chen ◽

...

Keyword(s):

Feature Fusion ◽

State Of The Art ◽

Single Shot ◽

Multi Scale ◽

One Stage ◽

Single Scale ◽

Feature Pyramid ◽

Multi Level ◽

Multiple Levels ◽

Inference Strategy

Feature pyramids are widely exploited by both the state-of-the-art one-stage object detectors (e.g., DSSD, RetinaNet, RefineDet) and the two-stage object detectors (e.g., Mask RCNN, DetNet) to alleviate the problem arising from scale variation across object instances. Although these object detectors with feature pyramids achieve encouraging results, they have some limitations due to that they only simply construct the feature pyramid according to the inherent multiscale, pyramidal architecture of the backbones which are originally designed for object classification task. Newly, in this work, we present Multi-Level Feature Pyramid Network (MLFPN) to construct more effective feature pyramids for detecting objects of different scales. First, we fuse multi-level features (i.e. multiple layers) extracted by backbone as the base feature. Second, we feed the base feature into a block of alternating joint Thinned U-shape Modules and Feature Fusion Modules and exploit the decoder layers of each Ushape module as the features for detecting objects. Finally, we gather up the decoder layers with equivalent scales (sizes) to construct a feature pyramid for object detection, in which every feature map consists of the layers (features) from multiple levels. To evaluate the effectiveness of the proposed MLFPN, we design and train a powerful end-to-end one-stage object detector we call M2Det by integrating it into the architecture of SSD, and achieve better detection performance than state-of-the-art one-stage detectors. Specifically, on MSCOCO benchmark, M2Det achieves AP of 41.0 at speed of 11.8 FPS with single-scale inference strategy and AP of 44.2 with multi-scale inference strategy, which are the new stateof-the-art results among one-stage detectors. The code will be made available on https://github.com/qijiezhao/M2Det.

Download Full-text

A Polarity Capturing Sphere for Word to Vector Representation

Applied Sciences ◽

10.3390/app10124386 ◽

2020 ◽

Vol 10 (12) ◽

pp. 4386 ◽

Cited By ~ 1

Author(s):

Sandra Rizkallah ◽

Amir F. Atiya ◽

Samir Shaheen

Keyword(s):

Natural Language Processing ◽

Language Processing ◽

State Of The Art ◽

Unrelated Word ◽

Research Field ◽

Word Embedding ◽

Vector Representation ◽

Active Research ◽

Embedding Methods ◽

Better Than

Embedding words from a dictionary as vectors in a space has become an active research field, due to its many uses in several natural language processing applications. Distances between the vectors should reflect the relatedness between the corresponding words. The problem with existing word embedding methods is that they often fail to distinguish between synonymous, antonymous, and unrelated word pairs. Meanwhile, polarity detection is crucial for applications such as sentiment analysis. In this work we propose an embedding approach that is designed to capture the polarity issue. The approach is based on embedding the word vectors into a sphere, whereby the dot product between any vectors represents the similarity. Vectors corresponding to synonymous words would be close to each other on the sphere, while a word and its antonym would lie at opposite poles of the sphere. The approach used to design the vectors is a simple relaxation algorithm. The proposed word embedding is successful in distinguishing between synonyms, antonyms, and unrelated word pairs. It achieves results that are better than those of some of the state-of-the-art techniques and competes well with the others.

Download Full-text

Reinforcement Learning Experience Reuse with Policy Residual Representation

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/618 ◽

2019 ◽

Author(s):

WenJi Zhou ◽

Yang Yu ◽

Yingfeng Chen ◽

Kai Guan ◽

Tangjie Lv ◽

...

Keyword(s):

Reinforcement Learning ◽

Video Game ◽

State Of The Art ◽

Learning Experience ◽

Critical Issues ◽

Experience Reuse ◽

Multi Level ◽

Multiple Levels ◽

Multiple Granularities ◽

Different Levels

Experience reuse is key to sample-efficient reinforcement learning. One of the critical issues is how the experience is represented and stored. Previously, the experience can be stored in the forms of features, individual models, and the average model, each lying at a different granularity. However, new tasks may require experience across multiple granularities. In this paper, we propose the policy residual representation (PRR) network, which can extract and store multiple levels of experience. PRR network is trained on a set of tasks with a multi-level architecture, where a module in each level corresponds to a subset of the tasks. Therefore, the PRR network represents the experience in a spectrum-like way. When training on a new task, PRR can provide different levels of experience for accelerating the learning. We experiment with the PRR network on a set of grid world navigation tasks, locomotion tasks, and fighting tasks in a video game. The results show that the PRR network leads to better reuse of experience and thus outperforms some state-of-the-art approaches.

Download Full-text

Making Tools and Making Sense: Complex, Intentional Behaviour in Human Evolution

Cambridge Archaeological Journal ◽

10.1017/s0959774309000055 ◽

2009 ◽

Vol 19 (1) ◽

pp. 85-96 ◽

Cited By ~ 37

Author(s):

Dietrich Stout ◽

Thierry Chaminade

Keyword(s):

Language Processing ◽

Neural Circuits ◽

Functional Brain Imaging ◽

Stone Tool ◽

Imaging Studies ◽

Functional Brain ◽

Making Sense ◽

Action Sequences ◽

Multi Level ◽

Multiple Levels

Stone tool-making is an ancient and prototypically human skill characterized by multiple levels of intentional organization. In a formal sense, it displays surprising similarities to the multi-level organization of human language. Recent functional brain imaging studies of stone tool-making similarly demonstrate overlap with neural circuits involved in language processing. These observations are consistent with the hypothesis that language and tool-making share key requirements for the construction of hierarchically structured action sequences and evolved together in a mutually reinforcing way.

Download Full-text

Learning Multi-Level Dependencies for Robust Word Recognition

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6463 ◽

2020 ◽

Vol 34 (05) ◽

pp. 9250-9257

Author(s):

Zhiwei Wang ◽

Hui Liu ◽

Jiliang Tang ◽

Songfan Yang ◽

Gale Yan Huang ◽

...

Keyword(s):

Neural Network ◽

Word Recognition ◽

Language Processing ◽

State Of The Art ◽

Learning Models ◽

Large Margin ◽

Sequential Dependencies ◽

Word Level ◽

Multi Level ◽

Machine Learning Models

Robust language processing systems are becoming increasingly important given the recent awareness of dangerous situations where brittle machine learning models can be easily broken with the presence of noises. In this paper, we introduce a robust word recognition framework that captures multi-level sequential dependencies in noised sentences. The proposed framework employs a sequence-to-sequence model over characters of each word, whose output is given to a word-level bi-directional recurrent neural network. We conduct extensive experiments to verify the effectiveness of the framework. The results show that the proposed framework outperforms state-of-the-art methods by a large margin and they also suggest that character-level dependencies can play an important role in word recognition. The code of the proposed framework and the major experiments are publicly available1.

Download Full-text

Gated Fully Fusion for Semantic Segmentation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6805 ◽

2020 ◽

Vol 34 (07) ◽

pp. 11418-11425 ◽

Cited By ~ 2

Author(s):

Xiangtai Li ◽

Houlong Zhao ◽

Lei Han ◽

Yunhai Tong ◽

Shaohua Tan ◽

...

Keyword(s):

Neural Networks ◽

State Of The Art ◽

Semantic Segmentation ◽

Semantic Gap ◽

Comprehensive Understanding ◽

Deep Convolutional Neural Networks ◽

Multi Level ◽

Multiple Levels ◽

High Level ◽

Fully Connected

Semantic segmentation generates comprehensive understanding of scenes through densely predicting the category for each pixel. High-level features from Deep Convolutional Neural Networks already demonstrate their effectiveness in semantic segmentation tasks, however the coarse resolution of high-level features often leads to inferior results for small/thin objects where detailed information is important. It is natural to consider importing low level features to compensate for the lost detailed information in high-level features. Unfortunately, simply combining multi-level features suffers from the semantic gap among them. In this paper, we propose a new architecture, named Gated Fully Fusion(GFF), to selectively fuse features from multiple levels using gates in a fully connected way. Specifically, features at each level are enhanced by higher-level features with stronger semantics and lower-level features with more details, and gates are used to control the propagation of useful information which significantly reduces the noises during fusion. We achieve the state of the art results on four challenging scene parsing datasets including Cityscapes, Pascal Context, COCO-stuff and ADE20K.

Download Full-text

Evaluating The Effect Of A Ten Week Multi-Level Mentor Training Course On Cell Leaders And Cell Members Being Trained To Make Disciples By Mentoring At Multiple Levels

10.2986/tren.090-0240 ◽

2000 ◽

Author(s):

Matthew J. HOWELL

Keyword(s):

Mentor Training ◽

Training Course ◽

Multi Level ◽

Multiple Levels ◽

Cell Leaders

Download Full-text

The Consequences of Accounting Failure for Innovation: A Multi-Level Analysis

Accounting Horizons ◽

10.2308/horizons-16-194 ◽

2020 ◽

Vol 34 (2) ◽

pp. 109-124

Author(s):

Megan F. Hess ◽

Andrew M. Hess

Keyword(s):

Firm Level ◽

Psychological Response ◽

Top Executives ◽

Innovation Activities ◽

Threat Rigidity ◽

Multi Level ◽

Multiple Levels ◽

Financial Misconduct ◽

Public Disclosures ◽

Level Analysis

SYNOPSIS In this study, we investigate the relation between accounting failure and innovation at multiple levels in an organization by developing and testing a model for how top executives and functional managers might change their risk preferences and their innovation investments in response to public disclosures of financial misconduct. At the firm level, we find that accounting failures reduce subsequent investments in R&D, as predicted by a threat rigidity (“play it safe”) psychological response among top executives. At the project level, accounting failures have the opposite effect, resulting in an increase in the number of exploratory projects, as predicted by a failure trap (“swing for the fences”) psychological response among functional managers. Unpacking this relation at multiple levels of analysis helps us to understand the complex ways in which financial misconduct shapes a firm's innovation activities and appreciate the far-reaching consequences of accounting failure.

Download Full-text

Report on the 4th Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries at SIGIR 2019

ACM SIGIR Forum ◽

10.1145/3458553.3458554 ◽

2019 ◽

Vol 53 (2) ◽

pp. 3-10

Author(s):

Muthu Kumar Chandrasekaran ◽

Philipp Mayr

Keyword(s):

Information Retrieval ◽

Natural Language Processing ◽

Natural Language ◽

Research And Development ◽

Language Processing ◽

Digital Libraries ◽

State Of The Art ◽

Shared Task ◽

Processing Information ◽

Joint Workshop

The 4 th joint BIRNDL workshop was held at the 42nd ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2019) in Paris, France. BIRNDL 2019 intended to stimulate IR researchers and digital library professionals to elaborate on new approaches in natural language processing, information retrieval, scientometrics, and recommendation techniques that can advance the state-of-the-art in scholarly document understanding, analysis, and retrieval at scale. The workshop incorporated different paper sessions and the 5 th edition of the CL-SciSumm Shared Task.

Download Full-text