Exemplar Guided Neural Dialogue Generation

Humans benefit from previous experiences when taking actions. Similarly, related examples from the training data also provide exemplary information for neural dialogue models when responding to a given input message. However, effectively fusing such exemplary information into dialogue generation is non-trivial: useful exemplars are required to be not only literally-similar, but also topic-related with the given context. Noisy exemplars impair the neural dialogue models understanding the conversation topics and even corrupt the response generation. To address the issues, we propose an exemplar guided neural dialogue generation model where exemplar responses are retrieved in terms of both the text similarity and the topic proximity through a two-stage exemplar retrieval model. In the first stage, a small subset of conversations is retrieved from a training set given a dialogue context. These candidate exemplars are then finely ranked regarding the topical proximity to choose the best-matched exemplar response. To further induce the neural dialogue generation model consulting the exemplar response and the conversation topics more faithfully, we introduce a multi-source sampling mechanism to provide the dialogue model with both local exemplary semantics and global topical guidance during decoding. Empirical evaluations on a large-scale conversation dataset show that the proposed approach significantly outperforms the state-of-the-art in terms of both the quantitative metrics and human evaluations.

Download Full-text

Neural models for information retrieval without labeled data

ACM SIGIR Forum ◽

10.1145/3458553.3458569 ◽

2019 ◽

Vol 53 (2) ◽

pp. 104-105

Author(s):

Hamed Zamani

Keyword(s):

Neural Network ◽

Information Retrieval ◽

Performance Prediction ◽

Large Scale ◽

Deep Neural Networks ◽

State Of The Art ◽

Training Data ◽

Retrieval Model ◽

Neural Models ◽

Retrieval Models

Recent developments of machine learning models, and in particular deep neural networks, have yielded significant improvements on several computer vision, natural language processing, and speech recognition tasks. Progress with information retrieval (IR) tasks has been slower, however, due to the lack of large-scale training data as well as neural network models specifically designed for effective information retrieval [9]. In this dissertation, we address these two issues by introducing task-specific neural network architectures for a set of IR tasks and proposing novel unsupervised or weakly supervised solutions for training the models. The proposed learning solutions do not require labeled training data. Instead, in our weak supervision approach, neural models are trained on a large set of noisy and biased training data obtained from external resources, existing models, or heuristics. We first introduce relevance-based embedding models [3] that learn distributed representations for words and queries. We show that the learned representations can be effectively employed for a set of IR tasks, including query expansion, pseudo-relevance feedback, and query classification [1, 2]. We further propose a standalone learning to rank model based on deep neural networks [5, 8]. Our model learns a sparse representation for queries and documents. This enables us to perform efficient retrieval by constructing an inverted index in the learned semantic space. Our model outperforms state-of-the-art retrieval models, while performing as efficiently as term matching retrieval models. We additionally propose a neural network framework for predicting the performance of a retrieval model for a given query [7]. Inspired by existing query performance prediction models, our framework integrates several information sources, such as retrieval score distribution and term distribution in the top retrieved documents. This leads to state-of-the-art results for the performance prediction task on various standard collections. We finally bridge the gap between retrieval and recommendation models, as the two key components in most information systems. Search and recommendation often share the same goal: helping people get the information they need at the right time. Therefore, joint modeling and optimization of search engines and recommender systems could potentially benefit both systems [4]. In more detail, we introduce a retrieval model that is trained using user-item interaction (e.g., recommendation data), with no need to query-document relevance information for training [6]. Our solutions and findings in this dissertation smooth the path towards learning efficient and effective models for various information retrieval and related tasks, especially when large-scale training data is not available.

Download Full-text

Attending to Entities for Better Text Understanding

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6254 ◽

2020 ◽

Vol 34 (05) ◽

pp. 7554-7561

Author(s):

Pengxiang Cheng ◽

Katrin Erk

Keyword(s):

Large Scale ◽

Human Performance ◽

State Of The Art ◽

Syntactic Structure ◽

Semantic Knowledge ◽

Training Data ◽

Language Models ◽

Long Distance ◽

Future Directions ◽

Text Understanding

Recent progress in NLP witnessed the development of large-scale pre-trained language models (GPT, BERT, XLNet, etc.) based on Transformer (Vaswani et al. 2017), and in a range of end tasks, such models have achieved state-of-the-art results, approaching human performance. This clearly demonstrates the power of the stacked self-attention architecture when paired with a sufficient number of layers and a large amount of pre-training data. However, on tasks that require complex and long-distance reasoning where surface-level cues are not enough, there is still a large gap between the pre-trained models and human performance. Strubell et al. (2018) recently showed that it is possible to inject knowledge of syntactic structure into a model through supervised self-attention. We conjecture that a similar injection of semantic knowledge, in particular, coreference information, into an existing model would improve performance on such complex problems. On the LAMBADA (Paperno et al. 2016) task, we show that a model trained from scratch with coreference as auxiliary supervision for self-attention outperforms the largest GPT-2 model, setting the new state-of-the-art, while only containing a tiny fraction of parameters compared to GPT-2. We also conduct a thorough analysis of different variants of model architectures and supervision configurations, suggesting future directions on applying similar techniques to other problems.

Download Full-text

Relevance-guided Supervision for OpenQA with ColBERT

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00405 ◽

2021 ◽

Vol 9 ◽

pp. 929-944

Author(s):

Omar Khattab ◽

Christopher Potts ◽

Matei Zaharia

Keyword(s):

Question Answering ◽

State Of The Art ◽

Training Data ◽

Coarse Grained ◽

Retrieval Model ◽

Open Domain ◽

Weak Supervision ◽

Fine Grained ◽

Vector Representations ◽

Large Corpus

Abstract Systems for Open-Domain Question Answering (OpenQA) generally depend on a retriever for finding candidate passages in a large corpus and a reader for extracting answers from those passages. In much recent work, the retriever is a learned component that uses coarse-grained vector representations of questions and passages. We argue that this modeling choice is insufficiently expressive for dealing with the complexity of natural language questions. To address this, we define ColBERT-QA, which adapts the scalable neural retrieval model ColBERT to OpenQA. ColBERT creates fine-grained interactions between questions and passages. We propose an efficient weak supervision strategy that iteratively uses ColBERT to create its own training data. This greatly improves OpenQA retrieval on Natural Questions, SQuAD, and TriviaQA, and the resulting system attains state-of-the-art extractive OpenQA performance on all three datasets.

Download Full-text

AI-Sketcher : A Deep Generative Model for Producing High-Quality Sketches

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33012564 ◽

2019 ◽

Vol 33 ◽

pp. 2564-2571 ◽

Cited By ~ 3

Author(s):

Nan Cao ◽

Xin Yan ◽

Yang Shi ◽

Chaoran Chen

Keyword(s):

Artificial Intelligence ◽

Large Scale ◽

State Of The Art ◽

Positional Information ◽

Generative Model ◽

Training Data ◽

Creative Design ◽

High Quality ◽

User Input ◽

Variational Autoencoder

Sketch drawings play an important role in assisting humans in communication and creative design since ancient period. This situation has motivated the development of artificial intelligence (AI) techniques for automatically generating sketches based on user input. Sketch-RNN, a sequence-to-sequence variational autoencoder (VAE) model, was developed for this purpose and known as a state-of-the-art technique. However, it suffers from limitations, including the generation of lowquality results and its incapability to support multi-class generations. To address these issues, we introduced AI-Sketcher, a deep generative model for generating high-quality multiclass sketches. Our model improves drawing quality by employing a CNN-based autoencoder to capture the positional information of each stroke at the pixel level. It also introduces an influence layer to more precisely guide the generation of each stroke by directly referring to the training data. To support multi-class sketch generation, we provided a conditional vector that can help differentiate sketches under various classes. The proposed technique was evaluated based on two large-scale sketch datasets, and results demonstrated its power in generating high-quality sketches.

Download Full-text

CharTeC-Net: An Efficient and Lightweight Character-Based Convolutional Network for Text Classification

Journal of Electrical and Computer Engineering ◽

10.1155/2020/9701427 ◽

2020 ◽

Vol 2020 ◽

pp. 1-7 ◽

Cited By ~ 2

Author(s):

Aboubakar Nasser Samatin Njikam ◽

Huan Zhao

Keyword(s):

Text Classification ◽

Building Block ◽

Large Scale ◽

State Of The Art ◽

Building Blocks ◽

Training Data ◽

Superior Performance ◽

Classification Problems ◽

Computationally Efficient ◽

Convolutional Network

This paper introduces an extremely lightweight (with just over around two hundred thousand parameters) and computationally efficient CNN architecture, named CharTeC-Net (Character-based Text Classification Network), for character-based text classification problems. This new architecture is composed of four building blocks for feature extraction. Each of these building blocks, except the last one, uses 1 × 1 pointwise convolutional layers to add more nonlinearity to the network and to increase the dimensions within each building block. In addition, shortcut connections are used in each building block to facilitate the flow of gradients over the network, but more importantly to ensure that the original signal present in the training data is shared across each building block. Experiments on eight standard large-scale text classification and sentiment analysis datasets demonstrate CharTeC-Net’s superior performance over baseline methods and yields competitive accuracy compared with state-of-the-art methods, although CharTeC-Net has only between 181,427 and 225,323 parameters and weighs less than 1 megabyte.

Download Full-text

A new parallel data geometry analysis algorithm to select training data for support vector machine

AIMS Mathematics ◽

10.3934/math.2021806 ◽

2021 ◽

Vol 6 (12) ◽

pp. 13931-13953

Author(s):

Yunfeng Shi ◽

◽

Shu Lv ◽

Kaibo Shi ◽

◽

...

Keyword(s):

Support Vector Machine ◽

Large Scale ◽

Computational Cost ◽

Training Data ◽

Support Vector ◽

Training Set ◽

Redundant Data ◽

Parallel Data ◽

Low Efficiency ◽

Geometry Analysis

<abstract><p>Support vector machine (SVM) is one of the most powerful technologies of machine learning, which has been widely concerned because of its remarkable performance. However, when dealing with the classification problem of large-scale datasets, the high complexity of SVM model leads to low efficiency and become impractical. Due to the sparsity of SVM in the sample space, this paper presents a new parallel data geometry analysis (PDGA) algorithm to reduce the training set of SVM, which helps to improve the efficiency of SVM training. The PDGA introduce Mahalanobis distance to measure the distance from each sample to its centroid. And based on this, proposes a method that can identify non support vectors and outliers at the same time to help remove redundant data. When the training set is further reduced, cosine angle distance analysis method is proposed to determine whether the samples are redundant data, ensure that the valuable data are not removed. Different from the previous data geometry analysis methods, the PDGA algorithm is implemented in parallel, which greatly saving the computational cost. Experimental results on artificial dataset and 6 real datasets show that the algorithm can adapt to different sample distributions. Which significantly reduce the training time and memory requirements without sacrificing the classification accuracy, and its performance is obviously better than the other five competitive algorithms.</p></abstract>

Download Full-text

A Pre-Training Based Personalized Dialogue Generation Model with Persona-Sparse Data

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6518 ◽

2020 ◽

Vol 34 (05) ◽

pp. 9693-9700

Author(s):

Yinhe Zheng ◽

Rongsheng Zhang ◽

Minlie Huang ◽

Xiaoxi Mao

Keyword(s):

State Of The Art ◽

Language Model ◽

Sparse Data ◽

Generation Model ◽

Dialogue Systems ◽

Inference Process ◽

Training Process ◽

Natural Languages ◽

Dialogue Model ◽

Proposed Model

Endowing dialogue systems with personas is essential to deliver more human-like conversations. However, this problem is still far from well explored due to the difficulties of both embodying personalities in natural languages and the persona sparsity issue observed in most dialogue corpora. This paper proposes a pre-training based personalized dialogue model that can generate coherent responses using persona-sparse dialogue data. In this method, a pre-trained language model is used to initialize an encoder and decoder, and personal attribute embeddings are devised to model richer dialogue contexts by encoding speakers' personas together with dialogue histories. Further, to incorporate the target persona in the decoding process and to balance its contribution, an attention routing structure is devised in the decoder to merge features extracted from the target persona and dialogue contexts using dynamically predicted weights. Our model can utilize persona-sparse dialogues in a unified manner during the training process, and can also control the amount of persona-related features to exhibit during the inference process. Both automatic and manual evaluation demonstrates that the proposed model outperforms state-of-the-art methods for generating more coherent and persona consistent responses with persona-sparse data.

Download Full-text

Learning from Web Data Using Adversarial Discriminative Neural Networks for Fine-Grained Classification

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.3301273 ◽

2019 ◽

Vol 33 ◽

pp. 273-280 ◽

Cited By ~ 2

Author(s):

Xiaoxiao Sun ◽

Liyi Chen ◽

Jufeng Yang

Keyword(s):

Neural Networks ◽

Large Scale ◽

State Of The Art ◽

Training Data ◽

Web Data ◽

Fine Grained ◽

Learning Framework ◽

Attractive Option ◽

Public Datasets ◽

Noisy Labels

Fine-grained classification is absorbed in recognizing the subordinate categories of one field, which need a large number of labeled images, while it is expensive to label these images. Utilizing web data has been an attractive option to meet the demands of training data for convolutional neural networks (CNNs), especially when the well-labeled data is not enough. However, directly training on such easily obtained images often leads to unsatisfactory performance due to factors such as noisy labels. This has been conventionally addressed by reducing the noise level of web data. In this paper, we take a fundamentally different view and propose an adversarial discriminative loss to advocate representation coherence between standard and web data. This is further encapsulated in a simple, scalable and end-to-end trainable multi-task learning framework. We experiment on three public datasets using large-scale web data to evaluate the effectiveness and generalizability of the proposed approach. Extensive experiments demonstrate that our approach performs favorably against the state-of-the-art methods.

Download Full-text

Adversarial Metric Learning

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/279 ◽

2018 ◽

Cited By ~ 6

Author(s):

Shuo Chen ◽

Chen Gong ◽

Jian Yang ◽

Xiang Li ◽

Yang Wei ◽

...

Keyword(s):

State Of The Art ◽

Metric Learning ◽

Sampling Bias ◽

Training Data ◽

Loss Functions ◽

Original Training ◽

Training Set ◽

Learning Problem ◽

Optimization Framework ◽

The Past

In the past decades, intensive efforts have been put to design various loss functions and metric forms for metric learning problem. These improvements have shown promising results when the test data is similar to the training data. However, the trained models often fail to produce reliable distances on the ambiguous test pairs due to the different samplings between training set and test set. To address this problem, the Adversarial Metric Learning (AML) is proposed in this paper, which automatically generates adversarial pairs to remedy the sampling bias and facilitate robust metric learning. Specifically, AML consists of two adversarial stages, i.e. confusion and distinguishment. In confusion stage, the ambiguous but critical adversarial data pairs are adaptively generated to mislead the learned metric. In distinguishment stage, a metric is exhaustively learned to try its best to distinguish both adversarial pairs and original training pairs. Thanks to the challenges posed by the confusion stage in such competing process, the AML model is able to grasp plentiful difficult knowledge that has not been contained by the original training pairs, so the discriminability of AML can be significantly improved. The entire model is formulated into optimization framework, of which the global convergence is theoretically proved. The experimental results on toy data and practical datasets clearly demonstrate the superiority of AML to representative state-of-the-art metric learning models.

Download Full-text

Teacher-Student Networks with Multiple Decoders for Solving Math Word Problem

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/555 ◽

2020 ◽

Author(s):

Jipeng Zhang ◽

Roy Ka-Wei Lee ◽

Ee-Peng Lim ◽

Wei Qin ◽

Lei Wang ◽

...

Keyword(s):

Word Problem ◽

Large Scale ◽

State Of The Art ◽

Training Data ◽

Teacher Student ◽

Candidate Solution ◽

Novel Approach ◽

Art Works ◽

Standard Solution ◽

Final Answer

Math word problem (MWP) is challenging due to the limitation in training data where only one “standard” solution is available. MWP models often simply fit this solution rather than truly understand or solve the problem. The generalization of models (to diverse word scenarios) is thus limited. To address this problem, this paper proposes a novel approach, TSN-MD, by leveraging the teacher network to integrate the knowledge of equivalent solution expressions and then to regularize the learning behavior of the student network. In addition, we introduce the multiple-decoder student network to generate multiple candidate solution expressions by which the final answer is voted. In experiments, we conduct extensive comparisons and ablative studies on two large-scale MWP benchmarks, and show that using TSN-MD can surpass the state-of-the-art works by a large margin. More intriguingly, the visualization results demonstrate that TSN-MD not only produces correct final answers but also generates diverse equivalent expressions of the solution.

Download Full-text