scholarly journals Learning to balance the coherence and diversity of response generation in generation-based chatbots

2020 ◽  
Vol 17 (4) ◽  
pp. 172988142095300
Author(s):  
Shuliang Wang ◽  
Dapeng Li ◽  
Jing Geng ◽  
Longxing Yang ◽  
Hongyong Leng

Generating response with both coherence and diversity is a challenging task in generation-based chatbots. It is more difficult to improve the coherence and diversity of dialog generation at the same time in the response generation model. In this article, we propose an improved method that improves the coherence and diversity of dialog generation by changing the model to use gamma sampling and adding attention mechanism to the knowledge-guided conditional variational autoencoder. The experimental results demonstrate that our proposed method can significantly improve the coherence and diversity of knowledge-guided conditional variational autoencoder for response generation in generation-based chatbots at the same time.

2021 ◽  
Vol 11 (5) ◽  
pp. 2174
Author(s):  
Xiaoguang Li ◽  
Feifan Yang ◽  
Jianglu Huang ◽  
Li Zhuo

Images captured in a real scene usually suffer from complex non-uniform degradation, which includes both global and local blurs. It is difficult to handle the complex blur variances by a unified processing model. We propose a global-local blur disentangling network, which can effectively extract global and local blur features via two branches. A phased training scheme is designed to disentangle the global and local blur features, that is the branches are trained with task-specific datasets, respectively. A branch attention mechanism is introduced to dynamically fuse global and local features. Complex blurry images are used to train the attention module and the reconstruction module. The visualized feature maps of different branches indicated that our dual-branch network can decouple the global and local blur features efficiently. Experimental results show that the proposed dual-branch blur disentangling network can improve both the subjective and objective deblurring effects for real captured images.


Author(s):  
Qianrong Zhou ◽  
Xiaojie Wang ◽  
Xuan Dong

Attention-based models have shown to be effective in learning representations for sentence classification. They are typically equipped with multi-hop attention mechanism. However, existing multi-hop models still suffer from the problem of paying much attention to the most frequently noticed words, which might not be important to classify the current sentence. And there is a lack of explicitly effective way that helps the attention to be shifted out of a wrong part in the sentence. In this paper, we alleviate this problem by proposing a differentiated attentive learning model. It is composed of two branches of attention subnets and an example discriminator. An explicit signal with the loss information of the first attention subnet is passed on to the second one to drive them to learn different attentive preference. The example discriminator then selects the suitable attention subnet for sentence classification. Experimental results on real and synthetic datasets demonstrate the effectiveness of our model.


Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Yongyi Li ◽  
Shiqi Wang ◽  
Shuang Dong ◽  
Xueling Lv ◽  
Changzhi Lv ◽  
...  

At present, person reidentification based on attention mechanism has attracted many scholars’ interests. Although attention module can improve the representation ability and reidentification accuracy of Re-ID model to a certain extent, it depends on the coupling of attention module and original network. In this paper, a person reidentification model that combines multiple attentions and multiscale residuals is proposed. The model introduces combined attention fusion module and multiscale residual fusion module in the backbone network ResNet 50 to enhance the feature flow between residual blocks and better fuse multiscale features. Furthermore, a global branch and a local branch are designed and applied to enhance the channel aggregation and position perception ability of the network by utilizing the dual ensemble attention module, as along as the fine-grained feature expression is obtained by using multiproportion block and reorganization. Thus, the global and local features are enhanced. The experimental results on Market-1501 dataset and DukeMTMC-reID dataset show that the indexes of the presented model, especially Rank-1 accuracy, reach 96.20% and 89.59%, respectively, which can be considered as a progress in Re-ID.


Author(s):  
Bowen Xing ◽  
Lejian Liao ◽  
Dandan Song ◽  
Jingang Wang ◽  
Fuzheng Zhang ◽  
...  

Aspect-based sentiment analysis (ABSA) aims to predict fine-grained sentiments of comments with respect to given aspect terms or categories. In previous ABSA methods, the importance of aspect has been realized and verified. Most existing LSTM-based models take aspect into account via the attention mechanism, where the attention weights are calculated after the context is modeled in the form of contextual vectors. However, aspect-related information may be already discarded and aspect-irrelevant information may be retained in classic LSTM cells in the context modeling process, which can be improved to generate more effective context representations. This paper proposes a novel variant of LSTM, termed as aspect-aware LSTM (AA-LSTM), which incorporates aspect information into LSTM cells in the context modeling stage before the attention mechanism. Therefore, our AA-LSTM can dynamically produce aspect-aware contextual representations. We experiment with several representative LSTM-based models by replacing the classic LSTM cells with the AA-LSTM cells. Experimental results on SemEval-2014 Datasets demonstrate the effectiveness of AA-LSTM.


Author(s):  
Hao Zhou ◽  
Tom Young ◽  
Minlie Huang ◽  
Haizhou Zhao ◽  
Jingfang Xu ◽  
...  

Commonsense knowledge is vital to many natural language processing tasks. In this paper, we present a novel open-domain conversation generation model to demonstrate how large-scale commonsense knowledge can facilitate language understanding and generation. Given a user post, the model retrieves relevant knowledge graphs from a knowledge base and then encodes the graphs with a static graph attention mechanism, which augments the semantic information of the post and thus supports better understanding of the post. Then, during word generation, the model attentively reads the retrieved knowledge graphs and the knowledge triples within each graph to facilitate better generation through a dynamic graph attention mechanism. This is the first attempt that uses large-scale commonsense knowledge in conversation generation. Furthermore, unlike existing models that use knowledge triples (entities) separately and independently, our model treats each knowledge graph as a whole, which encodes more structured, connected semantic information in the graphs. Experiments show that the proposed model can generate more appropriate and informative responses than state-of-the-art baselines. 


2020 ◽  
Vol 34 (04) ◽  
pp. 3495-3502 ◽  
Author(s):  
Junxiang Chen ◽  
Kayhan Batmanghelich

Recently, researches related to unsupervised disentanglement learning with deep generative models have gained substantial popularity. However, without introducing supervision, there is no guarantee that the factors of interest can be successfully recovered (Locatello et al. 2018). Motivated by a real-world problem, we propose a setting where the user introduces weak supervision by providing similarities between instances based on a factor to be disentangled. The similarity is provided as either a binary (yes/no) or real-valued label describing whether a pair of instances are similar or not. We propose a new method for weakly supervised disentanglement of latent variables within the framework of Variational Autoencoder. Experimental results demonstrate that utilizing weak supervision improves the performance of the disentanglement method substantially.


Author(s):  
Weijia Zhang

Multi-instance learning is a type of weakly supervised learning. It deals with tasks where the data is a set of bags and each bag is a set of instances. Only the bag labels are observed whereas the labels for the instances are unknown. An important advantage of multi-instance learning is that by representing objects as a bag of instances, it is able to preserve the inherent dependencies among parts of the objects. Unfortunately, most existing algorithms assume all instances to be identically and independently distributed, which violates real-world scenarios since the instances within a bag are rarely independent. In this work, we propose the Multi-Instance Variational Autoencoder (MIVAE) algorithm which explicitly models the dependencies among the instances for predicting both bag labels and instance labels. Experimental results on several multi-instance benchmarks and end-to-end medical imaging datasets demonstrate that MIVAE performs better than state-of-the-art algorithms for both instance label and bag label prediction tasks.


2019 ◽  
Vol 11 (12) ◽  
pp. 247
Author(s):  
Xin Zhou ◽  
Peixin Dong ◽  
Jianping Xing ◽  
Peijia Sun

Accurate prediction of bus arrival times is a challenging problem in the public transportation field. Previous studies have shown that to improve prediction accuracy, more heterogeneous measurements provide better results. So what other factors should be added into the prediction model? Traditional prediction methods mainly use the arrival time and the distance between stations, but do not make full use of dynamic factors such as passenger number, dwell time, bus driving efficiency, etc. We propose a novel approach that takes full advantage of dynamic factors. Our approach is based on a Recurrent Neural Network (RNN). The experimental results indicate that a variety of prediction algorithms (such as Support Vector Machine, Kalman filter, Multilayer Perceptron, and RNN) have significantly improved performance after using dynamic factors. Further, we introduce RNN with an attention mechanism to adaptively select the most relevant input factors. Experiments demonstrate that the prediction accuracy of RNN with an attention mechanism is better than RNN with no attention mechanism when there are heterogeneous input factors. The experimental results show the superior performances of our approach on the data set provided by Jinan Public Transportation Corporation.


Sign in / Sign up

Export Citation Format

Share Document