dialogue system
Recently Published Documents


TOTAL DOCUMENTS

725
(FIVE YEARS 188)

H-INDEX

19
(FIVE YEARS 5)

2022 ◽  
Vol 40 (1) ◽  
pp. 1-44
Author(s):  
Longxuan Ma ◽  
Mingda Li ◽  
Wei-Nan Zhang ◽  
Jiapeng Li ◽  
Ting Liu

Incorporating external knowledge into dialogue generation has been proven to benefit the performance of an open-domain Dialogue System (DS), such as generating informative or stylized responses, controlling conversation topics. In this article, we study the open-domain DS that uses unstructured text as external knowledge sources ( U nstructured T ext E nhanced D ialogue S ystem ( UTEDS )). The existence of unstructured text entails distinctions between UTEDS and traditional data-driven DS and we aim at analyzing these differences. We first give the definition of the UTEDS related concepts, then summarize the recently released datasets and models. We categorize UTEDS into Retrieval and Generative models and introduce them from the perspective of model components. The retrieval models consist of Fusion, Matching, and Ranking modules, while the generative models comprise Dialogue and Knowledge Encoding, Knowledge Selection (KS), and Response Generation modules. We further summarize the evaluation methods utilized in UTEDS and analyze the current models’ performance. At last, we discuss the future development trends of UTEDS, hoping to inspire new research in this field.


2022 ◽  
Vol 2022 ◽  
pp. 1-11
Author(s):  
Yu Wang

In this paper, we use machine learning algorithms to conduct in-depth research and analysis on the construction of human-computer interaction systems and propose a simple and effective method for extracting salient features based on contextual information. The method can retain the dynamic and static information of gestures intact, which results in a richer and more robust feature representation. Secondly, this paper proposes a dynamic planning algorithm based on feature matching, which uses the consistency and accuracy of feature matching to measure the similarity of two frames and then uses a dynamic planning algorithm to find the optimal matching distance between two gesture sequences. The algorithm ensures the continuity and accuracy of the gesture description and makes full use of the spatiotemporal location information of the features. The features and limitations of common motion target detection methods in motion gesture detection and common machine learning tracking methods in gesture tracking are first analyzed, and then, the kernel correlation filter method is improved by designing a confidence model and introducing a scale filter, and finally, comparison experiments are conducted on a self-built gesture dataset to verify the effectiveness of the improved method. During the training and validation of the model by the corpus, the complementary feature extraction methods are ablated and learned, and the corresponding results obtained are compared with the three baseline methods. But due to this feature, GMMs are not suitable when users want to model the time structure. It has been widely used in classification tasks. By using the kernel function, the support vector machine can transform the original input set into a high-dimensional feature space. After experiments, the speech emotion recognition method proposed in this paper outperforms the baseline methods, proving the effectiveness of complementary feature extraction and the superiority of the deep learning model. The speech is used as the input of the system, and the emotion recognition is performed on the input speech, and the corresponding emotion obtained is successfully applied to the human-computer dialogue system in combination with the online speech recognition method, which proves that the speech emotion recognition applied to the human-computer dialogue system has application research value.


Informatics ◽  
2021 ◽  
Vol 18 (4) ◽  
pp. 40-52
Author(s):  
S. A. Hetsevich ◽  
Dz. A. Dzenisyk ◽  
Yu. S. Hetsevich ◽  
L. I. Kaigorodova ◽  
K. A. Nikalaenka

O b j e c t i v e s. The main goal of the work is a research of the natural language user interfaces and the developmentof a prototype of such an interface. The prototype is a bilingual Russian and Belarusian question-and-answer dialogue system. The research of the natural language interfaces was conducted in terms of the use of natural language for interaction between a user and a computer system. The main problems here are the ambiguity of natural language and the difficulties in the design of natural language interfaces that meet user expectations.M e t ho d s. The main principles of modelling the natural language user interfaces are considered. As an intelligent system, it consists of a database, knowledge machine and a user interface. Speech recognition and speech synthesis components make natural language interfaces more convenient from the point of view of usability.R e s u l t s. The description of the prototype of a natural language interface for a question-and-answer intelligent system is presented. The model of the prototype includes speech-to-text and text-to-speech Belarusian and Russian subsystems, generation of responses in the form of the natural language and formal text.An additional component is natural Belarusian and Russian voice input. Some of the data, required for human voice recognition, are stored as knowledge in the knowledge base or created on the basis of existing knowledge. Another important component is Belarusian and Russian voice output. This component is the top required for making the natural language interface more user-friendly.Co n c l u s i o n. The article presents the research of natural language user interfaces, the result of which provides the development and description of the prototype of the natural language interface for the intelligent question- and-answer system.


Author(s):  
Tulika Saha ◽  
Dhawal Gupta ◽  
Sriparna Saha ◽  
Pushpak Bhattacharyya

Building Virtual Agents capable of carrying out complex queries of the user involving multiple intents of a domain is quite a challenge, because it demands that the agent manages several subtasks simultaneously. This article presents a universal Deep Reinforcement Learning framework that can synthesize dialogue managers capable of working in a task-oriented dialogue system encompassing various intents pertaining to a domain. The conversation between agent and user is broken down into hierarchies, to segregate subtasks pertinent to different intents. The concept of Hierarchical Reinforcement Learning, particularly options , is used to learn policies in different hierarchies that operates in distinct time steps to fulfill the user query successfully. The dialogue manager comprises top-level intent meta-policy to select among subtasks or options and a low-level controller policy to pick primitive actions to communicate with the user to complete the subtask provided to it by the top-level policy in varying intents of a domain. The proposed dialogue management module has been trained in a way such that it can be reused for any language for which it has been developed with little to no supervision. The developed system has been demonstrated for “Air Travel” and “Restaurant” domain in English and Hindi languages. Empirical results determine the robustness and efficacy of the learned dialogue policy as it outperforms several baselines and a state-of-the-art system.


Author(s):  
Lu Xiang ◽  
Junnan Zhu ◽  
Yang Zhao ◽  
Yu Zhou ◽  
Chengqing Zong

Cross-lingual dialogue systems are increasingly important in e-commerce and customer service due to the rapid progress of globalization. In real-world system deployment, machine translation (MT) services are often used before and after the dialogue system to bridge different languages. However, noises and errors introduced in the MT process will result in the dialogue system's low robustness, making the system's performance far from satisfactory. In this article, we propose a novel MT-oriented noise enhanced framework that exploits multi-granularity MT noises and injects such noises into the dialogue system to improve the dialogue system's robustness. Specifically, we first design a method to automatically construct multi-granularity MT-oriented noises and multi-granularity adversarial examples, which contain abundant noise knowledge oriented to MT. Then, we propose two strategies to incorporate the noise knowledge: (i) Utterance-level adversarial learning and (ii) Knowledge-level guided method. The former adopts adversarial learning to learn a perturbation-invariant encoder, guiding the dialogue system to learn noise-independent hidden representations. The latter explicitly incorporates the multi-granularity noises, which contain the noise tokens and their possible correct forms, into the training and inference process, thus improving the dialogue system's robustness. Experimental results on three dialogue models, two dialogue datasets, and two language pairs have shown that the proposed framework significantly improves the performance of the cross-lingual dialogue system.


2021 ◽  
Author(s):  
Philippe Blache ◽  
Matthis Houlès

This paper presents a dialogue system for training doctors to break bad news. The originality of this work lies in its knowledge representation. All information known before the dialogue (the universe of discourse, the context, the scenario of the dialogue) as well as the knowledge transferred from the doctor to the patient during the conversation is represented in a shared knowledge structure called common ground, that constitute the core of the system. The Natural Language Understanding and the Natural Language Generation modules of the system take advantage on this structure and we present in this paper different original techniques making it possible to implement them efficiently.


2021 ◽  
Vol 39 (4) ◽  
pp. 1-24
Author(s):  
Wei Wei ◽  
Jiayi Liu ◽  
Xianling Mao ◽  
Guibing Guo ◽  
Feida Zhu ◽  
...  

The consistency of a response to a given post at the semantic level and emotional level is essential for a dialogue system to deliver humanlike interactions. However, this challenge is not well addressed in the literature, since most of the approaches neglect the emotional information conveyed by a post while generating responses. This article addresses this problem and proposes a unified end-to-end neural architecture, which is capable of simultaneously encoding the semantics and the emotions in a post and leveraging target information to generate more intelligent responses with appropriately expressed emotions. Extensive experiments on real-world data demonstrate that the proposed method outperforms the state-of-the-art methods in terms of both content coherence and emotion appropriateness.


2021 ◽  
Vol 39 (4) ◽  
pp. 1-28
Author(s):  
Ruijian Xu ◽  
Chongyang Tao ◽  
Jiazhan Feng ◽  
Wei Wu ◽  
Rui Yan ◽  
...  

Building an intelligent dialogue system with the ability to select a proper response according to a multi-turn context is challenging in three aspects: (1) the meaning of a context–response pair is built upon language units from multiple granularities (e.g., words, phrases, and sub-sentences, etc.); (2) local (e.g., a small window around a word) and long-range (e.g., words across the context and the response) dependencies may exist in dialogue data; and (3) the relationship between the context and the response candidate lies in multiple relevant semantic clues or relatively implicit semantic clues in some real cases. However, existing approaches usually encode the dialogue with mono-type representation and the interaction processes between the context and the response candidate are executed in a rather shallow manner, which may lead to an inadequate understanding of dialogue content and hinder the recognition of the semantic relevance between the context and response. To tackle these challenges, we propose a representation [ K ] -interaction [ L ] -matching framework that explores multiple types of deep interactive representations to build context-response matching models for response selection. Particularly, we construct different types of representations for utterance–response pairs and deepen them via alternate encoding and interaction. By this means, the model can handle the relation of neighboring elements, phrasal pattern, and long-range dependencies during the representation and make a more accurate prediction through multiple layers of interactions between the context–response pair. Experiment results on three public benchmarks indicate that the proposed model significantly outperforms previous conventional context-response matching models and achieve slightly better results than the BERT model for multi-turn response selection in retrieval-based dialogue systems.


Sign in / Sign up

Export Citation Format

Share Document