scholarly journals Using Complexity-Identical Human- and Machine-Directed Utterances to Investigate Addressee Detection for Spoken Dialogue Systems

Sensors ◽  
2020 ◽  
Vol 20 (9) ◽  
pp. 2740 ◽  
Author(s):  
Oleg Akhtiamov ◽  
Ingo Siegert ◽  
Alexey Karpov ◽  
Wolfgang Minker

Human-machine addressee detection (H-M AD) is a modern paralinguistics and dialogue challenge that arises in multiparty conversations between several people and a spoken dialogue system (SDS) since the users may also talk to each other and even to themselves while interacting with the system. The SDS is supposed to determine whether it is being addressed or not. All existing studies on acoustic H-M AD were conducted on corpora designed in such a way that a human addressee and a machine played different dialogue roles. This peculiarity influences speakers’ behaviour and increases vocal differences between human- and machine-directed utterances. In the present study, we consider the Restaurant Booking Corpus (RBC) that consists of complexity-identical human- and machine-directed phone calls and allows us to eliminate most of the factors influencing speakers’ behaviour implicitly. The only remaining factor is the speakers’ explicit awareness of their interlocutor (technical system or human being). Although complexity-identical H-M AD is essentially more challenging than the classical one, we managed to achieve significant improvements using data augmentation (unweighted average recall (UAR) = 0.628) over native listeners (UAR = 0.596) and a baseline classifier presented by the RBC developers (UAR = 0.539).

2006 ◽  
Vol 32 (3) ◽  
pp. 417-438 ◽  
Author(s):  
Diane Litman ◽  
Julia Hirschberg ◽  
Marc Swerts

This article focuses on the analysis and prediction of corrections, defined as turns where a user tries to correct a prior error made by a spoken dialogue system. We describe our labeling procedure of various corrections types and statistical analyses of their features in a corpus collected from a train information spoken dialogue system. We then present results of machine-learning experiments designed to identify user corrections of speech recognition errors. We investigate the predictive power of features automatically computable from the prosody of the turn, the speech recognition process, experimental conditions, and the dialogue history. Our best-performing features reduce classification error from baselines of 25.70–28.99% to 15.72%.


Author(s):  
Oyelami Olufemi Moses

Aims: This article reports the various application areas of the spoken dialogue system in the developing world to determine if the system could be used to bridge the digital divide prevalent in these regions of the world. The work also aims to identify in which developing nations is the system currently being put to use. Study Design:  A survey of twenty articles on the subject matter was carried out and their domains of the application were identified. The different forms of the evaluation carried out on them were also identified towards determining their outcomes positivity for bridging the digital divide. Various comments made of the different evaluations were also considered in determining the suitability of spoken dialogue systems in bridging the digital divide. Place and Duration of Study: Department of Computer Science and Information Technology, Bowen University, Iwo, Nigeria, between February 2013 and October 2019. Methodology: The different domains of the works, the different forms of the evaluation carried out on the systems, the various comments consequent upon the testing of the systems by the participants and the developing countries where those works were carried out were identified. A position was now taken based on the results obtained.   Results: Nine of the works are in the healthcare domain, three in agriculture, one in banking, one in aviation, one in secretarial work, one in the accuracy of recognition, one in education and three having multiple domains. The various comments and results from the evaluations all point towards the system’s suitability for bridging the digital divide. The spoken dialogue system is currently being used in only six developing nations of the world. Conclusion: Consequent upon the results obtained, it is clear that spoken dialogue systems can be used to bridge the digital divide in the developing world and that other application areas not yet covered could be explored for the benefits of the citizens of these regions, especially the digitally disadvantaged ones.


1999 ◽  
Vol 5 (1) ◽  
pp. 45-93 ◽  
Author(s):  
GERTJAN VAN NOORD ◽  
GOSSE BOUMA ◽  
ROB KOELING ◽  
MARK-JAN NEDERHOF

We argue that grammatical analysis is a viable alternative to concept spotting for processing spoken input in a practical spoken dialogue system. We discuss the structure of the grammar, and a model for robust parsing which combines linguistic sources of information and statistical sources of information. We discuss test results suggesting that grammatical processing allows fast and accurate processing of spoken input.


2002 ◽  
Vol 16 ◽  
pp. 293-319 ◽  
Author(s):  
M. A. Walker ◽  
I. Langkilde-Geary ◽  
H. Wright Hastie ◽  
J. Wright ◽  
A. Gorin

Spoken dialogue systems promise efficient and natural access to a large variety of information sources and services from any phone. However, current spoken dialogue systems are deficient in their strategies for preventing, identifying and repairing problems that arise in the conversation. This paper reports results on automatically training a Problematic Dialogue Predictor to predict problematic human-computer dialogues using a corpus of 4692 dialogues collected with the 'How May I Help You' (SM) spoken dialogue system. The Problematic Dialogue Predictor can be immediately applied to the system's decision of whether to transfer the call to a human customer care agent, or be used as a cue to the system's dialogue manager to modify its behavior to repair problems, and even perhaps, to prevent them. We show that a Problematic Dialogue Predictor using automatically-obtainable features from the first two exchanges in the dialogue can predict problematic dialogues 13.2% more accurately than the baseline.


Author(s):  
Pepi Stavropoulou ◽  
Dimitris Spiliotopoulos ◽  
Georgios Kouroupetroglou

Sophisticated, commercially deployed spoken dialogue systems capable of engaging in more natural human-machine conversation have increased in number over the past years. Besides employing advanced interpretation and dialogue management technologies, the success of such systems greatly depends on effective design and development methodology. There is, actually, a widely acknowledged, fundamentally reciprocal relationship between technologies used and design choices. In this line of thought, this chapter constitutes a more practical approach to spoken dialogue system development, comparing design methods and implementation tools highly suited for industry oriented spoken dialogue systems, and commenting on their interdependencies, in order to facilitate the developer’s choice of the optimal tools and methodologies. The latter are presented and assessed in the light of AVA, a real-life Automated Voice Agent that performs call routing and customer service tasks, employing advanced stochastic techniques for interpretation and allowing for free form user input and less rigid dialogue structure.


2018 ◽  
Vol 2018 ◽  
pp. 1-10
Author(s):  
Regina Jucks ◽  
Gesa A. Linnemann ◽  
Benjamin Brummernhenrich

Communicating with spoken dialogue systems (SDS) such as Apple’s Siri® and Google’s Now is becoming more and more common. We report a study that manipulates an SDS’s word use with regard to politeness. In an experiment, 58 young adults evaluated the spoken messages of our self-developed SDS as it replied to typical questions posed by university freshmen. The answers were either formulated politely or rudely. Dependent measures were both holistic measures of how students perceived the SDS as well as detailed evaluations of each single answer. Results show that participants not only evaluated the content of rude answers as being less appropriate and less pleasant than the polite answers, but also evaluated the rude system as less accurate. Lack of politeness also impacted aspects of the perceived trustworthiness of the SDS. We conclude that users of SDS expect such systems to be polite, and we then discuss some practical implications for designing SDS.


2016 ◽  
Vol 2016 ◽  
pp. 1-11
Author(s):  
David Griol ◽  
Zoraida Callejas

Spoken dialogue systems have been proposed to enable a more natural and intuitive interaction with the environment and human-computer interfaces. In this contribution, we present a framework based on neural networks that allows modeling of the user’s intention during the dialogue and uses this prediction to dynamically adapt the dialogue model of the system taking into consideration the user’s needs and preferences. We have evaluated our proposal to develop a user-adapted spoken dialogue system that facilitates tourist information and services and provide a detailed discussion of the positive influence of our proposal in the success of the interaction, the information and services provided, and the quality perceived by the users.


2000 ◽  
Vol 12 ◽  
pp. 387-416 ◽  
Author(s):  
M. A. Walker

This paper describes a novel method by which a spoken dialogue system can learn to choose an optimal dialogue strategy from its experience interacting with human users. The method is based on a combination of reinforcement learning and performance modeling of spoken dialogue systems. The reinforcement learning component applies Q-learning (Watkins, 1989), while the performance modeling component applies the PARADISE evaluation framework (Walker et al., 1997) to learn the performance function (reward) used in reinforcement learning. We illustrate the method with a spoken dialogue system named ELVIS (EmaiL Voice Interactive System), that supports access to email over the phone. We conduct a set of experiments for training an optimal dialogue strategy on a corpus of 219 dialogues in which human users interact with ELVIS over the phone. We then test that strategy on a corpus of 18 dialogues. We show that ELVIS can learn to optimize its strategy selection for agent initiative, for reading messages, and for summarizing email folders.


2014 ◽  
Author(s):  
Ioannis Klasinas ◽  
Elias Iosif ◽  
Katerina Louka ◽  
Alexandros Potamianos

Sign in / Sign up

Export Citation Format

Share Document