Embodied Conversation

Author(s):  
Andrej Zgank ◽  
Izidor Mlakar ◽  
Uros Berglez ◽  
Danilo Zimsek ◽  
Matej Borko ◽  
...  

The chapter presents an overview of human-computer interfaces, which are a crucial element of an ambient intelligence solution. The focus is given to the embodied conversational agents, which are needed to communicate with users in a most natural way. Different input and output modalities, with supporting methods, to process the captured information (e.g., automatic speech recognition, gesture recognition, natural language processing, dialog processing, text to speech synthesis, etc.), have the crucial role to provide the high level of quality of experience to the user. As an example, usage of embodied conversational agent for e-Health domain is proposed.

2002 ◽  
Vol 1 (1) ◽  
pp. 35-61 ◽  
Author(s):  
Marcelo Dascal

Ever since Descartes singled out the ability to use natural language appropriately in any given circumstance as the proof that humans — unlike animals and machines — have minds, an idea that Turing transformed into his well-known test to determine whether machines have intelligence, the close connection between language and cognition has been widely acknowledged, although it was accounted for in quite different ways. Recent advances in natural language processing, as well as attempts to create “embodied conversational agents” which couple language processing with that of its natural bodily correlates (gestures, facial expression and gaze direction), in the hope of developing human-computer interfaces based on natural — rather than formal — language, have again brought to the fore the question of how far we can hope machines to be able to master the cognitive abilities required for language use. In this paper, I approach this issue from a different angle, inquiring whether language can be viewed as a “cognitive technology”, employed by humans as a tool for the performance of certain cognitive tasks. I propose a definition of “cognitive technology” that encompasses both external (or “prosthetic”) and internal cognitive devices. A number of parameters in terms of which a typology of cognitive technologies of both kinds can be sketched is also set forth. It is then argued that inquiring about language’s role in cognition allows us to re-frame the traditional debate about the relationship between language and thought, by examining how specific aspects of language actually influence cognition — as an environment, a resource, or a tool. This perspective helps bring together the contributions of the philosophical “linguistic turn” in epistemology and the incipient “epistemology of cognitive technology” It also permits a more precise and fruitful discussion of the question whether, to what extent, and which of the language-based cognitive technologies we naturally use can be emulated by the kinds of technologies presently or in the foreseeable future available.


Author(s):  
Vladimir Ortega-Gonza´lez ◽  
Samir Garbaya ◽  
Fre´de´ric Merienne

In this paper we briefly describe an approach for understanding the psychoacoustic and perceptual effects of what we have identified as the high-level spatial properties of 3D audio. The necessity of this study is firstly presented within the context of interactive applications such as Virtual Reality and Human Computer Interfaces. As a result of the bibliographic research in the field we identified the main potential functions of 3D audio spatial stimulation in interactive applications beyond traditional sound spatialization. In the same sense, a classification of the high-level aspects involved in spatial audio stimulation is proposed and explained. Immediately, the case of study, the experimental methodology and the framework are described. Finally, we present the expected results as well as their usefulness within the context of a larger project.


Author(s):  
Jagadish S Kallimani ◽  
V. K Ananthashayana ◽  
Debjani Goswami

Text-to-speech synthesis is a complex combination of language processing, signal processing and computer science. Ubiquitous computing (ubicomp) is a post-desktop model of human-computer interaction in which information processing has been thoroughly integrated into everyday objects and activities. Speech synthesis is the generation of synthesized speech from text. This chapter deals with the development of a Text to Speech (TTS) Synthesis system for an Indian regional language by considering Bengali as the language. This chapter highlights various methods which may be used for speech synthesis and also it provides an overview on the problems and difficulties in Bengali text to speech conversion. Variations in the prosody (speech parameters – volume, pitch, intonation, amplitude) of the speech yields the emotional aspects (anger, happy, normal), which are applied to our developed TTS system.


Author(s):  
G. Chroust

Information systems are designed for the people, by the people. The design of software systems with the help of software systems is another aspect of human-computer interfaces. New methods and their (non-)acceptance play an important role. Motivational factors of systems developers considerably influence the type and quality of the systems they develop (Arbaoui, Lonchamp & Montangero, 1999; Kumar & Bjoern-Andersen, 1990). To some extent, the quality of systems is a result of their developers’ willingness to accept new and (supposedly) better technology (Jones, 1995). A typical example is component-based development methodology (Bachmann et al., 2000; Cheesman & Daniels, 2001). Despite considerable publication effort and public lip service, component-based software development (CBD) appears to be getting a slower start than anticipated and hoped for. One key reason stems from the psychological and motivational attitudes of software developers (Campell, 2001; Lynex & Layzell, 1997). We therefore analyze the attitudes that potentially hamper the adoption of the component-based software development approach. Maslow’s Hierarchy of Need (Boeree, 1998; Maslow, 1943) is used for structuring the motives.


2010 ◽  
Vol 2010 ◽  
pp. 1-5 ◽  
Author(s):  
A. B. Usakli ◽  
S. Gurkan ◽  
F. Aloise ◽  
G. Vecchiato ◽  
F. Babiloni

The aim of this study is to present electrooculogram signals that can be used for human computer interface efficiently. Establishing an efficient alternative channel for communication without overt speech and hand movements is important to increase the quality of life for patients suffering from Amyotrophic Lateral Sclerosis or other illnesses that prevent correct limb and facial muscular responses. We have made several experiments to compare the P300-based BCI speller and EOG-based new system. A five-letter word can be written on average in 25 seconds and in 105 seconds with the EEG-based device. Giving message such as “clean-up” could be performed in 3 seconds with the new system. The new system is more efficient than P300-based BCI system in terms of accuracy, speed, applicability, and cost efficiency. Using EOG signals, it is possible to improve the communication abilities of those patients who can move their eyes.


2015 ◽  
Vol 11 (4) ◽  
Author(s):  
Grzegorz M. Wójcik ◽  
Piotr Wierzgała ◽  
Anna Gajos

AbstractElectroencephalography (EEG) has become more popular, and as a result, the market grows with new EEG products. The new EEG solutions offer higher mobility, easier application, and lower price. One of such devices that recently became popular is Emotiv EEG. It has been already tested in various applications concerning brain-computer interfaces, neuromarketing, language processing, and detection of the P-300 component, with a general result that it is capable of recording satisfying research data. However, no one has tested and described its usefulness in long-term research. This article presents experience from using Emotiv EEG in two research projects that involved 39 subjects for 22 sessions. Emotiv EEG has significant technical issues concerning the quality of its screw threads. Two complete and successful solutions to this problem are described.


Author(s):  
Roger K. Moore

The past twenty-five years have witnessed a steady improvement in the capabilities of spoken language technology, first in the research laboratory and more recently in the commercial marketplace. Progress has reached a point where automatic speech recognition software for dictating documents onto a computer is available as an inexpensive consumer product in most computer stores, text-to-speech synthesis can be heard in public places giving automated voice announcements, and interactive voice response is becoming a familiar option for people paying bills or booking cinema tickets over the telephone. This article looks at the main computational approaches employed in contemporary spoken language processing. It discusses acoustic modelling, language modelling, pronunciation modelling, and noise modelling. The article also considers future prospects in the context of the obvious shortcomings of current technology, and briefly addresses the potential for achieving a unified approach to human and machine spoken language processing.


Sign in / Sign up

Export Citation Format

Share Document