voice user Latest Research Papers

Many Internet of Things devices have voice user interfaces. One of the most popular voice user interfaces is Amazon’s Alexa, which supports more than 50,000 third-party applications (“skills”). We study how Alexa’s integration of these skills may confuse users. Our survey of 237 participants found that users do not understand that skills are often operated by third parties, that they often confuse third-party skills with native Alexa functions, and that they are unaware of the functions that the native Alexa system supports. Surprisingly, users who interact with Alexa more frequently are more likely to conclude that a third-party skill is a native Alexa function. The potential for misunderstanding creates new security and privacy risks: attackers can develop third-party skills that operate without users’ knowledge or masquerade as native Alexa functions. To mitigate this threat, we make design recommendations to help users better distinguish native functionality and third-party skills, including audio and visual indicators of native and third-party contexts, as well as a consistent design standard to help users learn what functions are and are not possible on Alexa.

Download Full-text

Speed Dating with Voice User Interfaces: Understanding How Families Interact and Perceive Voice User Interfaces in a Group Setting

Frontiers in Robotics and AI ◽

10.3389/frobt.2021.730992 ◽

2022 ◽

Vol 8 ◽

Author(s):

Anastasia K. Ostrowski ◽

Jenny Fu ◽

Vasiliki Zygouras ◽

Hae Won Park ◽

Cynthia Breazeal

Keyword(s):

User Interfaces ◽

Family Members ◽

Social Robot ◽

Emotional Engagement ◽

Family Interactions ◽

Group Setting ◽

Group Settings ◽

The Social ◽

Voice User ◽

Elicitation Studies

As voice-user interfaces (VUIs), such as smart speakers like Amazon Alexa or social robots like Jibo, enter multi-user environments like our homes, it is critical to understand how group members perceive and interact with these devices. VUIs engage socially with users, leveraging multi-modal cues including speech, graphics, expressive sounds, and movement. The combination of these cues can affect how users perceive and interact with these devices. Through a set of three elicitation studies, we explore family interactions (N = 34 families, 92 participants, ages 4–69) with three commercially available VUIs with varying levels of social embodiment. The motivation for these three studies began when researchers noticed that families interacted differently with three agents when familiarizing themselves with the agents and, therefore, we sought to further investigate this trend in three subsequent studies designed as a conceptional replication study. Each study included three activities to examine participants’ interactions with and perceptions of the three VUIS in each study, including an agent exploration activity, perceived personality activity, and user experience ranking activity. Consistent for each study, participants interacted significantly more with an agent with a higher degree of social embodiment, i.e., a social robot such as Jibo, and perceived the agent as more trustworthy, having higher emotional engagement, and having higher companionship. There were some nuances in interaction and perception with different brands and types of smart speakers, i.e., Google Home versus Amazon Echo, or Amazon Show versus Amazon Echo Spot between the studies. In the last study, a behavioral analysis was conducted to investigate interactions between family members and with the VUIs, revealing that participants interacted more with the social robot and interacted more with their family members around the interactions with the social robot. This paper explores these findings and elaborates upon how these findings can direct future VUI development for group settings, especially in familial settings.

Download Full-text

Voice Versus Keyboard and Mouse for Text Creation on Arabic User Interfaces

The International Arab Journal of Information Technology ◽

10.34028/iajit/19/1/15 ◽

2022 ◽

Author(s):

Khalid Majrashi

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

User Interfaces ◽

Voice Recognition ◽

User Interaction ◽

Learning Curves ◽

User Performance ◽

Voice User

Voice User Interfaces (VUIs) are increasingly popular owing to improvements in automatic speech recognition. However, the understanding of user interaction with VUIs, particularly Arabic VUIs, remains limited. Hence, this research compared user performance, learnability, and satisfaction when using voice and keyboard-and-mouse input modalities for text creation on Arabic user interfaces. A Voice-enabled Email Interface (VEI) and a Traditional Email Interface (TEI) were developed. Forty participants attempted pre-prepared and self-generated message creation tasks using voice on the VEI, and the keyboard-and-mouse modal on the TEI. The results showed that participants were faster (by 1.76 to 2.67 minutes) in pre-prepared message creation using voice than using the keyboard and mouse. Participants were also faster (by 1.72 to 2.49 minutes) in self-generated message creation using voice than using the keyboard and mouse. Although the learning curves were more efficient with the VEI, more participants were satisfied with the TEI. With the VEI, participants reported problems, such as misrecognitions and misspellings, but were satisfied about the visibility of possible executable commands and about the overall accuracy of voice recognition.

Download Full-text

Voice User Interface: Literature review, challenges and future directions

SYSTEM THEORY, CONTROL AND COMPUTING JOURNAL ◽

10.52846/stccj.2021.1.2.26 ◽

2021 ◽

Vol 1 (2) ◽

pp. 65-89

Author(s):

Francis Rakotomalala ◽

Hasindraibe Niriarijaona Randriatsarafara ◽

Aimé Richard Hajalalaina ◽

Ndaohialy Manda Vy Ravonimanantsoa

Keyword(s):

User Interfaces ◽

Human Interaction ◽

Component Level ◽

Natural User Interfaces ◽

Voice User ◽

Assistant Systems ◽

Level 3 ◽

New Type ◽

The Subject ◽

Future Work

Natural user interfaces are increasingly popular these days. One of the most common of these user interfaces today are voice-activated interfaces, in particular intelligent voice assistants such as Google Assistant, Alexa, Cortana and Siri. However, the results show that although there are many services available, there is still a lot to be done to improve the usability of these systems. Speech recognition, contextual understanding and human interaction are the issues that are not yet solved in this field. In this context, this research paper focuses on the state of the art and knowledge of work on intelligent voice interfaces, challenges and issues related to this field, in particular on interaction quality, usability, security and usability. As such, the study also examines voice assistant architecture components following the expansion of the use of technologies such as wearable computing in order to improve the user experience. Moreover, the presentation of new emerging technologies in this field will be the subject of a section in this work. The main contributions of this paper are therefore: (1) overview of existing research, (2) analysis and exploration of the field of intelligent voice assistant systems, with details at the component level, (3) identification of areas that require further research and development, with the aim of increasing its use, (4) various proposals for research directions and orientations for future work, and finally, (5) study of the feasibility of designing a new type of voice assistant and general presentation of the latter, whose realisation will be the subject of a thesis.

Download Full-text

Examining Narrative Sonification: Using First-Person Retrospection Methods to Translate Radio Production to Interaction Design

ACM Transactions on Computer-Human Interaction ◽

10.1145/3461762 ◽

2021 ◽

Vol 28 (6) ◽

pp. 1-34

Author(s):

Jordan Wirfs-Brock ◽

Alli Fam ◽

Laura Devendorf ◽

Brian Keegan

Keyword(s):

User Interfaces ◽

Interaction Design ◽

Sound Studies ◽

First Person ◽

Auditory Display ◽

Design Data ◽

Retrospective Design ◽

Voice User ◽

Rapid Pace

We present a first-person, retrospective exploration of two radio sonification pieces that employ narrative scaffolding to teach audiences how to listen to data. To decelerate and articulate design processes that occurred at the rapid pace of radio production, the sound designer and producer wrote retrospective design accounts. We then revisited the radio pieces through principles drawn from guidance design, data storytelling, visualization literacy, and sound studies. Finally, we speculated how these principles might be applied through interactive, voice-based technologies. First-person methods enabled us to access the implicit knowledge embedded in radio production and translate it to technologies of interest to the human–computer-interaction community, such as voice user interfaces that rely on auditory display. Traditionally, sonification practitioners have focused more on generating sounds than on teaching people how to listen; our process, however, treated sound and narrative as a holistic, sonic-narrative experience. Our first-person retrospection illuminated the role of narrative in designing to support people as they learn to listen to data.

Download Full-text

Using Different Error Handling Strategies to Facilitate Older Users’ Interaction With Chatbots in Learning Information and Communication Technologies

Frontiers in Psychology ◽

10.3389/fpsyg.2021.785815 ◽

2021 ◽

Vol 12 ◽

Author(s):

Weijane Lin ◽

Hong-Chun Chen ◽

Hsiu-Ping Yueh

Keyword(s):

Information And Communication Technologies ◽

User Interfaces ◽

Communication Technologies ◽

Literacy Education ◽

Error Handling ◽

Older Users ◽

Voice User ◽

Information And Communication ◽

The Voice ◽

User Experiment

To support older users’ accessibility and learning of the prevalent information and communication technologies (ICTs), libraries, as informal learning institutes, are committed to information literacy education activities with friendly interfaces. Chatbots using Voice User Interfaces (VUIs) with natural and intuitive interactions have received growing research and practical attention; however, older users report regular frustrations and problems in using them. To serve as a basis for the subsequent design and development of an automated dialog mechanism in senior-friendly chatbots, a between-subject user experiment was conducted with 30 older adults divided into three groups. The preliminary findings on their interactions with the voice chatbots designed with different error handling strategies were reported. Participants’ behavioral patterns, performances, and the tactics they employed in interacting with the three types of chatbots were analyzed. The results of the study showed that the use of multiple error handling strategies is beneficial for older users to achieve effectiveness and satisfaction in human-robot interactions, and facilitate their attitude toward information technology. This study contributes empirical evidence in the genuine and pragmatic field of gerontechnology and expands upon voice chatbots research by exploring conversation errors in human-robot interactions that could be of further application in designing educational and living gerontechnology.

Download Full-text