scholarly journals Designing Visual Interfaces to Support Voice Input

Author(s):  
Rita Santos ◽  
Joana Beja ◽  
Mário Rodrigues ◽  
Ciro Martins
2008 ◽  
Vol 66 (5) ◽  
pp. 318-332 ◽  
Author(s):  
Jaka Sodnik ◽  
Christina Dicke ◽  
Sašo Tomažič ◽  
Mark Billinghurst
Keyword(s):  

Author(s):  
Siegfried Handschuh ◽  
Tom Heath ◽  
VinhTuan Thai ◽  
Ian Dickinson ◽  
Lora Aroyo ◽  
...  

1985 ◽  
Vol 29 (4) ◽  
pp. 367-371
Author(s):  
Christopher G. Koch

Expert systems applications for special environments impose special requirements on the user-system interface. A study was conducted to determine requirements and define a design concept for the interface for an expert system being developed to support corrective maintenance and troubleshooting of gas turbine electronic equipment and controls. The resulting design specifies a portable unit containing color flat panel video/graphics display, special function membrane keypad, miniature printer, and headset with voice input/output. Communication with the expert system is structured by multiple-window information presentation and voice-activated control functions.


1999 ◽  
Vol 14 (2) ◽  
pp. 175-179
Author(s):  
TIZIANA CATARCI ◽  
GIUSEPPE SANTUCCI ◽  
LAURA TARANTINO

Author(s):  
Jiahao Chen ◽  
Ryota Nishimura ◽  
Norihide Kitaoka

Many end-to-end, large vocabulary, continuous speech recognition systems are now able to achieve better speech recognition performance than conventional systems. Most of these approaches are based on bidirectional networks and sequence-to-sequence modeling however, so automatic speech recognition (ASR) systems using such techniques need to wait for an entire segment of voice input to be entered before they can begin processing the data, resulting in a lengthy time-lag, which can be a serious drawback in some applications. An obvious solution to this problem is to develop a speech recognition algorithm capable of processing streaming data. Therefore, in this paper we explore the possibility of a streaming, online, ASR system for Japanese using a model based on unidirectional LSTMs trained using connectionist temporal classification (CTC) criteria, with local attention. Such an approach has not been well investigated for use with Japanese, as most Japanese-language ASR systems employ bidirectional networks. The best result for our proposed system during experimental evaluation was a character error rate of 9.87%.


2017 ◽  
Vol 7 (1.3) ◽  
pp. 121
Author(s):  
Sreeja B P ◽  
Amrutha K G ◽  
Jeni Benedicta J ◽  
Kalaiselvi V ◽  
Ranjani R

The conventional interactive mode is especially used for geometric modeling software. This paper describes, a voice-assisted geometric modeling mechanism to improve the performance of modeling, speech recognition technology is used to design this model. This model states that after receiving the voice command, the system uses the speech recognition engine to identify the voice commands, then the voice commands identified are parsed and processed to generate the geometric design based on the users voice input dimensions, The outcome of the system is capable of generating the geometric designs to the user via speech recognition. This work also focuses on receiving the feedback from the users and customized the model based on the feedback.


2018 ◽  
Vol 38 (2) ◽  
pp. 207-224 ◽  
Author(s):  
Melanie Revilla ◽  
Mick P. Couper ◽  
Oriol J. Bosch ◽  
Marc Asensio

We implemented an experiment within a smartphone web survey to explore the feasibility of using voice input (VI) options. Based on device used, participants were randomly assigned to a treatment or control group. Respondents in the iPhone operating system (iOS) treatment group were asked to use the dictation button, in which the voice was translated automatically into text by the device. Respondents with Android devices were asked to use a VI button which recorded the voice and transmitted the audio file. Both control groups were asked to answer open-ended questions using standard text entry. We found that the use of VI still presents a number of challenges for respondents. Voice recording (Android) led to substantially higher nonresponse, whereas dictation (iOS) led to slightly higher nonresponse, relative to text input. However, completion time was significantly reduced using VI. Among those who provided an answer, when dictation was used, we found fewer valid answers and less information provided, whereas for voice recording, longer and more elaborated answers were obtained. Voice recording (Android) led to significantly lower survey evaluations, but not dictation (iOS).


1982 ◽  
Vol 26 (3) ◽  
pp. 218-222
Author(s):  
Ken Funk ◽  
Ed McDowell

Lately, quite a lot of effort has been put into the development of voice input and output systems for human-machine communication. In this paper, we point out that while voice I/O is ideal for many applications, there are others for which it is ill-suited and there is a danger that fascination with the technology may well result in its misuse.


Sign in / Sign up

Export Citation Format

Share Document