scholarly journals An Ergonomic Framework for Researching and Designing Speech Recognition Technologies in Health Care with an Emphasis on Safety

Author(s):  
Tim Arnold ◽  
Helen J. A. Fuller

Automatic speech recognition (ASR) systems and speech interfaces are becoming increasingly prevalent. This includes increases in and expansion of use of these technologies for supporting work in health care. Computer-based speech processing has been extensively studied and developed over decades. Speech processing tools have been fine-tuned through the work of Speech and Language Researchers. Researchers have previously and continue to describe speech processing errors in medicine. The discussion provided in this paper proposes an ergonomic framework for speech recognition to expand and further describe this view of speech processing in supporting clinical work. With this end in mind, we hope to build on previous work and emphasize the need for increased human factors involvement in this area while also facilitating the discussion of speech recognition in contexts that have been explored in the human factors domain. Human factors expertise can contribute through proactively describing and designing these critical interconnected socio-technical systems with error-tolerance in mind.

2017 ◽  
Vol 60 (9) ◽  
pp. 2394-2405 ◽  
Author(s):  
Lionel Fontan ◽  
Isabelle Ferrané ◽  
Jérôme Farinas ◽  
Julien Pinquier ◽  
Julien Tardieu ◽  
...  

Purpose The purpose of this article is to assess speech processing for listeners with simulated age-related hearing loss (ARHL) and to investigate whether the observed performance can be replicated using an automatic speech recognition (ASR) system. The long-term goal of this research is to develop a system that will assist audiologists/hearing-aid dispensers in the fine-tuning of hearing aids. Method Sixty young participants with normal hearing listened to speech materials mimicking the perceptual consequences of ARHL at different levels of severity. Two intelligibility tests (repetition of words and sentences) and 1 comprehension test (responding to oral commands by moving virtual objects) were administered. Several language models were developed and used by the ASR system in order to fit human performances. Results Strong significant positive correlations were observed between human and ASR scores, with coefficients up to .99. However, the spectral smearing used to simulate losses in frequency selectivity caused larger declines in ASR performance than in human performance. Conclusion Both intelligibility and comprehension scores for listeners with simulated ARHL are highly correlated with the performances of an ASR-based system. In the future, it needs to be determined if the ASR system is similarly successful in predicting speech processing in noise and by older people with ARHL.


2009 ◽  
pp. 128-148
Author(s):  
Eric Petajan

Automatic Speech Recognition (ASR) is the most natural input modality from humans to machines. When the hands are busy or a full keyboard is not available, speech input is especially in demand. Since the most compelling application scenarios for ASR include noisy environments (mobile phones, public kiosks, cars), visual speech processing must be incorporated to provide robust performance. This chapter motivates and describes the MPEG-4 Face and Body Animation (FBA) standard for representing visual speech data as part of a whole virtual human specification. The super low bit-rate FBA codec included with the standard enables thin clients to access processing and communication services over any network including enhanced visual communication, animated entertainment, man-machine dialog, and audio/visual speech recognition.


2020 ◽  
Vol 4 (Supplement_1) ◽  
pp. 493-493
Author(s):  
Nancy Hodgson ◽  
Ani Nencova ◽  
Laura Gitlin ◽  
Emily Summerhayes

Abstract Careful fidelity monitoring is critical to implementing evidence-based interventions in dementia care settings to ensure that the intervention is delivered consistently and as intended. Most approaches to fidelity monitoring rely on human coding of content that has been covered during a session or of stylistic aspects of the intervention, including rapport, empathy, enthusiasm and are unrealistic to implement on a large scale in real world settings. Technological advances in automatic speech recognition and language and speech processing offers potential solutions to overcome these barriers. We compare three commercial automatic speech recognition tools on spoken content drawn from dementia care interactions to determine the accuracy of recognition and the guarantees for privacy offered by each provider. Data were obtained from recorded sessions of the Dementia Behavior Study intervention trial (NCT01892579). We find that despite their impressive performance in general applications, automatic speech recognition systems work less well for older adults and people of color. We outline a plan for automating fidelity in interaction style and content which would be integrated in an online program for training dementia care providers.


2017 ◽  
Author(s):  
Thomas Schatz ◽  
Francis Bach ◽  
Emmanuel Dupoux

We test the potential of standard Automatic Speech Recognition (ASR) systems trained on large corpora of continuous speech as quantitative models of human speech processing. In human adults, speech perception is attuned to efficiently process native speech sounds, at the expense of difficulties in pro- cessing non-native sounds. We use ABX-discriminability measures to test whether ASR models can account for the patterns of confusion between speech sounds observed in humans. We show that ASR models reproduce some well-documented effects in non-native phonetic perception. Beyond the immediate results, our methodology opens up the possibility of a more systematic investigation of phonetic category perception in humans.


Author(s):  
Kulwinder Singh ◽  
Vishal Goyal ◽  
Parshant Rana

Reading is an essential skill for literacy development in children. But it is a challenge for children with dyslexia because of phonological-core deficits. Poor reading skills have an impact on vocabulary development and to exposure to relevant background knowledge. It affects the ability to interpret what one sees and hears or the ability to link information from different parts of the brain. Dyslexic children face many challenges in their educational life due to reading difficulty. Support to dyslexic children include computer-based applications and multi-sensory methods like text-to-speech and character animation techniques. Some applications provide immediate reading intervention facility. Automatic speech recognition (ASR) is a new platform with immediate intervention for assisting dyslexic children to improve their reading ability. Findings contribute to develop a suitable approach to correct the reading mistakes of dyslexic children. Speech recognition technology provides the most interactive environment between human and machine.


Sign in / Sign up

Export Citation Format

Share Document