automated speech recognition
Recently Published Documents


TOTAL DOCUMENTS

56
(FIVE YEARS 12)

H-INDEX

7
(FIVE YEARS 2)

2021 ◽  
Vol 4 ◽  
Author(s):  
Zion Mengesha ◽  
Courtney Heldreth ◽  
Michal Lahav ◽  
Juliana Sublewski ◽  
Elyse Tuennerman

Automated speech recognition (ASR) converts language into text and is used across a variety of applications to assist us in everyday life, from powering virtual assistants, natural language conversations, to enabling dictation services. While recent work suggests that there are racial disparities in the performance of ASR systems for speakers of African American Vernacular English, little is known about the psychological and experiential effects of these failures paper provides a detailed examination of the behavioral and psychological consequences of ASR voice errors and the difficulty African American users have with getting their intents recognized. The results demonstrate that ASR failures have a negative, detrimental impact on African American users. Specifically, African Americans feel othered when using technology powered by ASR—errors surface thoughts about identity, namely about race and geographic location—leaving them feeling that the technology was not made for them. As a result, African Americans accommodate their speech to have better success with the technology. We incorporate the insights and lessons learned from sociolinguistics in our suggestions for linguistically responsive ways to build more inclusive voice systems that consider African American users’ needs, attitudes, and speech patterns. Our findings suggest that the use of a diary study can enable researchers to best understand the experiences and needs of communities who are often misunderstood by ASR. We argue this methodological framework could enable researchers who are concerned with fairness in AI to better capture the needs of all speakers who are traditionally misheard by voice-activated, artificially intelligent (voice-AI) digital systems.


2021 ◽  
Vol 45 (1) ◽  
Author(s):  
Jeannine Beeken

In this paper we address how Natural Language Processing (NLP) approaches and language technology can contribute to data services in different ways; from providing social science users with new approaches and tools to explore oral and textual data, to enhancing the search, findability and retrieval of data sources. By using linguistic approaches we are able to process data, for example using Automated Speech Recognition (ASR) and named entity recognizers (NER), extract key concepts and terms, and improve search strategies.  We provide examples of how computational linguistics contribute to and facilitate the mining and analysis of oral or textual material, for example (transcribed) interviews or oral histories, and show how free open source (OS) tools can be used very easily to gain a quick overview of the key features of text, which can be further exploited as useful metadata.


2021 ◽  
Vol 7 ◽  
pp. 205520762110021
Author(s):  
Catherine Diaz-Asper ◽  
Chelsea Chandler ◽  
R Scott Turner ◽  
Brigid Reynolds ◽  
Brita Elvevåg

Objective There is a critical need to develop rapid, inexpensive and easily accessible screening tools for mild cognitive impairment (MCI) and Alzheimer’s disease (AD). We report on the efficacy of collecting speech via the telephone to subsequently develop sensitive metrics that may be used as potential biomarkers by leveraging natural language processing methods. Methods Ninety-one older individuals who were cognitively unimpaired or diagnosed with MCI or AD participated from home in an audio-recorded telephone interview, which included a standard cognitive screening tool, and the collection of speech samples. In this paper we address six questions of interest: (1) Will elderly people agree to participate in a recorded telephone interview? (2) Will they complete it? (3) Will they judge it an acceptable approach? (4) Will the speech that is collected over the telephone be of a good quality? (5) Will the speech be intelligible to human raters? (6) Will transcriptions produced by automated speech recognition accurately reflect the speech produced? Results Participants readily agreed to participate in the telephone interview, completed it in its entirety, and rated the approach as acceptable. Good quality speech was produced for further analyses to be applied, and almost all recorded words were intelligible for human transcription. Not surprisingly, human transcription outperformed off the shelf automated speech recognition software, but further investigation into automated speech recognition shows promise for its usability in future work. Conclusion Our findings demonstrate that collecting speech samples from elderly individuals via the telephone is well tolerated, practical, and inexpensive, and produces good quality data for uses such as natural language processing.


2020 ◽  
Vol 117 (14) ◽  
pp. 7684-7689 ◽  
Author(s):  
Allison Koenecke ◽  
Andrew Nam ◽  
Emily Lake ◽  
Joe Nudell ◽  
Minnie Quartey ◽  
...  

Automated speech recognition (ASR) systems, which use sophisticated machine-learning algorithms to convert spoken language to text, have become increasingly widespread, powering popular virtual assistants, facilitating automated closed captioning, and enabling digital dictation platforms for health care. Over the last several years, the quality of these systems has dramatically improved, due both to advances in deep learning and to the collection of large-scale datasets used to train the systems. There is concern, however, that these tools do not work equally well for all subgroups of the population. Here, we examine the ability of five state-of-the-art ASR systems—developed by Amazon, Apple, Google, IBM, and Microsoft—to transcribe structured interviews conducted with 42 white speakers and 73 black speakers. In total, this corpus spans five US cities and consists of 19.8 h of audio matched on the age and gender of the speaker. We found that all five ASR systems exhibited substantial racial disparities, with an average word error rate (WER) of 0.35 for black speakers compared with 0.19 for white speakers. We trace these disparities to the underlying acoustic models used by the ASR systems as the race gap was equally large on a subset of identical phrases spoken by black and white individuals in our corpus. We conclude by proposing strategies—such as using more diverse training datasets that include African American Vernacular English—to reduce these performance differences and ensure speech recognition technology is inclusive.


Author(s):  
Lauren Werner ◽  
Gaojian Huang ◽  
Brandon J. Pitts

The number of older adults is growing significantly worldwide. At the same time, technological developments are rapidly evolving, and older populations are expected to interact more frequently with such sophisticated systems. Automated speech recognition (ASR) systems is an example of one technology that is increasingly present in daily life. However, age-related physical changes may alter speech production and limit the effectiveness of ASR systems for older individuals. The goal of this paper was to summarize the current knowledge on ASR systems and older adults. The PRISMA method was employed and 17 studies were compared on the basis of word error rate (WER). Overall, WER was found to be influenced by age, gender, and the number of speech samples used to train ASR systems. This work has implications for the development of future human-machine technologies that will be used by a wide range of age groups.


Sign in / Sign up

Export Citation Format

Share Document