Is human-human spoken interaction manageable? The emergence of the concept: ‘Conversation Intelligence’

Currently, via the mediation of audio mining technology and conversational user interfaces, and after years of constant improvements of Automatic Speech Recognition technology, conversation intelligence is an emerging concept, significant to the understanding of human-human communication in its most natural and primitive channel – our voice. This paper introduces the concept of Conversation Intelligence (CI), which is becoming crucial to the study of humanhuman speech interaction and communication management and is part of the field of speech analytics. CI is demonstrated on two established discourse terms – power relations and convergence. Finally, this paper highlights the importance of visualization for large-scale speech analytics.

Download Full-text

Phoneme-to-Grapheme Conversion Based Large-Scale Pre-Training for End-to-End Automatic Speech Recognition

10.21437/interspeech.2020-1930 ◽

2020 ◽

Author(s):

Ryo Masumura ◽

Naoki Makishima ◽

Mana Ihori ◽

Akihiko Takashima ◽

Tomohiro Tanaka ◽

...

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Large Scale ◽

End To End

Download Full-text

Large-Scale Mixed-Bandwidth Deep Neural Network Acoustic Modeling for Automatic Speech Recognition

10.21437/interspeech.2019-2641 ◽

2019 ◽

Author(s):

Khoi-Nguyen C. Mac ◽

Xiaodong Cui ◽

Wei Zhang ◽

Michael Picheny

Keyword(s):

Neural Network ◽

Speech Recognition ◽

Automatic Speech Recognition ◽

Large Scale ◽

Deep Neural Network ◽

Acoustic Modeling

Download Full-text

LARGE-SCALE REMOTE ASSESSMENT OF VERBAL COGNITIVE FUNCTION USING AUTOMATIC SPEECH RECOGNITION

Alzheimer s & Dementia ◽

10.1016/j.jalz.2019.06.4334 ◽

2019 ◽

Vol 15 (7) ◽

pp. P162-P163

Author(s):

Francesca K. Cormack ◽

Nick Taptiklis ◽

Jennifer H. Barnett ◽

Merina Su

Keyword(s):

Speech Recognition ◽

Cognitive Function ◽

Automatic Speech Recognition ◽

Large Scale ◽

Remote Assessment

Download Full-text

O3-06-06: LARGE-SCALE REMOTE ASSESSMENT OF VERBAL COGNITIVE FUNCTION USING AUTOMATIC SPEECH RECOGNITION

Alzheimer s & Dementia ◽

10.1016/j.jalz.2019.06.4659 ◽

2019 ◽

Vol 15 ◽

pp. P897-P897

Author(s):

Francesca K. Cormack ◽

Nick Taptiklis ◽

Jennifer H. Barnett ◽

Merina Su

Keyword(s):

Speech Recognition ◽

Cognitive Function ◽

Automatic Speech Recognition ◽

Large Scale ◽

Remote Assessment

Download Full-text

Large Scale Evaluation of Importance Maps in Automatic Speech Recognition

10.21437/interspeech.2020-2883 ◽

2020 ◽

Author(s):

Viet Anh Trinh ◽

Michael I. Mandel

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Large Scale ◽

Scale Evaluation

Download Full-text

Voice Versus Keyboard and Mouse for Text Creation on Arabic User Interfaces

The International Arab Journal of Information Technology ◽

10.34028/iajit/19/1/15 ◽

2022 ◽

Author(s):

Khalid Majrashi

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

User Interfaces ◽

Voice Recognition ◽

User Interaction ◽

Learning Curves ◽

User Performance ◽

Voice User

Voice User Interfaces (VUIs) are increasingly popular owing to improvements in automatic speech recognition. However, the understanding of user interaction with VUIs, particularly Arabic VUIs, remains limited. Hence, this research compared user performance, learnability, and satisfaction when using voice and keyboard-and-mouse input modalities for text creation on Arabic user interfaces. A Voice-enabled Email Interface (VEI) and a Traditional Email Interface (TEI) were developed. Forty participants attempted pre-prepared and self-generated message creation tasks using voice on the VEI, and the keyboard-and-mouse modal on the TEI. The results showed that participants were faster (by 1.76 to 2.67 minutes) in pre-prepared message creation using voice than using the keyboard and mouse. Participants were also faster (by 1.72 to 2.49 minutes) in self-generated message creation using voice than using the keyboard and mouse. Although the learning curves were more efficient with the VEI, more participants were satisfied with the TEI. With the VEI, participants reported problems, such as misrecognitions and misspellings, but were satisfied about the visibility of possible executable commands and about the overall accuracy of voice recognition.

Download Full-text

KsponSpeech: Korean Spontaneous Speech Corpus for Automatic Speech Recognition

Applied Sciences ◽

10.3390/app10196936 ◽

2020 ◽

Vol 10 (19) ◽

pp. 6936 ◽

Cited By ~ 1

Author(s):

Jeong-Uk Bang ◽

Seung Yun ◽

Seung-Hi Kim ◽

Mu-Yeol Choi ◽

Min-Kyu Lee ◽

...

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Large Scale ◽

Open Data ◽

Spontaneous Speech ◽

Open Domain ◽

Speech Corpus ◽

Clean Environment ◽

End To End ◽

Repeated Words

This paper introduces a large-scale spontaneous speech corpus of Korean, named KsponSpeech. This corpus contains 969 h of general open-domain dialog utterances, spoken by about 2000 native Korean speakers in a clean environment. All data were constructed by recording the dialogue of two people freely conversing on a variety of topics and manually transcribing the utterances. The transcription provides a dual transcription consisting of orthography and pronunciation, and disfluency tags for spontaneity of speech, such as filler words, repeated words, and word fragments. This paper also presents the baseline performance of an end-to-end speech recognition model trained with KsponSpeech. In addition, we investigated the performance of standard end-to-end architectures and the number of sub-word units suitable for Korean. We investigated issues that should be considered in spontaneous speech recognition in Korean. KsponSpeech is publicly available on an open data hub site of the Korea government.

Download Full-text

Feasibility of Automating Fidelity Monitoring in a Dementia Care Intervention

Innovation in Aging ◽

10.1093/geroni/igaa057.1593 ◽

2020 ◽

Vol 4 (Supplement_1) ◽

pp. 493-493

Author(s):

Nancy Hodgson ◽

Ani Nencova ◽

Laura Gitlin ◽

Emily Summerhayes

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Speech Processing ◽

Large Scale ◽

Dementia Care ◽

Care Providers ◽

Behavior Study ◽

Technological Advances ◽

Fidelity Monitoring ◽

Study Intervention

Abstract Careful fidelity monitoring is critical to implementing evidence-based interventions in dementia care settings to ensure that the intervention is delivered consistently and as intended. Most approaches to fidelity monitoring rely on human coding of content that has been covered during a session or of stylistic aspects of the intervention, including rapport, empathy, enthusiasm and are unrealistic to implement on a large scale in real world settings. Technological advances in automatic speech recognition and language and speech processing offers potential solutions to overcome these barriers. We compare three commercial automatic speech recognition tools on spoken content drawn from dementia care interactions to determine the accuracy of recognition and the guarantees for privacy offered by each provider. Data were obtained from recorded sessions of the Dementia Behavior Study intervention trial (NCT01892579). We find that despite their impressive performance in general applications, automatic speech recognition systems work less well for older adults and people of color. We outline a plan for automating fidelity in interaction style and content which would be integrated in an online program for training dementia care providers.

Download Full-text

DEVELOPMENT OF HIGH-PERFORMANCE AND LARGE-SCALE VIETNAMESE AUTOMATIC SPEECH RECOGNITION SYSTEMS

Journal of Computer Science and Cybernetics ◽

10.15625/1813-9663/34/4/13165 ◽

2019 ◽

Vol 34 (4) ◽

pp. 335-348

Author(s):

Do Quoc Truong ◽

Pham Ngoc Phuong ◽

Tran Hoang Tung ◽

Luong Chi Mai

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Call Center ◽

High Performance ◽

Large Scale ◽

System Development ◽

Large Data ◽

Training Data ◽

Data Set ◽

Wide Range

Automatic Speech Recognition (ASR) systems convert human speech into the corresponding transcription automatically. They have a wide range of applications such as controlling robots, call center analytics, voice chatbot. Recent studies on ASR for English have achieved the performance that surpasses human ability. The systems were trained on a large amount of training data and performed well under many environments. With regards to Vietnamese, there have been many studies on improving the performance of existing ASR systems, however, many of them are conducted on a small-scaled data, which does not reflect realistic scenarios. Although the corpora used to train the system were carefully design to maintain phonetic balance properties, efforts in collecting them at a large-scale are still limited. Specifically, only a certain accent of Vietnam was evaluated in existing works. In this paper, we first describe our efforts in collecting a large data set that covers all 3 major accents of Vietnam located in the Northern, Center, and Southern regions. Then, we detail our ASR system development procedure utilizing the collected data set and evaluating different model architectures to find the best structure for Vietnamese. In the VLSP 2018 challenge, our system achieved the best performance with 6.5% WER and on our internal test set with more than 10 hours of speech collected real environments, the system also performs well with 11% WER

Download Full-text

Automatic speech recognition at the University of Paris

PsycEXTRA Dataset ◽

10.1037/e506252009-002 ◽

1975 ◽

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

University Of Paris ◽

The University

Download Full-text