scholarly journals Is human-human spoken interaction manageable? The emergence of the concept: ‘Conversation Intelligence’

2018 ◽  
Vol 6 (1) ◽  
pp. 1-14 ◽  
Author(s):  
Vered Silber-Varod

Currently, via the mediation of audio mining technology and conversational user interfaces, and after years of constant improvements of Automatic Speech Recognition technology, conversation intelligence is an emerging concept, significant to the understanding of human-human communication in its most natural and primitive channel – our voice. This paper introduces the concept of Conversation Intelligence (CI), which is becoming crucial to the study of humanhuman speech interaction and communication management and is part of the field of speech analytics. CI is demonstrated on two established discourse terms – power relations and convergence. Finally, this paper highlights the importance of visualization for large-scale speech analytics.

2020 ◽  
Author(s):  
Ryo Masumura ◽  
Naoki Makishima ◽  
Mana Ihori ◽  
Akihiko Takashima ◽  
Tomohiro Tanaka ◽  
...  

2019 ◽  
Vol 15 (7) ◽  
pp. P162-P163
Author(s):  
Francesca K. Cormack ◽  
Nick Taptiklis ◽  
Jennifer H. Barnett ◽  
Merina Su

2019 ◽  
Vol 15 ◽  
pp. P897-P897
Author(s):  
Francesca K. Cormack ◽  
Nick Taptiklis ◽  
Jennifer H. Barnett ◽  
Merina Su

Author(s):  
Khalid Majrashi

Voice User Interfaces (VUIs) are increasingly popular owing to improvements in automatic speech recognition. However, the understanding of user interaction with VUIs, particularly Arabic VUIs, remains limited. Hence, this research compared user performance, learnability, and satisfaction when using voice and keyboard-and-mouse input modalities for text creation on Arabic user interfaces. A Voice-enabled Email Interface (VEI) and a Traditional Email Interface (TEI) were developed. Forty participants attempted pre-prepared and self-generated message creation tasks using voice on the VEI, and the keyboard-and-mouse modal on the TEI. The results showed that participants were faster (by 1.76 to 2.67 minutes) in pre-prepared message creation using voice than using the keyboard and mouse. Participants were also faster (by 1.72 to 2.49 minutes) in self-generated message creation using voice than using the keyboard and mouse. Although the learning curves were more efficient with the VEI, more participants were satisfied with the TEI. With the VEI, participants reported problems, such as misrecognitions and misspellings, but were satisfied about the visibility of possible executable commands and about the overall accuracy of voice recognition.


2020 ◽  
Vol 10 (19) ◽  
pp. 6936 ◽  
Author(s):  
Jeong-Uk Bang ◽  
Seung Yun ◽  
Seung-Hi Kim ◽  
Mu-Yeol Choi ◽  
Min-Kyu Lee ◽  
...  

This paper introduces a large-scale spontaneous speech corpus of Korean, named KsponSpeech. This corpus contains 969 h of general open-domain dialog utterances, spoken by about 2000 native Korean speakers in a clean environment. All data were constructed by recording the dialogue of two people freely conversing on a variety of topics and manually transcribing the utterances. The transcription provides a dual transcription consisting of orthography and pronunciation, and disfluency tags for spontaneity of speech, such as filler words, repeated words, and word fragments. This paper also presents the baseline performance of an end-to-end speech recognition model trained with KsponSpeech. In addition, we investigated the performance of standard end-to-end architectures and the number of sub-word units suitable for Korean. We investigated issues that should be considered in spontaneous speech recognition in Korean. KsponSpeech is publicly available on an open data hub site of the Korea government.


2020 ◽  
Vol 4 (Supplement_1) ◽  
pp. 493-493
Author(s):  
Nancy Hodgson ◽  
Ani Nencova ◽  
Laura Gitlin ◽  
Emily Summerhayes

Abstract Careful fidelity monitoring is critical to implementing evidence-based interventions in dementia care settings to ensure that the intervention is delivered consistently and as intended. Most approaches to fidelity monitoring rely on human coding of content that has been covered during a session or of stylistic aspects of the intervention, including rapport, empathy, enthusiasm and are unrealistic to implement on a large scale in real world settings. Technological advances in automatic speech recognition and language and speech processing offers potential solutions to overcome these barriers. We compare three commercial automatic speech recognition tools on spoken content drawn from dementia care interactions to determine the accuracy of recognition and the guarantees for privacy offered by each provider. Data were obtained from recorded sessions of the Dementia Behavior Study intervention trial (NCT01892579). We find that despite their impressive performance in general applications, automatic speech recognition systems work less well for older adults and people of color. We outline a plan for automating fidelity in interaction style and content which would be integrated in an online program for training dementia care providers.


2019 ◽  
Vol 34 (4) ◽  
pp. 335-348
Author(s):  
Do Quoc Truong ◽  
Pham Ngoc Phuong ◽  
Tran Hoang Tung ◽  
Luong Chi Mai

Automatic Speech Recognition (ASR) systems convert human speech into the corresponding transcription automatically. They have a wide range of applications such as controlling robots, call center analytics, voice chatbot. Recent studies on ASR for English have achieved the performance that surpasses human ability. The systems were trained on a large amount of training data and performed well under many environments. With regards to Vietnamese, there have been many studies on improving the performance of existing ASR systems, however, many of them are conducted on a small-scaled data, which does not reflect realistic scenarios. Although the corpora used to train the system were carefully design to maintain phonetic balance properties, efforts in collecting them at a large-scale are still limited. Specifically, only a certain accent of Vietnam was evaluated in existing works. In this paper, we first describe our efforts in collecting a large data set that covers all 3 major accents of Vietnam located in the Northern, Center, and Southern regions. Then, we detail our ASR system development procedure utilizing the collected data set and evaluating different model architectures to find the best structure for Vietnamese. In the VLSP 2018 challenge, our system achieved the best performance with 6.5% WER and on our internal test set with more than 10 hours of speech collected real environments, the system also performs well with 11% WER


Sign in / Sign up

Export Citation Format

Share Document