Automatic speech recognition in the booth

Target ◽  
2020 ◽  
Author(s):  
Bart Defrancq ◽  
Claudio Fantinuoli

Abstract Automatic Speech Recognition (ASR) has been proposed as a means to enhance state-of-the-art computer-assisted interpreting (CAI) tools and to allow machine-learning techniques to enter the workflow of professional interpreters. In this article, we test the usefulness of real-time transcription with number highlighting of a source speech for simultaneous interpreting using InterpretBank ASR. The system’s precision is high (96%) and its latency low enough to fit interpreters’ ear–voice span (EVS). We evaluate the potential benefits among first-time users of this technology by applying an error matrix and by investigating the users’ subjective perceptions through a questionnaire. The results show that the ASR provision improves overall performance for almost all number types. Interaction with the ASR support is varied and participants consult it for just over half of the stimuli. The study also provides some evidence of the psychological benefits of ASR availability and of overreliance on ASR support.

Tradterm ◽  
2018 ◽  
Vol 32 ◽  
pp. 9-31
Author(s):  
Luis Eduardo Schild Ortiz ◽  
Patrizia Cavallo

In recent years, several studies have indicated interpreters resist adopting new technologies. Yet, such technologies have enabled the development of several tools to help those professionals. In this paper, using bibliographical and documental research, we briefly analyse the tools cited by several authors to identify which ones remain up to date and available on the market. Following that, we present concepts about automation, and observe the usage of automatic speech recognition (ASR), while analysing its potential benefits and the current level of maturity of such an approach, especially regarding Computer-Assisted Interpreting (CAI) tools. The goal of this paper is to present the community of interpreters and researchers with a view of the state of the art in technology for interpreting as well as some future perspectives for this area.


2020 ◽  
pp. 263-281
Author(s):  
Mahmoud Gaber ◽  
Gloria Corpas Pastor ◽  
Ahmed Omer

Although interpreting has not yet benefited from technology as much as its sister field, translation, interest in developing tailor-made solutions for interpreters has risen sharply in recent years. In particular, Automatic Speech Recognition (ASR) is being used as a central component of Computer-Assisted Interpreting (CAI) tools, either bundled or standalone. This study pursues three main aims: (i) to establish the most suitable ASR application for building ad hoc corpora by comparing several ASR tools and assessing their performance; (ii) to use ASR in order to extract terminology from the transcriptions obtained from video-recorded speeches, in this case talks on climate change and adaptation; and (iii) to promote the adoption of ASR as a new documentation tool among interpreters. To the best of our knowledge, this is one of the first studies to explore the possibility of Speech-to-Text (S2T) technology for meeting the preparatory needs of interpreters as regards terminology and background/domain knowledge.


2021 ◽  
Vol 11 (15) ◽  
pp. 6695
Author(s):  
Cristian Tejedor-García ◽  
Valentín Cardeñoso-Payo ◽  
David Escudero-Mancebo

General-purpose automatic speech recognition (ASR) systems have improved in quality and are being used for pronunciation assessment. However, the assessment of isolated short utterances, such as words in minimal pairs for segmental approaches, remains an important challenge, even more so for non-native speakers. In this work, we compare the performance of our own tailored ASR system (kASR) with the one of Google ASR (gASR) for the assessment of Spanish minimal pair words produced by 33 native Japanese speakers in a computer-assisted pronunciation training (CAPT) scenario. Participants in a pre/post-test training experiment spanning four weeks were split into three groups: experimental, in-classroom, and placebo. The experimental group used the CAPT tool described in the paper, which we specially designed for autonomous pronunciation training. A statistically significant improvement for the experimental and in-classroom groups was revealed, and moderate correlation values between gASR and kASR results were obtained, in addition to strong correlations between the post-test scores of both ASR systems and the CAPT application scores found at the final stages of application use. These results suggest that both ASR alternatives are valid for assessing minimal pairs in CAPT tools, in the current configuration. Discussion on possible ways to improve our system and possibilities for future research are included.


2020 ◽  
Vol 5 (2) ◽  
pp. 193-197
Author(s):  
Esti Junining ◽  
Sony Alif ◽  
Nuria Setiarini

This study is intended to help English as a Foreign Language (EFL) learners in Indonesia to reduce their anxiety level while speaking in front of other people. This study helps to develop an atmosphere that encourages students to practice speaking independently. The interesting atmosphere can be obtained by using Automatic Speech Recognition (ASR) where every student can practice speaking individually without feeling anxious or pressurized, because he/she can practice independently in front of a computer or a gadget. This study used research and development design as it tried to develop a product which can create an atmosphere that encourages students to practice their speaking. The instrument used is a questionnaire which is used to analyze the students’ need of learning English. This study developed a product which utilized ASR technology using C# programming language. This study revealed that the product developed using ASR can make students practice speaking individually without feeling anxious and pressurized.


2017 ◽  
Vol 8 (2) ◽  
pp. 48
Author(s):  
Lina Fathi Sidig Sidgi ◽  
Ahmad Jelani Shaari

The use of technology, such as computer-assisted language learning (CALL), is used in teaching and learning in the foreign language classrooms where it is most needed. One promising emerging technology that supports language learning is automatic speech recognition (ASR). Integrating such technology, especially in the instruction of pronunciation in the classroom, is important in helping students to achieve correct pronunciation. In Iraq, English is a foreign language, and it is not surprising that learners commit many pronunciation mistakes. One factor contributing to these mistakes is the difference between the Arabic and English phonetic systems. Thus, the sound transformation from the mother tongue (Arabic) to the target language (English) is one barrier for Arab learners. The purpose of this study is to investigate the effectiveness of using automatic speech recognition ASR EyeSpeak software in improving the pronunciation of Iraqi learners of English. An experimental research project with a pretest-posttest design is conducted over a one-month period in the Department of English at Al-Turath University College in Baghdad, Iraq. The ten participants are randomly selected first-year college students enrolled in a pronunciation class that uses traditional teaching methods and ASR EyeSpeak software. The findings show that using EyeSpeak software leads to a significant improvement in the students’ English pronunciation, evident from the test scores they achieve after using EyeSpeak software. 


Author(s):  
Cristian Tejedor-García ◽  
Valentín Cardeñoso-Payo ◽  
David Escudero-Mancebo

General–purpose automatic speech recognition (ASR) systems have improved their quality and are being used for pronunciation assessment. However, the assessment of isolated short utterances, as words in minimal pairs for segmental approaches, remains an important challenge, even more for non-native speakers. In this work, we compare the performance of our own tailored ASR system (kASR) with the one of Google ASR (gASR) for the assessment of Spanish minimal pair words produced by 33 native Japanese speakers in a computer-assisted pronunciation training (CAPT) scenario. Participants of a pre/post-test training experiment spanning four weeks were split into three groups: experimental, in-classroom, and placebo. Experimental group used the CAPT tool described in the paper, which we specially designed for autonomous pronunciation training. Statistically significant improvement for experimental and in-classroom groups is revealed, and moderate correlation values between gASR and kASR results were obtained, beside strong correlations between the post-test scores of both ASR systems with the CAPT application scores found at the final stages of application use. These results suggest that both ASR alternatives are valid for assessing minimal pairs in CAPT tools, in the current configuration. Discussion on possible ways to improve our system and possibilities for future research are included.


Babel ◽  
2020 ◽  
Vol 66 (4-5) ◽  
pp. 733-749
Author(s):  
Silhee Jin

Abstract This paper proposes a model of delivering live interlingual subtitling (LIS) as a formal translation and interpreting (T&I) service in Korea, replacing the existing model of combining stenographic transcriptions with simultaneous interpreting. The model proposes that these two processes, currently delivered by two different professional groups, should be converged using respeaking technology. As a means to supply relevant talent, the paper proposes that formal interpreting and translation schools should include respeaking using automatic speech recognition technology (ASR) as part of their training.


Author(s):  
Sandeep Badrinath ◽  
Hamsa Balakrishnan

A significant fraction of communications between air traffic controllers and pilots is through speech, via radio channels. Automatic transcription of air traffic control (ATC) communications has the potential to improve system safety, operational performance, and conformance monitoring, and to enhance air traffic controller training. We present an automatic speech recognition model tailored to the ATC domain that can transcribe ATC voice to text. The transcribed text is used to extract operational information such as call-sign and runway number. The models are based on recent improvements in machine learning techniques for speech recognition and natural language processing. We evaluate the performance of the model on diverse datasets.


Sign in / Sign up

Export Citation Format

Share Document