speech translation
Recently Published Documents


TOTAL DOCUMENTS

482
(FIVE YEARS 138)

H-INDEX

15
(FIVE YEARS 4)

2021 ◽  
Vol E104.D (12) ◽  
pp. 2195-2208
Author(s):  
Sashi NOVITASARI ◽  
Sakriani SAKTI ◽  
Satoshi NAKAMURA

STEM Journal ◽  
2021 ◽  
Vol 22 (4) ◽  
pp. 59-74
Author(s):  
Christopher Irvin

For native English-speaking teachers, the ability to overcome communication issues caused by not having the same first language as their pupils is a challenge, especially with low-level students. The increased use of video lectures due to COVD-19 has made this even more difficult. This study was conducted to investigate whether the use of Artificial Intelligence-powered interlingual Simultaneous Speech Translation subtitled video lectures could be a practical solution to overcome this challenge. To that end, 14 participants from a first-semester prerequisite General English course took part in this study. A semi-structured interview was combined with surveys and descriptive statistics, and data was analyzed through qualitative means of thematic, descriptive, and inductive procedures that relied on simultaneous analysis and category construction. Key findings were as follows: First, respondents found the subtitled videos to be highly satisfactory and fairly accurate. Second, respondents reported greater content understanding as the main advance and less emphasis on improving listening ability as the primary disadvantage. Third, the use of English instead of Korean subtitles or subtitling only certain sections of the video in Korean were the main suggestions for the future. Specific responses from the student interviews and future implications are discussed.


2021 ◽  
Vol 2021 ◽  
pp. 1-14
Author(s):  
Wenbo Zhu ◽  
Hao Jin ◽  
WeiChang Yeh ◽  
Jianwen Chen ◽  
Lufeng Luo ◽  
...  

Speech translation (ST) is a bimodal conversion task from source speech to the target text. Generally, deep learning-based ST systems require sufficient training data to obtain a competitive result, even with a state-of-the-art model. However, the training data is usually unable to meet the completeness condition due to the small sample problems. Most low-resource ST tasks improve data integrity with a single model, but this optimization has a single dimension and limited effectiveness. In contrast, multimodality is introduced to leverage different dimensions of data features for multiperspective modeling. This approach mutually addresses the gaps in the different modalities to enhance the representation of the data and improve the utilization of the training samples. Therefore, it is a new challenge to leverage the enormous multimodal out-of-domain information to improve the low-resource tasks. This paper describes how to use multimodal out-of-domain information to improve low-resource models. First, we propose a low-resource ST framework to reconstruct large-scale label-free audio by combining self-supervised learning. At the same time, we introduce a machine translation (MT) pretraining model to complement text embedding and fine-tune decoding. In addition, we analyze the similarity at the decoder side. We reduce multimodal invalid pseudolabels by performing random depth pruning in the similarity layer to minimize error propagation and use additional CTC loss in the nonsimilarity layer to optimize the ensemble loss. Finally, we study the weighting ratio of the fusion technique in the multimodal decoder. Our experiment results show that the proposed method is promising for low-resource ST, with improvements of up to +3.6 BLEU points compared to baseline low-resource ST models.


Author(s):  
Ryo Fukuda ◽  
Sashi Novitasari ◽  
Yui Oka ◽  
Yasumasa Kano ◽  
Yuki Yano ◽  
...  

Author(s):  
Kak Soky ◽  
Masato Mimura ◽  
Tatsuya Kawahara ◽  
Sheng Li ◽  
Chenchen Ding ◽  
...  
Keyword(s):  

2021 ◽  
Author(s):  
Gareth Morlais

When you're making plans to get people using your language as much and as often as possible, there's a list of things related to Wikipedia which can really help. I'll share our experience with the Welsh language. Supporting the Welsh-language Wikipedia community forms Work Package 15 of 27 in the Welsh Government's Welsh Language Technology Action Plan https://gov.wales/sites/default/files/publications/2018-12/welsh-language-technology-and-digital-media-action-plan.pdf. We like supporting Welsh language Wikipedia editing workshops, video workshops and other channels that encourage people to create and publish Welsh-language video, audio, graphic and text content because we're on a mission to try to help double daily use of Welsh by 2050. I'll share developments we're funding in speech, translation and conversational AI. The partners we're giving grants to publish what they develop under open licence. So we can share what we've funded with many companies. We think Microsoft might have used some to make their new synthetic voices in Welsh. We're excited by the potential Wikidata offers. We'll look at its potential in populating Welsh maps this year. We've already used Wikipedia search data as a way of prioritising the training of a Welsh virtual assistant. Welsh may not be spending as much as Icelandic and Estonian do on language technologies, but we'd like to share what we're learning as a smaller language about the important areas to focus on and how Wikipedia can help.


2021 ◽  
Vol 4 (9) ◽  
pp. 166-178
Author(s):  
Adekunle Lawal

Language differences between parents and teachers, if not carefully managed, can cause miscommunication or communication gaps that could hinder both the school’s and students’ progress. This paper explores various ways of translating real-time conversations between teachers and parents who speak a different language. Fourteen K-12 teachers in the United States were surveyed and nine were interviewed to determine how English-speaking teachers can communicate effectively with non-English speaking parents. The findings from the study suggest Microsoft Translator technology for speech translation for conversations to break the language barrier, bridge communication gaps and promote effective bi/multilingual parent-teacher conferences.  


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Zhuo Chen

The signal corresponding to English speech contains a lot of redundant information and environmental interference information, which will produce a lot of distortion in the process of English speech translation signal recognition. Based on this, a large number of studies focus on encoding and processing English speech, so as to achieve high-precision speech recognition. The traditional wavelet denoising algorithm plays an obvious role in the recognition of English speech translation signals, which mainly depends on the excellent local time-frequency domain characteristics of the wavelet signal algorithm, but the traditional wavelet signal algorithm is still difficult to select the recognition threshold, and the recognition accuracy is easy to be affected. Based on this, this paper will improve the traditional wavelet denoising algorithm, abandon the single-threshold judgment of the original traditional algorithm, innovatively adopt the combination of soft threshold and hard threshold, further solve the distortion problem of the denoising algorithm in the process of English speech translation signal recognition, improve the signal-to-noise ratio of English speech recognition, and further reduce the root mean square error of the signal. Good noise reduction effect is realized, and the accuracy of speech recognition is improved. In the experiment, the algorithm is compared with the traditional algorithm based on MATLAB simulation software. The simulation results are consistent with the actual theoretical results. At the same time, the algorithm proposed in this paper has obvious advantages in the recognition accuracy of English speech translation signals, which reflects the superiority and practical value of the algorithm.


Author(s):  
Dikshita Patel ◽  
Minakshi Kudalkar ◽  
Shashank Gupta ◽  
Renuka Pawar
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document