scholarly journals Analyzing the Quality and Stability of a Streaming End-to-End On-Device Speech Recognizer

Author(s):  
Yuan Shangguan ◽  
Kate Knister ◽  
Yanzhang He ◽  
Ian McGraw ◽  
Françoise Beaufays
Keyword(s):  
2019 ◽  
Vol 7 ◽  
pp. 313-325 ◽  
Author(s):  
Matthias Sperber ◽  
Graham Neubig ◽  
Jan Niehues ◽  
Alex Waibel

Speech translation has traditionally been approached through cascaded models consisting of a speech recognizer trained on a corpus of transcribed speech, and a machine translation system trained on parallel texts. Several recent works have shown the feasibility of collapsing the cascade into a single, direct model that can be trained in an end-to-end fashion on a corpus of translated speech. However, experiments are inconclusive on whether the cascade or the direct model is stronger, and have only been conducted under the unrealistic assumption that both are trained on equal amounts of data, ignoring other available speech recognition and machine translation corpora. In this paper, we demonstrate that direct speech translation models require more data to perform well than cascaded models, and although they allow including auxiliary data through multi-task training, they are poor at exploiting such data, putting them at a severe disadvantage. As a remedy, we propose the use of end- to-end trainable models with two attention mechanisms, the first establishing source speech to source text alignments, the second modeling source to target text alignment. We show that such models naturally decompose into multi-task–trainable recognition and translation tasks and propose an attention-passing technique that alleviates error propagation issues in a previous formulation of a model with two attention stages. Our proposed model outperforms all examined baselines and is able to exploit auxiliary training data much more effectively than direct attentional models.


2021 ◽  
Author(s):  
Enrico Varano ◽  
Konstantinos Vougioukas ◽  
Pingchuan Ma ◽  
Stavros Petridis ◽  
Maja Pantic ◽  
...  

Understanding speech becomes a demanding task when the environment is noisy. Comprehension of speech in noise can be substantially improved by looking at the speake's face, and this audiovisual benefit is even more pronounced in people with hearing impairment. Recent advances in AI have allowed to synthesize photorealistic talking faces from a speech recording and a still image of a person's face in an end-to-end manner. However, it has remained unknown whether such facial animations improve speech-in-noise comprehension. Here we consider facial animations produced by a recently introduced generative adversarial network (GAN), and show that humans cannot distinguish between the synthesized and the natural videos. Importantly, we then show that the end-to-end synthesized videos significantly aid humans in understanding speech in noise, although the natural facial motions yield a yet higher audiovisual benefit. We further find that an audiovisual speech recognizer benefits from the synthesized facial animations as well. Our results suggest that synthesizing facial motions from speech can be used to aid speech comprehension in difficult listening environments.


2022 ◽  
Vol 15 ◽  
Author(s):  
Enrico Varano ◽  
Konstantinos Vougioukas ◽  
Pingchuan Ma ◽  
Stavros Petridis ◽  
Maja Pantic ◽  
...  

Understanding speech becomes a demanding task when the environment is noisy. Comprehension of speech in noise can be substantially improved by looking at the speaker’s face, and this audiovisual benefit is even more pronounced in people with hearing impairment. Recent advances in AI have allowed to synthesize photorealistic talking faces from a speech recording and a still image of a person’s face in an end-to-end manner. However, it has remained unknown whether such facial animations improve speech-in-noise comprehension. Here we consider facial animations produced by a recently introduced generative adversarial network (GAN), and show that humans cannot distinguish between the synthesized and the natural videos. Importantly, we then show that the end-to-end synthesized videos significantly aid humans in understanding speech in noise, although the natural facial motions yield a yet higher audiovisual benefit. We further find that an audiovisual speech recognizer (AVSR) benefits from the synthesized facial animations as well. Our results suggest that synthesizing facial motions from speech can be used to aid speech comprehension in difficult listening environments.


VASA ◽  
2016 ◽  
Vol 45 (3) ◽  
pp. 223-228 ◽  
Author(s):  
Jan Paweł Skóra ◽  
Jacek Kurcz ◽  
Krzysztof Korta ◽  
Przemysław Szyber ◽  
Tadeusz Andrzej Dorobisz ◽  
...  

Abstract. Background: We present the methods and results of the surgical management of extracranial carotid artery aneurysms (ECCA). Postoperative complications including early and late neurological events were analysed. Correlation between reconstruction techniques and morphology of ECCA was assessed in this retrospective study. Patients and methods: In total, 32 reconstructions of ECCA were performed in 31 symptomatic patients with a mean age of 59.2 (range 33 - 84) years. The causes of ECCA were divided among atherosclerosis (n = 25; 78.1 %), previous carotid endarterectomy with Dacron patch (n = 4; 12.5 %), iatrogenic injury (n = 2; 6.3 %) and infection (n = 1; 3.1 %). In 23 cases, intervention consisted of carotid bypass. Aneurysmectomy with end-to-end suture was performed in 4 cases. Aneurysmal resection with patching was done in 2 cases and aneurysmorrhaphy without patching in another 2 cases. In 1 case, ligature of the internal carotid artery (ICA) was required. Results: Technical success defined as the preservation of ICA patency was achieved in 31 cases (96.9 %). There was one perioperative death due to major stroke (3.1 %). Two cases of minor stroke occurred in the 30-day observation period (6.3 %). Three patients had a transient hypoglossal nerve palsy that subsided spontaneously (9.4 %). At a mean long-term follow-up of 68 months, there were no major or minor ipsilateral strokes or surgery-related deaths reported. In all 30 surviving patients (96.9 %), long-term clinical outcomes were free from ipsilateral neurological symptoms. Conclusions: Open surgery is a relatively safe method in the therapy of ECCA. Surgical repair of ECCAs can be associated with an acceptable major stroke rate and moderate minor stroke rate. Complication-free long-term outcomes can be achieved in as many as 96.9 % of patients. Aneurysmectomy with end-to-end anastomosis or bypass surgery can be implemented during open repair of ECCA.


Author(s):  
Ahmed Mousa ◽  
Ossama M. Zakaria ◽  
Mai A. Elkalla ◽  
Lotfy A. Abdelsattar ◽  
Hamad Al-Game'a

AbstractThis study was aimed to evaluate different management modalities for peripheral vascular trauma in children, with the aid of the Mangled Extremity Severity Score (MESS). A single-center retrospective analysis took place between 2010 and 2017 at University Hospitals, having emergencies and critical care centers. Different types of vascular repair were adopted by skillful vascular experts and highly trained pediatric surgeons. Patients were divided into three different age groups. Group I included those children between 5 and 10 years; group II involved pediatrics between 11 and 15 years; while children between 16 and 21 years participated in group III. We recruited 183 children with peripheral vascular injuries. They were 87% males and 13% females, with the mean age of 14.72 ± 04. Arteriorrhaphy was performed in 32%; end-to-end anastomosis and natural vein graft were adopted in 40.5 and 49%, respectively. On the other hand, 10.5% underwent bypass surgery. The age groups I and II are highly susceptible to penetrating trauma (p = 0.001), while patients with an extreme age (i.e., group III) are more susceptible to blunt injury (p = 0.001). The MESS has a significant correlation to both age groups I and II (p = 0.001). Vein patch angioplasty and end-to-end primary repair should be adopted as the main treatment options for the repair of extremity vascular injuries in children. Moreover, other treatment modalities, such as repair with autologous vein graft/bypass surgery, may be adopted whenever possible. They are cost-effective, reliable, and simple techniques with fewer postoperative complication, especially in poor/limited resources.


2014 ◽  
Vol 1 (1) ◽  
pp. 9-34
Author(s):  
Bobby Suryajaya

SKK Migas plans to apply end-to-end security based on Web Services Security (WS-Security) for Sistem Operasi Terpadu (SOT). However, there are no prototype or simulation results that can support the plan that has already been communicated to many parties. This paper proposes an experiment that performs PRODML data transfer using WS-Security by altering the WSDL to include encryption and digital signature. The experiment utilizes SoapUI, and successfully loaded PRODML WSDL that had been altered with WSP-Policy based on X.509 to transfer a SOAP message.


Controlling ◽  
2019 ◽  
Vol 31 (6) ◽  
pp. 63-65
Author(s):  
Carsten Speckmann ◽  
Péter Horváth

MindSphere ist das cloudbasierte, offene IoT-Betriebssystem von Siemens. Es verbindet Produkte, Anlagen, Systeme und Maschinen und ermöglicht es so, die Fülle von Daten aus dem Internet der Dinge (IoT) mit umfangreichen Analysen zu nutzen. Als eine sichere, skalierbare End-to-End-Lösung für die Industrie sorgt MindSphere für die Konnektivität von Anlagen und liefert somit handlungsrelevante Geschäftserkenntnisse, die zur Steigerung der Produktivität und Effizienz im gesamten Unternehmen nutzbar gemacht werden können. MindSphere ist weltweit verfügbar.


Sign in / Sign up

Export Citation Format

Share Document