Analyzing the Quality and Stability of a Streaming End-to-End On-Device Speech Recognizer

Speech translation has traditionally been approached through cascaded models consisting of a speech recognizer trained on a corpus of transcribed speech, and a machine translation system trained on parallel texts. Several recent works have shown the feasibility of collapsing the cascade into a single, direct model that can be trained in an end-to-end fashion on a corpus of translated speech. However, experiments are inconclusive on whether the cascade or the direct model is stronger, and have only been conducted under the unrealistic assumption that both are trained on equal amounts of data, ignoring other available speech recognition and machine translation corpora. In this paper, we demonstrate that direct speech translation models require more data to perform well than cascaded models, and although they allow including auxiliary data through multi-task training, they are poor at exploiting such data, putting them at a severe disadvantage. As a remedy, we propose the use of end- to-end trainable models with two attention mechanisms, the first establishing source speech to source text alignments, the second modeling source to target text alignment. We show that such models naturally decompose into multi-task–trainable recognition and translation tasks and propose an attention-passing technique that alleviates error propagation issues in a previous formulation of a model with two attention stages. Our proposed model outperforms all examined baselines and is able to exploit auxiliary training data much more effectively than direct attentional models.

Download Full-text

Speech-driven Facial Animations Improve Speech-in-Noise Comprehension of Humans

10.1101/2021.12.18.471222 ◽

2021 ◽

Author(s):

Enrico Varano ◽

Konstantinos Vougioukas ◽

Pingchuan Ma ◽

Stavros Petridis ◽

Maja Pantic ◽

...

Keyword(s):

Hearing Impairment ◽

Audiovisual Speech ◽

Speech Comprehension ◽

Generative Adversarial Network ◽

Still Image ◽

Speech In Noise ◽

Adversarial Network ◽

Listening Environments ◽

End To End ◽

Speech Recognizer

Understanding speech becomes a demanding task when the environment is noisy. Comprehension of speech in noise can be substantially improved by looking at the speake's face, and this audiovisual benefit is even more pronounced in people with hearing impairment. Recent advances in AI have allowed to synthesize photorealistic talking faces from a speech recording and a still image of a person's face in an end-to-end manner. However, it has remained unknown whether such facial animations improve speech-in-noise comprehension. Here we consider facial animations produced by a recently introduced generative adversarial network (GAN), and show that humans cannot distinguish between the synthesized and the natural videos. Importantly, we then show that the end-to-end synthesized videos significantly aid humans in understanding speech in noise, although the natural facial motions yield a yet higher audiovisual benefit. We further find that an audiovisual speech recognizer benefits from the synthesized facial animations as well. Our results suggest that synthesizing facial motions from speech can be used to aid speech comprehension in difficult listening environments.

Download Full-text

Speech-Driven Facial Animations Improve Speech-in-Noise Comprehension of Humans

Frontiers in Neuroscience ◽

10.3389/fnins.2021.781196 ◽

2022 ◽

Vol 15 ◽

Author(s):

Enrico Varano ◽

Konstantinos Vougioukas ◽

Pingchuan Ma ◽

Stavros Petridis ◽

Maja Pantic ◽

...

Keyword(s):

Hearing Impairment ◽

Audiovisual Speech ◽

Speech Comprehension ◽

Generative Adversarial Network ◽

Still Image ◽

Speech In Noise ◽

Adversarial Network ◽

Listening Environments ◽

End To End ◽

Speech Recognizer

Understanding speech becomes a demanding task when the environment is noisy. Comprehension of speech in noise can be substantially improved by looking at the speaker’s face, and this audiovisual benefit is even more pronounced in people with hearing impairment. Recent advances in AI have allowed to synthesize photorealistic talking faces from a speech recording and a still image of a person’s face in an end-to-end manner. However, it has remained unknown whether such facial animations improve speech-in-noise comprehension. Here we consider facial animations produced by a recently introduced generative adversarial network (GAN), and show that humans cannot distinguish between the synthesized and the natural videos. Importantly, we then show that the end-to-end synthesized videos significantly aid humans in understanding speech in noise, although the natural facial motions yield a yet higher audiovisual benefit. We further find that an audiovisual speech recognizer (AVSR) benefits from the synthesized facial animations as well. Our results suggest that synthesizing facial motions from speech can be used to aid speech comprehension in difficult listening environments.

Download Full-text

Surgical management of extracranial carotid artery aneurysms

VASA ◽

10.1024/0301-1526/a000528 ◽

2016 ◽

Vol 45 (3) ◽

pp. 223-228 ◽

Cited By ~ 2

Author(s):

Jan Paweł Skóra ◽

Jacek Kurcz ◽

Krzysztof Korta ◽

Przemysław Szyber ◽

Tadeusz Andrzej Dorobisz ◽

...

Keyword(s):

Carotid Artery ◽

Surgical Management ◽

Minor Stroke ◽

Stroke Rate ◽

Extracranial Carotid Artery ◽

Long Term Follow Up ◽

End To End ◽

Dacron Patch ◽

Major Stroke

Abstract. Background: We present the methods and results of the surgical management of extracranial carotid artery aneurysms (ECCA). Postoperative complications including early and late neurological events were analysed. Correlation between reconstruction techniques and morphology of ECCA was assessed in this retrospective study. Patients and methods: In total, 32 reconstructions of ECCA were performed in 31 symptomatic patients with a mean age of 59.2 (range 33 - 84) years. The causes of ECCA were divided among atherosclerosis (n = 25; 78.1 %), previous carotid endarterectomy with Dacron patch (n = 4; 12.5 %), iatrogenic injury (n = 2; 6.3 %) and infection (n = 1; 3.1 %). In 23 cases, intervention consisted of carotid bypass. Aneurysmectomy with end-to-end suture was performed in 4 cases. Aneurysmal resection with patching was done in 2 cases and aneurysmorrhaphy without patching in another 2 cases. In 1 case, ligature of the internal carotid artery (ICA) was required. Results: Technical success defined as the preservation of ICA patency was achieved in 31 cases (96.9 %). There was one perioperative death due to major stroke (3.1 %). Two cases of minor stroke occurred in the 30-day observation period (6.3 %). Three patients had a transient hypoglossal nerve palsy that subsided spontaneously (9.4 %). At a mean long-term follow-up of 68 months, there were no major or minor ipsilateral strokes or surgery-related deaths reported. In all 30 surviving patients (96.9 %), long-term clinical outcomes were free from ipsilateral neurological symptoms. Conclusions: Open surgery is a relatively safe method in the therapy of ECCA. Surgical repair of ECCAs can be associated with an acceptable major stroke rate and moderate minor stroke rate. Complication-free long-term outcomes can be achieved in as many as 96.9 % of patients. Aneurysmectomy with end-to-end anastomosis or bypass surgery can be implemented during open repair of ECCA.

Download Full-text

Reliability of the Mangled Extremity Severity Score in the Management of Peripheral Vascular Injuries in Children: A Retrospective Review

International Journal of Angiology ◽

10.1055/s-0040-1720970 ◽

2020 ◽

Author(s):

Ahmed Mousa ◽

Ossama M. Zakaria ◽

Mai A. Elkalla ◽

Lotfy A. Abdelsattar ◽

Hamad Al-Game'a

Keyword(s):

Severity Score ◽

Vein Graft ◽

Bypass Surgery ◽

Age Groups ◽

Vascular Injuries ◽

Peripheral Vascular ◽

Group Iii ◽

Mangled Extremity ◽

Group I ◽

End To End

AbstractThis study was aimed to evaluate different management modalities for peripheral vascular trauma in children, with the aid of the Mangled Extremity Severity Score (MESS). A single-center retrospective analysis took place between 2010 and 2017 at University Hospitals, having emergencies and critical care centers. Different types of vascular repair were adopted by skillful vascular experts and highly trained pediatric surgeons. Patients were divided into three different age groups. Group I included those children between 5 and 10 years; group II involved pediatrics between 11 and 15 years; while children between 16 and 21 years participated in group III. We recruited 183 children with peripheral vascular injuries. They were 87% males and 13% females, with the mean age of 14.72 ± 04. Arteriorrhaphy was performed in 32%; end-to-end anastomosis and natural vein graft were adopted in 40.5 and 49%, respectively. On the other hand, 10.5% underwent bypass surgery. The age groups I and II are highly susceptible to penetrating trauma (p = 0.001), while patients with an extreme age (i.e., group III) are more susceptible to blunt injury (p = 0.001). The MESS has a significant correlation to both age groups I and II (p = 0.001). Vein patch angioplasty and end-to-end primary repair should be adopted as the main treatment options for the repair of extremity vascular injuries in children. Moreover, other treatment modalities, such as repair with autologous vein graft/bypass surgery, may be adopted whenever possible. They are cost-effective, reliable, and simple techniques with fewer postoperative complication, especially in poor/limited resources.

Download Full-text

WS-Security on PRODML

ACMIT Proceedings ◽

10.33555/acmit.v1i1.16 ◽

2014 ◽

Vol 1 (1) ◽

pp. 9-34

Author(s):

Bobby Suryajaya

Keyword(s):

Web Services ◽

Digital Signature ◽

Data Transfer ◽

Web Services Security ◽

Soap Message ◽

Simulation Results ◽

End To End

SKK Migas plans to apply end-to-end security based on Web Services Security (WS-Security) for Sistem Operasi Terpadu (SOT). However, there are no prototype or simulation results that can support the plan that has already been communicated to many parties. This paper proposes an experiment that performs PRODML data transfer using WS-Security by altering the WSDL to include encryption and digital signature. The experiment utilizes SoapUI, and successfully loaded PRODML WSDL that had been altered with WSP-Policy based on X.509 to transfer a SOAP message.

Download Full-text

Mit MindSphere die digitale Transformation vorantreiben

Controlling ◽

10.15358/0935-0381-2019-6-63 ◽

2019 ◽

Vol 31 (6) ◽

pp. 63-65

Author(s):

Carsten Speckmann ◽

Péter Horváth

Keyword(s):

Digitale Transformation ◽

Internet Der Dinge ◽

End To End

MindSphere ist das cloudbasierte, offene IoT-Betriebssystem von Siemens. Es verbindet Produkte, Anlagen, Systeme und Maschinen und ermöglicht es so, die Fülle von Daten aus dem Internet der Dinge (IoT) mit umfangreichen Analysen zu nutzen. Als eine sichere, skalierbare End-to-End-Lösung für die Industrie sorgt MindSphere für die Konnektivität von Anlagen und liefert somit handlungsrelevante Geschäftserkenntnisse, die zur Steigerung der Produktivität und Effizienz im gesamten Unternehmen nutzbar gemacht werden können. MindSphere ist weltweit verfügbar.

Download Full-text

QMCP: QoS Aware Multi-Channel Path Discovery for End to End Data Transmission Over Cognitive Radio Ad Hoc Networks

International Review on Computers and Software (IRECOS) ◽

10.15866/irecos.v11i12.10978 ◽

2016 ◽

Vol 11 (12) ◽

pp. 1054

Author(s):

Nagul Shribala ◽

P. Srihari ◽

B. C. Jinaga

Keyword(s):

Cognitive Radio ◽

Ad Hoc Networks ◽

Data Transmission ◽

Ad Hoc ◽

Path Discovery ◽

End To End ◽

Hoc Networks

Download Full-text

Analyzing the Quality and Stability of a Streaming End-to-End On-Device Speech Recognizer

An End-to-End Language-Tracking Speech Recognizer for Mixed-Language Speech

Towards an end-to-end speech recognizer for Portuguese using deep neural networks

Attention-Passing Models for Robust and Data-Efficient End-to-End Speech Translation

Speech-driven Facial Animations Improve Speech-in-Noise Comprehension of Humans

Speech-Driven Facial Animations Improve Speech-in-Noise Comprehension of Humans

Surgical management of extracranial carotid artery aneurysms

Reliability of the Mangled Extremity Severity Score in the Management of Peripheral Vascular Injuries in Children: A Retrospective Review

WS-Security on PRODML

Mit MindSphere die digitale Transformation vorantreiben

QMCP: QoS Aware Multi-Channel Path Discovery for End to End Data Transmission Over Cognitive Radio Ad Hoc Networks

Export Citation Format