Speech Assistance for Persons With Speech Impediments Using Artificial Neural Networks

Author(s):  
Ramy Mounir ◽  
Redwan Alqasemi ◽  
Rajiv Dubey

This work focuses on the research related to enabling individuals with speech impairment to use speech-to-text software to recognize and dictate their speech. Automatic Speech Recognition (ASR) tends to be a challenging problem for researchers because of the wide range of speech variability. Some of the variabilities include different accents, pronunciations, speeds, volumes, etc. It is very difficult to train an end-to-end speech recognition model on data with speech impediment due to the lack of large enough datasets, and the difficulty of generalizing a speech disorder pattern on all users with speech impediments. This work highlights the different techniques used in deep learning to achieve ASR and how it can be modified to recognize and dictate speech from individuals with speech impediments.

Electronics ◽  
2021 ◽  
Vol 10 (11) ◽  
pp. 1350
Author(s):  
Andreas Krug ◽  
Maral Ebrahimzadeh ◽  
Jost Alemann ◽  
Jens Johannsmeier ◽  
Sebastian Stober

Deep Learning-based Automatic Speech Recognition (ASR) models are very successful, but hard to interpret. To gain a better understanding of how Artificial Neural Networks (ANNs) accomplish their tasks, several introspection methods have been proposed. However, established introspection techniques are mostly designed for computer vision tasks and rely on the data being visually interpretable, which limits their usefulness for understanding speech recognition models. To overcome this limitation, we developed a novel neuroscience-inspired technique for visualizing and understanding ANNs, called Saliency-Adjusted Neuron Activation Profiles (SNAPs). SNAPs are a flexible framework to analyze and visualize Deep Neural Networks that does not depend on visually interpretable data. In this work, we demonstrate how to utilize SNAPs for understanding fully-convolutional ASR models. This includes visualizing acoustic concepts learned by the model and the comparative analysis of their representations in the model layers.


2021 ◽  
Vol 13 (0) ◽  
pp. 1-5
Author(s):  
Mantas Tamulionis

Methods based on artificial neural networks (ANN) are widely used in various audio signal processing tasks. This provides opportunities to optimize processes and save resources required for calculations. One of the main objects we need to get to numerically capture the acoustics of a room is the room impulse response (RIR). Increasingly, research authors choose not to record these impulses in a real room but to generate them using ANN, as this gives them the freedom to prepare unlimited-sized training datasets. Neural networks are also used to augment the generated impulses to make them similar to the ones actually recorded. The widest use of ANN so far is observed in the evaluation of the generated results, for example, in automatic speech recognition (ASR) tasks. This review also describes datasets of recorded RIR impulses commonly found in various studies that are used as training data for neural networks.


2021 ◽  
Vol 5 (CHI PLAY) ◽  
pp. 1-20
Author(s):  
David Halbhuber ◽  
Niels Henze ◽  
Valentin Schwind

Cloud gaming services and remote play offer a wide range of advantages but can inherent a considerable delay between input and action also known as latency. Previous work indicates that deep learning algorithms such as artificial neural networks (ANN) are able to compensate for latency. As high latency in video games significantly reduces player performance and game experience, this work investigates if latency can be compensated using ANNs within a live first-person action game. We developed a 3D video game and coupled it with the prediction of an ANN. We trained our network on data of 24 participants who played the game in a first study. We evaluated our system in a second user study with 96 participants. To simulate latency in cloud game streaming services, we added 180 ms latency to the game by buffering user inputs. In the study we predicted latency values of 60 ms, 120 ms and 180 ms. Our results show that players achieve significantly higher scores, substantially more hits per shot and associate the game significantly stronger with a positive affect when supported by our ANN. This work illustrates that high latency systems, such as game streaming services, benefit from utilizing a predictive system.


2021 ◽  
pp. 1-44
Author(s):  
Tyler L. Hayes ◽  
Giri P. Krishnan ◽  
Maxim Bazhenov ◽  
Hava T. Siegelmann ◽  
Terrence J. Sejnowski ◽  
...  

Abstract Replay is the reactivation of one or more neural patterns that are similar to the activation patterns experienced during past waking experiences. Replay was first observed in biological neural networks during sleep, and it is now thought to play a critical role in memory formation, retrieval, and consolidation. Replay-like mechanisms have been incorporated in deep artificial neural networks that learn over time to avoid catastrophic forgetting of previous knowledge. Replay algorithms have been successfully used in a wide range of deep learning methods within supervised, unsupervised, and reinforcement learning paradigms. In this letter, we provide the first comprehensive comparison between replay in the mammalian brain and replay in artificial neural networks. We identify multiple aspects of biological replay that are missing in deep learning systems and hypothesize how they could be used to improve artificial neural networks.


Author(s):  
Rishabh Nevatia

Abstract: Lip reading is the visual task of interpreting phrases from lip movements. While speech is one of the most common ways of communicating among individuals, understanding what a person wants to convey while having access only to their lip movements is till date a task that has not seen its paradigm. Various stages are involved in the process of automated lip reading, ranging from extraction of features to applying neural networks. This paper covers various deep learning approaches that are used for lip reading Keywords: Automatic Speech Recognition, Lip Reading, Neural Networks, Feature Extraction, Deep Learning


2020 ◽  
Vol 6 (1) ◽  
Author(s):  
Malte Seemann ◽  
Lennart Bargsten ◽  
Alexander Schlaefer

AbstractDeep learning methods produce promising results when applied to a wide range of medical imaging tasks, including segmentation of artery lumen in computed tomography angiography (CTA) data. However, to perform sufficiently, neural networks have to be trained on large amounts of high quality annotated data. In the realm of medical imaging, annotations are not only quite scarce but also often not entirely reliable. To tackle both challenges, we developed a two-step approach for generating realistic synthetic CTA data for the purpose of data augmentation. In the first step moderately realistic images are generated in a purely numerical fashion. In the second step these images are improved by applying neural domain adaptation. We evaluated the impact of synthetic data on lumen segmentation via convolutional neural networks (CNNs) by comparing resulting performances. Improvements of up to 5% in terms of Dice coefficient and 20% for Hausdorff distance represent a proof of concept that the proposed augmentation procedure can be used to enhance deep learning-based segmentation for artery lumen in CTA images.


Sign in / Sign up

Export Citation Format

Share Document