Speech Assistance for Persons With Speech Impediments Using Artificial Neural Networks

Volume 3: Biomedical and Biotechnology Engineering ◽

10.1115/imece2017-71027 ◽

2017 ◽

Author(s):

Ramy Mounir ◽

Redwan Alqasemi ◽

Rajiv Dubey

Keyword(s):

Neural Networks ◽

Artificial Neural Networks ◽

Deep Learning ◽

Speech Recognition ◽

Automatic Speech Recognition ◽

Challenging Problem ◽

Speech Impairment ◽

Recognition Model ◽

Wide Range ◽

Speech Variability

This work focuses on the research related to enabling individuals with speech impairment to use speech-to-text software to recognize and dictate their speech. Automatic Speech Recognition (ASR) tends to be a challenging problem for researchers because of the wide range of speech variability. Some of the variabilities include different accents, pronunciations, speeds, volumes, etc. It is very difficult to train an end-to-end speech recognition model on data with speech impediment due to the lack of large enough datasets, and the difficulty of generalizing a speech disorder pattern on all users with speech impediments. This work highlights the different techniques used in deep learning to achieve ASR and how it can be modified to recognize and dictate speech from individuals with speech impediments.

Download Full-text

Analyzing and Visualizing Deep Neural Networks for Speech Recognition with Saliency-Adjusted Neuron Activation Profiles

Electronics ◽

10.3390/electronics10111350 ◽

2021 ◽

Vol 10 (11) ◽

pp. 1350

Author(s):

Andreas Krug ◽

Maral Ebrahimzadeh ◽

Jost Alemann ◽

Jens Johannsmeier ◽

Sebastian Stober

Keyword(s):

Neural Networks ◽

Computer Vision ◽

Artificial Neural Networks ◽

Deep Learning ◽

Comparative Analysis ◽

Speech Recognition ◽

Automatic Speech Recognition ◽

Deep Neural Networks ◽

Neuron Activation ◽

Flexible Framework

Deep Learning-based Automatic Speech Recognition (ASR) models are very successful, but hard to interpret. To gain a better understanding of how Artificial Neural Networks (ANNs) accomplish their tasks, several introspection methods have been proposed. However, established introspection techniques are mostly designed for computer vision tasks and rely on the data being visually interpretable, which limits their usefulness for understanding speech recognition models. To overcome this limitation, we developed a novel neuroscience-inspired technique for visualizing and understanding ANNs, called Saliency-Adjusted Neuron Activation Profiles (SNAPs). SNAPs are a flexible framework to analyze and visualize Deep Neural Networks that does not depend on visually interpretable data. In this work, we demonstrate how to utilize SNAPs for understanding fully-convolutional ASR models. This includes visualizing acoustic concepts learned by the model and the comparative analysis of their representations in the model layers.

Download Full-text

AN OVERVIEW OF METHODS FOR GENERATING, AUGMENTING AND EVALUATING ROOM IMPULSE RESPONSE USING ARTIFICIAL NEURAL NETWORKS

Mokslas - Lietuvos ateitis ◽

10.3846/mla.2021.15152 ◽

2021 ◽

Vol 13 (0) ◽

pp. 1-5

Author(s):

Mantas Tamulionis

Keyword(s):

Neural Networks ◽

Signal Processing ◽

Artificial Neural Networks ◽

Speech Recognition ◽

Impulse Response ◽

Automatic Speech Recognition ◽

Audio Signal ◽

Training Data ◽

Audio Signal Processing ◽

Artificial Neural

Methods based on artificial neural networks (ANN) are widely used in various audio signal processing tasks. This provides opportunities to optimize processes and save resources required for calculations. One of the main objects we need to get to numerically capture the acoustics of a room is the room impulse response (RIR). Increasingly, research authors choose not to record these impulses in a real room but to generate them using ANN, as this gives them the freedom to prepare unlimited-sized training datasets. Neural networks are also used to augment the generated impulses to make them similar to the ones actually recorded. The widest use of ANN so far is observed in the evaluation of the generated results, for example, in automatic speech recognition (ASR) tasks. This review also describes datasets of recorded RIR impulses commonly found in various studies that are used as training data for neural networks.

Download Full-text

An efficient noise-robust automatic speech recognition system using artificial neural networks

2016 International Conference on Communication and Signal Processing (ICCSP) ◽

10.1109/iccsp.2016.7754495 ◽

2016 ◽

Cited By ~ 2

Author(s):

Santosh Gupta ◽

Kishor M. Bhurchandi ◽

Avinash G. Keskar

Keyword(s):

Neural Networks ◽

Artificial Neural Networks ◽

Speech Recognition ◽

Automatic Speech Recognition ◽

Recognition System ◽

Speech Recognition System ◽

Automatic Speech Recognition System ◽

Artificial Neural ◽

Noise Robust

Download Full-text

Automatic speech recognition using hidden Markov models and artificial neural networks

IEEE International Conference on Neural Networks ◽

10.1109/icnn.1993.298825 ◽

2002 ◽

Author(s):

N.M. Botros ◽

M. Siddiqi ◽

M.Z. Deiri

Keyword(s):

Neural Networks ◽

Artificial Neural Networks ◽

Speech Recognition ◽

Hidden Markov Models ◽

Automatic Speech Recognition ◽

Markov Models ◽

Hidden Markov ◽

Artificial Neural

Download Full-text

Fast speaker adaptation of artificial neural networks for automatic speech recognition

2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100) ◽

10.1109/icassp.2000.862102 ◽

2002 ◽

Cited By ~ 12

Author(s):

S. Dupont ◽

L. Cheboub

Keyword(s):

Neural Networks ◽

Artificial Neural Networks ◽

Speech Recognition ◽

Automatic Speech Recognition ◽

Artificial Neural

Download Full-text

Increasing Player Performance and Game Experience in High Latency Systems

Proceedings of the ACM on Human-Computer Interaction ◽

10.1145/3474710 ◽

2021 ◽

Vol 5 (CHI PLAY) ◽

pp. 1-20

Author(s):

David Halbhuber ◽

Niels Henze ◽

Valentin Schwind

Keyword(s):

Neural Networks ◽

Artificial Neural Networks ◽

Deep Learning ◽

Video Games ◽

Positive Affect ◽

Video Game ◽

User Study ◽

Wide Range ◽

Streaming Services ◽

Player Performance

Cloud gaming services and remote play offer a wide range of advantages but can inherent a considerable delay between input and action also known as latency. Previous work indicates that deep learning algorithms such as artificial neural networks (ANN) are able to compensate for latency. As high latency in video games significantly reduces player performance and game experience, this work investigates if latency can be compensated using ANNs within a live first-person action game. We developed a 3D video game and coupled it with the prediction of an ANN. We trained our network on data of 24 participants who played the game in a first study. We evaluated our system in a second user study with 96 participants. To simulate latency in cloud game streaming services, we added 180 ms latency to the game by buffering user inputs. In the study we predicted latency values of 60 ms, 120 ms and 180 ms. Our results show that players achieve significantly higher scores, substantially more hits per shot and associate the game significantly stronger with a positive affect when supported by our ANN. This work illustrates that high latency systems, such as game streaming services, benefit from utilizing a predictive system.

Download Full-text

Replay in Deep Learning: Current Approaches and Missing Biological Elements

Neural Computation ◽

10.1162/neco_a_01433 ◽

2021 ◽

pp. 1-44

Author(s):

Tyler L. Hayes ◽

Giri P. Krishnan ◽

Maxim Bazhenov ◽

Hava T. Siegelmann ◽

Terrence J. Sejnowski ◽

...

Keyword(s):

Neural Networks ◽

Artificial Neural Networks ◽

Deep Learning ◽

Critical Role ◽

Mammalian Brain ◽

Previous Knowledge ◽

Activation Patterns ◽

Wide Range ◽

Comprehensive Comparison ◽

Artificial Neural

Abstract Replay is the reactivation of one or more neural patterns that are similar to the activation patterns experienced during past waking experiences. Replay was first observed in biological neural networks during sleep, and it is now thought to play a critical role in memory formation, retrieval, and consolidation. Replay-like mechanisms have been incorporated in deep artificial neural networks that learn over time to avoid catastrophic forgetting of previous knowledge. Replay algorithms have been successfully used in a wide range of deep learning methods within supervised, unsupervised, and reinforcement learning paradigms. In this letter, we provide the first comprehensive comparison between replay in the mammalian brain and replay in artificial neural networks. We identify multiple aspects of biological replay that are missing in deep learning systems and hypothesize how they could be used to improve artificial neural networks.

Download Full-text

Real-time frequency-based noise-robust Automatic Speech Recognition using Multi-Nets Artificial Neural Networks: A multi-views multi-learners approach

Neurocomputing ◽

10.1016/j.neucom.2013.09.040 ◽

2014 ◽

Vol 129 ◽

pp. 199-207 ◽

Cited By ~ 25

Author(s):

Seyed Reza Shahamiri ◽

Siti Salwah Binti Salim

Keyword(s):

Neural Networks ◽

Artificial Neural Networks ◽

Speech Recognition ◽

Real Time ◽

Automatic Speech Recognition ◽

Time Frequency ◽

Artificial Neural ◽

Noise Robust

Download Full-text

Lip Reading: Delving into Deep Learning

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.38216 ◽

2021 ◽

Vol 9 (9) ◽

pp. 1555-1561

Author(s):

Rishabh Nevatia

Keyword(s):

Neural Networks ◽

Feature Extraction ◽

Deep Learning ◽

Speech Recognition ◽

Automatic Speech Recognition ◽

Visual Task ◽

Learning Approaches ◽

Lip Reading

Abstract: Lip reading is the visual task of interpreting phrases from lip movements. While speech is one of the most common ways of communicating among individuals, understanding what a person wants to convey while having access only to their lip movements is till date a task that has not seen its paradigm. Various stages are involved in the process of automated lip reading, ranging from extraction of features to applying neural networks. This paper covers various deep learning approaches that are used for lip reading Keywords: Automatic Speech Recognition, Lip Reading, Neural Networks, Feature Extraction, Deep Learning

Download Full-text

Data augmentation for computed tomography angiography via synthetic image generation and neural domain adaptation

Current Directions in Biomedical Engineering ◽

10.1515/cdbme-2020-0015 ◽

2020 ◽

Vol 6 (1) ◽

Author(s):

Malte Seemann ◽

Lennart Bargsten ◽

Alexander Schlaefer

Keyword(s):

Computed Tomography ◽

Neural Networks ◽

Deep Learning ◽

Medical Imaging ◽

Computed Tomography Angiography ◽

Data Augmentation ◽

Domain Adaptation ◽

Synthetic Image ◽

Wide Range ◽

The Impact

AbstractDeep learning methods produce promising results when applied to a wide range of medical imaging tasks, including segmentation of artery lumen in computed tomography angiography (CTA) data. However, to perform sufficiently, neural networks have to be trained on large amounts of high quality annotated data. In the realm of medical imaging, annotations are not only quite scarce but also often not entirely reliable. To tackle both challenges, we developed a two-step approach for generating realistic synthetic CTA data for the purpose of data augmentation. In the first step moderately realistic images are generated in a purely numerical fashion. In the second step these images are improved by applying neural domain adaptation. We evaluated the impact of synthetic data on lumen segmentation via convolutional neural networks (CNNs) by comparing resulting performances. Improvements of up to 5% in terms of Dice coefficient and 20% for Hausdorff distance represent a proof of concept that the proposed augmentation procedure can be used to enhance deep learning-based segmentation for artery lumen in CTA images.

Download Full-text