Lip Reading: Delving into Deep Learning

Rishabh Nevatia

doi:10.22214/ijraset.2021.38216

Lip Reading: Delving into Deep Learning

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.38216 ◽

2021 ◽

Vol 9 (9) ◽

pp. 1555-1561

Author(s):

Rishabh Nevatia

Keyword(s):

Neural Networks ◽

Feature Extraction ◽

Deep Learning ◽

Speech Recognition ◽

Automatic Speech Recognition ◽

Visual Task ◽

Learning Approaches ◽

Lip Reading

Abstract: Lip reading is the visual task of interpreting phrases from lip movements. While speech is one of the most common ways of communicating among individuals, understanding what a person wants to convey while having access only to their lip movements is till date a task that has not seen its paradigm. Various stages are involved in the process of automated lip reading, ranging from extraction of features to applying neural networks. This paper covers various deep learning approaches that are used for lip reading Keywords: Automatic Speech Recognition, Lip Reading, Neural Networks, Feature Extraction, Deep Learning

Download Full-text

Speech Assistance for Persons With Speech Impediments Using Artificial Neural Networks

Volume 3: Biomedical and Biotechnology Engineering ◽

10.1115/imece2017-71027 ◽

2017 ◽

Author(s):

Ramy Mounir ◽

Redwan Alqasemi ◽

Rajiv Dubey

Keyword(s):

Neural Networks ◽

Artificial Neural Networks ◽

Deep Learning ◽

Speech Recognition ◽

Automatic Speech Recognition ◽

Challenging Problem ◽

Speech Impairment ◽

Recognition Model ◽

Wide Range ◽

Speech Variability

This work focuses on the research related to enabling individuals with speech impairment to use speech-to-text software to recognize and dictate their speech. Automatic Speech Recognition (ASR) tends to be a challenging problem for researchers because of the wide range of speech variability. Some of the variabilities include different accents, pronunciations, speeds, volumes, etc. It is very difficult to train an end-to-end speech recognition model on data with speech impediment due to the lack of large enough datasets, and the difficulty of generalizing a speech disorder pattern on all users with speech impediments. This work highlights the different techniques used in deep learning to achieve ASR and how it can be modified to recognize and dictate speech from individuals with speech impediments.

Download Full-text

Location-Based End-to-End Speech Recognition with Multiple Language Models

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33019975 ◽

2019 ◽

Vol 33 ◽

pp. 9975-9976

Author(s):

Zhijie Lin ◽

Kaiyang Lin ◽

Shiling Chen ◽

Linlin Li ◽

Zhou Zhao

Keyword(s):

Deep Learning ◽

Speech Recognition ◽

Error Correction ◽

Automatic Speech Recognition ◽

Language Model ◽

Language Models ◽

Learning Approaches ◽

Semantic Error ◽

End To End

End-to-End deep learning approaches for Automatic Speech Recognition (ASR) has been a new trend. In those approaches, starting active in many areas, language model can be considered as an important and effective method for semantic error correction. Many existing systems use one language model. In this paper, however, multiple language models (LMs) are applied into decoding. One LM is used for selecting appropriate answers and others, considering both context and grammar, for further decision. Experiment on a general location-based dataset show the effectiveness of our method.

Download Full-text

Analyzing and Visualizing Deep Neural Networks for Speech Recognition with Saliency-Adjusted Neuron Activation Profiles

Electronics ◽

10.3390/electronics10111350 ◽

2021 ◽

Vol 10 (11) ◽

pp. 1350

Author(s):

Andreas Krug ◽

Maral Ebrahimzadeh ◽

Jost Alemann ◽

Jens Johannsmeier ◽

Sebastian Stober

Keyword(s):

Neural Networks ◽

Computer Vision ◽

Artificial Neural Networks ◽

Deep Learning ◽

Comparative Analysis ◽

Speech Recognition ◽

Automatic Speech Recognition ◽

Deep Neural Networks ◽

Neuron Activation ◽

Flexible Framework

Deep Learning-based Automatic Speech Recognition (ASR) models are very successful, but hard to interpret. To gain a better understanding of how Artificial Neural Networks (ANNs) accomplish their tasks, several introspection methods have been proposed. However, established introspection techniques are mostly designed for computer vision tasks and rely on the data being visually interpretable, which limits their usefulness for understanding speech recognition models. To overcome this limitation, we developed a novel neuroscience-inspired technique for visualizing and understanding ANNs, called Saliency-Adjusted Neuron Activation Profiles (SNAPs). SNAPs are a flexible framework to analyze and visualize Deep Neural Networks that does not depend on visually interpretable data. In this work, we demonstrate how to utilize SNAPs for understanding fully-convolutional ASR models. This includes visualizing acoustic concepts learned by the model and the comparative analysis of their representations in the model layers.

Download Full-text

Deep convolutional neural networks for cardiovascular vulnerable plaque detection

MATEC Web of Conferences ◽

10.1051/matecconf/201927702024 ◽

2019 ◽

Vol 277 ◽

pp. 02024 ◽

Cited By ~ 1

Author(s):

Lincan Li ◽

Tong Jia ◽

Tianqi Meng ◽

Yizhe Liu

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Convolutional Neural Networks ◽

Vulnerable Plaque ◽

Recall Rate ◽

Superior Performance ◽

Learning Approaches ◽

Deep Convolutional Neural Networks ◽

Vulnerable Plaques ◽

Plaque Detection

In this paper, an accurate two-stage deep learning method is proposed to detect vulnerable plaques in ultrasonic images of cardiovascular. Firstly, a Fully Convonutional Neural Network (FCN) named U-Net is used to segment the original Intravascular Optical Coherence Tomography (IVOCT) cardiovascular images. We experiment on different threshold values to find the best threshold for removing noise and background in the original images. Secondly, a modified Faster RCNN is adopted to do precise detection. The modified Faster R-CNN utilize six-scale anchors (122,162,322,642,1282,2562) instead of the conventional one scale or three scale approaches. First, we present three problems in cardiovascular vulnerable plaque diagnosis, then we demonstrate how our method solve these problems. The proposed method in this paper apply deep convolutional neural networks to the whole diagnostic procedure. Test results show the Recall rate, Precision rate, IoU (Intersection-over-Union) rate and Total score are 0.94, 0.885, 0.913 and 0.913 respectively, higher than the 1st team of CCCV2017 Cardiovascular OCT Vulnerable Plaque Detection Challenge. AP of the designed Faster RCNN is 83.4%, higher than conventional approaches which use one-scale or three-scale anchors. These results demonstrate the superior performance of our proposed method and the power of deep learning approaches in diagnose cardiovascular vulnerable plaques.

Download Full-text

Decentralizing Feature Extraction with Quantum Convolutional Neural Network for Automatic Speech Recognition

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp39728.2021.9413453 ◽

2021 ◽

Author(s):

Chao-Han Huck Yang ◽

Jun Qi ◽

Samuel Yen-Chi Chen ◽

Pin-Yu Chen ◽

Sabato Marco Siniscalchi ◽

...

Keyword(s):

Neural Network ◽

Feature Extraction ◽

Speech Recognition ◽

Convolutional Neural Network ◽

Automatic Speech Recognition

Download Full-text

Dynamic Sparsity Neural Networks for Automatic Speech Recognition

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp39728.2021.9414505 ◽

2021 ◽

Author(s):

Zhaofeng Wu ◽

Ding Zhao ◽

Qiao Liang ◽

Jiahui Yu ◽

Anmol Gulati ◽

...

Keyword(s):

Neural Networks ◽

Speech Recognition ◽

Automatic Speech Recognition

Download Full-text

Evaluation of Automatic Speech Recognition Systems

10.5753/sbbd.2021.17889 ◽

2021 ◽

Author(s):

Matheus Xavier Sampaio ◽

Regis Pires Magalhães ◽

Ticiana Linhares Coelho da Silva ◽

Lívia Almada Cruz ◽

Davi Romero de Vasconcelos ◽

...

Keyword(s):

Deep Learning ◽

Speech Recognition ◽

Automatic Speech Recognition ◽

Smart Homes ◽

The Other ◽

Learning Models ◽

Recognition Systems ◽

Microsoft Azure

Automatic Speech Recognition (ASR) is an essential task for many applications like automatic caption generation for videos, voice search, voice commands for smart homes, and chatbots. Due to the increasing popularity of these applications and the advances in deep learning models for transcribing speech into text, this work aims to evaluate the performance of commercial solutions for ASR that use deep learning models, such as Facebook Wit.ai, Microsoft Azure Speech, and Google Cloud Speech-to-Text. The results demonstrate that the evaluated solutions slightly differ. However, Microsoft Azure Speech outperformed the other analyzed APIs.

Download Full-text

Feature Extraction Based on Speech Attractors in the Reconstructed Phase Space for Automatic Speech Recognition Systems

ETRI Journal ◽

10.4218/etrij.13.0112.0074 ◽

2013 ◽

Vol 35 (1) ◽

pp. 100-108 ◽

Cited By ~ 13

Author(s):

Yasser Shekofteh ◽

Farshad Almasganj

Keyword(s):

Feature Extraction ◽

Speech Recognition ◽

Phase Space ◽

Automatic Speech Recognition ◽

Reconstructed Phase Space ◽

Recognition Systems

Download Full-text

Advanced Recurrent Neural Networks for Automatic Speech Recognition

New Era for Robust Speech Recognition ◽

10.1007/978-3-319-64680-0_11 ◽

2017 ◽

pp. 261-279

Author(s):

Yu Zhang ◽

Dong Yu ◽

Guoguo Chen

Keyword(s):

Neural Networks ◽

Speech Recognition ◽

Automatic Speech Recognition ◽

Recurrent Neural Networks

Download Full-text

A Method for Image Forgery Detection Based on Error Level Analysis (ELA) Technique

Knowledge Innovation Through Intelligent Software Methodologies, Tools and Techniques - Frontiers in Artificial Intelligence and Applications ◽

10.3233/faia200557 ◽

2020 ◽

Author(s):

Emanuele Morra ◽

Roberto Revetria ◽

Danilo Pecorino ◽

Gabriele Galli ◽

Andrea Mungo ◽

...

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Digital Imaging ◽

Imaging Techniques ◽

Automated System ◽

Learning Approaches ◽

Forgery Detection ◽

Error Level ◽

Detection Algorithms ◽

Level Analysis

In the last years, there has been growing a large increase in digital imaging techniques, and their applications became more and more pivotal in many critical scenarios. Conversely, hand in hand with this technological boost, imaging forgeries have increased more and more along with their level of precision. In this view, the use of digital tools, aiming to verify the integrity of a certain image, is essential. Indeed, insurance is a field that extensively uses images for filling claim requests and a robust forgery detection is essential. This paper proposes an approach which aims to introduce a full-automated system for identifying potential splicing frauds in images of car plates by overcoming traditional problems using artificial neural networks (ANN). For instance, classic fraud-detection algorithms are impossible to fully automatize whereas modern deep learning approaches require vast training datasets that are not available most of the time. The method developed in this paper uses Error Level Analysis (ELA) performed on car license plates as an input for a trained model which is able to classify license plates in either original or forged.

Download Full-text