Automatic Lip Reading Using Convolution Neural Network and Bidirectional Long Short-term Memory

Author(s):  
Yuanyao Lu ◽  
Jie Yan

Traditional automatic lip-reading systems generally consist of two stages: feature extraction and recognition, while the handcrafted features are empirical and cannot learn the relevance of lip movement sequence sufficiently. Recently, deep learning approaches have attracted increasing attention, especially the significant improvements of convolution neural network (CNN) applied to image classification and long short-term memory (LSTM) used in speech recognition, video processing and text analysis. In this paper, we propose a hybrid neural network architecture, which integrates CNN and bidirectional LSTM (BiLSTM) for lip reading. First, we extract key frames from each isolated video clip and use five key points to locate mouth region. Then, features are extracted from raw mouth images using an eight-layer CNN. The extracted features have the characteristics of stronger robustness and fault-tolerant capability. Finally, we use BiLSTM to capture the correlation of sequential information among frame features in two directions and the softmax function to predict final recognition result. The proposed method is capable of extracting local features through convolution operations and finding hidden correlation in temporal information from lip image sequences. The evaluation results of lip-reading recognition experiments demonstrate that our proposed method outperforms conventional approaches such as active contour model (ACM) and hidden Markov model (HMM).

Sensors ◽  
2020 ◽  
Vol 20 (19) ◽  
pp. 5611 ◽  
Author(s):  
Mihail Burduja ◽  
Radu Tudor Ionescu ◽  
Nicolae Verga

In this paper, we present our system for the RSNA Intracranial Hemorrhage Detection challenge, which is based on the RSNA 2019 Brain CT Hemorrhage dataset. The proposed system is based on a lightweight deep neural network architecture composed of a convolutional neural network (CNN) that takes as input individual CT slices, and a Long Short-Term Memory (LSTM) network that takes as input multiple feature embeddings provided by the CNN. For efficient processing, we consider various feature selection methods to produce a subset of useful CNN features for the LSTM. Furthermore, we reduce the CT slices by a factor of 2×, which enables us to train the model faster. Even if our model is designed to balance speed and accuracy, we report a weighted mean log loss of 0.04989 on the final test set, which places us in the top 30 ranking (2%) from a total of 1345 participants. While our computing infrastructure does not allow it, processing CT slices at their original scale is likely to improve performance. In order to enable others to reproduce our results, we provide our code as open source. After the challenge, we conducted a subjective intracranial hemorrhage detection assessment by radiologists, indicating that the performance of our deep model is on par with that of doctors specialized in reading CT scans. Another contribution of our work is to integrate Grad-CAM visualizations in our system, providing useful explanations for its predictions. We therefore consider our system as a viable option when a fast diagnosis or a second opinion on intracranial hemorrhage detection are needed.


2021 ◽  
Author(s):  
Vipul Sharma ◽  
Mitul Kumar Ahirwal

In this paper, a new cascade one-dimensional convolution neural network (1DCNN) and bidirectional long short-term memory (BLSTM) model has been developed for binary and ternary classification of mental workload (MWL). MWL assessment is important to increase the safety and efficiency in Brain-Computer Interface (BCI) systems and professions where multi-tasking is required. Keeping in mind the necessity of MWL assessment, a two-fold study is presented, firstly binary classification is done to classify MWL into Low and High classes. Secondly, ternary classification is applied to classify MWL into Low, Moderate, and High classes. The cascaded 1DCNN-BLSTM deep learning architecture has been developed and tested over the Simultaneous task EEG workload (STEW) dataset. Unlike recent research in MWL, handcrafted feature extraction and engineering are not done, rather end-to-end deep learning is used over 14 channel EEG signals for classification. Accuracies exceeding the previous state-of-the-art studies have been obtained. In binary and ternary classification accuracies of 96.77% and 95.36% have been achieved with 7-fold cross validation, respectively.


2020 ◽  
Vol 196 ◽  
pp. 02007
Author(s):  
Vladimir Mochalov ◽  
Anastasia Mochalova

In this paper, the previously obtained results on recognition of ionograms using deep learning are expanded to predict the parameters of the ionosphere. After the ionospheric parameters have been identified on the ionogram using deep learning in real time, we can predict the parameters for some time ahead on the basis of the new data obtained Examples of predicting the ionosphere parameters using an artificial recurrent neural network architecture long short-term memory are given. The place of the block for predicting the parameters of the ionosphere in the system for analyzing ionospheric data using deep learning methods is shown.


2021 ◽  
Author(s):  
Vipul Sharma ◽  
Mitul Kumar Ahirwal

In this paper, a new cascade one-dimensional convolution neural network (1DCNN) and bidirectional long short-term memory (BLSTM) model has been developed for binary and ternary classification of mental workload (MWL). MWL assessment is important to increase the safety and efficiency in Brain-Computer Interface (BCI) systems and professions where multi-tasking is required. Keeping in mind the necessity of MWL assessment, a two-fold study is presented, firstly binary classification is done to classify MWL into Low and High classes. Secondly, ternary classification is applied to classify MWL into Low, Moderate, and High classes. The cascaded 1DCNN-BLSTM deep learning architecture has been developed and tested over the Simultaneous task EEG workload (STEW) dataset. Unlike recent research in MWL, handcrafted feature extraction and engineering are not done, rather end-to-end deep learning is used over 14 channel EEG signals for classification. Accuracies exceeding the previous state-of-the-art studies have been obtained. In binary and ternary classification accuracies of 96.77% and 95.36% have been achieved with 7-fold cross validation, respectively.


2021 ◽  
Vol 248 ◽  
pp. 01017
Author(s):  
Eugene Yu. Shchetinin ◽  
Leonid Sevastianov

Computer paralinguistic analysis is widely used in security systems, biometric research, call centers and banks. Paralinguistic models estimate different physical properties of voice, such as pitch, intensity, formants and harmonics to classify emotions. The main goal is to find such features that would be robust to outliers and will retain variety of human voice properties at the same time. Moreover, the model used must be able to estimate features on a time scale for an effective analysis of voice variability. In this paper a paralinguistic model based on Bidirectional Long Short-Term Memory (BLSTM) neural network is described, which was trained for vocal-based emotion recognition. The main advantage of this network architecture is that each module of the network consists of several interconnected layers, providing the ability to recognize flexible long-term dependencies in data, which is important in context of vocal analysis. We explain the architecture of a bidirectional neural network model, its main advantages over regular neural networks and compare experimental results of BLSTM network with other models.


2017 ◽  
Vol 24 (1) ◽  
pp. 77-90 ◽  
Author(s):  
REKIA KADARI ◽  
YU ZHANG ◽  
WEINAN ZHANG ◽  
TING LIU

AbstractNeural Network-based approaches have recently produced good performances in Natural language tasks, such as Supertagging. In the supertagging task, a Supertag (Lexical category) is assigned to each word in an input sequence. Combinatory Categorial Grammar Supertagging is a more challenging problem than various sequence-tagging problems, such as part-of-speech (POS) tagging and named entity recognition due to the large number of the lexical categories. Specifically, simple Recurrent Neural Network (RNN) has shown to significantly outperform the previous state-of-the-art feed-forward neural networks. On the other hand, it is well known that Recurrent Networks fail to learn long dependencies. In this paper, we introduce a new neural network architecture based on backward and Bidirectional Long Short-Term Memory (BLSTM) Networks that has the ability to memorize information for long dependencies and benefit from both past and future information. State-of-the-art methods focus on previous information, whereas BLSTM has access to information in both previous and future directions. Our main findings are that bidirectional networks outperform unidirectional ones, and Long Short-Term Memory (LSTM) networks are more precise and successful than both unidirectional and bidirectional standard RNNs. Experiment results reveal the effectiveness of our proposed method on both in-domain and out-of-domain datasets. Experiments show improvements about (1.2 per cent) over standard RNN.


Sign in / Sign up

Export Citation Format

Share Document