lip reading
Recently Published Documents


TOTAL DOCUMENTS

463
(FIVE YEARS 135)

H-INDEX

25
(FIVE YEARS 6)

2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Yuanyao Lu ◽  
Qi Xiao ◽  
Haiyang Jiang

In recent years, deep learning has already been applied to English lip-reading. However, Chinese lip-reading starts late and lacks relevant dataset, and the recognition accuracy is not ideal. Therefore, this paper proposes a new hybrid neural network model to establish a Chinese lip-reading system. In this paper, we integrate the attention mechanism into both CNN and RNN. Specifically, we add the convolutional block attention module (CBAM) to the ResNet50 neural network, which enhances its ability to capture the small differences among the mouth patterns of similarly pronounced words in Chinese, improving the performance of feature extraction in the convolution process. We also add the time attention mechanism to the GRU neural network, which helps to extract the features among consecutive lip motion images. Considering the effects of the moments before and after on the current moment in the lip-reading process, we assign more weights to the key frames, which makes the features more representative. We further validate our model through experiments on our self-built dataset. Our experiments show that using convolutional block attention module (CBAM) in the Chinese lip-reading model can accurately recognize Chinese numbers 0–9 and some frequently used Chinese words. Compared with other lip-reading systems, our system has better performance and higher recognition accuracy.


2021 ◽  
Vol 17 (2) ◽  
pp. 65-84
Author(s):  
Mohammad - Halili ◽  
Mahbub Arham Arrozy

This study aims to discuss the use of Total Communication (TC, henceforth), the combinations of communication modes, and the reasons for using TC for hearing-impaired (HI) students of English class at SLB PGRI Kamal. Teaching English to them is believed to require more strategic approaches especially when compared to students who do not have any hearing issues. The data are from four HI students at SLB PGRI Kamal and their English teacher. In collecting the data, observation, note taking and recording were used as the instruments in which during taking note and observation, the researchers used phone recorder and phone camera to anticipate the data from lost and for further analysis. The results show that there are seven communication modes that were used. Those are lip-reading, sign language, images, writing, Indonesian Alphabetic Symbol System (IAS), finger spelling, and speech. Those modes are combined depending on the needs of the users (both English teacher and HI students). Therefore, the researcher found six combinations of modes in TC. Moreover, the researchers found five reasons for using TC for HI students in the English classroom at SLB PGRI Kamal. It is because of its flexibility and effectiveness in communication. Futhermore, TC gives chances to HI students to learn to speak, write and read in English, and allow the HI students to make English sounds and identify them


Author(s):  
Yuki Takashima ◽  
Ryoichi Takashima ◽  
Ryota Tsunoda ◽  
Ryo Aihara ◽  
Tetsuya Takiguchi ◽  
...  

AbstractWe present an unsupervised domain adaptation (UDA) method for a lip-reading model that is an image-based speech recognition model. Most of conventional UDA methods cannot be applied when the adaptation data consists of an unknown class, such as out-of-vocabulary words. In this paper, we propose a cross-modal knowledge distillation (KD)-based domain adaptation method, where we use the intermediate layer output in the audio-based speech recognition model as a teacher for the unlabeled adaptation data. Because the audio signal contains more information for recognizing speech than lip images, the knowledge of the audio-based model can be used as a powerful teacher in cases where the unlabeled adaptation data consists of audio-visual parallel data. In addition, because the proposed intermediate-layer-based KD can express the teacher as the sub-class (sub-word)-level representation, this method allows us to use the data of unknown classes for the adaptation. Through experiments on an image-based word recognition task, we demonstrate that the proposed approach can not only improve the UDA performance but can also use the unknown-class adaptation data.


Author(s):  
Siyuan Jiang ◽  
Hengxin Ruan ◽  
Zhuo Wang ◽  
Hongrui Zhang ◽  
Hanting Zhao ◽  
...  
Keyword(s):  

2021 ◽  
Author(s):  
Yorghos Voutos ◽  
Georgios Drakopoulos ◽  
Georgios Chrysovitsiotis ◽  
Zoi Zachou ◽  
Dimitris Kikidis ◽  
...  
Keyword(s):  

Author(s):  
Xueyi Zhang ◽  
Changchong Sheng ◽  
Li Liu

2021 ◽  
Author(s):  
Zhijie Lin ◽  
Zhou Zhao ◽  
Haoyuan Li ◽  
Jinglin Liu ◽  
Meng Zhang ◽  
...  
Keyword(s):  

2021 ◽  
Vol 10 (5) ◽  
pp. 2557-2565
Author(s):  
Nada Hussain Ali ◽  
Matheel Emad Abdulmunem ◽  
Akbas Ezaldeen Ali

Communication between human beings has several ways, one of the most known and used is speech, both visual and acoustic perceptions sensory are involved, because of that, the speech is considered as a multi-sensory process. Micro contents are a small pieces of information that can be used to boost the learning process. Deep learning is an approach that dives into deep texture layers to learn fine grained details. The convolution neural network (CNN) is a deep learning technique that can be employed as a complementary model with micro learning to hold micro contents to achieve special process. In This paper a proposed model for lip reading system is presented with proposed video dataset. The proposed model receives micro contents (the English alphabet) in video as input and recognize them, the role of CNN deep learning is clearly appeared to perform two tasks, the first one is feature extraction and the second one is the recognition process. The implementation results show an efficient accuracy recognition rate for various video dataset that contains variety lip reader for many persons with age range from 11 to 63 years old, the proposed model gives high recognition rate reach to 98%.


Sign in / Sign up

Export Citation Format

Share Document