lip reading Latest Research Papers

In recent years, deep learning has already been applied to English lip-reading. However, Chinese lip-reading starts late and lacks relevant dataset, and the recognition accuracy is not ideal. Therefore, this paper proposes a new hybrid neural network model to establish a Chinese lip-reading system. In this paper, we integrate the attention mechanism into both CNN and RNN. Specifically, we add the convolutional block attention module (CBAM) to the ResNet50 neural network, which enhances its ability to capture the small differences among the mouth patterns of similarly pronounced words in Chinese, improving the performance of feature extraction in the convolution process. We also add the time attention mechanism to the GRU neural network, which helps to extract the features among consecutive lip motion images. Considering the effects of the moments before and after on the current moment in the lip-reading process, we assign more weights to the key frames, which makes the features more representative. We further validate our model through experiments on our self-built dataset. Our experiments show that using convolutional block attention module (CBAM) in the Chinese lip-reading model can accurately recognize Chinese numbers 0–9 and some frequently used Chinese words. Compared with other lip-reading systems, our system has better performance and higher recognition accuracy.

Download Full-text

THE TOTAL COMMUNICATION FOR HEARING-IMPAIRED STUDENTS IN LEARNING ENGLISH

LINGUA : JURNAL ILMIAH ◽

10.35962/lingua.v17i2.92 ◽

2021 ◽

Vol 17 (2) ◽

pp. 65-84

Author(s):

Mohammad - Halili ◽

Mahbub Arham Arrozy

Keyword(s):

Sign Language ◽

Hearing Impaired ◽

English Teacher ◽

Symbol System ◽

Teaching English ◽

English Class ◽

Lip Reading ◽

Total Communication ◽

Communication Modes ◽

English Classroom

This study aims to discuss the use of Total Communication (TC, henceforth), the combinations of communication modes, and the reasons for using TC for hearing-impaired (HI) students of English class at SLB PGRI Kamal. Teaching English to them is believed to require more strategic approaches especially when compared to students who do not have any hearing issues. The data are from four HI students at SLB PGRI Kamal and their English teacher. In collecting the data, observation, note taking and recording were used as the instruments in which during taking note and observation, the researchers used phone recorder and phone camera to anticipate the data from lost and for further analysis. The results show that there are seven communication modes that were used. Those are lip-reading, sign language, images, writing, Indonesian Alphabetic Symbol System (IAS), finger spelling, and speech. Those modes are combined depending on the needs of the users (both English teacher and HI students). Therefore, the researcher found six combinations of modes in TC. Moreover, the researchers found five reasons for using TC for HI students in the English classroom at SLB PGRI Kamal. It is because of its flexibility and effectiveness in communication. Futhermore, TC gives chances to HI students to learn to speak, write and read in English, and allow the HI students to make English sounds and identify them

Download Full-text

Unsupervised domain adaptation for lip reading based on cross-modal knowledge distillation

EURASIP Journal on Audio Speech and Music Processing ◽

10.1186/s13636-021-00232-5 ◽

2021 ◽

Vol 2021 (1) ◽

Author(s):

Yuki Takashima ◽

Ryoichi Takashima ◽

Ryota Tsunoda ◽

Ryo Aihara ◽

Tetsuya Takiguchi ◽

...

Keyword(s):

Speech Recognition ◽

Intermediate Layer ◽

Domain Adaptation ◽

Recognition Task ◽

Audio Signal ◽

Recognition Model ◽

Unsupervised Domain Adaptation ◽

Lip Reading ◽

Knowledge Distillation ◽

Adaptation Data

AbstractWe present an unsupervised domain adaptation (UDA) method for a lip-reading model that is an image-based speech recognition model. Most of conventional UDA methods cannot be applied when the adaptation data consists of an unknown class, such as out-of-vocabulary words. In this paper, we propose a cross-modal knowledge distillation (KD)-based domain adaptation method, where we use the intermediate layer output in the audio-based speech recognition model as a teacher for the unlabeled adaptation data. Because the audio signal contains more information for recognizing speech than lip images, the knowledge of the audio-based model can be used as a powerful teacher in cases where the unlabeled adaptation data consists of audio-visual parallel data. In addition, because the proposed intermediate-layer-based KD can express the teacher as the sub-class (sub-word)-level representation, this method allows us to use the data of unknown classes for the adaptation. Through experiments on an image-based word recognition task, we demonstrate that the proposed approach can not only improve the UDA performance but can also use the unknown-class adaptation data.

Download Full-text

Microwave Lip Reading of Chinese Mandarin Based on Programmable Metasurface

10.1109/imws-amp53428.2021.9643862 ◽

2021 ◽

Author(s):

Siyuan Jiang ◽

Hengxin Ruan ◽

Zhuo Wang ◽

Hongrui Zhang ◽

Hanting Zhao ◽

...

Keyword(s):

Lip Reading ◽

Chinese Mandarin

Download Full-text

Multimodal Lip- Reading for Tracheostomy Patients in the Greek Language

10.1109/smap53521.2021.9610767 ◽

2021 ◽

Author(s):

Yorghos Voutos ◽

Georgios Drakopoulos ◽

Georgios Chrysovitsiotis ◽

Zoi Zachou ◽

Dimitris Kikidis ◽

...

Keyword(s):

Greek Language ◽

Lip Reading

Download Full-text

Lip Motion Magnification Network for Lip Reading

10.1109/bigdia53151.2021.9619626 ◽

2021 ◽

Author(s):

Xueyi Zhang ◽

Changchong Sheng ◽

Li Liu

Keyword(s):

Lip Reading ◽

Motion Magnification

Download Full-text

Speech Guided Disentangled Visual Representation Learning for Lip Reading

10.1145/3462244.3479952 ◽

2021 ◽

Author(s):

Ya Zhao ◽

Cheng Ma ◽

Zunlei Feng ◽

Mingli Song

Keyword(s):

Visual Representation ◽

Representation Learning ◽

Lip Reading

Download Full-text

SimulLR: Simultaneous Lip Reading Transducer with Attention-Guided Adaptive Memory

10.1145/3474085.3475220 ◽

2021 ◽

Author(s):

Zhijie Lin ◽

Zhou Zhao ◽

Haoyuan Li ◽

Jinglin Liu ◽

Meng Zhang ◽

...

Keyword(s):

Adaptive Memory ◽

Lip Reading

Download Full-text

Cross-modal Self-Supervised Learning for Lip Reading: When Contrastive Learning meets Adversarial Training

10.1145/3474085.3475415 ◽

2021 ◽

Author(s):

Changchong Sheng ◽

Matti Pietikäinen ◽

Qi Tian ◽

Li Liu

Keyword(s):

Supervised Learning ◽

Lip Reading ◽

Adversarial Training

Download Full-text

Constructed model for micro-content recognition in lip reading based deep learning

Bulletin of Electrical Engineering and Informatics ◽

10.11591/eei.v10i5.2927 ◽

2021 ◽

Vol 10 (5) ◽

pp. 2557-2565

Author(s):

Nada Hussain Ali ◽

Matheel Emad Abdulmunem ◽

Akbas Ezaldeen Ali

Keyword(s):

Deep Learning ◽

Recognition Rate ◽

Human Beings ◽

Fine Grained ◽

Content Recognition ◽

Proposed Model ◽

Lip Reading ◽

English Alphabet ◽

Reading System ◽

Age Range

Communication between human beings has several ways, one of the most known and used is speech, both visual and acoustic perceptions sensory are involved, because of that, the speech is considered as a multi-sensory process. Micro contents are a small pieces of information that can be used to boost the learning process. Deep learning is an approach that dives into deep texture layers to learn fine grained details. The convolution neural network (CNN) is a deep learning technique that can be employed as a complementary model with micro learning to hold micro contents to achieve special process. In This paper a proposed model for lip reading system is presented with proposed video dataset. The proposed model receives micro contents (the English alphabet) in video as input and recognize them, the role of CNN deep learning is clearly appeared to perform two tasks, the first one is feature extraction and the second one is the recognition process. The implementation results show an efficient accuracy recognition rate for various video dataset that contains variety lip reader for many persons with age range from 11 to 63 years old, the proposed model gives high recognition rate reach to 98%.

Download Full-text

lip reading
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

A Chinese Lip-Reading System Based on Convolutional Block Attention Module

THE TOTAL COMMUNICATION FOR HEARING-IMPAIRED STUDENTS IN LEARNING ENGLISH

Unsupervised domain adaptation for lip reading based on cross-modal knowledge distillation

Microwave Lip Reading of Chinese Mandarin Based on Programmable Metasurface

Multimodal Lip- Reading for Tracheostomy Patients in the Greek Language

Lip Motion Magnification Network for Lip Reading

Speech Guided Disentangled Visual Representation Learning for Lip Reading

SimulLR: Simultaneous Lip Reading Transducer with Attention-Guided Adaptive Memory

Cross-modal Self-Supervised Learning for Lip Reading: When Contrastive Learning meets Adversarial Training

Constructed model for micro-content recognition in lip reading based deep learning

Export Citation Format

lip readingRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

A Chinese Lip-Reading System Based on Convolutional Block Attention Module

THE TOTAL COMMUNICATION FOR HEARING-IMPAIRED STUDENTS IN LEARNING ENGLISH

Unsupervised domain adaptation for lip reading based on cross-modal knowledge distillation

Microwave Lip Reading of Chinese Mandarin Based on Programmable Metasurface

Multimodal Lip- Reading for Tracheostomy Patients in the Greek Language

Lip Motion Magnification Network for Lip Reading

Speech Guided Disentangled Visual Representation Learning for Lip Reading

SimulLR: Simultaneous Lip Reading Transducer with Attention-Guided Adaptive Memory

Cross-modal Self-Supervised Learning for Lip Reading: When Contrastive Learning meets Adversarial Training

Constructed model for micro-content recognition in lip reading based deep learning

lip reading
Recently Published Documents