Deep Lip Reading-A Deep Learning Based Lip-Reading Software for the Hearing Impaired

Author(s):  
Mohammed Abid Abrar ◽  
A. N. M. Nafiul Islam ◽  
Mohammad Muntasir Hassan ◽  
Mohammad Tariqul Islam ◽  
Celia Shahnaz ◽  
...  
2018 ◽  
Vol 78 ◽  
pp. 53-72 ◽  
Author(s):  
Adriana Fernandez-Lopez ◽  
Federico M. Sukno
Keyword(s):  

2020 ◽  
Vol 34 (04) ◽  
pp. 6917-6924 ◽  
Author(s):  
Ya Zhao ◽  
Rui Xu ◽  
Xinchao Wang ◽  
Peng Hou ◽  
Haihong Tang ◽  
...  

Lip reading has witnessed unparalleled development in recent years thanks to deep learning and the availability of large-scale datasets. Despite the encouraging results achieved, the performance of lip reading, unfortunately, remains inferior to the one of its counterpart speech recognition, due to the ambiguous nature of its actuations that makes it challenging to extract discriminant features from the lip movement videos. In this paper, we propose a new method, termed as Lip by Speech (LIBS), of which the goal is to strengthen lip reading by learning from speech recognizers. The rationale behind our approach is that the features extracted from speech recognizers may provide complementary and discriminant clues, which are formidable to be obtained from the subtle movements of the lips, and consequently facilitate the training of lip readers. This is achieved, specifically, by distilling multi-granularity knowledge from speech recognizers to lip readers. To conduct this cross-modal knowledge distillation, we utilize an efficacious alignment scheme to handle the inconsistent lengths of the audios and videos, as well as an innovative filtering strategy to refine the speech recognizer's prediction. The proposed method achieves the new state-of-the-art performance on the CMLR and LRS2 datasets, outperforming the baseline by a margin of 7.66% and 2.75% in character error rate, respectively.


QJM ◽  
2020 ◽  
Vol 113 (Supplement_1) ◽  
Author(s):  
A M Saad ◽  
M A Hegazi ◽  
M S Khodeir

Abstract Background Lip-reading is considered an important skill which varies considerably among normal hearing and hearing impaired (HI) children. It helps HI children to perceive speech, acquire spoken language and acquire phonological awareness. Speech perception is considered to be a multisensory process that involves attention to auditory signals as well as visual articulatory movements. Integration of auditory and visual signals occurs naturally and automatically in normal individuals across all ages. Many researches suggested that normal hearing children use audition as the primary sensory modality for speech perception, whereas HI children use lip-reading cues as the primary sensory modality for speech perception. Aim of the Work The aim of this study is to compare the lip-reading ability between normal and HI children. Participants and methods This is a comparative descriptive case control study. It was applied on 60 hearing impaired children (cases) and 60 normal hearing children (controls) of the same age and gender. The age range was (3-8 years). The Egyptian Arabic Lip-reading Test was applied to all children. Results There was statistically significant difference between the total mean scores of the EALRT between normal and HI children. Conclusion The results of the study proved that normal children are better lip-readers than HI children of the matched age range.


Sensors ◽  
2020 ◽  
Vol 20 (21) ◽  
pp. 6256
Author(s):  
Boon Giin Lee ◽  
Teak-Wei Chong ◽  
Wan-Young Chung

Sign language was designed to allow hearing-impaired people to interact with others. Nonetheless, knowledge of sign language is uncommon in society, which leads to a communication barrier with the hearing-impaired community. Many studies of sign language recognition utilizing computer vision (CV) have been conducted worldwide to reduce such barriers. However, this approach is restricted by the visual angle and highly affected by environmental factors. In addition, CV usually involves the use of machine learning, which requires collaboration of a team of experts and utilization of high-cost hardware utilities; this increases the application cost in real-world situations. Thus, this study aims to design and implement a smart wearable American Sign Language (ASL) interpretation system using deep learning, which applies sensor fusion that “fuses” six inertial measurement units (IMUs). The IMUs are attached to all fingertips and the back of the hand to recognize sign language gestures; thus, the proposed method is not restricted by the field of view. The study reveals that this model achieves an average recognition rate of 99.81% for dynamic ASL gestures. Moreover, the proposed ASL recognition system can be further integrated with ICT and IoT technology to provide a feasible solution to assist hearing-impaired people in communicating with others and improve their quality of life.


Author(s):  
D. Ivanko ◽  
D. Ryumin ◽  
A. Karpov

<p><strong>Abstract.</strong> Inability to use speech interfaces greatly limits the deaf and hearing impaired people in the possibility of human-machine interaction. To solve this problem and to increase the accuracy and reliability of the automatic Russian sign language recognition system it is proposed to use lip-reading in addition to hand gestures recognition. Deaf and hearing impaired people use sign language as the main way of communication in everyday life. Sign language is a structured form of hand gestures and lips movements involving visual motions and signs, which is used as a communication system. Since sign language includes not only hand gestures, but also lip movements that mimic vocalized pronunciation, it is of interest to investigate how accurately such a visual speech can be recognized by a lip-reading system, especially considering the fact that the visual speech of hearing impaired people is often characterized with hyper-articulation, which should potentially facilitate its recognition. For this purpose, thesaurus of Russian sign language (TheRusLan) collected in SPIIRAS in 2018–19 was used. The database consists of color optical FullHD video recordings of 13 native Russian sign language signers (11 females and 2 males) from “Pavlovsk boarding school for the hearing impaired”. Each of the signers demonstrated 164 phrases for 5 times. This work covers the initial stages of this research, including data collection, data labeling, region-of-interest detection and methods for informative features extraction. The results of this study can later be used to create assistive technologies for deaf or hearing impaired people.</p>


Author(s):  
Kartik Datar ◽  
Meet N. Gandhi ◽  
Priyanshu Aggarwal ◽  
Mayank Sohani

In the world of development and advancement, deep learning has made its significant impact in certain tasks in such a way which seemed impossible a few years ago. Deep learning has been able to solve problems which are even complex for machine learning algorithms. The task of lip reading and converting the lip moments to text is been performed by various methods, one of the most successful methods for the following is Lip-net they provide end to end conversion form lip to text. The end to end conversion of lip moments to the words is possible because of availability of huge data and development of different deep learning methods such as Convolution Neural Network and Recurrent Neural Networks. The use of Deep Learning in lip reading is a recent concept and solves upcoming challenges in real-world such as Virtual Reality system, assisted driving systems, sign language recognition, movement recognition, improving hearing aid via Google lens. Various other approaches along with different datasets are explained in the paper.


2021 ◽  
Vol 17 (2) ◽  
pp. 65-84
Author(s):  
Mohammad - Halili ◽  
Mahbub Arham Arrozy

This study aims to discuss the use of Total Communication (TC, henceforth), the combinations of communication modes, and the reasons for using TC for hearing-impaired (HI) students of English class at SLB PGRI Kamal. Teaching English to them is believed to require more strategic approaches especially when compared to students who do not have any hearing issues. The data are from four HI students at SLB PGRI Kamal and their English teacher. In collecting the data, observation, note taking and recording were used as the instruments in which during taking note and observation, the researchers used phone recorder and phone camera to anticipate the data from lost and for further analysis. The results show that there are seven communication modes that were used. Those are lip-reading, sign language, images, writing, Indonesian Alphabetic Symbol System (IAS), finger spelling, and speech. Those modes are combined depending on the needs of the users (both English teacher and HI students). Therefore, the researcher found six combinations of modes in TC. Moreover, the researchers found five reasons for using TC for HI students in the English classroom at SLB PGRI Kamal. It is because of its flexibility and effectiveness in communication. Futhermore, TC gives chances to HI students to learn to speak, write and read in English, and allow the HI students to make English sounds and identify them


Sign in / Sign up

Export Citation Format

Share Document