Real Time Lip Reading System Implementation in Embedded Environment

Abstract. Visual information plays a key role in automatic speech recognition (ASR) when audio is corrupted by background noise, or even inaccessible. Speech recognition using visual information is called lip-reading. The initial idea of visual speech recognition comes from humans’ experience: we are able to recognize spoken words from the observation of a speaker's face without or with limited access to the sound part of the voice. Based on the conducted experimental evaluations as well as on analysis of the research field we propose a novel task-oriented approach towards practical lip-reading system implementation. Its main purpose is to be some kind of a roadmap for researchers who need to build a reliable visual speech recognition system for their task. In a rough approximation, we can divide the task of lip-reading into two parts, depending on the complexity of the problem. First, if we need to recognize isolated words, numbers or small phrases (e.g. Telephone numbers with a strict grammar or keywords). Or second, if we need to recognize continuous speech (phrases or sentences). All these stages disclosed in detail in this paper. Based on the proposed approach we implemented from scratch automatic visual speech recognition systems of three different architectures: GMM-CHMM, DNN-HMM and purely End-to-end. A description of the methodology, tools, step-by-step development and all necessary parameters are disclosed in detail in current paper. It is worth noting that for the Russian speech recognition, such systems were created for the first time.

Download Full-text

Personal computer based real time lip reading system

WCC 2000 - ICSP 2000. 2000 5th International Conference on Signal Processing Proceedings. 16th World Computer Congress 2000 ◽

10.1109/icosp.2000.891794 ◽

2002 ◽

Cited By ~ 1

Author(s):

K. Sugahara ◽

M. Kishino ◽

R. Konishi

Keyword(s):

Real Time ◽

Personal Computer ◽

Lip Reading ◽

Computer Based ◽

Reading System

Download Full-text

Real Time Realization of Lip Reading System on the Personal Computer

Transactions of the Society of Instrument and Control Engineers ◽

10.9746/sicetr1965.36.1145 ◽

2000 ◽

Vol 36 (12) ◽

pp. 1145-1151 ◽

Cited By ~ 1

Author(s):

Kazunori SUGAHARA ◽

Toshimi SHINCHI ◽

Makoto KISHINO ◽

Ryosuke KONISHI

Keyword(s):

Real Time ◽

Personal Computer ◽

Lip Reading ◽

Reading System

Download Full-text

Real-time lip reading system for fixed phrase and its combination

The First Asian Conference on Pattern Recognition ◽

10.1109/acpr.2011.6166595 ◽

2011 ◽

Cited By ~ 1

Author(s):

Takeshi Saitoh

Keyword(s):

Real Time ◽

Lip Reading ◽

Reading System

Download Full-text

Real-time word lip reading system based on trajectory feature

IEEJ Transactions on Electrical and Electronic Engineering ◽

10.1002/tee.20658 ◽

2011 ◽

Vol 6 (3) ◽

pp. 289-291 ◽

Cited By ~ 4

Author(s):

Takeshi Saitoh ◽

Ryosuke Konishi

Keyword(s):

Real Time ◽

Lip Reading ◽

Reading System

Download Full-text

Colour and Geometric based Model for Lip Localisation: Application for Lip-reading System

14th International Conference on Image Analysis and Processing (ICIAP 2007) ◽

10.1109/iciap.2007.4362750 ◽

2007 ◽

Cited By ~ 12

Author(s):

Salah Werda ◽

Walid Mahdi ◽

Abdelmajid Ben Hamadou

Keyword(s):

Lip Reading ◽

Reading System

Download Full-text

Attendance System Implementation Using Real Time Face Recognition

10.1109/icrito51393.2021.9596399 ◽

2021 ◽

Author(s):

Priyanka Tyagi ◽

Mayank Kaushik ◽

Harshit Kumar Singh ◽

Nikhil Jaiswal

Keyword(s):

Face Recognition ◽

Real Time ◽

System Implementation

Download Full-text

LIP-READING VIA DEEP NEURAL NETWORKS USING HYBRID VISUAL FEATURES

Image Analysis & Stereology ◽

10.5566/ias.1859 ◽

2018 ◽

Vol 37 (2) ◽

pp. 159 ◽

Cited By ~ 2

Author(s):

Fatemeh Vakhshiteh ◽

Farshad Almasganj ◽

Ahmad Nickabadi

Keyword(s):

Speech Intelligibility ◽

Deep Neural Networks ◽

Visual Speech ◽

Visual Features ◽

Noisy Environments ◽

Phone Recognition ◽

Facial Information ◽

Visual Speech Recognition ◽

Lip Reading ◽

Reading System

Lip-reading is typically known as visually interpreting the speaker's lip movements during speaking. Experiments over many years have revealed that speech intelligibility increases if visual facial information becomes available. This effect becomes more apparent in noisy environments. Taking steps toward automating this process, some challenges will be raised such as coarticulation phenomenon, visual units' type, features diversity and their inter-speaker dependency. While efforts have been made to overcome these challenges, presentation of a flawless lip-reading system is still under the investigations. This paper searches for a lipreading model with an efficiently developed incorporation and arrangement of processing blocks to extract highly discriminative visual features. Here, application of a properly structured Deep Belief Network (DBN)- based recognizer is highlighted. Multi-speaker (MS) and speaker-independent (SI) tasks are performed over CUAVE database, and phone recognition rates (PRRs) of 77.65% and 73.40% are achieved, respectively. The best word recognition rates (WRRs) achieved in the tasks of MS and SI are 80.25% and 76.91%, respectively. Resulted accuracies demonstrate that the proposed method outperforms the conventional Hidden Markov Model (HMM) and competes well with the state-of-the-art visual speech recognition works.

Download Full-text

Tracking of Aircrafts Using Software Defined Radio (SDR) With An Antenna

International Journal of Scientific Research in Science and Technology ◽

10.32628/ijsrst2183148 ◽

2021 ◽

pp. 660-665

Author(s):

H. Venkatesh Kumar ◽

Surabhi. G ◽

Neha V ◽

Sandesh. Y. M ◽

Sagar Kumar. H. S

Keyword(s):

Real Time ◽

Software Defined Radio ◽

Air Transportation ◽

Transportation System ◽

Air Traffic ◽

Next Generation ◽

System Implementation ◽

Traffic Surveillance ◽

Radar Systems

Automatic Dependent Surveillance-Broadcast (ADS-B) is one in all the favoured technologies employed in air traffic surveillance. The ADS- B uses a band of 1090 MHz. ADS-B is attended with the prevailing radar-based technologies to locate aircraft. The Next Generation Air Transportation System (NGATS) conflicts can be detected and resolved by the coexistence of radar systems and ADS-B. Here we tend to track the aircraft using Software Defined Radio, hence the complexness and the value of ADS-B system implementation is drastically reduced. SDR can receive multiple numbers of aircraft information like altitude, latitude, longitude, speed, and direction in real-time and displayed by using an appropriate antenna. The usage of SDR maximizes the coverage of data with accuracy and may accomplish timely.

Download Full-text