Bridging the Communication Gap: With Real Time Sign Language Translation

2021 ◽

Vol 5 (2) ◽

pp. 1-30

Author(s):

HyeonJung Park ◽

Youngki Lee ◽

JeongGil Ko

Keyword(s):

Real Time ◽

Sign Language ◽

Data Augmentation ◽

Language Translation ◽

Mobile Platforms ◽

Depth Cameras ◽

Language Data ◽

In The Wild ◽

Environmental Robustness ◽

Cloud Servers

In this work we present SUGO, a depth video-based system for translating sign language to text using a smartphone's front camera. While exploiting depth-only videos offer benefits such as being less privacy-invasive compared to using RGB videos, it introduces new challenges which include dealing with low video resolutions and the sensors' sensitiveness towards user motion. We overcome these challenges by diversifying our sign language video dataset to be robust to various usage scenarios via data augmentation and design a set of schemes to emphasize human gestures from the input images for effective sign detection. The inference engine of SUGO is based on a 3-dimensional convolutional neural network (3DCNN) to classify a sequence of video frames as a pre-trained word. Furthermore, the overall operations are designed to be light-weight so that sign language translation takes place in real-time using only the resources available on a smartphone, with no help from cloud servers nor external sensing components. Specifically, to train and test SUGO, we collect sign language data from 20 individuals for 50 Korean Sign Language words, summing up to a dataset of ~5,000 sign gestures and collect additional in-the-wild data to evaluate the performance of SUGO in real-world usage scenarios with different lighting conditions and daily activities. Comprehensively, our extensive evaluations show that SUGO can properly classify sign words with an accuracy of up to 91% and also suggest that the system is suitable (in terms of resource usage, latency, and environmental robustness) to enable a fully mobile solution for sign language translation.

Download Full-text

Achieving Real-Time Sign Language Translation Using a Smartphone's True Depth Images

2020 International Conference on COMmunication Systems & NETworkS (COMSNETS) ◽

10.1109/comsnets48256.2020.9027420 ◽

2020 ◽

Author(s):

HyeonJung Park ◽

Jong-Seok Lee ◽

JeongGil Ko

Keyword(s):

Real Time ◽

Sign Language ◽

Language Translation ◽

Depth Images

Download Full-text

A real-time portable sign language translation system

2015 IEEE 58th International Midwest Symposium on Circuits and Systems (MWSCAS) ◽

10.1109/mwscas.2015.7282137 ◽

2015 ◽

Cited By ~ 14

Author(s):

Lih-Jen Kau ◽

Wan-Lin Su ◽

Pei-Ju Yu ◽

Sin-Jhan Wei

Keyword(s):

Real Time ◽

Sign Language ◽

Language Translation ◽

Translation System

Download Full-text

Indian Sign Language Recognition through Hybrid ConvNet-LSTM Networks

EMITTER International Journal of Engineering Technology ◽

10.24003/emitter.v9i1.613 ◽

2021 ◽

Vol 9 (1) ◽

pp. 182-203

Author(s):

Muthu Mariappan H ◽

Dr Gomathi V

Keyword(s):

Neural Network ◽

Computer Vision ◽

Real Time ◽

Sign Language ◽

Gesture Recognition ◽

Language Translation ◽

Video Gaming ◽

Language Recognition ◽

Sign Language Recognition ◽

Indian Sign Language

Dynamic hand gesture recognition is a challenging task of Human-Computer Interaction (HCI) and Computer Vision. The potential application areas of gesture recognition include sign language translation, video gaming, video surveillance, robotics, and gesture-controlled home appliances. In the proposed research, gesture recognition is applied to recognize sign language words from real-time videos. Classifying the actions from video sequences requires both spatial and temporal features. The proposed system handles the former by the Convolutional Neural Network (CNN), which is the core of several computer vision solutions and the latter by the Recurrent Neural Network (RNN), which is more efficient in handling the sequences of movements. Thus, the real-time Indian sign language (ISL) recognition system is developed using the hybrid CNN-RNN architecture. The system is trained with the proposed CasTalk-ISL dataset. The ultimate purpose of the presented research is to deploy a real-time sign language translator to break the hurdles present in the communication between hearing-impaired people and normal people. The developed system achieves 95.99% top-1 accuracy and 99.46% top-3 accuracy on the test dataset. The obtained results outperform the existing approaches using various deep models on different datasets.

Download Full-text