An Attention-Enhanced Multi-Scale and Dual Sign Language Recognition Network Based on a Graph Convolution Network

Lu Meng; Ronghui Li

doi:10.3390/s21041120

An Attention-Enhanced Multi-Scale and Dual Sign Language Recognition Network Based on a Graph Convolution Network

Sensors ◽

10.3390/s21041120 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1120

Author(s):

Lu Meng ◽

Ronghui Li

Keyword(s):

Sign Language ◽

Attention Mechanism ◽

Language Recognition ◽

Sign Language Recognition ◽

Accuracy Rate ◽

Long Distance ◽

Convolutional Network ◽

Attention Network ◽

Multi Scale ◽

Human Finger

Sign language is the most important way of communication for hearing-impaired people. Research on sign language recognition can help normal people understand sign language. We reviewed the classic methods of sign language recognition, and the recognition accuracy is not high enough because of redundant information, human finger occlusion, motion blurring, the diversified signing styles of different people, and so on. To overcome these shortcomings, we propose a multi-scale and dual sign language recognition Network (SLR-Net) based on a graph convolutional network (GCN). The original input data was RGB videos. We first extracted the skeleton data from them and then used the skeleton data for sign language recognition. SLR-Net is mainly composed of three sub-modules: multi-scale attention network (MSA), multi-scale spatiotemporal attention network (MSSTA) and attention enhanced temporal convolution network (ATCN). MSA allows the GCN to learn the dependencies between long-distance vertices; MSSTA can directly learn the spatiotemporal features; ATCN allows the GCN network to better learn the long temporal dependencies. The three different attention mechanisms, multi-scale attention mechanism, spatiotemporal attention mechanism, and temporal attention mechanism, are proposed to further improve the robustness and accuracy. Besides, a keyframe extraction algorithm is proposed, which can greatly improve efficiency by sacrificing a little accuracy. Experimental results showed that our method can reach 98.08% accuracy rate in the CSL-500 dataset with a 500-word vocabulary. Even on the challenging dataset DEVISIGN-L with a 2000-word vocabulary, it also reached a 64.57% accuracy rate, outperforming other state-of-the-art sign language recognition methods.

Get full-text (via PubEx)

CRB-Net: A Sign Language Recognition Deep Learning Strategy Based on Multi-modal Fusion with Attention Mechanism *

10.1109/smc52423.2021.9659090 ◽

2021 ◽

Author(s):

Feng Xiao ◽

Cong Shen ◽

Tiantian Yuan ◽

Shengyong Chen

Keyword(s):

Deep Learning ◽

Sign Language ◽

Learning Strategy ◽

Attention Mechanism ◽

Language Recognition ◽

Sign Language Recognition

Get full-text (via PubEx)

Isolated Sign Language Recognition with Multi-scale Features using LSTM

2019 27th Signal Processing and Communications Applications Conference (SIU) ◽

10.1109/siu.2019.8806467 ◽

2019 ◽

Author(s):

Ozge Mercanoglu Sincan ◽

Anil Osman Tur ◽

Hacer Yalim Keles

Keyword(s):

Sign Language ◽

Language Recognition ◽

Sign Language Recognition ◽

Multi Scale

Get full-text (via PubEx)

Dilated Convolutional Network with Iterative Optimization for Continuous Sign Language Recognition

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/123 ◽

2018 ◽

Cited By ~ 11

Author(s):

Junfu Pu ◽

Wengang Zhou ◽

Houqiang Li

Keyword(s):

Sign Language ◽

Real World ◽

Sequence Learning ◽

Input Sequence ◽

Learning Model ◽

Optimization Strategy ◽

Language Recognition ◽

Sign Language Recognition ◽

Convolutional Network ◽

Iterative Optimization

This paper presents a novel deep neural architecture with iterative optimization strategy for real-world continuous sign language recognition. Generally, a continuous sign language recognition system consists of visual input encoder for feature extraction and a sequence learning model to learn the correspondence between the input sequence and the output sentence-level labels. We use a 3D residual convolutional network (3D-ResNet) to extract visual features. After that, a stacked dilated convolutional network with Connectionist Temporal Classification (CTC) is applied for learning the mapping between the sequential features and the text sentence. The deep network is hard to train since the CTC loss has limited contribution to early CNN parameters. To alleviate this problem, we design an iterative optimization strategy to train our architecture. We generate pseudo-labels for video clips from sequence learning model with CTC, and fine-tune the 3D-ResNet with the supervision of pseudo-labels for a better feature representation. We alternately optimize feature extractor and sequence learning model with iterative steps. Experimental results on RWTH-PHOENIX-Weather, a large real-world continuous sign language recognition benchmark, demonstrate the advantages and effectiveness of our proposed method.

Get full-text (via PubEx)

Isolated Sign Language Recognition with Multi-Scale Spatial-Temporal Graph Convolutional Networks

10.1109/cvprw53098.2021.00385 ◽

2021 ◽

Author(s):

Manuel Vazquez-Enriquez ◽

Jose L. Alba-Castro ◽

Laura Docio-Fernandez ◽

Eduardo Rodriguez-Banga

Keyword(s):

Sign Language ◽

Language Recognition ◽

Sign Language Recognition ◽

Convolutional Networks ◽

Multi Scale ◽

Temporal Graph

Get full-text (via PubEx)

American Sign Language Recognition Based on MobileNetV2

Advances in Science Technology and Engineering Systems Journal ◽

10.25046/aj050657 ◽

2020 ◽

Vol 5 (6) ◽

pp. 481-488

Author(s):

Kin Yun Lum ◽

Yeh Huann Goh ◽

Yi Bin Lee

Keyword(s):

American Sign Language ◽

Sign Language ◽

American Sign ◽

Language Recognition ◽

Sign Language Recognition

Get full-text (via PubEx)

INDIAN SIGN LANGUAGE RECOGNITION SYSTEM USING OPENPOSE

i-manager s Journal on Computer Science ◽

10.26634/jcom.7.2.15993 ◽

2019 ◽

Vol 7 (2) ◽

pp. 43

Author(s):

MALHOTRA POOJA ◽

K. MANIAR CHIRAG ◽

V. SANKPAL NIKHIL ◽

R. THAKKAR HARDIK ◽

◽

...

Keyword(s):

Sign Language ◽

Recognition System ◽

Language Recognition ◽

Sign Language Recognition ◽

Indian Sign Language

Get full-text (via PubEx)

STATIC DEVNAGARI SIGN LANGUAGE RECOGNITION

i-manager’s Journal on Pattern Recognition ◽

10.26634/jpr.3.3.12406 ◽

2016 ◽

Vol 3 (3) ◽

pp. 13

Author(s):

VERMA VERSHA ◽

PATIL SANDEEP B. ◽

◽

Keyword(s):

Sign Language ◽

Language Recognition ◽

Sign Language Recognition

Get full-text (via PubEx)

Sign Language Recognition Techniques - A Survey

International Journal of Psychosocial Rehabilitation ◽

10.37200/ijpr/v24i5/pr201978 ◽

2020 ◽

Vol 24 (5) ◽

pp. 2747-2760

Author(s):

Mahidar P.R.

Keyword(s):

Sign Language ◽

Language Recognition ◽

Sign Language Recognition

Get full-text (via PubEx)

Technological Aids for Deaf and Mute in Modern World

Recent Patents on Engineering ◽

10.2174/1872212114999201116214802 ◽

2020 ◽

Vol 14 ◽

Author(s):

Vasu Mehra ◽

Dhiraj Pandey ◽

Aayush Rastogi ◽

Aditya Singh ◽

Harsh Preet Singh

Keyword(s):

Sign Language ◽

Gesture Recognition ◽

Recognition System ◽

Modern World ◽

Language Recognition ◽

Sign Language Recognition ◽

Background Elimination ◽

Sign Recognition ◽

Technological Advances ◽

Better Than

Background:: People suffering from hearing and speaking disabilities have a few ways of communicating with other people. One of these is to communicate through the use of sign language. Objective:: Developing a system for sign language recognition becomes essential for deaf as well as a mute person. The recognition system acts as a translator between a disabled and an able person. This eliminates the hindrances in exchange of ideas. Most of the existing systems are very poorly designed with limited support for the needs of their day to day facilities. Methods:: The proposed system embedded with gesture recognition capability has been introduced here which extracts signs from a video sequence and displays them on screen. On the other hand, a speech to text as well as text to speech system is also introduced to further facilitate the grieved people. To get the best out of human computer relationship, the proposed solution consists of various cutting-edge technologies and Machine Learning based sign recognition models which have been trained by using Tensor Flow and Keras library. Result:: The proposed architecture works better than several gesture recognition techniques like background elimination and conversion to HSV because of sharply defined image provided to the model for classification. The results of testing indicate reliable recognition systems with high accuracy that includes most of the essential and necessary features for any deaf and dumb person in his/her day to day tasks. Conclusion:: It’s the need of current technological advances to develop reliable solutions which can be deployed to assist deaf and dumb people to adjust to normal life. Instead of focusing on a standalone technology, a plethora of them have been introduced in this proposed work. Proposed Sign Recognition System is based on feature extraction and classification. The trained model helps in identification of different gestures.

Get full-text (via PubEx)

Indian Sign Language Recognition on PYNQ Board

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813999200909110140 ◽

2020 ◽

Vol 13 ◽

Author(s):

Sukhendra Singh ◽

G. N. Rathna ◽

Vivek Singhal

Keyword(s):

Sign Language ◽

Hand Gesture ◽

Language Recognition ◽

Sign Language Recognition ◽

Hand Gestures ◽

Depth Images ◽

Kinect Camera ◽

Impaired People ◽

Web Camera ◽

Indian Sign Language

Introduction: Sign language is the only way to communicate for speech-impaired people. But this sign language is not known to normal people so this is the cause of barrier in communicating. This is the problem faced by speech impaired people. In this paper, we have presented our solution which captured hand gestures with Kinect camera and classified the hand gesture into its correct symbol. Method: We used Kinect camera not the ordinary web camera because the ordinary camera does not capture its 3d orientation or depth of an image from camera however Kinect camera can capture 3d image and this will make classification more accurate. Result: Kinect camera will produce a different image for hand gestures for ‘2’ and ‘V’ and similarly for ‘1’ and ‘I’ however, normal web camera will not be able to distinguish between these two. We used hand gesture for Indian sign language and our dataset had 46339, RGB images and 46339 depth images. 80% of the total images were used for training and the remaining 20% for testing. In total 36 hand gestures were considered to capture alphabets and alphabets from A-Z and 10 for numeric, 26 for digits from 0-9 were considered to capture alphabets and Keywords. Conclusion: Along with real-time implementation, we have also shown the comparison of the performance of the various machine learning models in which we have found out the accuracy of CNN on depth- images has given the most accurate performance than other models. All these resulted were obtained on PYNQ Z2 board.

Get full-text (via PubEx)