Basic investigation of sign language motion classification by feature extraction using pre-trained network models

Author(s):  
Kaito Kawaguchi ◽  
Hiromitsu Nishimura ◽  
Zhizhong Wang ◽  
Hiroshi Tanaka ◽  
Eiji Ohta
Author(s):  
Mohammad H. Ismail ◽  
Shefa A. Dawwd ◽  
Fakhradeen H. Ali

An Arabic sign language recognition using two concatenated deep convolution neural network models DenseNet121 & VGG16 is presented. The pre-trained models are fed with images, and then the system can automatically recognize the Arabic sign language. To evaluate the performance of concatenated two models in the Arabic sign language recognition, the red-green-blue (RGB) images for various static signs are collected in a dataset. The dataset comprises 220,000 images for 44 categories: 32 letters, 11 numbers (0:10), and 1 for none. For each of the static signs, there are 5000 images collected from different volunteers. The pre-trained models were used and trained on prepared Arabic sign language data. These models were used after some modification. Also, an attempt has been made to adopt two models from the previously trained models, where they are trained in parallel deep feature extractions. Then they are combined and prepared for the classification stage. The results demonstrate the comparison between the performance of the single model and multi-model. It appears that most of the multi-model is better in feature extraction and classification than the single models. And also show that when depending on the total number of incorrect recognize sign image in training, validation and testing dataset, the best convolutional neural networks (CNN) model in feature extraction and classification Arabic sign language is the DenseNet121 for a single model using and DenseNet121 & VGG16 for multi-model using.


2020 ◽  
Vol 20 (5) ◽  
pp. 60-67
Author(s):  
Dilara Gumusbas ◽  
Tulay Yildirim

AbstractOffline signature is one of the frequently used biometric traits in daily life and yet skilled forgeries are posing a great challenge for offline signature verification. To differentiate forgeries, a variety of research has been conducted on hand-crafted feature extraction methods until now. However, these methods have recently been set aside for automatic feature extraction methods such as Convolutional Neural Networks (CNN). Although these CNN-based algorithms often achieve satisfying results, they require either many samples in training or pre-trained network weights. Recently, Capsule Network has been proposed to model with fewer data by using the advantage of convolutional layers for automatic feature extraction. Moreover, feature representations are obtained as vectors instead of scalar activation values in CNN to keep orientation information. Since signature samples per user are limited and feature orientations in signature samples are highly informative, this paper first aims to evaluate the capability of Capsule Network for signature identification tasks on three benchmark databases. Capsule Network achieves 97 96, 94 89, 95 and 91% accuracy on CEDAR, GPDS-100 and MCYT databases for 64×64 and 32×32 resolutions, which are lower than usual, respectively. The second aim of the paper is to generalize the capability of Capsule Network concerning the verification task. Capsule Network achieves average 91, 86, and 89% accuracy on CEDAR, GPDS-100 and MCYT databases for 64×64 resolutions, respectively. Through this evaluation, the capability of Capsule Network is shown for offline verification and identification tasks.


Symmetry ◽  
2020 ◽  
Vol 12 (7) ◽  
pp. 1193
Author(s):  
Shaochen Jiang ◽  
Liejun Wang ◽  
Shuli Cheng ◽  
Anyu Du ◽  
Yongming Li

The existing learning-based unsupervised hashing method usually uses a pre-trained network to extract features, and then uses the extracted feature vectors to construct a similarity matrix which guides the generation of hash codes through gradient descent. Existing research shows that the algorithm based on gradient descent will cause the hash codes of the paired images to be updated toward each other’s position during the training process. For unsupervised training, this situation will cause large fluctuations in the hash code during training and limit the learning efficiency of the hash code. In this paper, we propose a method named Deep Unsupervised Hashing with Gradient Attention (UHGA) to solve this problem. UHGA mainly includes the following contents: (1) use pre-trained network models to extract image features; (2) calculate the cosine distance of the corresponding features of the pair of images, and construct a similarity matrix through the cosine distance to guide the generation of hash codes; (3) a gradient attention mechanism is added during the training of the hash code to pay attention to the gradient. Experiments on two existing public datasets show that our proposed method can obtain more discriminating hash codes.


2019 ◽  
Vol 9 (13) ◽  
pp. 2683 ◽  
Author(s):  
Sang-Ki Ko ◽  
Chang Jo Kim ◽  
Hyedong Jung ◽  
Choongsang Cho

We propose a sign language translation system based on human keypoint estimation. It is well-known that many problems in the field of computer vision require a massive dataset to train deep neural network models. The situation is even worse when it comes to the sign language translation problem as it is far more difficult to collect high-quality training data. In this paper, we introduce the KETI (Korea Electronics Technology Institute) sign language dataset, which consists of 14,672 videos of high resolution and quality. Considering the fact that each country has a different and unique sign language, the KETI sign language dataset can be the starting point for further research on the Korean sign language translation. Using the KETI sign language dataset, we develop a neural network model for translating sign videos into natural language sentences by utilizing the human keypoints extracted from the face, hands, and body parts. The obtained human keypoint vector is normalized by the mean and standard deviation of the keypoints and used as input to our translation model based on the sequence-to-sequence architecture. As a result, we show that our approach is robust even when the size of the training data is not sufficient. Our translation model achieved 93.28% (55.28%, respectively) translation accuracy on the validation set (test set, respectively) for 105 sentences that can be used in emergency situations. We compared several types of our neural sign translation models based on different attention mechanisms in terms of classical metrics for measuring the translation performance.


Sensors ◽  
2020 ◽  
Vol 20 (4) ◽  
pp. 1085
Author(s):  
Kaifeng Zhang ◽  
Dan Li ◽  
Jiayun Huang ◽  
Yifei Chen

The detection of pig behavior helps detect abnormal conditions such as diseases and dangerous movements in a timely and effective manner, which plays an important role in ensuring the health and well-being of pigs. Monitoring pig behavior by staff is time consuming, subjective, and impractical. Therefore, there is an urgent need to implement methods for identifying pig behavior automatically. In recent years, deep learning has been gradually applied to the study of pig behavior recognition. Existing studies judge the behavior of the pig only based on the posture of the pig in a still image frame, without considering the motion information of the behavior. However, optical flow can well reflect the motion information. Thus, this study took image frames and optical flow from videos as two-stream input objects to fully extract the temporal and spatial behavioral characteristics. Two-stream convolutional network models based on deep learning were proposed, including inflated 3D convnet (I3D) and temporal segment networks (TSN) whose feature extraction network is Residual Network (ResNet) or the Inception architecture (e.g., Inception with Batch Normalization (BN-Inception), InceptionV3, InceptionV4, or InceptionResNetV2) to achieve pig behavior recognition. A standard pig video behavior dataset that included 1000 videos of feeding, lying, walking, scratching and mounting from five kinds of different behavioral actions of pigs under natural conditions was created. The dataset was used to train and test the proposed models, and a series of comparative experiments were conducted. The experimental results showed that the TSN model whose feature extraction network was ResNet101 was able to recognize pig feeding, lying, walking, scratching, and mounting behaviors with a higher average of 98.99%, and the average recognition time of each video was 0.3163 s. The TSN model (ResNet101) is superior to the other models in solving the task of pig behavior recognition.


2019 ◽  
Vol 14 (2) ◽  
pp. 158-164 ◽  
Author(s):  
G. Emayavaramban ◽  
A. Amudha ◽  
T. Rajendran ◽  
M. Sivaramkumar ◽  
K. Balachandar ◽  
...  

Background: Identifying user suitability plays a vital role in various modalities like neuromuscular system research, rehabilitation engineering and movement biomechanics. This paper analysis the user suitability based on neural networks (NN), subjects, age groups and gender for surface electromyogram (sEMG) pattern recognition system to control the myoelectric hand. Six parametric feature extraction algorithms are used to extract the features from sEMG signals such as AR (Autoregressive) Burg, AR Yule Walker, AR Covariance, AR Modified Covariance, Levinson Durbin Recursion and Linear Prediction Coefficient. The sEMG signals are modeled using Cascade Forward Back propagation Neural Network (CFBNN) and Pattern Recognition Neural Network. Methods: sEMG signals generated from forearm muscles of the participants are collected through an sEMG acquisition system. Based on the sEMG signals, the type of movement attempted by the user is identified in the sEMG recognition module using signal processing, feature extraction and machine learning techniques. The information about the identified movement is passed to microcontroller wherein a control is developed to command the prosthetic hand to emulate the identified movement. Results: From the six feature extraction algorithms and two neural network models used in the study, the maximum classification accuracy of 95.13% was obtained using AR Burg with Pattern Recognition Neural Network. This justifies that the Pattern Recognition Neural Network is best suited for this study as the neural network model is specially designed for pattern matching problem. Moreover, it has simple architecture and low computational complexity. AR Burg is found to be the best feature extraction technique in this study due to its high resolution for short data records and its ability to always produce a stable model. In all the neural network models, the maximum classification accuracy is obtained for subject 10 as a result of his better muscle fitness and his maximum involvement in training sessions. Subjects in the age group of 26-30 years are best suited for the study due to their better muscle contractions. Better muscle fatigue resistance has contributed for better performance of female subjects as compared to male subjects. From the single trial analysis, it can be observed that the hand close movement has achieved best recognition rate for all neural network models. Conclusion: In this paper a study was conducted to identify user suitability for designing hand prosthesis. Data were collected from ten subjects for twelve tasks related to finger movements. The suitability of the user was identified using two neural networks with six parametric features. From the result, it was concluded thatfit women doing regular physical exercises aged between 26-30 years are best suitable for developing HMI for designing a prosthetic hand. Pattern Recognition Neural Network with AR Burg extraction features using extension movements will be a better way to design the HMI. However, Signal acquisition based on wireless method is worth considering for the future.


Sign in / Sign up

Export Citation Format

Share Document