Hand gesture recognition via image processing techniques and deep CNN

2020 ◽  
Vol 39 (3) ◽  
pp. 4405-4418
Author(s):  
Yao-Liang Chung ◽  
Hung-Yuan Chung ◽  
Wei-Feng Tsai

In the present study, we sought to enable instant tracking of the hand region as a region of interest (ROI) within the image range of a webcam, while also identifying specific hand gestures to facilitate the control of home appliances in smart homes or issuing of commands to human-computer interaction fields. To accomplish this objective, we first applied skin color detection and noise processing to remove unnecessary background information from the captured image, before applying background subtraction for detection of the ROI. Then, to prevent background objects or noise from influencing the ROI, we utilized the kernelized correlation filters (KCF) algorithm to implement tracking of the detected ROI. Next, the size of the ROI image was resized to 100×120 and input into a deep convolutional neural network (CNN) to enable the identification of various hand gestures. In the present study, two deep CNN architectures modified from the AlexNet CNN and VGGNet CNN, respectively, were developed by substantially reducing the number of network parameters used and appropriately adjusting internal network configuration settings. Then, the tracking and recognition process described above was continuously repeated to achieve immediate effect, with the execution of the system continuing until the hand is removed from the camera range. The results indicated excellent performance by both of the proposed deep CNN architectures. In particular, the modified version of the VGGNet CNN achieved better performance with a recognition rate of 99.90% for the utilized training data set and a recognition rate of 95.61% for the utilized test data set, which indicate the good feasibility of the system for practical applications.

2019 ◽  
Vol 8 (4) ◽  
pp. 12842-12845

Automating the analysis of facial expressions of individuals is one of the challenging tasks in opinion mining. In this work, the proposed technique for identifying the face of an individual and the emotions, if present from a live camera. Expression detection is one of the sub-areas of computer visions which is capable of finding a person from a digital image and identify the facial expression which are the key factors of nonverbal communication. Complexity involves mainly in two cases viz., 1)if more than one emotions coexist on a face. 2) expressing same emotion between individuals is not exactly same. Our aim was to make the processes automatic by identify the expressions of people in a live video. In this system OpenCV library containing face recognizer module for detecting the face and for training the model. It was able to identify the seven different expressions with 75-85% accuracy. The expressions identified are happy, sadness, disgust, fear, anger, surprise and neutral. The this an image frame from is captured from the video, locate the face in it and then test it against the training data for predicting the emotion and update the result. This process is continued till the video input exists. On top of this the data set for training should be in such a way that , it prediction should be independent of age, gender, skin color orientation of the human face in the video and also the lamination around the subject of reference


2015 ◽  
Vol 781 ◽  
pp. 531-534
Author(s):  
Weera Kompreyarat ◽  
Thanasin Bunnam

In this paper, we propose a development of Thai Buddha amulet identification using simple local correlation features. By using this technique, it has an ability to deal with variety of the amulet materials and colors in the same generation with less computation complexity. Moreover, it is able to apply for semi-controlled environment, which states that the image just has a plain background color that different from the amulet one. This article uses K-nearest neighbors as classification technique. The experiment was done automatically by using amulet images from the internet, which ensured that each image in the same class had a different in light intensity, contrast and color. There were 240 images with 80 classes for training data set and 751 images for test data set. The result shows that the proposed method gains a high recognition rate about 89.35%.


2018 ◽  
Vol 2018 ◽  
pp. 1-12
Author(s):  
Yi Ning Xie ◽  
Lian Yu ◽  
Guo Hui Guan ◽  
Yong Jun He

DNA ploidy analysis of cells is an automation technique applied in pathological diagnosis. It is important for this technique to classify various nuclei images accurately. However, the lack of overlapping nuclei images in training data (imbalanced training data) results in low recognition rates of overlapping nuclei images. To solve this problem, a new method which synthesizes overlapping nuclei images with single-nuclei images is proposed. Firstly, sample selection is employed to make the synthesized samples representative. Secondly, random functions are used to control the rotation angles of the nucleus and the distance between the centroids of the nucleus, increasing the sample diversity. Then, the Lambert-Beer law is applied to reassign the pixels of overlapping parts, thus making the synthesized samples quite close to the real ones. Finally, all synthesized samples are added to the training sets for classifier training. The experimental results show that images synthesized by this method can solve the data set imbalance problem and improve the recognition rate of DNA ploidy analysis systems.


2012 ◽  
Vol 605-607 ◽  
pp. 2179-2182 ◽  
Author(s):  
Lan Lan Wu ◽  
Jie Wu ◽  
You Xian Wen ◽  
Li Rong Xiong ◽  
Yu Zheng

The study was conducted to identify three types of non-touching grain kernels using a colour machine vision system. Images of individual cereal grain kernels were acquired using an camera. Shape feature was extracted from binary and edge images of cereal grain kernels obtained by iamge processing for classification. A total of 13 shape feature parameters, including region area, perimeter, length, width, the maximum radius, the smallest radius etc, were extracted from each kernel to use as input to the Bayesian classifier. Experimental results showed that the Bayesian classifier gave better classification with a calssificaiton accuracy of 99.67% for indica type rice, followed by 98.67% and 78.33% for japonica rice and glutinous rice using training set, respectively. The classification system was developed with Bayesian classifier that achieved an overall recognition rate of 92.22% with training data set and furthermore, a classification accuracy of 90% for the testing data set.


2021 ◽  
Vol 2021 ◽  
pp. 1-16
Author(s):  
Mengmeng Huang ◽  
Fang Liu ◽  
Xianfa Meng

Synthetic Aperture Radar (SAR), as one of the important and significant methods for obtaining target characteristics in the field of remote sensing, has been applied to many fields including intelligence search, topographic surveying, mapping, and geological survey. In SAR field, the SAR automatic target recognition (SAR ATR) is a significant issue. However, on the other hand, it also has high application value. The development of deep learning has enabled it to be applied to SAR ATR. Some researchers point out that existing convolutional neural network (CNN) paid more attention to texture information, which is often not as good as shape information. Wherefore, this study designs the enhanced-shape CNN, which enhances the target shape at the input. Further, it uses an improved attention module, so that the network can highlight target shape in SAR images. Aiming at the problem of the small scale of the existing SAR data set, a small sample experiment is conducted. Enhanced-shape CNN achieved a recognition rate of 99.29% when trained on the full training set, while it is 89.93% on the one-eighth training data set.


Author(s):  
Wening Mustikarini ◽  
Risanuri Hidayat ◽  
Agus Bejo

Abstract — Automatic Speech Recognition (ASR) is a technology that uses machines to process and recognize human voice. One way to increase recognition rate is to use a model of language you want to recognize. In this paper, a speech recognition application is introduced to recognize words "atas" (up), "bawah" (down), "kanan" (right), and "kiri" (left). This research used 400 samples of speech data, 75 samples from each word for training data and 25 samples for each word for test data. This speech recognition system was designed using Mel Frequency Cepstral Coefficient (MFCC) as many as 13 coefficients as features and Support Vector Machine (SVM) as identifiers. The system was tested with linear kernels and RBF, various cost values, and three sample sizes (n = 25, 75, 50). The best average accuracy value was obtained from SVM using linear kernels, a cost value of 100 and a data set consisted of 75 samples from each class. During the training phase, the system showed a f1-score (trade-off value between precision and recall) of 80% for the word "atas", 86% for the word "bawah", 81% for the word "kanan", and 100% for the word "kiri". Whereas by using 25 new samples per class for system testing phase, the f1-score was 76% for the "atas" class, 54% for the "bawah" class, 44% for the "kanan" class, and 100% for the "kiri" class.


2014 ◽  
Vol 2014 ◽  
pp. 1-9 ◽  
Author(s):  
Zhi-hua Chen ◽  
Jung-Tae Kim ◽  
Jianning Liang ◽  
Jing Zhang ◽  
Yu-Bo Yuan

Hand gesture recognition is very significant for human-computer interaction. In this work, we present a novel real-time method for hand gesture recognition. In our framework, the hand region is extracted from the background with the background subtraction method. Then, the palm and fingers are segmented so as to detect and recognize the fingers. Finally, a rule classifier is applied to predict the labels of hand gestures. The experiments on the data set of 1300 images show that our method performs well and is highly efficient. Moreover, our method shows better performance than a state-of-art method on another data set of hand gestures.


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Xiaoying Shen ◽  
Chao Yuan

With the development of the live broadcast industry, security issues in the live broadcast process have become increasingly apparent. At present, the supervision of various live broadcast platforms is basically in a state of human supervision. Manpower supervision is mainly through user reporting and platform supervision measures. However, there are a large number of live broadcast rooms at the same time, and only relying on human supervision can no longer meet the monitoring needs of live broadcasts. Based on this situation, this study proposes a violation information recognition method of a live-broadcasting platform based on machine learning technology. By analyzing the similarities and differences between normal live broadcasts and violation live broadcasts, combined with the characteristics of violation image data, this study mainly detects human skin color and sensitive parts. A prominent feature of violation images is that they contain a large area of naked skin, and the ratio of the area of naked skin to the overall image area of the violation image will exceed the threshold. Skin color recognition plays a role in initial target positioning. The accuracy of skin color recognition is directly related to the recognition accuracy of the entire system, so skin color recognition is the most important part of violation information recognition. Although there are many effective skin color recognition technologies, the accuracy and stability of skin color recognition still need to be improved due to the influence of various external factors, such as light intensity, light source color, and physical equipment. When it is detected that the area of the skin color in the live screen exceeds the threshold, it is preliminarily determined to be a suspected violation video. In order to improve the recognition accuracy, it is necessary to detect sensitive parts of the suspected video. Naked female breasts are a very obvious feature in violation images. This study uses a chest feature extraction method to detect the chest in the image. When the recognition result is a violation image, it is determined that the live broadcast involves violation content. The machine learning algorithm is simple to implement, and the parameters are easy to adjust. The classifier training requires a short time and is suitable for live violation information recognition scenarios. The experimental results on the adopted data set show that the method used in this article can effectively detect videos with violation content. The recognition rate is as high as 85.98%, which is suitable for a real-life environment and has good practical significance.


2020 ◽  
Vol 30 (05) ◽  
pp. 2050020 ◽  
Author(s):  
Qingguo Wei ◽  
Shan Zhu ◽  
Yijun Wang ◽  
Xiaorong Gao ◽  
Hai Guo ◽  
...  

Canonical correlation analysis (CCA) is an effective spatial filtering algorithm widely used in steady-state visual evoked potential (SSVEP)-based brain-computer interfaces (BCIs). In existing CCA methods, training data are used for constructing templates of stimulus targets and the spatial filters are created between the template signals and a single-trial testing signal. The fact that spatial filters rely on testing data, however, results in low classification performance of CCA compared to other state-of-the-art algorithms such as task-related component analysis (TRCA). In this study, we proposed a novel CCA method in which spatial filters are estimated using training data only. This is achieved by using observed EEG training data and their SSVEP components as the two inputs of CCA and the objective function is optimized by averaging multiple training trials. In this case, we proved in theory that the two spatial filters estimated by the CCA are equivalent, and that the CCA and TRCA are also equivalent under certain hypotheses. A benchmark SSVEP data set from 35 subjects was used to compare the performance of the two algorithms according to different lengths of data, numbers of channels and numbers of training trials. In addition, the CCA was also compared with power spectral density analysis (PSDA). The experimental results suggest that the CCA is equivalent to TRCA if the signal-to-noise ratio of training data is high enough; otherwise, the CCA outperforms TRCA in terms of classification accuracy. The CCA is much faster than PSDA in detecting time of targets. The robustness of the training data-driven CCA to noise gives it greater potential in practical applications.


2021 ◽  
Vol 14 (2) ◽  
pp. 120-128
Author(s):  
Mohammed Ehsan Safi ◽  
Eyad I. Abbas

In personal image recognition algorithms, two effective factors govern the system’s evaluation, recognition rate and size of the database. Unfortunately, the recognition rate proportional to the increase in training sets. Consequently, that increases the processing time and memory limitation problems. This paper’s main goal was to present a robust algorithm with minimum data sets and a high recognition rate. Images for ten persons were chosen as a database, nine images for each individual as the full version of the training data set, and one image for each person out of the training set as a test pattern before the database reduction procedure. The proposed algorithm integrates Principal Component Analysis (PCA) as a feature extraction technique with the minimum means of clusters and Euclidean Distance to achieve personal recognition. After indexing the training set for each person, the clustering of the differences is determined. The recognition of the person represented by the minimum mean index; this process returned with each reduction. The experimental results show that the recognition rate is 100% despite reducing the training sets to 44%, while the recognition rate decrease to 70% when the reduction reaches 89%. The clear picture out is the results of the proposed system support the idea of the redaction of training sets in addition to obtaining a high recognition rate based on application requirements.


Sign in / Sign up

Export Citation Format

Share Document