In-situ identification and recognition of multi-hand gestures using optimized deep residual network
The real-time perception of hand gestures in a deprived environment is a demanding machine vision task. The hand recognition operations are more strenuous with different illumination conditions and varying backgrounds. Robust recognition and classification are the vital steps to support effective human-machine interaction (HMI), virtual reality, etc. In this paper, the real-time hand action recognition is performed by using an optimized Deep Residual Network model. It incorporates a RetinaNet model for hand detection and a Depthwise Separable Convolutional (DSC) layer for precise hand gesture recognition. The proposed model overcomes the class imbalance problems encountered by the conventional single-stage hand action recognition algorithms. The integrated DSC layer reduces the computational parameters and enhances the recognition speed. The model utilizes a ResNet-101 CNN architecture as a Feature extractor. The model is trained and evaluated on the MITI-HD dataset and compared with the benchmark datasets (NUSHP-II, Senz-3D). The network achieved a higher Precision and Recall value for an IoU value of 0.5. It is realized that the RetinaNet-DSC model using ResNet-101 backbone network obtained higher Precision (99.21 %for AP0.5, 96.80%for AP0.75) for MITI-HD Dataset. Higher performance metrics are obtained for a value of γ= 2 and α= 0.25. The SGD with a momentum optimizer outperformed the other optimizers (Adam, RMSprop) for the datasets considered in the studies. The prediction time of the optimized deep residual network is 82 ms.