Interpretation of optical flow through complex neural network

Author(s):  
Minami Miyauchi ◽  
Masatoshi Seki ◽  
Akira Watanabe ◽  
Arata Miyauchi
2020 ◽  
Vol 17 (4) ◽  
pp. 497-506
Author(s):  
Sunil Patel ◽  
Ramji Makwana

Automatic classification of dynamic hand gesture is challenging due to the large diversity in a different class of gesture, Low resolution, and it is performed by finger. Due to a number of challenges many researchers focus on this area. Recently deep neural network can be used for implicit feature extraction and Soft Max layer is used for classification. In this paper, we propose a method based on a two-dimensional convolutional neural network that performs detection and classification of hand gesture simultaneously from multimodal Red, Green, Blue, Depth (RGBD) and Optical flow Data and passes this feature to Long-Short Term Memory (LSTM) recurrent network for frame-to-frame probability generation with Connectionist Temporal Classification (CTC) network for loss calculation. We have calculated an optical flow from Red, Green, Blue (RGB) data for getting proper motion information present in the video. CTC model is used to efficiently evaluate all possible alignment of hand gesture via dynamic programming and check consistency via frame-to-frame for the visual similarity of hand gesture in the unsegmented input stream. CTC network finds the most probable sequence of a frame for a class of gesture. The frame with the highest probability value is selected from the CTC network by max decoding. This entire CTC network is trained end-to-end with calculating CTC loss for recognition of the gesture. We have used challenging Vision for Intelligent Vehicles and Applications (VIVA) dataset for dynamic hand gesture recognition captured with RGB and Depth data. On this VIVA dataset, our proposed hand gesture recognition technique outperforms competing state-of-the-art algorithms and gets an accuracy of 86%


Electronics ◽  
2021 ◽  
Vol 10 (3) ◽  
pp. 222
Author(s):  
Baigan Zhao ◽  
Yingping Huang ◽  
Hongjian Wei ◽  
Xing Hu

Visual odometry (VO) refers to incremental estimation of the motion state of an agent (e.g., vehicle and robot) by using image information, and is a key component of modern localization and navigation systems. Addressing the monocular VO problem, this paper presents a novel end-to-end network for estimation of camera ego-motion. The network learns the latent subspace of optical flow (OF) and models sequential dynamics so that the motion estimation is constrained by the relations between sequential images. We compute the OF field of consecutive images and extract the latent OF representation in a self-encoding manner. A Recurrent Neural Network is then followed to examine the OF changes, i.e., to conduct sequential learning. The extracted sequential OF subspace is used to compute the regression of the 6-dimensional pose vector. We derive three models with different network structures and different training schemes: LS-CNN-VO, LS-AE-VO, and LS-RCNN-VO. Particularly, we separately train the encoder in an unsupervised manner. By this means, we avoid non-convergence during the training of the whole network and allow more generalized and effective feature representation. Substantial experiments have been conducted on KITTI and Malaga datasets, and the results demonstrate that our LS-RCNN-VO outperforms the existing learning-based VO approaches.


2012 ◽  
Vol 116 (9) ◽  
pp. 953-966 ◽  
Author(s):  
Hatem A. Rashwan ◽  
Domenec Puig ◽  
Miguel Angel Garcia

2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Xiaoping Guo

Traditional text annotation-based video retrieval is done by manually labeling videos with text, which is inefficient and highly subjective and generally cannot accurately describe the meaning of videos. Traditional content-based video retrieval uses convolutional neural networks to extract the underlying feature information of images to build indexes and achieves similarity retrieval of video feature vectors according to certain similarity measure algorithms. In this paper, by studying the characteristics of sports videos, we propose the histogram difference method based on using transfer learning and the four-step method based on block matching for mutation detection and fading detection of video shots, respectively. By adaptive thresholding, regions with large frame difference changes are marked as candidate regions for shots, and then the shot boundaries are determined by mutation detection algorithm. Combined with the characteristics of sports video, this paper proposes a key frame extraction method based on clustering and optical flow analysis, and experimental comparison with the traditional clustering method. In addition, this paper proposes a key frame extraction algorithm based on clustering and optical flow analysis for key frame extraction of sports video. The algorithm effectively removes the redundant frames, and the extracted key frames are more representative. Through extensive experiments, the keyword fuzzy finding algorithm based on improved deep neural network and ontology semantic expansion proposed in this paper shows a more desirable retrieval performance, and it is feasible to use this method for video underlying feature extraction, annotation, and keyword finding, and one of the outstanding features of the algorithm is that it can quickly and effectively retrieve the desired video in a large number of Internet video resources, reducing the false detection rate and leakage rate while improving the fidelity, which basically meets people’s daily needs.


1994 ◽  
Author(s):  
Gabriella Convertino ◽  
Maddalena Brattoli ◽  
Arcangelo Distante

Author(s):  
Haiqun Qin ◽  
Ziyang Zhen ◽  
Kun Ma

Purpose The purpose of this paper is to meet the large demand for the new-generation intelligence monitoring systems that are used to detect targets within a dynamic background. Design/methodology/approach A dynamic target detection method based on the fusion of optical flow and neural network is proposed. Findings Simulation results verify the accuracy of the moving object detection based on optical flow and neural network fusion. The method eliminates the influence caused by the movement of the camera to detect the target and has the ability to extract a complete moving target. Practical implications It provides a powerful safeguard for target detection and targets the tracking application. Originality/value The proposed method represents the fusion of optical flow and neural network to detect the moving object, and it can be used in new-generation intelligent monitoring systems.


Sign in / Sign up

Export Citation Format

Share Document