Spatio-temporal Image Tracking Based on Optical Flow and Clustering: An Endoneurosonographic Application

Author(s):  
Andrés F. Serna-Morales ◽  
Flavio Prieto ◽  
Eduardo Bayro-Corrochano
2020 ◽  
Vol 34 (07) ◽  
pp. 10713-10720
Author(s):  
Mingyu Ding ◽  
Zhe Wang ◽  
Bolei Zhou ◽  
Jianping Shi ◽  
Zhiwu Lu ◽  
...  

A major challenge for video semantic segmentation is the lack of labeled data. In most benchmark datasets, only one frame of a video clip is annotated, which makes most supervised methods fail to utilize information from the rest of the frames. To exploit the spatio-temporal information in videos, many previous works use pre-computed optical flows, which encode the temporal consistency to improve the video segmentation. However, the video segmentation and optical flow estimation are still considered as two separate tasks. In this paper, we propose a novel framework for joint video semantic segmentation and optical flow estimation. Semantic segmentation brings semantic information to handle occlusion for more robust optical flow estimation, while the non-occluded optical flow provides accurate pixel-level temporal correspondences to guarantee the temporal consistency of the segmentation. Moreover, our framework is able to utilize both labeled and unlabeled frames in the video through joint training, while no additional calculation is required in inference. Extensive experiments show that the proposed model makes the video semantic segmentation and optical flow estimation benefit from each other and outperforms existing methods under the same settings in both tasks.


2009 ◽  
pp. 388-415 ◽  
Author(s):  
Wai Chee Yau ◽  
Dinesh Kant Kumar ◽  
Hans Weghorn

The performance of a visual speech recognition technique is greatly influenced by the choice of visual speech features. Speech information in the visual domain can be generally categorized into static (mouth appearance) and motion (mouth movement) features. This chapter reviews a number of computer-based lip-reading approaches using motion features. The motion-based visual speech recognition techniques can be broadly categorized into two types of algorithms: optical-flow and image subtraction. Image subtraction techniques have been demonstrated to outperform optical-flow based methods in lip-reading. The problem with image subtraction-based method using difference of frames (DOF) is that these features capture the changes in the images over time, but do not indicate the direction of the mouth movement. New motion features to overcome the limitation of the conventional image subtraction-based techniques in visual speech recognition are presented in this chapter. The proposed approach extracts features by applying motion segmentation on image sequences. Video data are represented in a 2-D space using grayscale images named as motion history images (MHI). MHIs are spatio-temporal templates that implicitly encode the temporal component of mouth movement. Zernike moments are computed from MHIs as image descriptors and classified using support vector machines (SVMs). Experimental results demonstrate that the proposed technique yield a high accuracy in a phoneme classification task. The results suggest that dynamic information is important for visual speech recognition.


Author(s):  
Plinio Moreno ◽  
Dario Figueira ◽  
Alexandre Bernardino ◽  
José Santos-Victor

The goal of this work is to distinguish between humans and robots in a mixed human-robot environment. We analyze the spatio-temporal patterns of optical flow-based features along several frames. We consider the Histogram of Optical Flow (HOF) and the Motion Boundary Histogram (MBH) features, which have shown good results on people detection. The spatio-temporal patterns are composed of groups of feature components that have similar values on previous frames. The groups of features are fed into the FuzzyBoost algorithm, which at each round selects the spatio-temporal pattern (i.e. feature set) having the lowest classification error. The search for patterns is guided by grouping feature dimensions, considering three algorithms: (a) similarity of weights from dimensionality reduction matrices, (b) Boost Feature Subset Selection (BFSS) and (c) Sequential Floating Feature Selection (SFSS), which avoid the brute force approach. The similarity weights are computed by the Multiple Metric Learning for large Margin Nearest Neighbor (MMLMNN), a linear dimensionality algorithm that provides a type of Mahalanobis metric Weinberger and Saul, J. MaCh. Learn. Res.10 (2009) 207–244. The experiments show that FuzzyBoost brings good generalization properties, better than the GentleBoost, the Support Vector Machines (SVM) with linear kernels and SVM with Radial Basis Function (RBF) kernels. The classifier was implemented and tested in a real-time, multi-camera dynamic setting.


Sensors ◽  
2019 ◽  
Vol 19 (23) ◽  
pp. 5142 ◽  
Author(s):  
Dong Liang ◽  
Jiaxing Pan ◽  
Han Sun ◽  
Huiyu Zhou

Foreground detection is an important theme in video surveillance. Conventional background modeling approaches build sophisticated temporal statistical model to detect foreground based on low-level features, while modern semantic/instance segmentation approaches generate high-level foreground annotation, but ignore the temporal relevance among consecutive frames. In this paper, we propose a Spatio-Temporal Attention Model (STAM) for cross-scene foreground detection. To fill the semantic gap between low and high level features, appearance and optical flow features are synthesized by attention modules via the feature learning procedure. Experimental results on CDnet 2014 benchmarks validate it and outperformed many state-of-the-art methods in seven evaluation metrics. With the attention modules and optical flow, its F-measure increased 9 % and 6 % respectively. The model without any tuning showed its cross-scene generalization on Wallflower and PETS datasets. The processing speed was 10.8 fps with the frame size 256 by 256.


1990 ◽  
Author(s):  
Thomas R. Tsao ◽  
Victor C. Chen ◽  
John M. Libert ◽  
Haw-Jye S. Shyu

Sign in / Sign up

Export Citation Format

Share Document