scholarly journals Semi-supervised sequence modeling for improved behavioral segmentation

2021 ◽  
Author(s):  
Matthew R Whiteway ◽  
Evan S Schaffer ◽  
Anqi Wu ◽  
E Kelly Buchanan ◽  
Omer F Onder ◽  
...  

A popular approach to quantifying animal behavior from video data is through discrete behavioral segmentation, wherein video frames are labeled as containing one or more behavior classes such as walking or grooming. Sequence models learn to map behavioral features extracted from video frames to discrete behaviors, and both supervised and unsupervised methods are common. However, each approach has its drawbacks: supervised models require a time-consuming annotation step where humans must hand label the desired behaviors; unsupervised models may fail to accurately segment particular behaviors of interest. We introduce a semi-supervised approach that addresses these challenges by constructing a sequence model loss function with (1) a standard supervised loss that classifies a sparse set of hand labels; (2) a weakly supervised loss that classifies a set of easy-to-compute heuristic labels; and (3) a self-supervised loss that predicts the evolution of the behavioral features. With this approach, we show that a large number of unlabeled frames can improve supervised segmentation in the regime of sparse hand labels and also show that a small number of hand labeled frames can increase the precision of unsupervised segmentation.

Entropy ◽  
2019 ◽  
Vol 21 (4) ◽  
pp. 329 ◽  
Author(s):  
Yunqi Tang ◽  
Zhuorong Li ◽  
Huawei Tian ◽  
Jianwei Ding ◽  
Bingxian Lin

Detecting gait events from video data accurately would be a challenging problem. However, most detection methods for gait events are currently based on wearable sensors, which need high cooperation from users and power consumption restriction. This study presents a novel algorithm for achieving accurate detection of toe-off events using a single 2D vision camera without the cooperation of participants. First, a set of novel feature, namely consecutive silhouettes difference maps (CSD-maps), is proposed to represent gait pattern. A CSD-map can encode several consecutive pedestrian silhouettes extracted from video frames into a map. And different number of consecutive pedestrian silhouettes will result in different types of CSD-maps, which can provide significant features for toe-off events detection. Convolutional neural network is then employed to reduce feature dimensions and classify toe-off events. Experiments on a public database demonstrate that the proposed method achieves good detection accuracy.


Author(s):  
Zaid Al-Huda ◽  
Donghai Zhai ◽  
Yan Yang ◽  
Riyadh Nazar Ali Algburi

Deep convolutional neural networks (DCNNs) trained on the pixel-level annotated images have achieved improvements in semantic segmentation. Due to the high cost of labeling training data, their applications may have great limitation. However, weakly supervised segmentation approaches can significantly reduce human labeling efforts. In this paper, we introduce a new framework to generate high-quality initial pixel-level annotations. By using a hierarchical image segmentation algorithm to predict the boundary map, we select the optimal scale of high-quality hierarchies. In the initialization step, scribble annotations and the saliency map are combined to construct a graphic model over the optimal scale segmentation. By solving the minimal cut problem, it can spread information from scribbles to unmarked regions. In the training process, the segmentation network is trained by using the initial pixel-level annotations. To iteratively optimize the segmentation, we use a graphical model to refine segmentation masks and retrain the segmentation network to get more precise pixel-level annotations. The experimental results on Pascal VOC 2012 dataset demonstrate that the proposed framework outperforms most of weakly supervised semantic segmentation methods and achieves the state-of-the-art performance, which is [Formula: see text] mIoU.


Action recognition (AR) plays a fundamental role in computer vision and video analysis. We are witnessing an astronomical increase of video data on the web and it is difficult to recognize the action in video due to different view point of camera. For AR in video sequence, it depends upon appearance in frame and optical flow in frames of video. In video spatial and temporal components of video frames features play integral role for better classification of action in videos. In the proposed system, RGB frames and optical flow frames are used for AR with the help of Convolutional Neural Network (CNN) pre-trained model Alex-Net extract features from fc7 layer. Support vector machine (SVM) classifier is used for the classification of AR in videos. For classification purpose, HMDB51 dataset have been used which includes 51 Classes of human action. The dataset is divided into 51 action categories. Using SVM classifier, extracted features are used for classification and achieved best result 95.6% accuracy as compared to other techniques of the state-of- art.v


Author(s):  
C. Platias ◽  
M. Vakalopoulou ◽  
K. Karantzalos

In this paper we propose a deformable registration framework for high resolution satellite video data able to automatically and accurately co-register satellite video frames and/or register them to a reference map/image. The proposed approach performs non-rigid registration, formulates a Markov Random Fields (MRF) model, while efficient linear programming is employed for reaching the lowest potential of the cost function. The developed approach has been applied and validated on satellite video sequences from Skybox Imaging and compared with a rigid, descriptor-based registration method. Regarding the computational performance, both the MRF-based and the descriptor-based methods were quite efficient, with the first one converging in some minutes and the second in some seconds. Regarding the registration accuracy the proposed MRF-based method significantly outperformed the descriptor-based one in all the performing experiments.


Author(s):  
Aliasghar Mortazi ◽  
Naji Khosravan ◽  
Drew A. Torigian ◽  
Sila Kurugol ◽  
Ulas Bagci

Author(s):  
Jiping Zheng ◽  
Ganfeng Lu

With the explosive growth of video data, video summarization which converts long-time videos to key frame sequences has become an important task in information retrieval and machine learning. Determinantal point processes (DPPs) which are elegant probabilistic models have been successfully applied to video summarization. However, existing DPP-based video summarization methods suffer from poor efficiency of outputting a specified size summary or neglecting inherent sequential nature of videos. In this paper, we propose a new model in the DPP lineage named k-SDPP in vein of sequential determinantal point processes but with fixed user specified size k. Our k-SDPP partitions sampled frames of a video into segments where each segment is with constant number of video frames. Moreover, an efficient branch and bound method (BB) considering sequential nature of the frames is provided to optimally select k frames delegating the summary from the divided segments. Experimental results show that our proposed BB method outperforms not only k-DPP and sequential DPP (seqDPP) but also the partition and Markovian assumption based methods.


Sign in / Sign up

Export Citation Format

Share Document