Semi-supervised sequence modeling for improved behavioral segmentation

A popular approach to quantifying animal behavior from video data is through discrete behavioral segmentation, wherein video frames are labeled as containing one or more behavior classes such as walking or grooming. Sequence models learn to map behavioral features extracted from video frames to discrete behaviors, and both supervised and unsupervised methods are common. However, each approach has its drawbacks: supervised models require a time-consuming annotation step where humans must hand label the desired behaviors; unsupervised models may fail to accurately segment particular behaviors of interest. We introduce a semi-supervised approach that addresses these challenges by constructing a sequence model loss function with (1) a standard supervised loss that classifies a sparse set of hand labels; (2) a weakly supervised loss that classifies a set of easy-to-compute heuristic labels; and (3) a self-supervised loss that predicts the evolution of the behavioral features. With this approach, we show that a large number of unlabeled frames can improve supervised segmentation in the regime of sparse hand labels and also show that a small number of hand labeled frames can increase the precision of unsupervised segmentation.

Download Full-text

Bootstrapping Weakly Supervised Segmentation-free Word Spotting through HMM-based Alignment

2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR) ◽

10.1109/icfhr2020.2020.00020 ◽

2020 ◽

Author(s):

Tomas Wilkinson ◽

Carl Nettelblad

Keyword(s):

Word Spotting ◽

Supervised Segmentation ◽

Weakly Supervised ◽

Free Word

Download Full-text

Detecting Toe-Off Events Utilizing a Vision-Based Method

Entropy ◽

10.3390/e21040329 ◽

2019 ◽

Vol 21 (4) ◽

pp. 329 ◽

Cited By ~ 4

Author(s):

Yunqi Tang ◽

Zhuorong Li ◽

Huawei Tian ◽

Jianwei Ding ◽

Bingxian Lin

Keyword(s):

Wearable Sensors ◽

Gait Pattern ◽

Video Data ◽

Detection Methods ◽

Detection Accuracy ◽

Public Database ◽

Video Frames ◽

Different Types ◽

Events Detection ◽

Good Detection

Detecting gait events from video data accurately would be a challenging problem. However, most detection methods for gait events are currently based on wearable sensors, which need high cooperation from users and power consumption restriction. This study presents a novel algorithm for achieving accurate detection of toe-off events using a single 2D vision camera without the cooperation of participants. First, a set of novel feature, namely consecutive silhouettes difference maps (CSD-maps), is proposed to represent gait pattern. A CSD-map can encode several consecutive pedestrian silhouettes extracted from video frames into a map. And different number of consecutive pedestrian silhouettes will result in different types of CSD-maps, which can provide significant features for toe-off events detection. Convolutional neural network is then employed to reduce feature dimensions and classify toe-off events. Experiments on a public database demonstrate that the proposed method achieves good detection accuracy.

Download Full-text

Optimal Scale of Hierarchical Image Segmentation with Scribbles Guidance for Weakly Supervised Semantic Segmentation

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001421540264 ◽

2021 ◽

pp. 2154026

Author(s):

Zaid Al-Huda ◽

Donghai Zhai ◽

Yan Yang ◽

Riyadh Nazar Ali Algburi

Keyword(s):

Image Segmentation ◽

Graphical Model ◽

Semantic Segmentation ◽

Saliency Map ◽

Training Data ◽

Deep Convolutional Neural Networks ◽

High Quality ◽

Optimal Scale ◽

Supervised Segmentation ◽

Weakly Supervised

Deep convolutional neural networks (DCNNs) trained on the pixel-level annotated images have achieved improvements in semantic segmentation. Due to the high cost of labeling training data, their applications may have great limitation. However, weakly supervised segmentation approaches can significantly reduce human labeling efforts. In this paper, we introduce a new framework to generate high-quality initial pixel-level annotations. By using a hierarchical image segmentation algorithm to predict the boundary map, we select the optimal scale of high-quality hierarchies. In the initialization step, scribble annotations and the saliency map are combined to construct a graphic model over the optimal scale segmentation. By solving the minimal cut problem, it can spread information from scribbles to unmarked regions. In the training process, the segmentation network is trained by using the initial pixel-level annotations. To iteratively optimize the segmentation, we use a graphical model to refine segmentation masks and retrain the segmentation network to get more precise pixel-level annotations. The experimental results on Pascal VOC 2012 dataset demonstrate that the proposed framework outperforms most of weakly supervised semantic segmentation methods and achieves the state-of-the-art performance, which is [Formula: see text] mIoU.

Download Full-text

Constrained Convolutional Neural Networks for Weakly Supervised Segmentation

2015 IEEE International Conference on Computer Vision (ICCV) ◽

10.1109/iccv.2015.209 ◽

2015 ◽

Cited By ~ 162

Author(s):

Deepak Pathak ◽

Philipp Krahenbuhl ◽

Trevor Darrell

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Supervised Segmentation ◽

Weakly Supervised

Download Full-text

Classification of Action Based Video using Heterogeneous Feature Extraction and SVM

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.k2089.0981119 ◽

2019 ◽

Vol 8 (11) ◽

pp. 1887-1892

Keyword(s):

Optical Flow ◽

Video Sequence ◽

Human Action ◽

Video Data ◽

Support Vector ◽

Svm Classifier ◽

Video Frames ◽

Integral Role ◽

Heterogeneous Feature

Action recognition (AR) plays a fundamental role in computer vision and video analysis. We are witnessing an astronomical increase of video data on the web and it is difficult to recognize the action in video due to different view point of camera. For AR in video sequence, it depends upon appearance in frame and optical flow in frames of video. In video spatial and temporal components of video frames features play integral role for better classification of action in videos. In the proposed system, RGB frames and optical flow frames are used for AR with the help of Convolutional Neural Network (CNN) pre-trained model Alex-Net extract features from fc7 layer. Support vector machine (SVM) classifier is used for the classification of AR in videos. For classification purpose, HMDB51 dataset have been used which includes 51 Classes of human action. The dataset is divided into 51 action categories. Using SVM classifier, extracted features are used for classification and achieved best result 95.6% accuracy as compared to other techniques of the state-of- art.v

Download Full-text

AUTOMATIC MRF-BASED REGISTRATION OF HIGH RESOLUTION SATELLITE VIDEO DATA

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprsannals-iii-1-121-2016 ◽

2016 ◽

Vol III-1 ◽

pp. 121-128 ◽

Cited By ~ 1

Author(s):

C. Platias ◽

M. Vakalopoulou ◽

K. Karantzalos

Keyword(s):

High Resolution ◽

Markov Random Fields ◽

Video Data ◽

Registration Accuracy ◽

Registration Method ◽

Computational Performance ◽

Markov Random ◽

Video Frames ◽

Reference Map ◽

The Cost

In this paper we propose a deformable registration framework for high resolution satellite video data able to automatically and accurately co-register satellite video frames and/or register them to a reference map/image. The proposed approach performs non-rigid registration, formulates a Markov Random Fields (MRF) model, while efficient linear programming is employed for reaching the lowest potential of the cost function. The developed approach has been applied and validated on satellite video sequences from Skybox Imaging and compared with a rigid, descriptor-based registration method. Regarding the computational performance, both the MRF-based and the descriptor-based methods were quite efficient, with the first one converging in some minutes and the second in some seconds. Regarding the registration accuracy the proposed MRF-based method significantly outperformed the descriptor-based one in all the performing experiments.

Download Full-text

Weakly Supervised Segmentation by a Deep Geodesic Prior

Machine Learning in Medical Imaging - Lecture Notes in Computer Science ◽

10.1007/978-3-030-32692-0_28 ◽

2019 ◽

pp. 238-246 ◽

Cited By ~ 1

Author(s):

Aliasghar Mortazi ◽

Naji Khosravan ◽

Drew A. Torigian ◽

Sila Kurugol ◽

Ulas Bagci

Keyword(s):

Supervised Segmentation ◽

Weakly Supervised

Download Full-text

k-SDPP: Fixed-Size Video Summarization via Sequential Determinantal Point Processes

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/108 ◽

2020 ◽

Author(s):

Jiping Zheng ◽

Ganfeng Lu

Keyword(s):

Point Processes ◽

Probabilistic Models ◽

Video Summarization ◽

Video Data ◽

Determinantal Point Processes ◽

Key Frame ◽

Video Frames ◽

Long Time ◽

Sequential Nature ◽

Markovian Assumption

With the explosive growth of video data, video summarization which converts long-time videos to key frame sequences has become an important task in information retrieval and machine learning. Determinantal point processes (DPPs) which are elegant probabilistic models have been successfully applied to video summarization. However, existing DPP-based video summarization methods suffer from poor efficiency of outputting a specified size summary or neglecting inherent sequential nature of videos. In this paper, we propose a new model in the DPP lineage named k-SDPP in vein of sequential determinantal point processes but with fixed user specified size k. Our k-SDPP partitions sampled frames of a video into segments where each segment is with constant number of video frames. Moreover, an efficient branch and bound method (BB) considering sequential nature of the frames is provided to optimally select k frames delegating the summary from the divided segments. Experimental results show that our proposed BB method outperforms not only k-DPP and sequential DPP (seqDPP) but also the partition and Markovian assumption based methods.

Download Full-text