Superpixel-Based Temporally Aligned Representation for Video-Based Person Re-Identification

Most existing person re-identification methods focus on matching still person images across non-overlapping camera views. Despite their excellent performance in some circumstances, these methods still suffer from occlusion and the changes of pose, viewpoint or lighting. Video-based re-id is a natural way to overcome these problems, by exploiting space–time information from videos. One of the most challenging problems in video-based person re-identification is temporal alignment, in addition to spatial alignment. To address the problem, we propose an effective superpixel-based temporally aligned representation for video-based person re-identification, which represents a video sequence only using one walking cycle. Particularly, we first build a candidate set of walking cycles by extracting motion information at superpixel level, which is more robust than that at the pixel level. Then, from the candidate set, we propose an effective criterion to select the walking cycle most matching the intrinsic periodicity property of walking persons. Finally, we propose a temporally aligned pooling scheme to describe the video data in the selected walking cycle. In addition, to characterize the individual still images in the cycle, we propose a superpixel-based representation to improve spatial alignment. Extensive experimental results on three public datasets demonstrate the effectiveness of the proposed method compared with the state-of-the-art approaches.

Download Full-text

Video-Based Person Re-Identification With Unregulated Sequences

International Journal of Digital Crime and Forensics ◽

10.4018/ijdcf.2020040104 ◽

2020 ◽

Vol 12 (2) ◽

pp. 59-76

Author(s):

Wenjun Huang ◽

Chao Liang ◽

Chunxia Xiao ◽

Zhen Han

Keyword(s):

Video Sequence ◽

Feature Representation ◽

Original Video ◽

Sequence Detection ◽

Video Images ◽

Discriminative Feature ◽

Refinement Method ◽

Public Datasets ◽

The Impact ◽

Time Information

Video-based person re-identification (re-id) has recently attracted widespread attentions because extra space-time information and more appearance cues in videos can be used to improve the performance of image-based person re-id. Most existing approaches equally treat person video images, ignoring their individual discrepancy. However, in real scenarios, captured images are usually contaminated by various noises, especially occlusions, resulting in a series of unregulated sequences. Through investigating the impact of unregulated sequences to feature representation of video-based person re-id, the authors find a remarkable promotion by eliminating noisy sub sequences. Based on this interesting finding, an adaptive unregulated sub sequence detection and refinement method is proposed to purify original video sequence and obtain a more effective and discriminative feature representation for video-based person re-id. Experimental results on two public datasets demonstrate that the proposed method outperforms the state-of-the-art work.

Download Full-text

Classification of Action Based Video using Heterogeneous Feature Extraction and SVM

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.k2089.0981119 ◽

2019 ◽

Vol 8 (11) ◽

pp. 1887-1892

Keyword(s):

Optical Flow ◽

Video Sequence ◽

Human Action ◽

Video Data ◽

Support Vector ◽

Svm Classifier ◽

Video Frames ◽

Integral Role ◽

Heterogeneous Feature

Action recognition (AR) plays a fundamental role in computer vision and video analysis. We are witnessing an astronomical increase of video data on the web and it is difficult to recognize the action in video due to different view point of camera. For AR in video sequence, it depends upon appearance in frame and optical flow in frames of video. In video spatial and temporal components of video frames features play integral role for better classification of action in videos. In the proposed system, RGB frames and optical flow frames are used for AR with the help of Convolutional Neural Network (CNN) pre-trained model Alex-Net extract features from fc7 layer. Support vector machine (SVM) classifier is used for the classification of AR in videos. For classification purpose, HMDB51 dataset have been used which includes 51 Classes of human action. The dataset is divided into 51 action categories. Using SVM classifier, extracted features are used for classification and achieved best result 95.6% accuracy as compared to other techniques of the state-of- art.v

Download Full-text

Data Model and Software Architecture for Business Process Model Generator

JOIV International Journal on Informatics Visualization ◽

10.30630/joiv.2.4-2.176 ◽

2018 ◽

Vol 2 (4-2) ◽

pp. 349

Author(s):

Ivaylo Kamenarov ◽

Katalina Grigorova

Keyword(s):

Software Architecture ◽

Business Process ◽

Data Model ◽

Process Model ◽

Process Models ◽

Process Chain ◽

Business Process Models ◽

Model Generator ◽

The Individual ◽

Natural Way

This paper describes the internal data model for a business process generator. Business process models are stored in an Event-driven process chain notation that provides a natural way to link the individual elements of a process. There is a software architecture that makes it easy to communicate with users as well as external systems.

Download Full-text

Within and Between Shot Information Utilisation in Video Key Frame Extraction

Journal of Information & Knowledge Management ◽

10.1142/s0219649211002961 ◽

2011 ◽

Vol 10 (03) ◽

pp. 247-259 ◽

Cited By ~ 4

Author(s):

Dianting Liu ◽

Mei-Ling Shyu ◽

Chao Chen ◽

Shu-Ching Chen

Keyword(s):

Extraction Method ◽

Video Sequence ◽

Video Retrieval ◽

Video Browsing ◽

High Quality ◽

Key Frame Extraction ◽

Key Frame ◽

Video Shot ◽

Key Frames ◽

Candidate Set

In consequence of the popularity of family video recorders and the surge of Web 2.0, increasing amounts of videos have made the management and integration of the information in videos an urgent and important issue in video retrieval. Key frames, as a high-quality summary of videos, play an important role in the areas of video browsing, searching, categorisation, and indexing. An effective set of key frames should include major objects and events of the video sequence, and should contain minimum content redundancies. In this paper, an innovative key frame extraction method is proposed to select representative key frames for a video. By analysing the differences between frames and utilising the clustering technique, a set of key frame candidates (KFCs) is first selected at the shot level, and then the information within a video shot and between video shots is used to filter the candidate set to generate the final set of key frames. Experimental results on the TRECVID 2007 video dataset have demonstrated the effectiveness of our proposed key frame extraction method in terms of the percentage of the extracted key frames and the retrieval precision.

Download Full-text

To Find Where You Talk: Temporal Sentence Localization in Video with Attention Based Location Regression

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33019159 ◽

2019 ◽

Vol 33 ◽

pp. 9159-9166 ◽

Cited By ~ 16

Author(s):

Yitian Yuan ◽

Tao Mei ◽

Wenwu Zhu

Keyword(s):

Video Sequence ◽

Temporal Structure ◽

Research Community ◽

Location Prediction ◽

Video Content ◽

Global Context ◽

Effectiveness And Efficiency ◽

Public Datasets ◽

Temporal Localization ◽

Informative Text

We have witnessed the tremendous growth of videos over the Internet, where most of these videos are typically paired with abundant sentence descriptions, such as video titles, captions and comments. Therefore, it has been increasingly crucial to associate specific video segments with the corresponding informative text descriptions, for a deeper understanding of video content. This motivates us to explore an overlooked problem in the research community — temporal sentence localization in video, which aims to automatically determine the start and end points of a given sentence within a paired video. For solving this problem, we face three critical challenges: (1) preserving the intrinsic temporal structure and global context of video to locate accurate positions over the entire video sequence; (2) fully exploring the sentence semantics to give clear guidance for localization; (3) ensuring the efficiency of the localization method to adapt to long videos. To address these issues, we propose a novel Attention Based Location Regression (ABLR) approach to localize sentence descriptions in videos in an efficient end-to-end manner. Specifically, to preserve the context information, ABLR first encodes both video and sentence via Bi-directional LSTM networks. Then, a multi-modal co-attention mechanism is presented to generate both video and sentence attentions. The former reflects the global video structure, while the latter highlights the sentence details for temporal localization. Finally, a novel attention based location prediction network is designed to regress the temporal coordinates of sentence from the previous attentions. We evaluate the proposed ABLR approach on two public datasets ActivityNet Captions and TACoS. Experimental results show that ABLR significantly outperforms the existing approaches in both effectiveness and efficiency.

Download Full-text

Multimodal Registration Procedure for the Initial Spatial Alignment of a Retinal Video Sequence to a Retinal Composite Image

IEEE Transactions on Biomedical Engineering ◽

10.1109/tbme.2010.2048710 ◽

2010 ◽

Vol 57 (8) ◽

pp. 1991-2000 ◽

Cited By ~ 3

Author(s):

A Martina Broehan ◽

Christoph Tappeiner ◽

Simon P Rothenbuehler ◽

Tobias Rudolph ◽

Christoph A Amstutz ◽

...

Keyword(s):

Video Sequence ◽

Composite Image ◽

Multimodal Registration ◽

Registration Procedure ◽

Spatial Alignment

Download Full-text

AUTOMATIC PEDESTRIAN SEGMENTATION COMBINING SHAPE, PUZZLE AND APPEARANCE

International Journal of Artificial Intelligence Tools ◽

10.1142/s021821301360004x ◽

2013 ◽

Vol 22 (05) ◽

pp. 1360004 ◽

Cited By ~ 1

Author(s):

YANLI LI ◽

ZHONG ZHOU ◽

WEI WU

Keyword(s):

Integration Scheme ◽

Partial Occlusion ◽

Still Images ◽

Multiple Cues ◽

Qualitative And Quantitative ◽

Novel Approach ◽

Public Datasets ◽

Human Silhouette ◽

Quantitative Evaluations

In this paper, we address the problem of automatically segmenting non-rigid pedestrians in still images. Since this task is well known difficult for any type of model or cue alone, a novel approach utilizing shape, puzzle and appearance cues is presented. The major contribution of this approach lies in the combination of multiple cues to refine pedestrian segmentation successively, which has two characterizations: (1) a shape guided puzzle integration scheme, which extracts pedestrians via assembling puzzles with constraint of a shape template; (2) a pedestrian refinement scheme, which is fulfilled by optimizing an automatically generated trimap that encodes both human silhouette and skeleton. Qualitative and quantitative evaluations on several public datasets verify the approach's effectiveness to various articulated bodies, human appearance and partial occlusion, and that this approach is able to segment pedestrians more accurately than methods based only on appearance or shape cue.

Download Full-text

ROBUST COLOR OBJECT TRACKING WITH APPLICATION TO PEOPLE MONITORING

International Journal of Image and Graphics ◽

10.1142/s0219467807002647 ◽

2007 ◽

Vol 07 (02) ◽

pp. 227-254

Author(s):

JI TAO ◽

YAP-PENG TAN ◽

WENMIAO LU

Keyword(s):

Monitoring System ◽

Video Sequence ◽

Video Data ◽

Color Features ◽

Color Histograms ◽

Monitoring Camera ◽

Real Video ◽

Closed Environment ◽

Tracking And Recognition

We present an automated and complete camera-based monitoring system that makes use of low-level color features to perform detection, tracking and recognition of multiple people in video sequence. Specifically, the system employs a novel coverage check-up method to segment detected foreground regions into isolated people and then localize each of them. During tracking, the appearances of people are modeled by their color histograms so that the system can keep aware of their identities and recognize them after occlusions by maximizing the joint likelihood. To make the recognition more robust against shadows or changes of background illumination, the system also incorporates a shadow removal scheme to suppress shadow effects and hence improve the quality of color histogram. The proposed system has been used to identify people who re-enter the field of view of a monitoring camera in a closed-environment. Experimental results of real video data demonstrate the efficacy of the proposed people monitoring system.

Download Full-text

Seabed video and still images from the northern Weddell Sea and the western flanks of the Powell Basin

Earth System Science Data ◽

10.5194/essd-13-609-2021 ◽

2021 ◽

Vol 13 (2) ◽

pp. 609-615

Author(s):

Autun Purser ◽

Simon Dreutter ◽

Huw Griffiths ◽

Laura Hehemann ◽

Kerstin Jerosch ◽

...

Keyword(s):

Digital Camera ◽

Weddell Sea ◽

Video Data ◽

High Definition ◽

Short Baseline ◽

Still Images ◽

Still Image ◽

Topographic Complexity ◽

Camera Systems ◽

Quality Video

Abstract. Research vessels equipped with fibre optic and copper-cored coaxial cables support the live onboard inspection of high-bandwidth marine data in real time. This allows for towed still-image and video sleds to be equipped with latest-generation higher-resolution digital camera systems and additional sensors. During RV Polarstern expedition PS118 in February–April 2019, the recently developed Ocean Floor Observation and Bathymetry System (OFOBS) of the Alfred Wegener Institute was used to collect still-image and video data from the seafloor at a total of 11 predominantly ice-covered locations in the northern Weddell Sea and Powell Basin. Still images of 26-megapixel resolution and HD (high-definition) quality video data were recorded throughout each deployment. In addition to downward-facing video and still-image cameras, the OFOBS also mounted side-scanning and forward-facing acoustic systems, which facilitated safe deployment in areas of high topographic complexity, such as above the steep flanks of the Powell Basin and the rapidly shallowing, iceberg-scoured Nachtigaller Shoal. To localise collected data, the OFOBS system was equipped with a Posidonia transponder for ultra-short baseline triangulation of OFOBS positions. All images are available from: https://doi.org/10.1594/PANGAEA.911904 (Purser et al., 2020).

Download Full-text

Intelligent Querying in Camera Networks for Efficient Target Tracking

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/918 ◽

2019 ◽

Author(s):

Anil Sharma

Keyword(s):

Target Tracking ◽

Decision Process ◽

Visual Analytics ◽

High Accuracy ◽

Video Data ◽

Camera Network ◽

Markov Decision ◽

Target Templates ◽

The Individual ◽

Identification Techniques

Visual analytics applications often rely on target tracking across a network of cameras for inference and prediction. A network of cameras generates immense amount of video data and processing it for tracking a target is highly computationally expensive. Related works typically use data association and visual re-identification techniques to match target templates across multiple cameras. In this thesis, I propose to formulate this scheduling problem as a Markov Decision Process (MDP) and present a reinforcement learning based solution to schedule cameras by selecting one where the target is most likely to appear next. The proposed approach can be learned directly from data and doesn't require any information of the camera network topology. NLPR MCT and DukeMTMC datasets are used to show that the proposed policy significantly reduces the number of frames to be processed for tracking and identifies the camera schedule with high accuracy as compared to the related approaches. Finally, I will be formulating an end-to-end pipeline for target tracking that will learn a policy to find the camera schedule and to track the target in the individual camera frames of the schedule.

Download Full-text