Machine Learning for Human Motion Analysis
Latest Publications


TOTAL DOCUMENTS

13
(FIVE YEARS 0)

H-INDEX

1
(FIVE YEARS 0)

Published By IGI Global

9781605669007, 9781605669014

Author(s):  
Dong Seon Cheng ◽  
Marco Cristani ◽  
Vittorio Murino

Image super-resolution is one of the most appealing applications of image processing, capable of retrieving a high resolution image by fusing several registered low resolution images depicting an object of interest. However, employing super-resolution in video data is challenging: a video sequence generally contains a lot of scattered information regarding several objects of interest in cluttered scenes. Especially with hand-held cameras, the overall quality may be poor due to low resolution or unsteadiness. The objective of this chapter is to demonstrate why standard image super-resolution fails in video data, which are the problems that arise, and how we can overcome these problems. In our first contribution, we propose a novel Bayesian framework for super-resolution of persistent objects of interest in video sequences. We call this process Distillation. In the traditional formulation of the image super-resolution problem, the observed target is (1) always the same, (2) acquired using a camera making small movements, and (3) found in a number of low resolution images sufficient to recover high-frequency information. These assumptions are usually unsatisfied in real world video acquisitions and often beyond the control of the video operator. With Distillation, we aim to extend and to generalize the image super-resolution task, embedding it in a structured framework that accurately distills all the informative bits of an object of interest. In practice, the Distillation process: i) individuates, in a semi supervised way, a set of objects of interest, clustering the related video frames and registering them with respect to global rigid transformations; ii) for each one, produces a high resolution image, by weighting each pixel according to the information retrieved about the object of interest. As a second contribution, we extend the Distillation process to deal with objects of interest whose transformations in the appearance are not (only) rigid. Such process, built on top of the Distillation, is hierarchical, in the sense that a process of clustering is applied recursively, beginning with the analysis of whole frames, and selectively focusing on smaller sub-regions whose isolated motion can be reasonably assumed as rigid. The ultimate product of the overall process is a strip of images that describe at high resolution the dynamics of the video, switching between alternative local descriptions in response to visual changes. Our approach is first tested on synthetic data, obtaining encouraging comparative results with respect to known super-resolution techniques, and a good robustness against noise. Second, real data coming from different videos are considered, trying to solve the major details of the objects in motion.


Author(s):  
Qingdi Wei ◽  
Xiaoqin Zhang ◽  
Weiming Hu

Action recognition is one of the most active research fields in computer vision. This chapter first reviews the action recognition methods in literature from two aspects: action representation and recognition strategy. Then, a novel method for classifying human actions from image sequences is investigated. In this method, each human action is represented by a sequence of shape context features of human silhouette during the action, and a dominant set-based approach is employed to classify the action to the predefined classes. The dominant set-based approach to classification is compared with K-means, mean shift, and Fuzzy-Cmean approaches.


Author(s):  
Scott Blunsden ◽  
Robert Fisher

This chapter presents a way to classify interactions between people. Examples of the interactions we investigate are: people meeting one another, walking together, and fighting. A new feature set is proposed along with a corresponding classification method. Results are presented which show the new method performing significantly better than the previous state of the art method as proposed by Oliver et al. (2000).


Author(s):  
Wanqing Li ◽  
Zhengyou Zhang ◽  
Zicheng Liu ◽  
Philip Ogunbona

This chapter first presents a brief review of the recent development in human action recognition. In particular, the principle and shortcomings of the conventional Hidden Markov Model (HMM) and its variants are discussed. We then introduce an expandable graphical model that represents the dynamics of human actions using a weighted directed graph, referred to as action graph. Unlike the conventional HMM, the action graph is shared by all actions to be recognized with each action being encoded in one or multiple paths and, thus, can be effectively and efficiently trained from a small number of samples. Furthermore, the action graph is expandable to incorporate new actions without being retrained and compromised. To verify the performance of the proposed expandable graphic model, a system that learns and recognizes human actions from sequences of silhouettes is developed and promising results are obtained.


Author(s):  
Konrad Schindler ◽  
Luc van Gool

Visual categorisation of human motion in video clips has been an active field of research in recent years. However, most published methods either analyse an entire video and assign it a single category label, or use relatively large look-ahead to classify each frame. Contrary to these strategies, the human visual system proves that simple categories can be recognised almost instantaneously. Here we present a system for categorisation from very short sequences (“snippets”) of 1–10 frames, and systematically evaluate it on several data sets. It turns out that even local shape and optic flow for a single frame are enough to achieve ˜80-90% correct classification, and snippets of 5-7 frames (0.2-0.3 seconds of video) yield results on par with the ones state-of-the-art methods obtain on entire video sequences.


Author(s):  
Vassilis Syrris

This work describes a simple and computationally efficient, appearance-based approach both for human pose recovery and for real-time recognition of basic human actions. We apply a technique that depicts the differences between two or more successive frames and we use a threshold filter to detect the regions of the video frames where some type of human motion is observed. From each frame difference, the algorithm extracts an incomplete and unformed human body shape and generates a skeleton model which represents it in an abstract way. Eventually, the recognition process is formulated as a time-series problem and handled by a very robust and accurate prediction method (Support Vector Regression). The proposed technique could be employed in applications such as surveillance and security systems.


Author(s):  
YingLi Tian ◽  
Rogerio Feris ◽  
Lisa Brown ◽  
Daniel Vaquero ◽  
Yun Zhai ◽  
...  

Visual processing of people, including detection, tracking, recognition, and behavior interpretation, is a key component of intelligent video surveillance systems. Computer vision algorithms with the capability of “looking at people” at multiple scales can be applied in different surveillance scenarios, such as farfield people detection for wide-area perimeter protection, mid-field people detection for retail/banking applications or parking lot monitoring, and near-field people/face detection for facility security and access. In this chapter, we address the people detection problem in different scales as well as human tracking and motion analysis for real video surveillance applications including people search, retail loss prevention, people counting, and display effectiveness.


Author(s):  
Lei Zhang ◽  
Jixu Chen ◽  
Zhi Zeng ◽  
Qiang Ji

Upper body tracking is a problem to track the pose of human body from video sequences. It is difficult due to such problems as the high dimensionality of the state space, the self-occlusion, the appearance changes, etc. In this paper, we propose a generic framework that can be used for both 2D and 3D upper body tracking and can be easily parameterized without heavily depending on supervised training. We first construct a Bayesian Network (BN) to represent the human upper body structure and then incorporate into the BN various generic physical and anatomical constraints on the parts of the upper body. Unlike the existing upper body models, we aim at handling physically feasible body motions rather than only some typical motions. We also explicitly model the body part occlusion in the model, which allows to automatically detect the occurrence of self-occlusion and to minimize the effect of measurement errors on the tracking accuracy due to occlusion. Using the proposed model, upper body tracking can be performed through probabilistic inference over time. A series of experiments were performed on both monocular and stereo video sequences to demonstrate the effectiveness and capability of the model in improving upper body tracking accuracy and robustness.


Author(s):  
Therdsak Tangkuampien ◽  
David Suter

A marker-less motion capture system, based on machine learning, is proposed and tested. Pose information is inferred from images captured from multiple (as few as two) synchronized cameras. The central concept of which, we call: Kernel Subspace Mapping (KSM). The images-to-pose learning could be done with large numbers of images of a large variety of people (and with the ground truth poses accurately known). Of course, obtaining the ground-truth poses could be problematic. Here we choose to use synthetic data (both for learning and for, at least some of, testing). The system needs to generalizes well to novel inputs:unseen poses (not in the training database) and unseen actors. For the learning we use a generic and relatively low fidelity computer graphic model and for testing we sometimes use a more accurate model (made to resemble the first author). What makes machine learning viable for human motion capture is that a high percentage of human motion is coordinated. Indeed, it is now relatively well known that there is large redundancy in the set of possible images of a human (these images form som sort of relatively smooth lower dimensional manifold in the huge dimensional space of all possible images) and in the set of pose angles (again, a low dimensional and smooth sub-manifold of the moderately high dimensional space of all possible joint angles). KSM, is based on the KPCA (Kernel PCA) algorithm, which is costly. We show that the Greedy Kernel PCA (GKPCA) algorithm can be used to speed up KSM, with relatively minor modifications. At the core, then, is two KPCA’s (or two GKPCA’s) - one for the learning of pose manifold and one for the learning image manifold. Then we use a modification of Local Linear Embedding (LLE) to bridge between pose and image manifolds.


Author(s):  
Ronald Poppe

We present a discriminative approach to human action recognition. At the heart of our approach is the use of common spatial patterns (CSP), a spatial filter technique that transforms temporal feature data by using differences in variance between two classes. Such a transformation focuses on differences between classes, rather than on modeling each class individually. As a result, to distinguish between two classes, we can use simple distance metrics in the low-dimensional transformed space. The most likely class is found by pairwise evaluation of all discriminant functions, which can be done in real-time. Our image representations are silhouette boundary gradients, spatially binned into cells. We achieve scores of approximately 96% on the Weizmann human action dataset, and show that reasonable results can be obtained when training on only a single subject. We further compare our results with a recent examplarbased approach. Future work is aimed at combining our approach with automatic human detection.


Sign in / Sign up

Export Citation Format

Share Document