scholarly journals Human Segmentation and Tracking Survey on Masks for MADS Dataset

Sensors ◽  
2021 ◽  
Vol 21 (24) ◽  
pp. 8397
Author(s):  
Van-Hung Le ◽  
Rafal Scherer

Human segmentation and tracking often use the outcome of person detection in the video. Thus, the results of segmentation and tracking depend heavily on human detection results in the video. With the advent of Convolutional Neural Networks (CNNs), there are excellent results in this field. Segmentation and tracking of the person in the video have significant applications in monitoring and estimating human pose in 2D images and 3D space. In this paper, we performed a survey of many studies, methods, datasets, and results for human segmentation and tracking in video. We also touch upon detecting persons as it affects the results of human segmentation and human tracking. The survey is performed in great detail up to source code paths. The MADS (Martial Arts, Dancing and Sports) dataset comprises fast and complex activities. It has been published for the task of estimating human posture. However, before determining the human pose, the person needs to be detected as a segment in the video. Moreover, in the paper, we publish a mask dataset to evaluate the segmentation and tracking of people in the video. In our MASK MADS dataset, we have prepared 28 k mask images. We also evaluated the MADS dataset for segmenting and tracking people in the video with many recently published CNNs methods.

Author(s):  
SANG-HO CHO ◽  
TAEWAN KIM ◽  
DAIJIN KIM

This paper proposes a pose robust human detection and identification method for sequences of stereo images using multiply-oriented 2D elliptical filters (MO2DEFs), which can detect and identify humans regardless of scale and pose. Four 2D elliptical filters with specific orientations are applied to a 2D spatial-depth histogram, and threshold values are used to detect humans. The human pose is then determined by finding the filter whose convolution result was maximal. Candidates are verified by either detecting the face or matching head-shoulder shapes. Human identification employs the human detection method for a sequence of input stereo images and identifies them as a registered human or a new human using the Bhattacharyya distance of the color histogram. Experimental results show that (1) the accuracy of pose angle estimation is about 88%, (2) human detection using the proposed method outperforms that of using the existing Object Oriented Scale Adaptive Filter (OOSAF) by 15–20%, especially in the case of posed humans, and (3) the human identification method has a nearly perfect accuracy.


Author(s):  
Swati Nigam ◽  
Rajiv Singh ◽  
A. K. Misra

Computer vision techniques are capable of detecting human behavior from video sequences. Several state-of-the-art techniques have been proposed for human behavior detection and analysis. However, a collective framework is always required for intelligent human behavior analysis. Therefore, in this chapter, the authors provide a comprehensive understanding towards human behavior detection approaches. The framework of this chapter is based on human detection, human tracking, and human activity recognition, as these are the basic steps of human behavior detection process. The authors provide a detailed discussion over the human behavior detection framework and discuss the feature-descriptor-based approach. Furthermore, they have provided qualitative and quantitative analysis for the detection framework and demonstrate the results for human detection, human tracking, and human activity recognition.


IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 52830-52840
Author(s):  
Huynh The Vu ◽  
Richardt H. Wilkinson ◽  
Margaret Lech ◽  
Eva Cheng

2019 ◽  
Vol 277 ◽  
pp. 02034
Author(s):  
Sophie Aubry ◽  
Sohaib Laraba ◽  
Joëlle Tilmanne ◽  
Thierry Dutoit

In this paper a methodology to recognize actions based on RGB videos is proposed which takes advantages of the recent breakthrough made in deep learning. Following the development of Convolutional Neural Networks (CNNs), research was conducted on the transformation of skeletal motion data into 2D images. In this work, a solution is proposed requiring only the use of RGB videos instead of RGB-D videos. This work is based on multiple works studying the conversion of RGB-D data into 2D images. From a video stream (RGB images), a two-dimension skeleton of 18 joints for each detected body is extracted with a DNN-based human pose estimator called OpenPose. The skeleton data are encoded into Red, Green and Blue channels of images. Different ways of encoding motion data into images were studied. We successfully use state-of-the-art deep neural networks designed for image classification to recognize actions. Based on a study of the related works, we chose to use image classification models: SqueezeNet, AlexNet, DenseNet, ResNet, Inception, VGG and retrained them to perform action recognition. For all the test the NTU RGB+D database is used. The highest accuracy is obtained with ResNet: 83.317% cross-subject and 88.780% cross-view which outperforms most of state-of-the-art results.


Author(s):  
F. Flitti ◽  
M. Bennamoun ◽  
D. Q. Huynh ◽  
R. A. Owens
Keyword(s):  

Author(s):  
Annie Benjamin ◽  
Pradhumn Kanase ◽  
Maitreyee Likhite ◽  
Noella Noronha ◽  
Anita Jadhav
Keyword(s):  
3D Space ◽  

Over the decade’s human detection in security and surveillance system became dynamic research part in computer vision. This concern is focused by wide functions in several areas such as smart surveillance, multiple human interface, human pose characterization, person counting and person identification etc. Video surveillance organism mainly deals with recognition plus classification of moving objects with respect to several actions like walking, talking and hand shaking etc. The specific processing stages of small human group detection and validation includes frame generation, segmentation using hierarchical clustering, To achieve accurate classification feature descriptors namely Multi-Scale Completed Local Binary Pattern (MS-CLBP) and Pyramidal Histogram Of Oriented Gradients (PHOG) are employed to extract the features efficiently, Recurrent Neural Network (RNN) classifier helps to classify the features into human and group in a crowd, To extract statistical features Gray Level Run Length Method (GLRLM) is incorporated which helps in group validation.


Sign in / Sign up

Export Citation Format

Share Document