A Novel Approach for Human Silhouette Extraction from Video Data

Author(s):  
Amlan Raychaudhuri ◽  
Satyabrata Maity ◽  
Amlan Chakrabarti ◽  
Debotosh Bhattacharjee
Author(s):  
Daniel Danso Essel ◽  
Ben-Bright Benuwa ◽  
Benjamin Ghansah

Sparse Representation (SR) and Dictionary Learning (DL) based Classifier have shown promising results in classification tasks, with impressive recognition rate on image data. In Video Semantic Analysis (VSA) however, the local structure of video data contains significant discriminative information required for classification. To the best of our knowledge, this has not been fully explored by recent DL-based approaches. Further, similar coding findings are not being realized from video features with the same video category. Based on the foregoing, a novel learning algorithm, Sparsity based Locality-Sensitive Discriminative Dictionary Learning (SLSDDL) for VSA is proposed in this paper. In the proposed algorithm, a discriminant loss function for the category based on sparse coding of the sparse coefficients is introduced into structure of Locality-Sensitive Dictionary Learning (LSDL) algorithm. Finally, the sparse coefficients for the testing video feature sample are solved by the optimized method of SLSDDL and the classification result for video semantic is obtained by minimizing the error between the original and reconstructed samples. The experimental results show that, the proposed SLSDDL significantly improves the performance of video semantic detection compared with state-of-the-art approaches. The proposed approach also shows robustness to diverse video environments, proving the universality of the novel approach.


Author(s):  
Sheila M. Pinto-Cáceres ◽  
Jurandy Almeida ◽  
Vânia P. A. Neris ◽  
M. Cecília C. Baranauskas ◽  
Neucimar J. Leite ◽  
...  

The fast evolution of technology has led to a growing demand for video data, increasing the amount of research into efficient systems to manage those materials. Making efficient use of video information requires that data be accessed in a user-friendly way. Ideally, one would like to perform video search using an intuitive tool. Most of existing browsers for the interactive search of video sequences, however, have employed a too rigid layout to arrange the results, restricting users to explore the results using list- or grid-based layouts. This paper presents a novel approach for the interactive search that displays the result set in a flexible manner. The proposed method is based on a simple and fast algorithm to build video stories and on an effective visual structure to arrange the storyboards, called Clustering Set. It is able to group together videos with similar content and to organize the result set in a well-defined tree. Results from a rigorous empirical comparison with a subjective evaluation show that such a strategy makes the navigation more coherent and engaging to users.


2020 ◽  
Vol 10 (18) ◽  
pp. 6391
Author(s):  
Dien Van Nguyen ◽  
Jaehyuk Choi

Intelligent video analytics systems have come to play an essential role in many fields, including public safety, transportation safety, and many other industrial areas, such as automated tools for data extraction, and analyzing huge datasets, such as multiple live video streams transmitted from a large number of cameras. A key characteristic of such systems is that it is critical to perform real-time analytics so as to provide timely actionable alerts on various tasks, activities, and conditions. Due to the computation-intensive and bandwidth-intensive nature of these operations, however, video analytics servers may not fulfill the requirements when serving a large number of cameras simultaneously. To handle these challenges, we present an edge computing-based system that minimizes the transfer of video data from the surveillance camera feeds on a cloud video analytics server. Based on a novel approach of utilizing the information from the encoded bitstream, the edge can achieve low processing complexity of object tracking in surveillance videos and filter non-motion frames from the list of data that will be forwarded to the cloud server. To demonstrate the effectiveness of our approach, we implemented a video surveillance prototype consisting of edge devices with low computational capacity and a GPU-enabled server. The evaluation results show that our method can efficiently catch the characteristics of the frame and is compatible with the edge-to-cloud platform in terms of accuracy and delay sensitivity. The average processing time of this method is approximately 39 ms/frame with high definition resolution video, which outperforms most of the state-of-the-art methods. In addition to the scenario implementation of the proposed system, the method helps the cloud server reduce 49% of the load of the GPU, 49% that of the CPU, and 55% of the network traffic while maintaining the accuracy of video analytics event detection.


Semantic Web ◽  
2020 ◽  
pp. 1-25
Author(s):  
Ashish Singh Patel ◽  
Giovanni Merlino ◽  
Dario Bruneo ◽  
Antonio Puliafito ◽  
O.P. Vyas ◽  
...  

Storage and analysis of video surveillance data is a significant challenge, requiring video interpretation and event detection in the relevant context. To perform this task, the low-level features including shape, texture, and color information are extracted and represented in symbolic forms. In this work, a methodology is proposed, which extracts the salient features and properties using machine learning techniques and represent this information as Linked Data using a domain ontology that is explicitly tailored for detection of certain activities. An ontology is also developed to include concepts and properties which may be applicable in the domain of surveillance and its applications. The proposed approach is validated with actual implementation and is thus evaluated by recognizing suspicious activity in an open parking space. The suspicious activity detection is formalized through inference rules and SPARQL queries. Eventually, Semantic Web Technology has proven to be a remarkable toolchain to interpret videos, thus opening novel possibilities for video scene representation, and detection of complex events, without any human involvement. The proposed novel approach can thus have representation of frame-level information of a video in structured representation and perform event detection while reducing storage and enhancing semantically-aided retrieval of video data.


2013 ◽  
Vol 22 (05) ◽  
pp. 1360004 ◽  
Author(s):  
YANLI LI ◽  
ZHONG ZHOU ◽  
WEI WU

In this paper, we address the problem of automatically segmenting non-rigid pedestrians in still images. Since this task is well known difficult for any type of model or cue alone, a novel approach utilizing shape, puzzle and appearance cues is presented. The major contribution of this approach lies in the combination of multiple cues to refine pedestrian segmentation successively, which has two characterizations: (1) a shape guided puzzle integration scheme, which extracts pedestrians via assembling puzzles with constraint of a shape template; (2) a pedestrian refinement scheme, which is fulfilled by optimizing an automatically generated trimap that encodes both human silhouette and skeleton. Qualitative and quantitative evaluations on several public datasets verify the approach's effectiveness to various articulated bodies, human appearance and partial occlusion, and that this approach is able to segment pedestrians more accurately than methods based only on appearance or shape cue.


2018 ◽  
Vol 1 (4) ◽  
pp. 42 ◽  
Author(s):  
Sharnil Pandya ◽  
Hemant Ghayvat ◽  
Ketan Kotecha ◽  
Mohammed Awais ◽  
Saeed Akbarzadeh ◽  
...  

The proposed research methodology aims to design a generally implementable framework for providing a house owner/member with the immediate notification of an ongoing theft (unauthorized access to their premises). For this purpose, a rigorous analysis of existing systems was undertaken to identify research gaps. The problems found with existing systems were that they can only identify the intruder after the theft, or cannot distinguish between human and non-human objects. Wireless Sensors Networks (WSNs) combined with the use of Internet of Things (IoT) and Cognitive Internet of Things are expanding smart home concepts and solutions, and their applications. The present research proposes a novel smart home anti-theft system that can detect an intruder, even if they have partially/fully hidden their face using clothing, leather, fiber, or plastic materials. The proposed system can also detect an intruder in the dark using a CCTV camera without night vision capability. The fundamental idea was to design a cost-effective and efficient system for an individual to be able to detect any kind of theft in real-time and provide instant notification of the theft to the house owner. The system also promises to implement home security with large video data handling in real-time. The investigation results validate the success of the proposed system. The system accuracy has been enhanced to 97.01%, 84.13, 78.19%, and 66.5%, in scenarios where a detected intruder had not hidden his/her face, hidden his/her face partially, fully, and was detected in the dark from 85%, 64.13%, 56.70%, and 44.01%.


Electronics ◽  
2021 ◽  
Vol 10 (12) ◽  
pp. 1393
Author(s):  
Luis Brandon Garcia-Ortiz ◽  
Jose Portillo-Portillo ◽  
Aldo Hernandez-Suarez ◽  
Jesus Olivares-Mercado ◽  
Gabriel Sanchez-Perez ◽  
...  

This paper proposes the use of the FASSD-Net model for semantic segmentation of human silhouettes, these silhouettes can later be used in various applications that require specific characteristics of human interaction observed in video sequences for the understanding of human activities or for human identification. These applications are classified as high-level task semantic understanding. Since semantic segmentation is presented as one solution for human silhouette extraction, it is concluded that convolutional neural networks (CNN) have a clear advantage over traditional methods for computer vision, based on their ability to learn the representations of appropriate characteristics for the task of segmentation. In this work, the FASSD-Net model is used as a novel proposal that promises real-time segmentation in high-resolution images exceeding 20 FPS. To evaluate the proposed scheme, we use the Cityscapes database, which consists of sundry scenarios that represent human interaction with its environment (these scenarios show the semantic segmentation of people, difficult to solve, that favors the evaluation of our proposal), To adapt the FASSD-Net model to human silhouette semantic segmentation, the indexes of the 19 classes traditionally proposed for Cityscapes were modified, leaving only two labels: One for the class of interest labeled as person and one for the background. The Cityscapes database includes the category “human” composed for “rider” and “person” classes, in which the rider class contains incomplete human silhouettes due to self-occlusions for the activity or transport used. For this reason, we only train the model using the person class rather than human category. The implementation of the FASSD-Net model with only two classes shows promising results in both a qualitative and quantitative manner for the segmentation of human silhouettes.


Sign in / Sign up

Export Citation Format

Share Document