scholarly journals Video Scene Information Detection Based on Entity Recognition

2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Hui Qian ◽  
Mengxuan Dai ◽  
Yong Ma ◽  
Jiale Zhao ◽  
Qinghua Liu ◽  
...  

Video situational information detection is widely used in the fields of video query, character anomaly detection, surveillance analysis, and so on. However, most of the existing researches pay much attention to the subject or video backgrounds, but little attention to the recognition of situational information. What is more, because there is no strong relation between the pixel information and the scene information of video data, it is difficult for computers to obtain corresponding high-level scene information through the low-level pixel information of video data. Video scene information detection is mainly to detect and analyze the multiple features in the video and mark the scenes in the video. It is aimed at automatically extracting video scene information from all kinds of original video data and realizing the recognition of scene information through “comprehensive consideration of pixel information and spatiotemporal continuity.” In order to solve the problem of transforming pixel information into scene information, this paper proposes a video scene information detection method based on entity recognition. This model integrates the spatiotemporal relationship between the video subject and object on the basis of entity recognition, so as to realize the recognition of scene information by establishing mapping relation. The effectiveness and accuracy of the model are verified by simulation experiments with the TV series as experimental data. The accuracy of this model in the simulation experiment can reach more than 85%.

2021 ◽  
Vol 11 (9) ◽  
pp. 3730
Author(s):  
Aniqa Dilawari ◽  
Muhammad Usman Ghani Khan ◽  
Yasser D. Al-Otaibi ◽  
Zahoor-ur Rehman ◽  
Atta-ur Rahman ◽  
...  

After the September 11 attacks, security and surveillance measures have changed across the globe. Now, surveillance cameras are installed almost everywhere to monitor video footage. Though quite handy, these cameras produce videos in a massive size and volume. The major challenge faced by security agencies is the effort of analyzing the surveillance video data collected and generated daily. Problems related to these videos are twofold: (1) understanding the contents of video streams, and (2) conversion of the video contents to condensed formats, such as textual interpretations and summaries, to save storage space. In this paper, we have proposed a video description framework on a surveillance dataset. This framework is based on the multitask learning of high-level features (HLFs) using a convolutional neural network (CNN) and natural language generation (NLG) through bidirectional recurrent networks. For each specific task, a parallel pipeline is derived from the base visual geometry group (VGG)-16 model. Tasks include scene recognition, action recognition, object recognition and human face specific feature recognition. Experimental results on the TRECViD, UET Video Surveillance (UETVS) and AGRIINTRUSION datasets depict that the model outperforms state-of-the-art methods by a METEOR (Metric for Evaluation of Translation with Explicit ORdering) score of 33.9%, 34.3%, and 31.2%, respectively. Our results show that our framework has distinct advantages over traditional rule-based models for the recognition and generation of natural language descriptions.


2021 ◽  
Vol 13 (3) ◽  
pp. 72
Author(s):  
Shengbo Chen ◽  
Hongchang Zhang ◽  
Zhou Lei

Person re-identification (ReID) plays a significant role in video surveillance analysis. In the real world, due to illumination, occlusion, and deformation, pedestrian features extraction is the key to person ReID. Considering the shortcomings of existing methods in pedestrian features extraction, a method based on attention mechanism and context information fusion is proposed. A lightweight attention module is introduced into ResNet50 backbone network equipped with a small number of network parameters, which enhance the significant characteristics of person and suppress irrelevant information. Aiming at the problem of person context information loss due to the over depth of the network, a context information fusion module is designed to sample the shallow feature map of pedestrians and cascade with the high-level feature map. In order to improve the robustness, the model is trained by combining the loss of margin sample mining with the loss function of cross entropy. Experiments are carried out on datasets Market1501 and DukeMTMC-reID, our method achieves rank-1 accuracy of 95.9% on the Market1501 dataset, and 90.1% on the DukeMTMC-reID dataset, outperforming the current mainstream method in case of only using global feature.


Sensors ◽  
2021 ◽  
Vol 21 (12) ◽  
pp. 4045
Author(s):  
Alessandro Sassu ◽  
Jose Francisco Saenz-Cogollo ◽  
Maurizio Agelli

Edge computing is the best approach for meeting the exponential demand and the real-time requirements of many video analytics applications. Since most of the recent advances regarding the extraction of information from images and video rely on computation heavy deep learning algorithms, there is a growing need for solutions that allow the deployment and use of new models on scalable and flexible edge architectures. In this work, we present Deep-Framework, a novel open source framework for developing edge-oriented real-time video analytics applications based on deep learning. Deep-Framework has a scalable multi-stream architecture based on Docker and abstracts away from the user the complexity of cluster configuration, orchestration of services, and GPU resources allocation. It provides Python interfaces for integrating deep learning models developed with the most popular frameworks and also provides high-level APIs based on standard HTTP and WebRTC interfaces for consuming the extracted video data on clients running on browsers or any other web-based platform.


2021 ◽  
Vol 11 (12) ◽  
pp. 1555
Author(s):  
Gianpaolo Alvari ◽  
Luca Coviello ◽  
Cesare Furlanello

The high level of heterogeneity in Autism Spectrum Disorder (ASD) and the lack of systematic measurements complicate predicting outcomes of early intervention and the identification of better-tailored treatment programs. Computational phenotyping may assist therapists in monitoring child behavior through quantitative measures and personalizing the intervention based on individual characteristics; still, real-world behavioral analysis is an ongoing challenge. For this purpose, we designed EYE-C, a system based on OpenPose and Gaze360 for fine-grained analysis of eye-contact episodes in unconstrained therapist-child interactions via a single video camera. The model was validated on video data varying in resolution and setting, achieving promising performance. We further tested EYE-C on a clinical sample of 62 preschoolers with ASD for spectrum stratification based on eye-contact features and age. By unsupervised clustering, three distinct sub-groups were identified, differentiated by eye-contact dynamics and a specific clinical phenotype. Overall, this study highlights the potential of Artificial Intelligence in categorizing atypical behavior and providing translational solutions that might assist clinical practice.


Author(s):  
Min Chen

The fast proliferation of video data archives has increased the need for automatic video content analysis and semantic video retrieval. Since temporal information is critical in conveying video content, in this chapter, an effective temporal-based event detection framework is proposed to support high-level video indexing and retrieval. The core is a temporal association mining process that systematically captures characteristic temporal patterns to help identify and define interesting events. This framework effectively tackles the challenges caused by loose video structure and class imbalance issues. One of the unique characteristics of this framework is that it offers strong generality and extensibility with the capability of exploring representative event patterns with little human interference. The temporal information and event detection results can then be input into our proposed distributed video retrieval system to support the high-level semantic querying, selective video browsing and event-based video retrieval.


Author(s):  
Min Chen

The fast proliferation of video data archives has increased the need for automatic video content analysis and semantic video retrieval. Since temporal information is critical in conveying video content, in this chapter, an effective temporal-based event detection framework is proposed to support high-level video indexing and retrieval. The core is a temporal association mining process that systematically captures characteristic temporal patterns to help identify and define interesting events. This framework effectively tackles the challenges caused by loose video structure and class imbalance issues. One of the unique characteristics of this framework is that it offers strong generality and extensibility with the capability of exploring representative event patterns with little human interference. The temporal information and event detection results can then be input into our proposed distributed video retrieval system to support the high-level semantic querying, selective video browsing and event-based video retrieval.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Ritaban Dutta ◽  
Cherry Chen ◽  
David Renshaw ◽  
Daniel Liang

AbstractExtraordinary shape recovery capabilities of shape memory alloys (SMAs) have made them a crucial building block for the development of next-generation soft robotic systems and associated cognitive robotic controllers. In this study we desired to determine whether combining video data analysis techniques with machine learning techniques could develop a computer vision based predictive system to accurately predict force generated by the movement of a SMA body that is capable of a multi-point actuation performance. We identified that rapid video capture of the bending movements of a SMA body while undergoing external electrical excitements and adapting that characterisation using computer vision approach into a machine learning model, can accurately predict the amount of actuation force generated by the body. This is a fundamental area for achieving a superior control of the actuation of SMA bodies. We demonstrate that a supervised machine learning framework trained with Restricted Boltzmann Machine (RBM) inspired features extracted from 45,000 digital thermal infrared video frames captured during excitement of various SMA shapes, is capable to estimate and predict force and stress with 93% global accuracy with very low false negatives and high level of predictive generalisation.


Semantic Web ◽  
2020 ◽  
pp. 1-25
Author(s):  
Ashish Singh Patel ◽  
Giovanni Merlino ◽  
Dario Bruneo ◽  
Antonio Puliafito ◽  
O.P. Vyas ◽  
...  

Storage and analysis of video surveillance data is a significant challenge, requiring video interpretation and event detection in the relevant context. To perform this task, the low-level features including shape, texture, and color information are extracted and represented in symbolic forms. In this work, a methodology is proposed, which extracts the salient features and properties using machine learning techniques and represent this information as Linked Data using a domain ontology that is explicitly tailored for detection of certain activities. An ontology is also developed to include concepts and properties which may be applicable in the domain of surveillance and its applications. The proposed approach is validated with actual implementation and is thus evaluated by recognizing suspicious activity in an open parking space. The suspicious activity detection is formalized through inference rules and SPARQL queries. Eventually, Semantic Web Technology has proven to be a remarkable toolchain to interpret videos, thus opening novel possibilities for video scene representation, and detection of complex events, without any human involvement. The proposed novel approach can thus have representation of frame-level information of a video in structured representation and perform event detection while reducing storage and enhancing semantically-aided retrieval of video data.


Author(s):  
Maria Torres Vega ◽  
Vittorio Sguazzo ◽  
Decebal Constantin Mocanu ◽  
Antonio Liotta

Purpose The Video Quality Metric (VQM) is one of the most used objective methods to assess video quality, because of its high correlation with the human visual system (HVS). VQM is, however, not viable in real-time deployments such as mobile streaming, not only due to its high computational demands but also because, as a Full Reference (FR) metric, it requires both the original video and its impaired counterpart. In contrast, No Reference (NR) objective algorithms operate directly on the impaired video and are considerably faster but loose out in accuracy. The purpose of this paper is to study how differently NR metrics perform in the presence of network impairments. Design/methodology/approach The authors assess eight NR metrics, alongside a lightweight FR metric, using VQM as benchmark in a self-developed network-impaired video data set. This paper covers a range of methods, a diverse set of video types and encoding conditions and a variety of network impairment test-cases. Findings The authors show the extent by which packet loss affects different video types, correlating the accuracy of NR metrics to the FR benchmark. This paper helps identifying the conditions under which simple metrics may be used effectively and indicates an avenue to control the quality of streaming systems. Originality/value Most studies in literature have focused on assessing streams that are either unaffected by the network (e.g. looking at the effects of video compression algorithms) or are affected by synthetic network impairments (i.e. via simulated network conditions). The authors show that when streams are affected by real network conditions, assessing Quality of Experience becomes even harder, as the existing metrics perform poorly.


Author(s):  
Mathias Bärtl

To this date, it is difficult to find high-level statistics on YouTube that paint a fair picture of the platform in its entirety. This study attempts to provide an overall characterization of YouTube, based on a random sample of channel and video data, by showing how video provision and consumption evolved over the course of the past 10 years. It demonstrates stark contrasts between video genres in terms of channels, uploads and views, and that a vast majority of on average 85% of all views goes to a small minority of 3% of all channels. The analytical results give evidence that older channels have a significantly higher probability to garner a large viewership, but also show that there has always been a small chance for young channels to become successful quickly, depending on whether they choose their genre wisely.


Sign in / Sign up

Export Citation Format

Share Document