scholarly journals Natural Language Description of Videos for Smart Surveillance

2021 ◽  
Vol 11 (9) ◽  
pp. 3730
Author(s):  
Aniqa Dilawari ◽  
Muhammad Usman Ghani Khan ◽  
Yasser D. Al-Otaibi ◽  
Zahoor-ur Rehman ◽  
Atta-ur Rahman ◽  
...  

After the September 11 attacks, security and surveillance measures have changed across the globe. Now, surveillance cameras are installed almost everywhere to monitor video footage. Though quite handy, these cameras produce videos in a massive size and volume. The major challenge faced by security agencies is the effort of analyzing the surveillance video data collected and generated daily. Problems related to these videos are twofold: (1) understanding the contents of video streams, and (2) conversion of the video contents to condensed formats, such as textual interpretations and summaries, to save storage space. In this paper, we have proposed a video description framework on a surveillance dataset. This framework is based on the multitask learning of high-level features (HLFs) using a convolutional neural network (CNN) and natural language generation (NLG) through bidirectional recurrent networks. For each specific task, a parallel pipeline is derived from the base visual geometry group (VGG)-16 model. Tasks include scene recognition, action recognition, object recognition and human face specific feature recognition. Experimental results on the TRECViD, UET Video Surveillance (UETVS) and AGRIINTRUSION datasets depict that the model outperforms state-of-the-art methods by a METEOR (Metric for Evaluation of Translation with Explicit ORdering) score of 33.9%, 34.3%, and 31.2%, respectively. Our results show that our framework has distinct advantages over traditional rule-based models for the recognition and generation of natural language descriptions.

2021 ◽  
Author(s):  
Mallappa G. Mendagudli ◽  
K.G. Kharade ◽  
T. Nadana Ravishankar ◽  
K. Vengatesan

Effective methods for video indexing will be more valuable as digital video data continues to grow. It has been years since we’ve seen this level of new multimedia research. The content analysis aims to create high-level descriptions and annotations by treating language and facts as data. Data mining is a technique that seeks out previously unknown facts and patterns in large datasets. A video can include several different kinds of data, such as images, visuals, audio, text, and additional metadata. Thanks to its broad application in various disciplines, like security, education, medicine, research, sports, and entertainment, it is often used differently. Data mining aims to discover and articulate exciting patterns that are hidden in a lot of video footage. While video mining is still in its infancy, data mining is more mature. A considerable amount of research must be done to turn the mined video into usable content


2020 ◽  
pp. 1-12
Author(s):  
Hu Jingchao ◽  
Haiying Zhang

The difficulty in class student state recognition is how to make feature judgments based on student facial expressions and movement state. At present, some intelligent models are not accurate in class student state recognition. In order to improve the model recognition effect, this study builds a two-level state detection framework based on deep learning and HMM feature recognition algorithm, and expands it as a multi-level detection model through a reasonable state classification method. In addition, this study selects continuous HMM or deep learning to reflect the dynamic generation characteristics of fatigue, and designs random human fatigue recognition experiments to complete the collection and preprocessing of EEG data, facial video data, and subjective evaluation data of classroom students. In addition to this, this study discretizes the feature indicators and builds a student state recognition model. Finally, the performance of the algorithm proposed in this paper is analyzed through experiments. The research results show that the algorithm proposed in this paper has certain advantages over the traditional algorithm in the recognition of classroom student state features.


Sensors ◽  
2021 ◽  
Vol 21 (12) ◽  
pp. 4045
Author(s):  
Alessandro Sassu ◽  
Jose Francisco Saenz-Cogollo ◽  
Maurizio Agelli

Edge computing is the best approach for meeting the exponential demand and the real-time requirements of many video analytics applications. Since most of the recent advances regarding the extraction of information from images and video rely on computation heavy deep learning algorithms, there is a growing need for solutions that allow the deployment and use of new models on scalable and flexible edge architectures. In this work, we present Deep-Framework, a novel open source framework for developing edge-oriented real-time video analytics applications based on deep learning. Deep-Framework has a scalable multi-stream architecture based on Docker and abstracts away from the user the complexity of cluster configuration, orchestration of services, and GPU resources allocation. It provides Python interfaces for integrating deep learning models developed with the most popular frameworks and also provides high-level APIs based on standard HTTP and WebRTC interfaces for consuming the extracted video data on clients running on browsers or any other web-based platform.


2020 ◽  
Vol 34 (07) ◽  
pp. 12862-12869
Author(s):  
Shiwen Zhang ◽  
Sheng Guo ◽  
Limin Wang ◽  
Weilin Huang ◽  
Matthew Scott

In this work, we propose Knowledge Integration Networks (referred as KINet) for video action recognition. KINet is capable of aggregating meaningful context features which are of great importance to identifying an action, such as human information and scene context. We design a three-branch architecture consisting of a main branch for action recognition, and two auxiliary branches for human parsing and scene recognition which allow the model to encode the knowledge of human and scene for action recognition. We explore two pre-trained models as teacher networks to distill the knowledge of human and scene for training the auxiliary tasks of KINet. Furthermore, we propose a two-level knowledge encoding mechanism which contains a Cross Branch Integration (CBI) module for encoding the auxiliary knowledge into medium-level convolutional features, and an Action Knowledge Graph (AKG) for effectively fusing high-level context information. This results in an end-to-end trainable framework where the three tasks can be trained collaboratively, allowing the model to compute strong context knowledge efficiently. The proposed KINet achieves the state-of-the-art performance on a large-scale action recognition benchmark Kinetics-400, with a top-1 accuracy of 77.8%. We further demonstrate that our KINet has strong capability by transferring the Kinetics-trained model to UCF-101, where it obtains 97.8% top-1 accuracy.


2021 ◽  
Vol 11 (12) ◽  
pp. 1555
Author(s):  
Gianpaolo Alvari ◽  
Luca Coviello ◽  
Cesare Furlanello

The high level of heterogeneity in Autism Spectrum Disorder (ASD) and the lack of systematic measurements complicate predicting outcomes of early intervention and the identification of better-tailored treatment programs. Computational phenotyping may assist therapists in monitoring child behavior through quantitative measures and personalizing the intervention based on individual characteristics; still, real-world behavioral analysis is an ongoing challenge. For this purpose, we designed EYE-C, a system based on OpenPose and Gaze360 for fine-grained analysis of eye-contact episodes in unconstrained therapist-child interactions via a single video camera. The model was validated on video data varying in resolution and setting, achieving promising performance. We further tested EYE-C on a clinical sample of 62 preschoolers with ASD for spectrum stratification based on eye-contact features and age. By unsupervised clustering, three distinct sub-groups were identified, differentiated by eye-contact dynamics and a specific clinical phenotype. Overall, this study highlights the potential of Artificial Intelligence in categorizing atypical behavior and providing translational solutions that might assist clinical practice.


Author(s):  
Anton Dries ◽  
Angelika Kimmig ◽  
Jesse Davis ◽  
Vaishak Belle ◽  
Luc de Raedt

The ability to solve probability word problems such as those found in introductory discrete mathematics textbooks, is an important cognitive and intellectual skill. In this paper, we develop a two-step end-to-end fully automated approach for solving such questions that is able to automatically provide answers to exercises about probability formulated in natural language.In the first step, a question formulated in natural language is analysed and transformed into a high-level model specified in a declarative language. In the second step, a solution to the high-level model is computed using a probabilistic programming system. On a dataset of 2160 probability problems, our solver is able to correctly answer 97.5% of the questions given a correct model. On the end-to-end evaluation, we are able to answer 12.5% of the questions (or 31.1% if we exclude examples not supported by design).


Author(s):  
Abraham Sanders ◽  
Rachael White ◽  
Lauren Severson ◽  
Rufeng Ma ◽  
Richard McQueen ◽  
...  

In this exploratory study, we scrutinize a database of over 1 million tweets collected across the first five months of 2020 to draw conclusions about public attitudes towards the preventative measure of mask usage during the COVID-19 pandemic. In recent months, a body of literature has emerged to suggest the robustness of trends in online activity as proxies for the epidemiological and sociological impact of COVID-19. We employ natural language processing, clustering and sentiment analysis techniques to organize tweets relating to mask-wearing into high-level themes, then relay narratives for individual clusters through automatic text summarization. We find that topic clustering and visualization based on mask-related Twitter data offers revealing insights into societal perceptions of COVID-19 and techniques for its prevention. We observe that the volume and polarity of mask related tweets has greatly increased. Importantly, the analysis pipeline presented can be leveraged by the health community for the assessment of public response to health interventions in the ongoing global health crisis.


Sign in / Sign up

Export Citation Format

Share Document