scholarly journals Video Question Answering for Surveillance

Author(s):  
Iqbal Chowdhury ◽  
Kien Nguyen Thanh ◽  
Clinton fookes ◽  
Sridha Sridharan

There are many task in surveillance monitoring such as object detection, person identification, activity and action recognition etc. Integrating variety of surveillance task through a multimodal interactive system will benefit real-life deployment, and will also support human operators. We first introduce a dataset which is first of its kind and named as Surveillance Video Question Answering (SVideoQA) dataset. The multi-camera surveillance monitoring aspect is considered through the multimodal context of Video Question Answering (VideoQA) in the SVideoQA dataset. This paper proposes a deep learning model where VideoQA task on the SVideoQA dataset is attempted to solved in a manner where memory-driven relationship among appearance and motion aspect of the video features are captured. At each level of the relational reasoning respective attentive parts of the context of the motion and appearance features are identified forwarded through frame level and clip level relational reasoning module. Also, respective memories are updated which are again forwarded to the memory-relation module to finally predict the answer word. The proposed memory-driven multilevel relational reasoning is made compatible with the surveillance monitoring task through the incorporation of multi-camera relation module, which is able to capture and reason over the relationships among the video feeds across multiple cameras. Experimental outcome exhibits that the proposed memory-driven multilevel relational reasoning perform significantly better on the open-ended VideoQA task compared to other state-of-the art systems. The proposed method achieves an accuracy of 57\% and 57.6\% respectively for the single-camera and multi-camera task of the SVideoQA dataset.

2020 ◽  
Author(s):  
Iqbal Chowdhury ◽  
Kien Nguyen Thanh ◽  
Clinton fookes ◽  
Sridha Sridharan

There are many task in surveillance monitoring such as object detection, person identification, activity and action recognition etc. Integrating variety of surveillance task through a multimodal interactive system will benefit real-life deployment, and will also support human operators. We first introduce a dataset which is first of its kind and named as Surveillance Video Question Answering (SVideoQA) dataset. The multi-camera surveillance monitoring aspect is considered through the multimodal context of Video Question Answering (VideoQA) in the SVideoQA dataset. This paper proposes a deep learning model where VideoQA task on the SVideoQA dataset is attempted to solved in a manner where memory-driven relationship among appearance and motion aspect of the video features are captured. At each level of the relational reasoning respective attentive parts of the context of the motion and appearance features are identified forwarded through frame level and clip level relational reasoning module. Also, respective memories are updated which are again forwarded to the memory-relation module to finally predict the answer word. The proposed memory-driven multilevel relational reasoning is made compatible with the surveillance monitoring task through the incorporation of multi-camera relation module, which is able to capture and reason over the relationships among the video feeds across multiple cameras. Experimental outcome exhibits that the proposed memory-driven multilevel relational reasoning perform significantly better on the open-ended VideoQA task compared to other state-of-the art systems. The proposed method achieves an accuracy of 57\% and 57.6\% respectively for the single-camera and multi-camera task of the SVideoQA dataset.


2019 ◽  
Author(s):  
Hongyin Luo ◽  
Mitra Mohtarami ◽  
James Glass ◽  
Karthik Krishnamurthy ◽  
Brigitte Richardson

2021 ◽  
Vol 14 (8) ◽  
pp. 1289-1297
Author(s):  
Ziquan Fang ◽  
Lu Pan ◽  
Lu Chen ◽  
Yuntao Du ◽  
Yunjun Gao

Traffic prediction has drawn increasing attention for its ubiquitous real-life applications in traffic management, urban computing, public safety, and so on. Recently, the availability of massive trajectory data and the success of deep learning motivate a plethora of deep traffic prediction studies. However, the existing neural-network-based approaches tend to ignore the correlations between multiple types of moving objects located in the same spatio-temporal traffic area, which is suboptimal for traffic prediction analytics. In this paper, we propose a multi-source deep traffic prediction framework over spatio-temporal trajectory data, termed as MDTP. The framework includes two phases: spatio-temporal feature modeling and multi-source bridging. We present an enhanced graph convolutional network (GCN) model combined with long short-term memory network (LSTM) to capture the spatial dependencies and temporal dynamics of traffic in the feature modeling phase. In the multi-source bridging phase, we propose two methods, Sum and Concat, to connect the learned features from different trajectory data sources. Extensive experiments on two real-life datasets show that MDTP i) has superior efficiency, compared with classical time-series methods, machine learning methods, and state-of-the-art neural-network-based approaches; ii) offers a significant performance improvement over the single-source traffic prediction approach; and iii) performs traffic predictions in seconds even on tens of millions of trajectory data. we develop MDTP + , a user-friendly interactive system to demonstrate traffic prediction analysis.


2020 ◽  
pp. 111-136
Author(s):  
Aaron V. Cicourel

The concept of micro social structure is viewed as a level of predication requiring explicit reference to specific knowledge processes and memory systems initiated and sustained by conscious and unconscious contacts with self and others, including verbal and nonverbal observation of daily life settings. Communal life is enabled by micro-level, affective, cognitive, analogical, and relational reasoning; different types of communicative events; and taken-for-granted normative and tacit knowledge. “Macro social structure” refers to large or enlarged complex forms of organization activities: sociocultural, political-economic, sociohistorical, aggregated micro, behavioral, communicative actions essential for eliciting demographic, sample-survey, and archival historical data that ignores tacit, micro-level phenomena—that is, real-time, real-life, conscious episodic and unconscious procedural memory, colloquial language use, gestural events, documented elicitation procedures, and mundane forms of communal daily life. This chapter examines observed and recorded, moment-to-moment, negotiated elements of behavioral outpatient clinical medicine as it emerges in situated, ethnographic settings. One goal of this chapter is to clarify the micro of the concept of cognitive overload, a cognitive/behavioral obstacle inherent in all communicative, socially organized ecological settings. Participant observation data leverages the temporal and situational comparisons of the method required for the study and explanation of micro social structure. Thus micro social structure is essential for understanding the normative, socially organized, institutionalized macro, complex activities called medical clinics, and hospital settings embedded in abstract meso-structures, such as macro-economic systems.


Sign in / Sign up

Export Citation Format

Share Document