Learning for action-based scene understanding

A deep understanding of our visual world is more than an isolated perception on a series of objects, and the relationships between them also contain rich semantic information. Especially for those satellite remote sensing images, the span is so large that the various objects are always of different sizes and complex spatial compositions. Therefore, the recognition of semantic relations is conducive to strengthen the understanding of remote sensing scenes. In this paper, we propose a novel multi-scale semantic fusion network (MSFN). In this framework, dilated convolution is introduced into a graph convolutional network (GCN) based on an attentional mechanism to fuse and refine multi-scale semantic context, which is crucial to strengthen the cognitive ability of our model Besides, based on the mapping between visual features and semantic embeddings, we design a sparse relationship extraction module to remove meaningless connections among entities and improve the efficiency of scene graph generation. Meanwhile, to further promote the research of scene understanding in remote sensing field, this paper also proposes a remote sensing scene graph dataset (RSSGD). We carry out extensive experiments and the results show that our model significantly outperforms previous methods on scene graph generation. In addition, RSSGD effectively bridges the huge semantic gap between low-level perception and high-level cognition of remote sensing images.

Download Full-text

Semantics-Driven Remote Sensing Scene Understanding Framework for Grounded Spatio-Contextual Scene Descriptions

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10010032 ◽

2021 ◽

Vol 10 (1) ◽

pp. 32

Author(s):

Abhishek V. Potnis ◽

Surya S. Durbha ◽

Rajat C. Shinde

Keyword(s):

Remote Sensing ◽

Domain Knowledge ◽

Situational Awareness ◽

Scene Understanding ◽

Graph Representation ◽

Observation Data ◽

Test Scenario ◽

Semantic Web Technologies ◽

Earth Observation Data ◽

Contextual Domain

Earth Observation data possess tremendous potential in understanding the dynamics of our planet. We propose the Semantics-driven Remote Sensing Scene Understanding (Sem-RSSU) framework for rendering comprehensive grounded spatio-contextual scene descriptions for enhanced situational awareness. To minimize the semantic gap for remote-sensing-scene understanding, the framework puts forward the transformation of scenes by using semantic-web technologies to Remote Sensing Scene Knowledge Graphs (RSS-KGs). The knowledge-graph representation of scenes has been formalized through the development of a Remote Sensing Scene Ontology (RSSO)—a core ontology for an inclusive remote-sensing-scene data product. The RSS-KGs are enriched both spatially and contextually, using a deductive reasoner, by mining for implicit spatio-contextual relationships between land-cover classes in the scenes. The Sem-RSSU, at its core, constitutes novel Ontology-driven Spatio-Contextual Triple Aggregation and realization algorithms to transform KGs to render grounded natural language scene descriptions. Considering the significance of scene understanding for informed decision-making from remote sensing scenes during a flood, we selected it as a test scenario, to demonstrate the utility of this framework. In that regard, a contextual domain knowledge encompassing Flood Scene Ontology (FSO) has been developed. Extensive experimental evaluations show promising results, further validating the efficacy of this framework.

Download Full-text

Understanding Natural Disaster Scenes from Mobile Images Using Deep Learning

Applied Sciences ◽

10.3390/app11093952 ◽

2021 ◽

Vol 11 (9) ◽

pp. 3952

Author(s):

Shimin Tang ◽

Zhiqiang Chen

Keyword(s):

Deep Learning ◽

Natural Disaster ◽

Scene Understanding ◽

Computing Methods ◽

Classification Model ◽

Learning Approach ◽

Learning Models ◽

Damage Level ◽

Feature Extractor ◽

Mobile Imaging

With the ubiquitous use of mobile imaging devices, the collection of perishable disaster-scene data has become unprecedentedly easy. However, computing methods are unable to understand these images with significant complexity and uncertainties. In this paper, the authors investigate the problem of disaster-scene understanding through a deep-learning approach. Two attributes of images are concerned, including hazard types and damage levels. Three deep-learning models are trained, and their performance is assessed. Specifically, the best model for hazard-type prediction has an overall accuracy (OA) of 90.1%, and the best damage-level classification model has an explainable OA of 62.6%, upon which both models adopt the Faster R-CNN architecture with a ResNet50 network as a feature extractor. It is concluded that hazard types are more identifiable than damage levels in disaster-scene images. Insights are revealed, including that damage-level recognition suffers more from inter- and intra-class variations, and the treatment of hazard-agnostic damage leveling further contributes to the underlying uncertainties.

Download Full-text

RegionNet: Region-feature-enhanced 3D Scene Understanding Network with Dual Spatial-aware Discriminative Loss

2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) ◽

10.1109/iros45743.2020.9340725 ◽

2020 ◽

Author(s):

Guanghui Zhang ◽

Dongchen Zhu ◽

Xiaoqing Ye ◽

Wenjun Shi ◽

Minghong Chen ◽

...

Keyword(s):

Scene Understanding ◽

3D Scene ◽

Region Feature

Download Full-text

Knowledge-based generative adversarial networks for scene understanding in Cultural Heritage

Journal of Archaeological Science Reports ◽

10.1016/j.jasrep.2020.102736 ◽

2021 ◽

Vol 35 ◽

pp. 102736

Author(s):

Raissa Garozzo ◽

Cettina Santagati ◽

Concetto Spampinato ◽

Giuseppe Vecchio

Keyword(s):

Cultural Heritage ◽

Scene Understanding ◽

Generative Adversarial Networks ◽

Knowledge Based ◽

Adversarial Networks

Download Full-text

SIGGRAPH Asia 2014 Indoor Scene Understanding Where Graphics Meets Vision on - SIGGRAPH ASIA '14

10.1145/2670291 ◽

2014 ◽

Keyword(s):

Scene Understanding ◽

Indoor Scene

Download Full-text

Human-Like Traffic Scene Understanding System: A Survey

IEEE Industrial Electronics Magazine ◽

10.1109/mie.2020.2970790 ◽

2020 ◽

pp. 0-0

Author(s):

Zi-Xiang Xia ◽

Wei-Cheng Lai ◽

Li-Wu Tsao ◽

Lien-Feng Hsu ◽

Chih-Chia Hu Yu ◽

...

Keyword(s):

Scene Understanding ◽

System A

Download Full-text

Investigating the utility of VR for spatial understanding in surgical planning: evaluation of head-mounted to desktop display

Scientific Reports ◽

10.1038/s41598-021-92536-x ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Georges Hattab ◽

Adamantini Hatzipanayioti ◽

Anna Klimova ◽

Micha Pfeiffer ◽

Peter Klausing ◽

...

Keyword(s):

Surgical Planning ◽

Spatial Information ◽

Scene Understanding ◽

3D Models ◽

Target Object ◽

Learning Condition ◽

Estimation Task ◽

Head Mounted Display ◽

Direction Estimation ◽

Spatial Understanding

AbstractRecent technological advances have made Virtual Reality (VR) attractive in both research and real world applications such as training, rehabilitation, and gaming. Although these other fields benefited from VR technology, it remains unclear whether VR contributes to better spatial understanding and training in the context of surgical planning. In this study, we evaluated the use of VR by comparing the recall of spatial information in two learning conditions: a head-mounted display (HMD) and a desktop screen (DT). Specifically, we explored (a) a scene understanding and then (b) a direction estimation task using two 3D models (i.e., a liver and a pyramid). In the scene understanding task, participants had to navigate the rendered the 3D models by means of rotation, zoom and transparency in order to substantially identify the spatial relationships among its internal objects. In the subsequent direction estimation task, participants had to point at a previously identified target object, i.e., internal sphere, on a materialized 3D-printed version of the model using a tracked pointing tool. Results showed that the learning condition (HMD or DT) did not influence participants’ memory and confidence ratings of the models. In contrast, the model type, that is, whether the model to be recalled was a liver or a pyramid significantly affected participants’ memory about the internal structure of the model. Furthermore, localizing the internal position of the target sphere was also unaffected by participants’ previous experience of the model via HMD or DT. Overall, results provide novel insights on the use of VR in a surgical planning scenario and have paramount implications in medical learning by shedding light on the mental model we make to recall spatial structures.

Download Full-text

Investigating automatic vectorization for real-time 3D scene understanding

Proceedings of the 2018 4th Workshop on Programming Models for SIMD/Vector Processing - WPMVP'18 ◽

10.1145/3178433.3178438 ◽

2018 ◽

Author(s):

Alexandru Nica ◽

Emanuele Vespa ◽

Pablo Gonzélez de Aledo ◽

Paul H. J. Kelly

Keyword(s):

Real Time ◽

Scene Understanding ◽

3D Scene

Download Full-text