semantic scene
Recently Published Documents


TOTAL DOCUMENTS

154
(FIVE YEARS 67)

H-INDEX

17
(FIVE YEARS 4)

Author(s):  
Daniel Schoepflin ◽  
Karthik Iyer ◽  
Martin Gomse ◽  
Thorsten Schüppstuhl

Abstract Obtaining annotated data for proper training of AI image classifiers remains a challenge for successful deployment in industrial settings. As a promising alternative to handcrafted annotations, synthetic training data generation has grown in popularity. However, in most cases the pipelines used to generate this data are not of universal nature and have to be redesigned for different domain applications. This requires a detailed formulation of the domain through a semantic scene grammar. We aim to present such a grammar that is based on domain knowledge for the production-supplying transport of components in intralogistic settings. We present a use-case analysis for the domain of production supplying logistics and derive a scene grammar, which can be used to formulate similar problem statements in the domain for the purpose of data generation. We demonstrate the use of this grammar to feed a scene generation pipeline and obtain training data for an AI based image classifier.


2021 ◽  
Vol 12 (9) ◽  
pp. 459-469
Author(s):  
D. D. Rukhovich ◽  

In this paper, we propose a novel method of joint 3D object detection and room layout estimation. The proposed method surpasses all existing methods of 3D object detection from monocular images on the indoor SUN RGB-D dataset. Moreover, the proposed method shows competitive results on the ScanNet dataset in multi-view mode. Both these datasets are collected in various residential, administrative, educational and industrial spaces, and altogether they cover almost all possible use cases. Moreover, we are the first to formulate and solve a problem of multi-class 3D object detection from multi-view inputs in indoor scenes. The proposed method can be integrated into the controlling systems of mobile robots. The results of this study can be used to address a navigation task, as well as path planning, capturing and manipulating scene objects, and semantic scene mapping.


2021 ◽  
Author(s):  
Muraleekrishna Gopinathan ◽  
Giang Truong ◽  
Jumana Abu-Khalaf

2021 ◽  
Author(s):  
Hansung Kim ◽  
Luca Remaggi ◽  
Aloisio Dourado ◽  
Teofilo de Campos ◽  
Philip J. B. Jackson ◽  
...  

AbstractAs personalised immersive display systems have been intensely explored in virtual reality (VR), plausible 3D audio corresponding to the visual content is required to provide more realistic experiences to users. It is well known that spatial audio synchronised with visual information improves a sense of immersion but limited research progress has been achieved in immersive audio-visual content production and reproduction. In this paper, we propose an end-to-end pipeline to simultaneously reconstruct 3D geometry and acoustic properties of the environment from a pair of omnidirectional panoramic images. A semantic scene reconstruction and completion method using a deep convolutional neural network is proposed to estimate the complete semantic scene geometry in order to adapt spatial audio reproduction to the scene. Experiments provide objective and subjective evaluations of the proposed pipeline for plausible audio-visual VR reproduction of real scenes.


2021 ◽  
Vol 12 (7) ◽  
pp. 373-384
Author(s):  
D. D. Rukhovich ◽  

In this article, we introduce the task of multi-view RGB-based 3D object detection as an end-to-end optimization problem. In a multi-view formulation of the 3D object detection problem, several images of a static scene are used to detect objects in the scene. To address the 3D object detection problem in a multi-view formulation, we propose a novel 3D object detection method named ImVoxelNet. ImVoxelNet is based on a fully convolutional neural network. Unlike existing 3D object detection methods, ImVoxelNet works directly with 3D representations and does not mediate 3D object detection through 2D object detection. The proposed method accepts multi-view inputs. The number of monocular images in each multi-view input can vary during training and inference; actually, this number might be unique for each multi-view input. Moreover, we propose to treat a single RGB image as a special case of a multi-view input. Accordingly, the proposed method can also accept monocular inputs with no modifications. Through extensive evaluation, we demonstrate that the proposed method successfully handles a variety of outdoor scenes. Specifically, it achieves state-of-the-art results in car detection on KITTI (monocular) and nuScenes (multi-view) benchmarks among all methods that accept RGB images. The proposed method operates in real-time, which makes it possible to integrate it into the navigation systems of autonomous devices. The results of this study can be used to address tasks of navigation, path planning, and semantic scene mapping.


2021 ◽  
Author(s):  
Hao Zou ◽  
Xuemeng Yang ◽  
Tianxin Huang ◽  
Chujuan Zhang ◽  
Yong Liu ◽  
...  
Keyword(s):  

2021 ◽  
Author(s):  
Lixiang Chen ◽  
Radoslaw Martin Cichy ◽  
Daniel Kaiser

AbstractDuring natural vision, objects rarely appear in isolation, but often within a semantically related scene context. Previous studies reported that semantic consistency between objects and scenes facilitates object perception, and that scene-object consistency is reflected in changes in the N300 and N400 components in EEG recordings. Here, we investigate whether these N300/N400 differences are indicative of changes in the cortical representation of objects. In two experiments, we recorded EEG signals while participants viewed semantically consistent or inconsistent objects within a scene; in Experiment 1, these objects were task-irrelevant, while in Experiment 2, they were directly relevant for behavior. In both experiments, we found reliable and comparable N300/400 differences between consistent and inconsistent scene-object combinations. To probe the quality of object representations, we performed multivariate classification analyses, in which we decoded the category of the objects contained in the scene. In Experiment 1, in which the objects were not task-relevant, object category could be decoded from around 100 ms after the object presentation, but no difference in decoding performance was found between consistent and inconsistent objects. By contrast, when the objects were task-relevant in Experiment 2, we found enhanced decoding of semantically consistent, compared to semantically inconsistent, objects. These results show that differences in N300/N400 components related to scene-object consistency do not index changes in cortical object representations, but rather reflect a generic marker of semantic violations. Further, our findings suggest that facilitatory effects between objects and scenes are task-dependent rather than automatic.


Author(s):  
Jie Li ◽  
Laiyan Ding ◽  
Rui Huang

3D semantic scene completion and 2D semantic segmentation are two tightly correlated tasks that are both essential for indoor scene understanding, because they predict the same semantic classes, using positively correlated high-level features. Current methods use 2D features extracted from early-fused RGB-D images for 2D segmentation to improve 3D scene completion. We argue that this sequential scheme does not ensure these two tasks fully benefit each other, and present an Iterative Mutual Enhancement Network (IMENet) to solve them jointly, which interactively refines the two tasks at the late prediction stage. Specifically, two refinement modules are developed under a unified framework for the two tasks. The first is a 2D Deformable Context Pyramid (DCP) module, which receives the projection from the current 3D predictions to refine the 2D predictions. In turn, a 3D Deformable Depth Attention (DDA) module is proposed to leverage the reprojected results from 2D predictions to update the coarse 3D predictions. This iterative fusion happens to the stable high-level features of both tasks at a late stage. Extensive experiments on NYU and NYUCAD datasets verify the effectiveness of the proposed iterative late fusion scheme, and our approach outperforms the state of the art on both 3D semantic scene completion and 2D semantic segmentation.


Sign in / Sign up

Export Citation Format

Share Document