scene representation Latest Research Papers

Improving pedestrian safety at urban intersections requires intelligent systems that should not only understand the actual vehicle–pedestrian (V2P) interaction state but also proactively anticipate the event’s future severity pattern. This paper presents a Gated Recurrent Unit-based system that aims to predict, up to 3 s ahead in time, the severity level of V2P encounters, depending on the current scene representation drawn from on-board radars’ data. A car-driving simulator experiment has been designed to collect sequential mobility features on a cohort of 65 licensed university students who faced different V2P conflicts on a planned urban route. To accurately describe the pedestrian safety condition during the encounter process, a combination of surrogate safety indicators, namely TAdv (Time Advantage) and T2 (Nearness of the Encroachment), are considered for modeling. Due to the nature of these indicators, multiple recurrent neural networks are trained to separately predict T2 continuous values and TAdv categories. Afterwards, their predictions are exploited to label serious conflict interactions. As a comparison, an additional Gated Recurrent Unit (GRU) neural network is developed to directly predict the severity level of inner-city encounters. The latter neural model reaches the best performance on the test set, scoring a recall value of 0.899. Based on selected threshold values, the presented models can be used to label pedestrians near accident events and to enhance existing intelligent driving systems.

Download Full-text

An Attention-Guided Multilayer Feature Aggregation Network for Remote Sensing Image Scene Classification

Remote Sensing ◽

10.3390/rs13163113 ◽

2021 ◽

Vol 13 (16) ◽

pp. 3113

Author(s):

Ming Li ◽

Lin Lei ◽

Yuqi Tang ◽

Yuli Sun ◽

Gangyao Kuang

Keyword(s):

Remote Sensing ◽

Feature Learning ◽

Remote Sensing Image ◽

Classification Performance ◽

Learning Ability ◽

Scene Classification ◽

Feature Maps ◽

Feature Aggregation ◽

Scene Representation ◽

High Level

Remote sensing image scene classification (RSISC) has broad application prospects, but related challenges still exist and urgently need to be addressed. One of the most important challenges is how to learn a strong discriminative scene representation. Recently, convolutional neural networks (CNNs) have shown great potential in RSISC due to their powerful feature learning ability; however, their performance may be restricted by the complexity of remote sensing images, such as spatial layout, varying scales, complex backgrounds, category diversity, etc. In this paper, we propose an attention-guided multilayer feature aggregation network (AGMFA-Net) that attempts to improve the scene classification performance by effectively aggregating features from different layers. Specifically, to reduce the discrepancies between different layers, we employed the channel–spatial attention on multiple high-level convolutional feature maps to capture more accurately semantic regions that correspond to the content of the given scene. Then, we utilized the learned semantic regions as guidance to aggregate the valuable information from multilayer convolutional features, so as to achieve stronger scene features for classification. Experimental results on three remote sensing scene datasets indicated that our approach achieved competitive classification performance in comparison to the baselines and other state-of-the-art methods.

Download Full-text

Curvature and Entropy Statistics-Based Blind Multi-Exposure Fusion Image Quality Assessment

Symmetry ◽

10.3390/sym13081446 ◽

2021 ◽

Vol 13 (8) ◽

pp. 1446

Author(s):

Zhouyan He ◽

Yang Song ◽

Caiming Zhong ◽

Li Li

Keyword(s):

Image Quality ◽

Quality Assessment ◽

Image Quality Assessment ◽

Transformation Process ◽

Feature Representation ◽

Exposure Fusion ◽

Spatial Entropy ◽

Scene Representation ◽

Related Quality ◽

Entropy Statistics

The multi-exposure fusion (MEF) technique provides humans a new opportunity for natural scene representation, and the related quality assessment issues are urgent to be considered for validating the effectiveness of these techniques. In this paper, a curvature and entropy statistics-based blind MEF image quality assessment (CE-BMIQA) method is proposed to perceive the quality degradation objectively. The transformation process from multiple images with different exposure levels to the final MEF image leads to the loss of structure and detail information, so that the related curvature statistics features and entropy statistics features are utilized to portray the above distortion presentation. The former features are extracted from the histogram statistics of surface type map calculated by mean curvature and Gaussian curvature of MEF image. Moreover, contrast energy weighting is attached to consider the contrast variation of the MEF image. The latter features refer to spatial entropy and spectral entropy. All extracted features based on a multi-scale scheme are aggregated by training the quality regression model via random forest. Since the MEF image and its feature representation are spatially symmetric in physics, the final prediction quality is symmetric to and representative of the image distortion. Experimental results on a public MEF image database demonstrate that the proposed CE-BMIQA method achieves more outstanding performance than the state-of-the-art blind image quality assessment ones.

Download Full-text

Learning Sufficient Scene Representation for Unsupervised Cross-modal Retrieval

Neurocomputing ◽

10.1016/j.neucom.2021.07.078 ◽

2021 ◽

Author(s):

Jieting Luo ◽

Yan Wo ◽

Bicheng Wu ◽

Guoqiang Han

Keyword(s):

Scene Representation

Download Full-text

ENHANCING GEOMETRIC EDGE DETAILS IN MVS RECONSTRUCTION

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xliii-b2-2021-391-2021 ◽

2021 ◽

Vol XLIII-B2-2021 ◽

pp. 391-398

Author(s):

E. K. Stathopoulou ◽

S. Rigon ◽

R. Battisti ◽

F. Remondino

Keyword(s):

Mesh Generation ◽

Point Cloud ◽

State Of The Art ◽

Built Environments ◽

Edge Extraction ◽

Scene Representation ◽

Dense Point ◽

Mesh Reconstruction ◽

3D Edge Detection ◽

Mesh Models

Abstract. Mesh models generated by multi view stereo (MVS) algorithms often fail to represent in an adequate manner the sharp, natural edge details of the scene. The harsh depth discontinuities of edge regions are eventually a challenging task for dense reconstruction, while vertex displacement during mesh refinement frequently leads to smoothed edges that do not coincide with the fine details of the scene. Meanwhile, 3D edges have been used for scene representation, particularly man-made built environments, which are dominated by regular planar and linear structures. Indeed, 3D edge detection and matching are commonly exploited either to constrain camera pose estimation, or to generate an abstract representation of the most salient parts of the scene, and even to support mesh reconstruction. In this work, we attempt to jointly use 3D edge extraction and MVS mesh generation to promote edge detail preservation in the final result. Salient 3D edges of the scene are reconstructed with state-of-the-art algorithms and integrated in the dense point cloud to be further used in order to support the mesh triangulation step. Experimental results on benchmark dataset sequences using metric and appearance-based measures are performed in order to evaluate our hypothesis.

Download Full-text

Automatic Indoor Scene Recognition Based on Mandatory and Desirable Objects and a Simple Coding Scheme

10.21203/rs.3.rs-474393/v1 ◽

2021 ◽

Author(s):

Kathirvel N ◽

Thanabal M.S

Keyword(s):

Number System ◽

Recognition System ◽

Object Identification ◽

Scene Recognition ◽

Second Phase ◽

Indoor Scene ◽

Third Stage ◽

Scene Representation ◽

Indoor Scenes ◽

Two Phases

Abstract In this paper, a simple at the same time effective recognition system for indoor scenes is presented. The proposed system has two phases, namely, creation of mandatory and desirable objects and an indoor scene recognition system. In the first phase a list of probable objects and their classification, such as mandatory and desirable objects, for any generic scene is created based on real time indoor environment clubbed with human knowledge on standard datasets. In the second phase, the proposed system contains four stages. In the first stage, the proposed indoor scene recognition system identifies and recognizes the objects of the given key frame based on simplified version of CNN architecture of YOLO v3. In the second stage, the identified objects are divided into two sets of mandatory and desirable objects with a simple dictionary look-up. In the third stage, the objects are identified to belong to a probable scene and this technique is called scene-object identification. Simple algorithms have been proposed to effect the above three stages. In the final stage, a novel Binary Scene Representation (BSR) is proposed for each of the probable scenes and the final scene recognition is obtained with a new scene-number, obtained after converting the binary BSR into decimal number system. The effect of proposed indoor scene recognition system has been experimented with standard input datasets and measured in terms of standard measures, besides comparison with existing schemes. The results are encouraging.

Download Full-text

Online Learning of a Probabilistic and Adaptive Scene Representation

10.1109/cvpr46437.2021.01291 ◽

2021 ◽

Author(s):

Zike Yan ◽

Xin Wang ◽

Hongbin Zha

Keyword(s):

Online Learning ◽

Scene Representation

Download Full-text

Multi-scene Representation Learning with Neural Radiance Fields

Journal of Physics Conference Series ◽

10.1088/1742-6596/1880/1/012034 ◽

2021 ◽

Vol 1880 (1) ◽

pp. 012034

Author(s):

Bofeng Fu ◽

Zheng Wang

Keyword(s):

Representation Learning ◽

Scene Representation

Download Full-text

Role of Deep Learning in Loop Closure Detection for Visual and Lidar SLAM: A Survey

Sensors ◽

10.3390/s21041243 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1243

Author(s):

Saba Arshad ◽

Gon-Woo Kim

Keyword(s):

Deep Learning ◽

Detailed Comparison ◽

Loop Closure ◽

Loop Closure Detection ◽

Detection Algorithms ◽

Mobile Objects ◽

Localization And Mapping ◽

Loop Detection ◽

Scene Representation ◽

Matching Strategy

Loop closure detection is of vital importance in the process of simultaneous localization and mapping (SLAM), as it helps to reduce the cumulative error of the robot’s estimated pose and generate a consistent global map. Many variations of this problem have been considered in the past and the existing methods differ in the acquisition approach of query and reference views, the choice of scene representation, and associated matching strategy. Contributions of this survey are many-fold. It provides a thorough study of existing literature on loop closure detection algorithms for visual and Lidar SLAM and discusses their insight along with their limitations. It presents a taxonomy of state-of-the-art deep learning-based loop detection algorithms with detailed comparison metrics. Also, the major challenges of conventional approaches are identified. Based on those challenges, deep learning-based methods were reviewed where the identified challenges are tackled focusing on the methods providing long-term autonomy in various conditions such as changing weather, light, seasons, viewpoint, and occlusion due to the presence of mobile objects. Furthermore, open challenges and future directions were also discussed.

Download Full-text

scene representation
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Effect of Acoustic Scene Complexity and Visual Scene Representation on Auditory Perception in Virtual Audio-Visual Environments

Surrogate Safety Measures Prediction at Multiple Timescales in V2P Conflicts Based on Gated Recurrent Unit

An Attention-Guided Multilayer Feature Aggregation Network for Remote Sensing Image Scene Classification

Curvature and Entropy Statistics-Based Blind Multi-Exposure Fusion Image Quality Assessment

Learning Sufficient Scene Representation for Unsupervised Cross-modal Retrieval

ENHANCING GEOMETRIC EDGE DETAILS IN MVS RECONSTRUCTION

Automatic Indoor Scene Recognition Based on Mandatory and Desirable Objects and a Simple Coding Scheme

Online Learning of a Probabilistic and Adaptive Scene Representation

Multi-scene Representation Learning with Neural Radiance Fields

Role of Deep Learning in Loop Closure Detection for Visual and Lidar SLAM: A Survey

Export Citation Format

scene representationRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Effect of Acoustic Scene Complexity and Visual Scene Representation on Auditory Perception in Virtual Audio-Visual Environments

Surrogate Safety Measures Prediction at Multiple Timescales in V2P Conflicts Based on Gated Recurrent Unit

An Attention-Guided Multilayer Feature Aggregation Network for Remote Sensing Image Scene Classification

Curvature and Entropy Statistics-Based Blind Multi-Exposure Fusion Image Quality Assessment

Learning Sufficient Scene Representation for Unsupervised Cross-modal Retrieval

ENHANCING GEOMETRIC EDGE DETAILS IN MVS RECONSTRUCTION

Automatic Indoor Scene Recognition Based on Mandatory and Desirable Objects and a Simple Coding Scheme

Online Learning of a Probabilistic and Adaptive Scene Representation

Multi-scene Representation Learning with Neural Radiance Fields

Role of Deep Learning in Loop Closure Detection for Visual and Lidar SLAM: A Survey

scene representation
Recently Published Documents