scholarly journals High Level 3D Structure Extraction from a Single Image Using a CNN-Based Approach

Sensors ◽  
2019 ◽  
Vol 19 (3) ◽  
pp. 563 ◽  
Author(s):  
J. Osuna-Coutiño ◽  
Jose Martinez-Carranza

High-Level Structure (HLS) extraction in a set of images consists of recognizing 3D elements with useful information to the user or application. There are several approaches to HLS extraction. However, most of these approaches are based on processing two or more images captured from different camera views or on processing 3D data in the form of point clouds extracted from the camera images. In contrast and motivated by the extensive work developed for the problem of depth estimation in a single image, where parallax constraints are not required, in this work, we propose a novel methodology towards HLS extraction from a single image with promising results. For that, our method has four steps. First, we use a CNN to predict the depth for a single image. Second, we propose a region-wise analysis to refine depth estimates. Third, we introduce a graph analysis to segment the depth in semantic orientations aiming at identifying potential HLS. Finally, the depth sections are provided to a new CNN architecture that predicts HLS in the shape of cubes and rectangular parallelepipeds.

Sensors ◽  
2020 ◽  
Vol 20 (20) ◽  
pp. 5765 ◽  
Author(s):  
Seiya Ito ◽  
Naoshi Kaneko ◽  
Kazuhiko Sumi

This paper proposes a novel 3D representation, namely, a latent 3D volume, for joint depth estimation and semantic segmentation. Most previous studies encoded an input scene (typically given as a 2D image) into a set of feature vectors arranged over a 2D plane. However, considering the real world is three-dimensional, this 2D arrangement reduces one dimension and may limit the capacity of feature representation. In contrast, we examine the idea of arranging the feature vectors in 3D space rather than in a 2D plane. We refer to this 3D volumetric arrangement as a latent 3D volume. We will show that the latent 3D volume is beneficial to the tasks of depth estimation and semantic segmentation because these tasks require an understanding of the 3D structure of the scene. Our network first constructs an initial 3D volume using image features and then generates latent 3D volume by passing the initial 3D volume through several 3D convolutional layers. We apply depth regression and semantic segmentation by projecting the latent 3D volume onto a 2D plane. The evaluation results show that our method outperforms previous approaches on the NYU Depth v2 dataset.


Author(s):  
S. Guinard ◽  
L. Landrieu

We consider the problem of the semantic classification of 3D LiDAR point clouds obtained from urban scenes when the training set is limited. We propose a non-parametric segmentation model for urban scenes composed of anthropic objects of simple shapes, partionning the scene into geometrically-homogeneous segments which size is determined by the local complexity. This segmentation can be integrated into a conditional random field classifier (CRF) in order to capture the high-level structure of the scene. For each cluster, this allows us to aggregate the noisy predictions of a weakly-supervised classifier to produce a higher confidence data term. We demonstrate the improvement provided by our method over two publicly-available large-scale data sets.


2017 ◽  
Vol 14 (3) ◽  
pp. 685-699 ◽  
Author(s):  
Roberto de Lima ◽  
Jose Martinez-Carranza ◽  
Alicia Morales-Reyes ◽  
Walterio Mayol-Cuevas

2018 ◽  
Vol 175 ◽  
pp. 03055 ◽  
Author(s):  
Yaoxin Li ◽  
Keyuan Qian ◽  
Tao Huang ◽  
Jingkun Zhou

Depth estimation has achieved considerable success with the development of the depth sensor devices and deep learning method. However, depth estimation from monocular RGB-based image will increase ambiguity and is prone to error. In this paper, we present a novel approach to produce dense depth map from a single image coupled with coarse point-cloud samples. Our approach learns to fit the distribution of the depth map from source data using conditional adversarial networks and convert the sparse point clouds to dense maps. Our experiments show that the use of the conditional adversarial networks can add full image information to the predicted depth maps and the effectiveness of our approach to predict depth in NYU-Depth-v2 indoor dataset.


Sensors ◽  
2020 ◽  
Vol 21 (1) ◽  
pp. 15
Author(s):  
Filippo Aleotti ◽  
Giulio Zaccaroni ◽  
Luca Bartolomei ◽  
Matteo Poggi ◽  
Fabio Tosi ◽  
...  

Depth perception is paramount for tackling real-world problems, ranging from autonomous driving to consumer applications. For the latter, depth estimation from a single image would represent the most versatile solution since a standard camera is available on almost any handheld device. Nonetheless, two main issues limit the practical deployment of monocular depth estimation methods on such devices: (i) the low reliability when deployed in the wild and (ii) the resources needed to achieve real-time performance, often not compatible with low-power embedded systems. Therefore, in this paper, we deeply investigate all these issues, showing how they are both addressable by adopting appropriate network design and training strategies. Moreover, we also outline how to map the resulting networks on handheld devices to achieve real-time performance. Our thorough evaluation highlights the ability of such fast networks to generalize well to new environments, a crucial feature required to tackle the extremely varied contexts faced in real applications. Indeed, to further support this evidence, we report experimental results concerning real-time, depth-aware augmented reality and image blurring with smartphones in the wild.


Mathematics ◽  
2021 ◽  
Vol 9 (9) ◽  
pp. 1022
Author(s):  
Gianluca D’Addese ◽  
Martina Casari ◽  
Roberto Serra ◽  
Marco Villani

In many complex systems one observes the formation of medium-level structures, whose detection could allow a high-level description of the dynamical organization of the system itself, and thus to its better understanding. We have developed in the past a powerful method to achieve this goal, which however requires a heavy computational cost in several real-world cases. In this work we introduce a modified version of our approach, which reduces the computational burden. The design of the new algorithm allowed the realization of an original suite of methods able to work simultaneously at the micro level (that of the binary relationships of the single variables) and at meso level (the identification of dynamically relevant groups). We apply this suite to a particularly relevant case, in which we look for the dynamic organization of a gene regulatory network when it is subject to knock-outs. The approach combines information theory, graph analysis, and an iterated sieving algorithm in order to describe rather complex situations. Its application allowed to derive some general observations on the dynamical organization of gene regulatory networks, and to observe interesting characteristics in an experimental case.


Author(s):  
Mengxi Guo ◽  
Mingtao Chen ◽  
Cong Ma ◽  
Yuan Li ◽  
Xianfeng Li ◽  
...  
Keyword(s):  

Author(s):  
Michael Radermacher ◽  
Teresa Ruiz

Biological samples are radiation-sensitive and require imaging under low-dose conditions to minimize damage. As a result, images contain a high level of noise and exhibit signal-to-noise ratios that are typically significantly smaller than 1. Averaging techniques, either implicit or explicit, are used to overcome the limitations imposed by the high level of noise. Averaging of 2D images showing the same molecule in the same orientation results in highly significant projections. A high-resolution structure can be obtained by combining the information from many single-particle images to determine a 3D structure. Similarly, averaging of multiple copies of macromolecular assembly subvolumes extracted from tomographic reconstructions can lead to a virtually noise-free high-resolution structure. Cross-correlation methods are often used in the alignment and classification steps of averaging processes for both 2D images and 3D volumes. However, the high noise level can bias alignment and certain classification results. While other approaches may be implicitly affected, sensitivity to noise is most apparent in multireference alignments, 3D reference-based projection alignments and projection-based volume alignments. Here, the influence of the image signal-to-noise ratio on the value of the cross-correlation coefficient is analyzed and a method for compensating for this effect is provided.


Author(s):  
F. Politz ◽  
M. Sester

<p><strong>Abstract.</strong> Over the past years, the algorithms for dense image matching (DIM) to obtain point clouds from aerial images improved significantly. Consequently, DIM point clouds are now a good alternative to the established Airborne Laser Scanning (ALS) point clouds for remote sensing applications. In order to derive high-level applications such as digital terrain models or city models, each point within a point cloud must be assigned a class label. Usually, ALS and DIM are labelled with different classifiers due to their varying characteristics. In this work, we explore both point cloud types in a fully convolutional encoder-decoder network, which learns to classify ALS as well as DIM point clouds. As input, we project the point clouds onto a 2D image raster plane and calculate the minimal, average and maximal height values for each raster cell. The network then differentiates between the classes ground, non-ground, building and no data. We test our network in six training setups using only one point cloud type, both point clouds as well as several transfer-learning approaches. We quantitatively and qualitatively compare all results and discuss the advantages and disadvantages of all setups. The best network achieves an overall accuracy of 96<span class="thinspace"></span>% in an ALS and 83<span class="thinspace"></span>% in a DIM test set.</p>


Author(s):  
R. Näsi ◽  
N. Viljanen ◽  
R. Oliveira ◽  
J. Kaivosoja ◽  
O. Niemeläinen ◽  
...  

Light-weight 2D format hyperspectral imagers operable from unmanned aerial vehicles (UAV) have become common in various remote sensing tasks in recent years. Using these technologies, the area of interest is covered by multiple overlapping hypercubes, in other words multiview hyperspectral photogrammetric imagery, and each object point appears in many, even tens of individual hypercubes. The common practice is to calculate hyperspectral orthomosaics utilizing only the most nadir areas of the images. However, the redundancy of the data gives potential for much more versatile and thorough feature extraction. We investigated various options of extracting spectral features in the grass sward quantity evaluation task. In addition to the various sets of spectral features, we used photogrammetry-based ultra-high density point clouds to extract features describing the canopy 3D structure. Machine learning technique based on the Random Forest algorithm was used to estimate the fresh biomass. Results showed high accuracies for all investigated features sets. The estimation results using multiview data provided approximately 10&amp;thinsp;% better results than the most nadir orthophotos. The utilization of the photogrammetric 3D features improved estimation accuracy by approximately 40&amp;thinsp;% compared to approaches where only spectral features were applied. The best estimation RMSE of 239&amp;thinsp;kg/ha (6.0&amp;thinsp;%) was obtained with multiview anisotropy corrected data set and the 3D features.


Sign in / Sign up

Export Citation Format

Share Document