scholarly journals Mono3D++: Monocular 3D Vehicle Detection with Two-Scale 3D Hypotheses and Task Priors

Author(s):  
Tong He ◽  
Stefano Soatto

We present a method to infer 3D pose and shape of vehicles from a single image. To tackle this ill-posed problem, we optimize two-scale projection consistency between the generated 3D hypotheses and their 2D pseudo-measurements. Specifically, we use a morphable wireframe model to generate a fine-scaled representation of vehicle shape and pose. To reduce its sensitivity to 2D landmarks, we jointly model the 3D bounding box as a coarse representation which improves robustness. We also integrate three task priors, including unsupervised monocular depth, a ground plane constraint as well as vehicle shape priors, with forward projection errors into an overall energy function.

Sensors ◽  
2020 ◽  
Vol 21 (1) ◽  
pp. 54
Author(s):  
Peng Liu ◽  
Zonghua Zhang ◽  
Zhaozong Meng ◽  
Nan Gao

Depth estimation is a crucial component in many 3D vision applications. Monocular depth estimation is gaining increasing interest due to flexible use and extremely low system requirements, but inherently ill-posed and ambiguous characteristics still cause unsatisfactory estimation results. This paper proposes a new deep convolutional neural network for monocular depth estimation. The network applies joint attention feature distillation and wavelet-based loss function to recover the depth information of a scene. Two improvements were achieved, compared with previous methods. First, we combined feature distillation and joint attention mechanisms to boost feature modulation discrimination. The network extracts hierarchical features using a progressive feature distillation and refinement strategy and aggregates features using a joint attention operation. Second, we adopted a wavelet-based loss function for network training, which improves loss function effectiveness by obtaining more structural details. The experimental results on challenging indoor and outdoor benchmark datasets verified the proposed method’s superiority compared with current state-of-the-art methods.


Author(s):  
L. Madhuanand ◽  
F. Nex ◽  
M. Y. Yang

Abstract. Depth is an essential component for various scene understanding tasks and for reconstructing the 3D geometry of the scene. Estimating depth from stereo images requires multiple views of the same scene to be captured which is often not possible when exploring new environments with a UAV. To overcome this monocular depth estimation has been a topic of interest with the recent advancements in computer vision and deep learning techniques. This research has been widely focused on indoor scenes or outdoor scenes captured at ground level. Single image depth estimation from aerial images has been limited due to additional complexities arising from increased camera distance, wider area coverage with lots of occlusions. A new aerial image dataset is prepared specifically for this purpose combining Unmanned Aerial Vehicles (UAV) images covering different regions, features and point of views. The single image depth estimation is based on image reconstruction techniques which uses stereo images for learning to estimate depth from single images. Among the various available models for ground-level single image depth estimation, two models, 1) a Convolutional Neural Network (CNN) and 2) a Generative Adversarial model (GAN) are used to learn depth from aerial images from UAVs. These models generate pixel-wise disparity images which could be converted into depth information. The generated disparity maps from these models are evaluated for its internal quality using various error metrics. The results show higher disparity ranges with smoother images generated by CNN model and sharper images with lesser disparity range generated by GAN model. The produced disparity images are converted to depth information and compared with point clouds obtained using Pix4D. It is found that the CNN model performs better than GAN and produces depth similar to that of Pix4D. This comparison helps in streamlining the efforts to produce depth from a single aerial image.


Sensors ◽  
2020 ◽  
Vol 20 (8) ◽  
pp. 2272 ◽  
Author(s):  
Faisal Khan ◽  
Saqib Salahuddin ◽  
Hossein Javidnia

Monocular depth estimation from Red-Green-Blue (RGB) images is a well-studied ill-posed problem in computer vision which has been investigated intensively over the past decade using Deep Learning (DL) approaches. The recent approaches for monocular depth estimation mostly rely on Convolutional Neural Networks (CNN). Estimating depth from two-dimensional images plays an important role in various applications including scene reconstruction, 3D object-detection, robotics and autonomous driving. This survey provides a comprehensive overview of this research topic including the problem representation and a short description of traditional methods for depth estimation. Relevant datasets and 13 state-of-the-art deep learning-based approaches for monocular depth estimation are reviewed, evaluated and discussed. We conclude this paper with a perspective towards future research work requiring further investigation in monocular depth estimation challenges.


2020 ◽  
Vol 34 (07) ◽  
pp. 11661-11668 ◽  
Author(s):  
Yunfei Liu ◽  
Feng Lu

Many real world vision tasks, such as reflection removal from a transparent surface and intrinsic image decomposition, can be modeled as single image layer separation. However, this problem is highly ill-posed, requiring accurately aligned and hard to collect triplet data to train the CNN models. To address this problem, this paper proposes an unsupervised method that requires no ground truth data triplet in training. At the core of the method are two assumptions about data distributions in the latent spaces of different layers, based on which a novel unsupervised layer separation pipeline can be derived. Then the method can be constructed based on the GANs framework with self-supervision and cycle consistency constraints, etc. Experimental results demonstrate its successfulness in outperforming existing unsupervised methods in both synthetic and real world tasks. The method also shows its ability to solve a more challenging multi-layer separation task.


2017 ◽  
Vol 2017 ◽  
pp. 1-10
Author(s):  
Mingye Ju ◽  
Zhenfei Gu ◽  
Dengyin Zhang ◽  
Haoxing Qin

Single image haze removal has been a challenging task due to its super ill-posed nature. In this paper, we propose a novel single image algorithm that improves the detail and color of such degraded images. More concretely, we redefine a more reliable atmospheric scattering model (ASM) based on our previous work and the atmospheric point spread function (APSF). Further, by taking the haze density spatial feature into consideration, we design a scene-wise APSF kernel prediction mechanism to eliminate the multiple-scattering effect. With the redefined ASM and designed APSF, combined with the existing prior knowledge, the complex dehazing problem can be subtly converted into one-dimensional searching problem, which allows us to directly obtain the scene transmission and thereby recover visually realistic results via the proposed ASM. Experimental results verify that our algorithm outperforms several state-of-the-art dehazing techniques in terms of robustness, effectiveness, and efficiency.


2018 ◽  
Vol 7 (02) ◽  
pp. 23578-23584
Author(s):  
Miss. Anjana Navale ◽  
Prof. Namdev Sawant ◽  
Prof. Umaji Bagal

Single image haze removal has been a challenging problem due to its ill-posed nature. In this paper, we have used a simple but powerful color attenuation prior for haze removal from a single input hazy image. By creating a linear model for modeling the scene depth of the hazy image under this novel prior and learning the parameters of the model with a supervised learning method, the depth information can be well recovered. With the depth map of the hazy image, we can easily estimate the transmission and restore the scene radiance via the atmospheric scattering model, and thus effectively remove the haze from a single image. Experimental results show that the proposed approach outperforms state-of-the-art haze removal algorithms in terms of both efficiency and the dehazing effect.


2021 ◽  
Vol 38 (5) ◽  
pp. 1485-1493
Author(s):  
Yasasvy Tadepalli ◽  
Meenakshi Kollati ◽  
Swaraja Kuraparthi ◽  
Padmavathi Kora

Monocular depth estimation is a hot research topic in autonomous car driving. Deep convolution neural networks (DCNN) comprising encoder and decoder with transfer learning are exploited in the proposed work for monocular depth map estimation of two-dimensional images. Extracted CNN features from initial stages are later upsampled using a sequence of Bilinear UpSampling and convolution layers to reconstruct the depth map. The encoder forms the feature extraction part, and the decoder forms the image reconstruction part. EfficientNetB0, a new architecture is used with pretrained weights as encoder. It is a revolutionary architecture with smaller model parameters yet achieving higher efficiencies than the architectures of state-of-the-art, pretrained networks. EfficientNet-B0 is compared with two other pretrained networks, the DenseNet-121 and ResNet50 models. Each of these three models are used in encoding stage for features extraction followed by bilinear method of UpSampling in the decoder. The Monocular image is an ill-posed problem and is thus considered as a regression problem. So the metrics used in the proposed work are F1-score, Jaccard score and Mean Actual Error (MAE) etc., between the original and the reconstructed image. The results convey that EfficientNet-B0 outperforms in validation loss, F1-score and Jaccard score compared to DenseNet-121 and ResNet-50 models.


Sensors ◽  
2021 ◽  
Vol 21 (23) ◽  
pp. 7922
Author(s):  
Xin Jiang ◽  
Chunlei Zhao ◽  
Ming Zhu ◽  
Zhicheng Hao ◽  
Wen Gao

Single image dehazing is a highly challenging ill-posed problem. Existing methods including both prior-based and learning-based heavily rely on the conceptual simplified atmospheric scattering model by estimating the so-called medium transmission map and atmospheric light. However, the formation of haze in the real world is much more complicated and inaccurate estimations further degrade the dehazing performance with color distortion, artifacts and insufficient haze removal. Moreover, most dehazing networks treat spatial-wise and channel-wise features equally, but haze is practically unevenly distributed across an image, thus regions with different haze concentrations require different attentions. To solve these problems, we propose an end-to-end trainable densely connected residual spatial and channel attention network based on the conditional generative adversarial framework to directly restore a haze-free image from an input hazy image, without explicitly estimation of any atmospheric scattering parameters. Specifically, a novel residual attention module is proposed by combining spatial attention and channel attention mechanism, which could adaptively recalibrate spatial-wise and channel-wise feature weights by considering interdependencies among spatial and channel information. Such a mechanism allows the network to concentrate on more useful pixels and channels. Meanwhile, the dense network can maximize the information flow along features from different levels to encourage feature reuse and strengthen feature propagation. In addition, the network is trained with a multi-loss function, in which contrastive loss and registration loss are novel refined to restore sharper structures and ensure better visual quality. Experimental results demonstrate that the proposed method achieves the state-of-the-art performance on both public synthetic datasets and real-world images with more visually pleasing dehazed results.


2020 ◽  
Vol 2020 (14) ◽  
pp. 377-1-377-7
Author(s):  
Bruno Artacho ◽  
Nilesh Pandey ◽  
Andreas Savakis

Monocular depth estimation is an important task in scene understanding with applications to pose, segmentation and autonomous navigation. Deep Learning methods relying on multilevel features are currently used for extracting local information that is used to infer depth from a single RGB image. We present an efficient architecture that utilizes the features from multiple levels with fewer connections compared to previous networks. Our model achieves comparable scores for monocular depth estimation with better efficiency on the memory requirements and computational burden.


Sign in / Sign up

Export Citation Format

Share Document