Intuitive Estimation of Speed using Motion and Monocular Depth Information

Depth estimation is a crucial component in many 3D vision applications. Monocular depth estimation is gaining increasing interest due to flexible use and extremely low system requirements, but inherently ill-posed and ambiguous characteristics still cause unsatisfactory estimation results. This paper proposes a new deep convolutional neural network for monocular depth estimation. The network applies joint attention feature distillation and wavelet-based loss function to recover the depth information of a scene. Two improvements were achieved, compared with previous methods. First, we combined feature distillation and joint attention mechanisms to boost feature modulation discrimination. The network extracts hierarchical features using a progressive feature distillation and refinement strategy and aggregates features using a joint attention operation. Second, we adopted a wavelet-based loss function for network training, which improves loss function effectiveness by obtaining more structural details. The experimental results on challenging indoor and outdoor benchmark datasets verified the proposed method’s superiority compared with current state-of-the-art methods.

Download Full-text

PDANet: Self-Supervised Monocular Depth Estimation Using Perceptual and Data Augmentation Consistency

Applied Sciences ◽

10.3390/app11125383 ◽

2021 ◽

Vol 11 (12) ◽

pp. 5383

Author(s):

Huachen Gao ◽

Xiaoyu Liu ◽

Meixia Qu ◽

Shijie Huang

Keyword(s):

Data Augmentation ◽

State Of The Art ◽

Depth Estimation ◽

Input Image ◽

Depth Information ◽

Disparity Map ◽

Estimation Model ◽

Absolute Relative Error ◽

Texture Region ◽

Monocular Depth

In recent studies, self-supervised learning methods have been explored for monocular depth estimation. They minimize the reconstruction loss of images instead of depth information as a supervised signal. However, existing methods usually assume that the corresponding points in different views should have the same color, which leads to unreliable unsupervised signals and ultimately damages the reconstruction loss during the training. Meanwhile, in the low texture region, it is unable to predict the disparity value of pixels correctly because of the small number of extracted features. To solve the above issues, we propose a network—PDANet—that integrates perceptual consistency and data augmentation consistency, which are more reliable unsupervised signals, into a regular unsupervised depth estimation model. Specifically, we apply a reliable data augmentation mechanism to minimize the loss of the disparity map generated by the original image and the augmented image, respectively, which will enhance the robustness of the image in the prediction of color fluctuation. At the same time, we aggregate the features of different layers extracted by a pre-trained VGG16 network to explore the higher-level perceptual differences between the input image and the generated one. Ablation studies demonstrate the effectiveness of each components, and PDANet shows high-quality depth estimation results on the KITTI benchmark, which optimizes the state-of-the-art method from 0.114 to 0.084, measured by absolute relative error for depth estimation.

Download Full-text

DEEP LEARNING FOR MONOCULAR DEPTH ESTIMATION FROM UAV IMAGES

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-annals-v-2-2020-451-2020 ◽

2020 ◽

Vol V-2-2020 ◽

pp. 451-458

Author(s):

L. Madhuanand ◽

F. Nex ◽

M. Y. Yang

Keyword(s):

Deep Learning ◽

Ground Level ◽

Depth Estimation ◽

Aerial Images ◽

Aerial Image ◽

Depth Information ◽

Single Image ◽

Monocular Depth ◽

Uav Images ◽

Image Depth

Abstract. Depth is an essential component for various scene understanding tasks and for reconstructing the 3D geometry of the scene. Estimating depth from stereo images requires multiple views of the same scene to be captured which is often not possible when exploring new environments with a UAV. To overcome this monocular depth estimation has been a topic of interest with the recent advancements in computer vision and deep learning techniques. This research has been widely focused on indoor scenes or outdoor scenes captured at ground level. Single image depth estimation from aerial images has been limited due to additional complexities arising from increased camera distance, wider area coverage with lots of occlusions. A new aerial image dataset is prepared specifically for this purpose combining Unmanned Aerial Vehicles (UAV) images covering different regions, features and point of views. The single image depth estimation is based on image reconstruction techniques which uses stereo images for learning to estimate depth from single images. Among the various available models for ground-level single image depth estimation, two models, 1) a Convolutional Neural Network (CNN) and 2) a Generative Adversarial model (GAN) are used to learn depth from aerial images from UAVs. These models generate pixel-wise disparity images which could be converted into depth information. The generated disparity maps from these models are evaluated for its internal quality using various error metrics. The results show higher disparity ranges with smoother images generated by CNN model and sharper images with lesser disparity range generated by GAN model. The produced disparity images are converted to depth information and compared with point clouds obtained using Pix4D. It is found that the CNN model performs better than GAN and produces depth similar to that of Pix4D. This comparison helps in streamlining the efforts to produce depth from a single aerial image.

Download Full-text

Development of Infants' Sensitivity to Surface Contour Information for Spatial Layout

Perception ◽

10.1068/p2789 ◽

2001 ◽

Vol 30 (2) ◽

pp. 167-176 ◽

Cited By ~ 19

Author(s):

Maya G Sen ◽

Albert Yonas ◽

David C Knill

Keyword(s):

Age Groups ◽

Spatial Layout ◽

The Other ◽

Surface Shape ◽

Depth Information ◽

Surface Contour ◽

Depth Cue ◽

Contour Information ◽

Monocular Depth ◽

Control Study

The development of sensitivity to a recently discovered static-monocular depth cue to surface shape, surface contours, was investigated. Twenty infants in each of three age groups (5, 5½, and 7 months) viewed a display that creates an illusion, for adult viewers, that what is in fact a frontoparallel cylinder is slanted away in depth, so that one end appears closer than the other. Preferential reaching was recorded in both monocular and binocular conditions. More reaching to the apparently closer end in the monocular than in the binocular condition is evidence of sensitivity. Infants aged 7 months responded to surface contour information, but infants aged 5 and 5 months did not. In a control study, twenty 5-month-old infants reached consistently for the closer ends of cylinders that were actually rotated in depth. As findings with other static-monocular depth information suggest, infants' sensitivity to surface contour information appears to develop at approximately 6 months.

Download Full-text

Unsupervised Monocular Depth Estimation for Colonoscope System Using Feedback Network

Sensors ◽

10.3390/s21082691 ◽

2021 ◽

Vol 21 (8) ◽

pp. 2691

Author(s):

Seung-Jun Hwang ◽

Sung-Jun Park ◽

Gyu-Min Kim ◽

Joong-Hwan Baek

Keyword(s):

State Of The Art ◽

Qualitative Evaluation ◽

Depth Estimation ◽

Depth Information ◽

Polyp Detection ◽

Feedback Network ◽

Polyp Detection Rate ◽

Previous Frame ◽

Monocular Depth ◽

Spatiotemporal Consistency

A colonoscopy is a medical examination used to check disease or abnormalities in the large intestine. If necessary, polyps or adenomas would be removed through the scope during a colonoscopy. Colorectal cancer can be prevented through this. However, the polyp detection rate differs depending on the condition and skill level of the endoscopist. Even some endoscopists have a 90% chance of missing an adenoma. Artificial intelligence and robot technologies for colonoscopy are being studied to compensate for these problems. In this study, we propose a self-supervised monocular depth estimation using spatiotemporal consistency in the colon environment. It is our contribution to propose a loss function for reconstruction errors between adjacent predicted depths and a depth feedback network that uses predicted depth information of the previous frame to predict the depth of the next frame. We performed quantitative and qualitative evaluation of our approach, and the proposed FBNet (depth FeedBack Network) outperformed state-of-the-art results for unsupervised depth estimation on the UCL datasets.

Download Full-text

Infants' responsiveness to static-monocular depth information: A recovery from habituation approach

Infant Behavior and Development ◽

10.1016/0163-6383(91)90008-g ◽

1991 ◽

Vol 14 (2) ◽

pp. 241-251 ◽

Cited By ~ 18

Author(s):

Martha E. Arterberry ◽

Ann Sorknes Bensen ◽

Albert Yonas

Keyword(s):

Depth Information ◽

Monocular Depth

Download Full-text

Unsupervised Monocular Depth Estimation Based on Residual Neural Network of Coarse–Refined Feature Extractions for Drone

Electronics ◽

10.3390/electronics8101179 ◽

2019 ◽

Vol 8 (10) ◽

pp. 1179 ◽

Cited By ~ 1

Author(s):

Tao Huang ◽

Shuanfeng Zhao ◽

Longlong Geng ◽

Qian Xu

Keyword(s):

Neural Network ◽

Image Reconstruction ◽

Depth Map ◽

Ground Truth ◽

Depth Estimation ◽

Input Image ◽

Superior Performance ◽

Estimation Methods ◽

Depth Information ◽

Monocular Depth

To take full advantage of the information of images captured by drones and given that most existing monocular depth estimation methods based on supervised learning require vast quantities of corresponding ground truth depth data for training, the model of unsupervised monocular depth estimation based on residual neural network of coarse–refined feature extractions for drone is therefore proposed. As a virtual camera is introduced through a deep residual convolution neural network based on coarse–refined feature extractions inspired by the principle of binocular depth estimation, the unsupervised monocular depth estimation has become an image reconstruction problem. To improve the performance of our model for monocular depth estimation, the following innovations are proposed. First, the pyramid processing for input image is proposed to build the topological relationship between the resolution of input image and the depth of input image, which can improve the sensitivity of depth information from a single image and reduce the impact of input image resolution on depth estimation. Second, the residual neural network of coarse–refined feature extractions for corresponding image reconstruction is designed to improve the accuracy of feature extraction and solve the contradiction between the calculation time and the numbers of network layers. In addition, to predict high detail output depth maps, the long skip connections between corresponding layers in the neural network of coarse feature extractions and deconvolution neural network of refined feature extractions are designed. Third, the loss of corresponding image reconstruction based on the structural similarity index (SSIM), the loss of approximate disparity smoothness and the loss of depth map are united as a novel training loss to better train our model. The experimental results show that our model has superior performance on the KITTI dataset composed by corresponding left view and right view and Make3D dataset composed by image and corresponding ground truth depth map compared to the state-of-the-art monocular depth estimation methods and basically meet the requirements for depth information of images captured by drones when our model is trained on KITTI.

Download Full-text

Monocular Depth Estimation using Transfer learning-An Overview

E3S Web of Conferences ◽

10.1051/e3sconf/202130901069 ◽

2021 ◽

Vol 309 ◽

pp. 01069

Author(s):

K. Swaraja ◽

V. Akshitha ◽

K. Pranav ◽

B. Vyshnavi ◽

V. Sai Akhil ◽

...

Keyword(s):

Deep Learning ◽

Transfer Learning ◽

Deep Neural Networks ◽

Depth Estimation ◽

Depth Information ◽

Learning Approaches ◽

Learning Network ◽

Depth Maps ◽

Ill Posed ◽

Monocular Depth

Depth estimation is a computer vision technique that is critical for autonomous schemes for sensing their surroundings and predict their own condition. Traditional estimating approaches, such as structure from motion besides stereo vision similarity, rely on feature communications from several views to provide depth information. In the meantime, the depth maps anticipated are scarce. Gathering depth information via monocular depth estimation is an ill-posed issue, according to a substantial corpus of deep learning approaches recently suggested. Estimation of Monocular depth with deep learning has gotten a lot of interest in current years, thanks to the fast expansion of deep neural networks, and numerous strategies have been developed to solve this issue. In this study, we want to give a comprehensive assessment of the methodologies often used in the estimation of monocular depth. The purpose of this study is to look at recent advances in deep learning-based estimation of monocular depth. To begin, we'll go through the various depth estimation techniques and datasets for monocular depth estimation. A complete overview of multiple deep learning methods that use transfer learning Network designs, including several combinations of encoders and decoders, is offered. In addition, multiple deep learning-based monocular depth estimation approaches and models are classified. Finally, the use of transfer learning approaches to monocular depth estimation is illustrated.

Download Full-text

A Novel Method for Estimating Monocular Depth Using Cycle GAN and Segmentation

Sensors ◽

10.3390/s20092567 ◽

2020 ◽

Vol 20 (9) ◽

pp. 2567

Author(s):

Dong-hoon Kwak ◽

Seung-ho Lee

Keyword(s):

Visual Information ◽

Spatial Information ◽

Three Dimensional ◽

Depth Estimation ◽

Autonomous Driving ◽

Depth Information ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Novel Method ◽

Monocular Depth

Modern image processing techniques use three-dimensional (3D) images, which contain spatial information such as depth and scale, in addition to visual information. These images are indispensable in virtual reality, augmented reality (AR), and autonomous driving applications. We propose a novel method to estimate monocular depth using a cycle generative adversarial network (GAN) and segmentation. In this paper, we propose a method for estimating depth information by combining segmentation. It uses three processes: segmentation and depth estimation, adversarial loss calculations, and cycle consistency loss calculations. The cycle consistency loss calculation process evaluates the similarity of two images when they are restored to their original forms after being estimated separately from two adversarial losses. To evaluate the objective reliability of the proposed method, we compared our proposed method with other monocular depth estimation (MDE) methods using the NYU Depth Dataset V2. Our results show that the benchmark value for our proposed method is better than other methods. Therefore, we demonstrated that our proposed method is more efficient in determining depth estimation.

Download Full-text

Effects of Pictorial Depth Information on Memory for Scene Expanse

PsycEXTRA Dataset ◽

10.1037/e501882009-434 ◽

2000 ◽

Author(s):

Carmela V. Gottesman

Keyword(s):

Depth Information ◽

Pictorial Depth

Download Full-text