A Novel Method for Estimating Monocular Depth Using Cycle GAN and Segmentation

Modern image processing techniques use three-dimensional (3D) images, which contain spatial information such as depth and scale, in addition to visual information. These images are indispensable in virtual reality, augmented reality (AR), and autonomous driving applications. We propose a novel method to estimate monocular depth using a cycle generative adversarial network (GAN) and segmentation. In this paper, we propose a method for estimating depth information by combining segmentation. It uses three processes: segmentation and depth estimation, adversarial loss calculations, and cycle consistency loss calculations. The cycle consistency loss calculation process evaluates the similarity of two images when they are restored to their original forms after being estimated separately from two adversarial losses. To evaluate the objective reliability of the proposed method, we compared our proposed method with other monocular depth estimation (MDE) methods using the NYU Depth Dataset V2. Our results show that the benchmark value for our proposed method is better than other methods. Therefore, we demonstrated that our proposed method is more efficient in determining depth estimation.

Download Full-text

Domain gap in adapting self-supervised depth estimation methods for stereo-endoscopy

Current Directions in Biomedical Engineering ◽

10.1515/cdbme-2020-0004 ◽

2020 ◽

Vol 6 (1) ◽

Author(s):

Lalith Sharan ◽

Lukas Burger ◽

Georgii Kostiuchik ◽

Ivo Wolf ◽

Matthias Karck ◽

...

Keyword(s):

Visual Information ◽

Ground Truth ◽

Depth Estimation ◽

Autonomous Driving ◽

Mitral Valve Surgery ◽

Estimation Methods ◽

Depth Information ◽

Depth Sensor ◽

Detection Range ◽

Depth Sensors

AbstractIn endoscopy, depth estimation is a task that potentially helps in quantifying visual information for better scene understanding. A plethora of depth estimation algorithms have been proposed in the computer vision community. The endoscopic domain however, differs from the typical depth estimation scenario due to differences in the setup and nature of the scene. Furthermore, it is unfeasible to obtain ground truth depth information owing to an unsuitable detection range of off-the-shelf depth sensors and difficulties in setting up a depth-sensor in a surgical environment. In this paper, an existing self-supervised approach, called Monodepth [1], from the field of autonomous driving is applied to a novel dataset of stereo-endoscopic images from reconstructive mitral valve surgery. While it is already known that endoscopic scenes are more challenging than outdoor driving scenes, the paper performs experiments to quantify the comparison, and describe the domain gap and challenges involved in the transfer of these methods.

Download Full-text

Efficient unsupervised monocular depth estimation using attention guided generative adversarial network

Journal of Real-Time Image Processing ◽

10.1007/s11554-021-01092-0 ◽

2021 ◽

Author(s):

Sumanta Bhattacharyya ◽

Ju Shen ◽

Stephen Welch ◽

Chen Chen

Keyword(s):

Depth Estimation ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Monocular Depth

Download Full-text

Unsupervised Monocular Depth Estimation for Autonomous Driving

Proceedings of the International Display Workshops ◽

10.36463/idw.2019.3dsap2_3dp2-2 ◽

2019 ◽

pp. 128

Author(s):

Chih-Shuan Huang ◽

Wan-Nung Tsung ◽

Wei-Jong Yang ◽

Chin-Hsing Chen

Keyword(s):

Depth Estimation ◽

Autonomous Driving ◽

Monocular Depth

Download Full-text

Saliency Heat-Map as Visual Attention for Autonomous Driving Using Generative Adversarial Network (GAN)

IEEE Transactions on Intelligent Transportation Systems ◽

10.1109/tits.2021.3053178 ◽

2021 ◽

pp. 1-14

Author(s):

Fahad Lateef ◽

Mohamed Kas ◽

Yassine Ruichek

Keyword(s):

Visual Attention ◽

Autonomous Driving ◽

Generative Adversarial Network ◽

Heat Map ◽

Adversarial Network

Download Full-text

Monocular Depth Estimation with Joint Attention Feature Distillation and Wavelet-Based Loss Function

Sensors ◽

10.3390/s21010054 ◽

2020 ◽

Vol 21 (1) ◽

pp. 54

Author(s):

Peng Liu ◽

Zonghua Zhang ◽

Zhaozong Meng ◽

Nan Gao

Keyword(s):

Joint Attention ◽

Loss Function ◽

Depth Estimation ◽

Depth Information ◽

3D Vision ◽

Network Training ◽

Crucial Component ◽

Benchmark Datasets ◽

Ill Posed ◽

Monocular Depth

Depth estimation is a crucial component in many 3D vision applications. Monocular depth estimation is gaining increasing interest due to flexible use and extremely low system requirements, but inherently ill-posed and ambiguous characteristics still cause unsatisfactory estimation results. This paper proposes a new deep convolutional neural network for monocular depth estimation. The network applies joint attention feature distillation and wavelet-based loss function to recover the depth information of a scene. Two improvements were achieved, compared with previous methods. First, we combined feature distillation and joint attention mechanisms to boost feature modulation discrimination. The network extracts hierarchical features using a progressive feature distillation and refinement strategy and aggregates features using a joint attention operation. Second, we adopted a wavelet-based loss function for network training, which improves loss function effectiveness by obtaining more structural details. The experimental results on challenging indoor and outdoor benchmark datasets verified the proposed method’s superiority compared with current state-of-the-art methods.

Download Full-text

PDANet: Self-Supervised Monocular Depth Estimation Using Perceptual and Data Augmentation Consistency

Applied Sciences ◽

10.3390/app11125383 ◽

2021 ◽

Vol 11 (12) ◽

pp. 5383

Author(s):

Huachen Gao ◽

Xiaoyu Liu ◽

Meixia Qu ◽

Shijie Huang

Keyword(s):

Data Augmentation ◽

State Of The Art ◽

Depth Estimation ◽

Input Image ◽

Depth Information ◽

Disparity Map ◽

Estimation Model ◽

Absolute Relative Error ◽

Texture Region ◽

Monocular Depth

In recent studies, self-supervised learning methods have been explored for monocular depth estimation. They minimize the reconstruction loss of images instead of depth information as a supervised signal. However, existing methods usually assume that the corresponding points in different views should have the same color, which leads to unreliable unsupervised signals and ultimately damages the reconstruction loss during the training. Meanwhile, in the low texture region, it is unable to predict the disparity value of pixels correctly because of the small number of extracted features. To solve the above issues, we propose a network—PDANet—that integrates perceptual consistency and data augmentation consistency, which are more reliable unsupervised signals, into a regular unsupervised depth estimation model. Specifically, we apply a reliable data augmentation mechanism to minimize the loss of the disparity map generated by the original image and the augmented image, respectively, which will enhance the robustness of the image in the prediction of color fluctuation. At the same time, we aggregate the features of different layers extracted by a pre-trained VGG16 network to explore the higher-level perceptual differences between the input image and the generated one. Ablation studies demonstrate the effectiveness of each components, and PDANet shows high-quality depth estimation results on the KITTI benchmark, which optimizes the state-of-the-art method from 0.114 to 0.084, measured by absolute relative error for depth estimation.

Download Full-text

Depth Estimation From a Single RGB Image Using Fine-Tuned Generative Adversarial Network

IEEE Access ◽

10.1109/access.2021.3060435 ◽

2021 ◽

Vol 9 ◽

pp. 32781-32794

Author(s):

Naeem Ul Islam ◽

Jaebyung Park

Keyword(s):

Depth Estimation ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Rgb Image

Download Full-text

Visually Navigated Bronchoscopy using Three Cycle-Consistent Generative Adversarial Network for Depth Estimation

Medical Image Analysis ◽

10.1016/j.media.2021.102164 ◽

2021 ◽

pp. 102164

Author(s):

Artur Banach ◽

Franklin King ◽

Fumitaro Masaki ◽

Hisashi Tsukada ◽

Nobuhiko Hata

Keyword(s):

Depth Estimation ◽

Generative Adversarial Network ◽

Adversarial Network

Download Full-text

Cuttlefish use stereopsis to strike at prey

Science Advances ◽

10.1126/sciadv.aay6036 ◽

2020 ◽

Vol 6 (2) ◽

pp. eaay6036 ◽

Cited By ~ 9

Author(s):

R. C. Feord ◽

M. E. Sumner ◽

S. Pusdekar ◽

L. Kalra ◽

P. T. Gonzalez-Bellido ◽

...

Keyword(s):

Information Processing ◽

Visual Information ◽

Convergent Evolution ◽

Visual Fields ◽

Three Dimensional ◽

Visual Information Processing ◽

Depth Information ◽

Cephalopod Species ◽

Left And Right

The camera-type eyes of vertebrates and cephalopods exhibit remarkable convergence, but it is currently unknown whether the mechanisms for visual information processing in these brains, which exhibit wildly disparate architecture, are also shared. To investigate stereopsis in a cephalopod species, we affixed “anaglyph” glasses to cuttlefish and used a three-dimensional perception paradigm. We show that (i) cuttlefish have also evolved stereopsis (i.e., the ability to extract depth information from the disparity between left and right visual fields); (ii) when stereopsis information is intact, the time and distance covered before striking at a target are shorter; (iii) stereopsis in cuttlefish works differently to vertebrates, as cuttlefish can extract stereopsis cues from anticorrelated stimuli. These findings demonstrate that although there is convergent evolution in depth computation, cuttlefish stereopsis is likely afforded by a different algorithm than in humans, and not just a different implementation.

Download Full-text

DEEP LEARNING FOR MONOCULAR DEPTH ESTIMATION FROM UAV IMAGES

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-annals-v-2-2020-451-2020 ◽

2020 ◽

Vol V-2-2020 ◽

pp. 451-458

Author(s):

L. Madhuanand ◽

F. Nex ◽

M. Y. Yang

Keyword(s):

Deep Learning ◽

Ground Level ◽

Depth Estimation ◽

Aerial Images ◽

Aerial Image ◽

Depth Information ◽

Single Image ◽

Monocular Depth ◽

Uav Images ◽

Image Depth

Abstract. Depth is an essential component for various scene understanding tasks and for reconstructing the 3D geometry of the scene. Estimating depth from stereo images requires multiple views of the same scene to be captured which is often not possible when exploring new environments with a UAV. To overcome this monocular depth estimation has been a topic of interest with the recent advancements in computer vision and deep learning techniques. This research has been widely focused on indoor scenes or outdoor scenes captured at ground level. Single image depth estimation from aerial images has been limited due to additional complexities arising from increased camera distance, wider area coverage with lots of occlusions. A new aerial image dataset is prepared specifically for this purpose combining Unmanned Aerial Vehicles (UAV) images covering different regions, features and point of views. The single image depth estimation is based on image reconstruction techniques which uses stereo images for learning to estimate depth from single images. Among the various available models for ground-level single image depth estimation, two models, 1) a Convolutional Neural Network (CNN) and 2) a Generative Adversarial model (GAN) are used to learn depth from aerial images from UAVs. These models generate pixel-wise disparity images which could be converted into depth information. The generated disparity maps from these models are evaluated for its internal quality using various error metrics. The results show higher disparity ranges with smoother images generated by CNN model and sharper images with lesser disparity range generated by GAN model. The produced disparity images are converted to depth information and compared with point clouds obtained using Pix4D. It is found that the CNN model performs better than GAN and produces depth similar to that of Pix4D. This comparison helps in streamlining the efforts to produce depth from a single aerial image.

Download Full-text