DEEP LEARNING FOR 3D RECONSTRUCTION OF THE MARTIAN SURFACE USING MONOCULAR IMAGES: A FIRST GLANCE

Abstract. The paper presents our efforts on CNN-based 3D reconstruction of the Martian surface using monocular images. The Viking colorized global mosaic and Mar Express HRSC blended DEM are used as training data. An encoder-decoder network system is employed in the framework. The encoder section extracts features from the images, which includes convolution layers and reduction layers. The decoder section consists of deconvolution layers and is to integrate features and convert the images to desired DEMs. In addition, skip connection between encoder and decoder section is applied, which offers more low-level features for the decoder section to improve its performance. Monocular Context Camera (CTX) images are used to test and verify the performance of the proposed CNN-based approach. Experimental results show promising performances of the proposed approach. Features in images are well utilized, and topographical details in images are successfully recovered in the DEMs. In most cases, the geometric accuracies of the generated DEMs are comparable to those generated by the traditional technology of photogrammetry using stereo images. The preliminary results show that the proposed CNN-based approach has great potential for 3D reconstruction of the Martian surface.

Download Full-text

Mars3DNet: CNN-Based High-Resolution 3D Reconstruction of the Martian Surface from Single Images

Remote Sensing ◽

10.3390/rs13050839 ◽

2021 ◽

Vol 13 (5) ◽

pp. 839

Author(s):

Zeyu Chen ◽

Bo Wu ◽

Wai Chung Liu

Keyword(s):

High Resolution ◽

3D Reconstruction ◽

Real Data ◽

Experimental Results ◽

Training Data ◽

Stereo Images ◽

Martian Surface ◽

Laser Altimetry ◽

3D Surface ◽

Imaging Science

Three-dimensional (3D) surface models, e.g., digital elevation models (DEMs), are important for planetary exploration missions and scientific research. Current DEMs of the Martian surface are mainly generated by laser altimetry or photogrammetry, which have respective limitations. Laser altimetry cannot produce high-resolution DEMs; photogrammetry requires stereo images, but high-resolution stereo images of Mars are rare. An alternative is the convolutional neural network (CNN) technique, which implicitly learns features by assigning corresponding inputs and outputs. In recent years, CNNs have exhibited promising performance in the 3D reconstruction of close-range scenes. In this paper, we present a CNN-based algorithm that is capable of generating DEMs from single images; the DEMs have the same resolutions as the input images. An existing low-resolution DEM is used to provide global information. Synthetic and real data, including context camera (CTX) images and DEMs from stereo High-Resolution Imaging Science Experiment (HiRISE) images, are used as training data. The performance of the proposed method is evaluated using single CTX images of representative landforms on Mars, and the generated DEMs are compared with those obtained from stereo HiRISE images. The experimental results show promising performance of the proposed method. The topographic details are well reconstructed, and the geometric accuracies achieve root-mean-square error (RMSE) values ranging from 2.1 m to 12.2 m (approximately 0.5 to 2 pixels in the image space). The experimental results show that the proposed CNN-based method has great potential for 3D surface reconstruction in planetary applications.

Download Full-text

From 2D Images to 3D Tangible Models: Autostereoscopic and Haptic Visualization of Martian Rocks in Virtual Environments

Presence Teleoperators & Virtual Environments ◽

10.1162/pres.16.1.1 ◽

2007 ◽

Vol 16 (1) ◽

pp. 1-15 ◽

Cited By ~ 3

Author(s):

Cagatay Basdogan

Keyword(s):

3D Reconstruction ◽

3D Models ◽

Rock Surface ◽

Surface Model ◽

Stereo Images ◽

Martian Surface ◽

Reconstruction Process ◽

Low Transmission ◽

2D Images ◽

Multimodal Visualization

A planetary rover acquires a large collection of images while exploring its surrounding environment. For example, 2D stereo images of the Martian surface captured by the lander and the Sojourner rover during the Mars Pathfinder mission in 1997 were transmitted to Earth for scientific analysis and navigation planning. Due to the limited memory and computational power of the Sojourner rover, most of the images were captured by the lander and then transmitted to Earth directly for processing. If these images were merged together at the rover site to reconstruct a 3D representation of the rover's environment using its on-board resources, more information could potentially be transmitted to Earth in a compact manner. However, construction of a 3D model from multiple views is a highly challenging task to accomplish even for the new generation rovers (Spirit and Opportunity) running on the Mars surface at the time this article was written. Moreover, low transmission rates and communication intervals between Earth and Mars make the transmission of any data more difficult. We propose a robust and computationally efficient method for progressive transmission of multi-resolution 3D models of Martian rocks and soil reconstructed from a series of stereo images. For visualization of these models on Earth, we have developed a new multimodal visualization setup that integrates vision and touch. Our scheme for 3D reconstruction of Martian rocks from 2D images for visualization on Earth involves four main steps: a) acquisition of scans: depth maps are generated from stereo images, b) integration of scans: the scans are correctly positioned and oriented with respect to each other and fused to construct a 3D volumetric representation of the rocks using an octree, c) transmission: the volumetric data is encoded and progressively transmitted to Earth, d) visualization: a surface model is reconstructed from the transmitted data on Earth and displayed to a user through a new autostereoscopic visualization table and a haptic device for providing touch feedback. To test the practical utility of our approach, we first captured a sequence of stereo images of a rock surface from various viewpoints in JPL MarsYard using a mobile cart and then performed a series of 3D reconstruction experiments. In this paper, we discuss the steps of our reconstruction process, our multimodal visualization system, and the tradeoffs that have to be made to transmit multiresolution 3D models to Earth in an efficient manner under the constraints of limited computational resources, low transmission rate, and communication interval between Earth and Mars.

Download Full-text

Automated Ground Truth Generation for Learning-Based Crack Detection on Concrete Surfaces

Applied Sciences ◽

10.3390/app112210966 ◽

2021 ◽

Vol 11 (22) ◽

pp. 10966

Author(s):

Hsiang-Chieh Chen ◽

Zheng-Ting Li

Keyword(s):

Deep Learning ◽

Crack Detection ◽

Ground Truth ◽

Experimental Results ◽

Training Data ◽

Generation Process ◽

Detection Model ◽

Ground Truth Generation ◽

Labeling Approach ◽

The Cost

This article introduces an automated data-labeling approach for generating crack ground truths (GTs) within concrete images. The main algorithm includes generating first-round GTs, pre-training a deep learning-based model, and generating second-round GTs. On the basis of the generated second-round GTs of the training data, a learning-based crack detection model can be trained in a self-supervised manner. The pre-trained deep learning-based model is effective for crack detection after it is re-trained using the second-round GTs. The main contribution of this study is the proposal of an automated GT generation process for training a crack detection model at the pixel level. Experimental results show that the second-round GTs are similar to manually marked labels. Accordingly, the cost of implementing learning-based methods is reduced significantly because data labeling by humans is not necessitated.

Download Full-text

Siamese Reconstruction Network: Accurate Image Reconstruction from Human Brain Activity by Learning to Compare

Applied Sciences ◽

10.3390/app9224749 ◽

2019 ◽

Vol 9 (22) ◽

pp. 4749

Author(s):

Lingyun Jiang ◽

Kai Qiao ◽

Linyuan Wang ◽

Chi Zhang ◽

Jian Chen ◽

...

Keyword(s):

Deep Learning ◽

Human Brain ◽

Brain Activity ◽

Feature Space ◽

Training Data ◽

Reconstruction Method ◽

Learning Method ◽

Training Samples ◽

Visual Reconstruction ◽

Relationship Of

Decoding human brain activities, especially reconstructing human visual stimuli via functional magnetic resonance imaging (fMRI), has gained increasing attention in recent years. However, the high dimensionality and small quantity of fMRI data impose restrictions on satisfactory reconstruction, especially for the reconstruction method with deep learning requiring huge amounts of labelled samples. When compared with the deep learning method, humans can recognize a new image because our human visual system is naturally capable of extracting features from any object and comparing them. Inspired by this visual mechanism, we introduced the mechanism of comparison into deep learning method to realize better visual reconstruction by making full use of each sample and the relationship of the sample pair by learning to compare. In this way, we proposed a Siamese reconstruction network (SRN) method. By using the SRN, we improved upon the satisfying results on two fMRI recording datasets, providing 72.5% accuracy on the digit dataset and 44.6% accuracy on the character dataset. Essentially, this manner can increase the training data about from n samples to 2n sample pairs, which takes full advantage of the limited quantity of training samples. The SRN learns to converge sample pairs of the same class or disperse sample pairs of different class in feature space.

Download Full-text

Low-level Optimizations for Faster Mobile Deep Learning Inference Frameworks

Proceedings of the 28th ACM International Conference on Multimedia ◽

10.1145/3394171.3416516 ◽

2020 ◽

Author(s):

Mathieu Febvay

Keyword(s):

Deep Learning ◽

Low Level

Download Full-text

Domain randomization-enhanced deep learning models for bird detection

Scientific Reports ◽

10.1038/s41598-020-80101-x ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Xin Mao ◽

Jun Kang Chow ◽

Pin Siang Tan ◽

Kuan-fu Liu ◽

Jimmy Wu ◽

...

Keyword(s):

Deep Learning ◽

Continuous Monitoring ◽

Bird Species ◽

Training Data ◽

Learning Models ◽

Fine Grained ◽

Bird Detection ◽

Relationship Of ◽

The Relationship

AbstractAutomatic bird detection in ornithological analyses is limited by the accuracy of existing models, due to the lack of training data and the difficulties in extracting the fine-grained features required to distinguish bird species. Here we apply the domain randomization strategy to enhance the accuracy of the deep learning models in bird detection. Trained with virtual birds of sufficient variations in different environments, the model tends to focus on the fine-grained features of birds and achieves higher accuracies. Based on the 100 terabytes of 2-month continuous monitoring data of egrets, our results cover the findings using conventional manual observations, e.g., vertical stratification of egrets according to body size, and also open up opportunities of long-term bird surveys requiring intensive monitoring that is impractical using conventional methods, e.g., the weather influences on egrets, and the relationship of the migration schedules between the great egrets and little egrets.

Download Full-text

U-Infuse: Democratization of Customizable Deep Learning for Object Detection

Sensors ◽

10.3390/s21082611 ◽

2021 ◽

Vol 21 (8) ◽

pp. 2611

Author(s):

Andrew Shepley ◽

Greg Falzon ◽

Christopher Lawson ◽

Paul Meek ◽

Paul Kwan

Keyword(s):

Deep Learning ◽

Intellectual Property ◽

Object Detection ◽

Image Data ◽

Learning Technologies ◽

Training Data ◽

Learning Models ◽

Ecological Data ◽

Single Class ◽

Large Numbers

Image data is one of the primary sources of ecological data used in biodiversity conservation and management worldwide. However, classifying and interpreting large numbers of images is time and resource expensive, particularly in the context of camera trapping. Deep learning models have been used to achieve this task but are often not suited to specific applications due to their inability to generalise to new environments and inconsistent performance. Models need to be developed for specific species cohorts and environments, but the technical skills required to achieve this are a key barrier to the accessibility of this technology to ecologists. Thus, there is a strong need to democratize access to deep learning technologies by providing an easy-to-use software application allowing non-technical users to train custom object detectors. U-Infuse addresses this issue by providing ecologists with the ability to train customised models using publicly available images and/or their own images without specific technical expertise. Auto-annotation and annotation editing functionalities minimize the constraints of manually annotating and pre-processing large numbers of images. U-Infuse is a free and open-source software solution that supports both multiclass and single class training and object detection, allowing ecologists to access deep learning technologies usually only available to computer scientists, on their own device, customised for their application, without sharing intellectual property or sensitive data. It provides ecological practitioners with the ability to (i) easily achieve object detection within a user-friendly GUI, generating a species distribution report, and other useful statistics, (ii) custom train deep learning models using publicly available and custom training data, (iii) achieve supervised auto-annotation of images for further training, with the benefit of editing annotations to ensure quality datasets. Broad adoption of U-Infuse by ecological practitioners will improve ecological image analysis and processing by allowing significantly more image data to be processed with minimal expenditure of time and resources, particularly for camera trap images. Ease of training and use of transfer learning means domain-specific models can be trained rapidly, and frequently updated without the need for computer science expertise, or data sharing, protecting intellectual property and privacy.

Download Full-text

Matching Large Baseline Oblique Stereo Images Using an End-to-End Convolutional Neural Network

Remote Sensing ◽

10.3390/rs13020274 ◽

2021 ◽

Vol 13 (2) ◽

pp. 274

Author(s):

Guobiao Yao ◽

Alper Yilmaz ◽

Li Zhang ◽

Fei Meng ◽

Haibin Ai ◽

...

Keyword(s):

Neural Network ◽

Deep Learning ◽

Convolutional Neural Network ◽

Stereo Matching ◽

Least Square ◽

Affine Invariant ◽

Stereo Images ◽

Distance Ratio ◽

Matching Algorithm ◽

End To End

The available stereo matching algorithms produce large number of false positive matches or only produce a few true-positives across oblique stereo images with large baseline. This undesired result happens due to the complex perspective deformation and radiometric distortion across the images. To address this problem, we propose a novel affine invariant feature matching algorithm with subpixel accuracy based on an end-to-end convolutional neural network (CNN). In our method, we adopt and modify a Hessian affine network, which we refer to as IHesAffNet, to obtain affine invariant Hessian regions using deep learning framework. To improve the correlation between corresponding features, we introduce an empirical weighted loss function (EWLF) based on the negative samples using K nearest neighbors, and then generate deep learning-based descriptors with high discrimination that is realized with our multiple hard network structure (MTHardNets). Following this step, the conjugate features are produced by using the Euclidean distance ratio as the matching metric, and the accuracy of matches are optimized through the deep learning transform based least square matching (DLT-LSM). Finally, experiments on Large baseline oblique stereo images acquired by ground close-range and unmanned aerial vehicle (UAV) verify the effectiveness of the proposed approach, and comprehensive comparisons demonstrate that our matching algorithm outperforms the state-of-art methods in terms of accuracy, distribution and correct ratio. The main contributions of this article are: (i) our proposed MTHardNets can generate high quality descriptors; and (ii) the IHesAffNet can produce substantial affine invariant corresponding features with reliable transform parameters.

Download Full-text

Deep Learning-Based Differentiation between Mucinous Cystic Neoplasm and Serous Cystic Neoplasm in the Pancreas Using Endoscopic Ultrasonography

Diagnostics ◽

10.3390/diagnostics11061052 ◽

2021 ◽

Vol 11 (6) ◽

pp. 1052

Author(s):

Leang Sim Nguon ◽

Kangwon Seo ◽

Jung-Hyun Lim ◽

Tae-Jun Song ◽

Sung-Hyun Cho ◽

...

Keyword(s):

Decision Making ◽

Deep Learning ◽

Network Model ◽

Endoscopic Ultrasonography ◽

Data Augmentation ◽

Clinical Information ◽

Training Data ◽

Fine Tuning ◽

Cystic Neoplasm ◽

Cystic Neoplasms

Mucinous cystic neoplasms (MCN) and serous cystic neoplasms (SCN) account for a large portion of solitary pancreatic cystic neoplasms (PCN). In this study we implemented a convolutional neural network (CNN) model using ResNet50 to differentiate between MCN and SCN. The training data were collected retrospectively from 59 MCN and 49 SCN patients from two different hospitals. Data augmentation was used to enhance the size and quality of training datasets. Fine-tuning training approaches were utilized by adopting the pre-trained model from transfer learning while training selected layers. Testing of the network was conducted by varying the endoscopic ultrasonography (EUS) image sizes and positions to evaluate the network performance for differentiation. The proposed network model achieved up to 82.75% accuracy and a 0.88 (95% CI: 0.817–0.930) area under curve (AUC) score. The performance of the implemented deep learning networks in decision-making using only EUS images is comparable to that of traditional manual decision-making using EUS images along with supporting clinical information. Gradient-weighted class activation mapping (Grad-CAM) confirmed that the network model learned the features from the cyst region accurately. This study proves the feasibility of diagnosing MCN and SCN using a deep learning network model. Further improvement using more datasets is needed.

Download Full-text

Unsupervised content-preserving transformation for optical microscopy

Light Science & Applications ◽

10.1038/s41377-021-00484-y ◽

2021 ◽

Vol 10 (1) ◽

Cited By ~ 1

Author(s):

Xinyang Li ◽

Guoxun Zhang ◽

Hui Qiao ◽

Feng Bao ◽

Yue Deng ◽

...

Keyword(s):

Deep Learning ◽

Optical Microscopy ◽

Training Data ◽

Fluorescence Labeling ◽

Imaging Data ◽

Image Transformation ◽

General Applicability ◽

Data Annotation ◽

Biomedical Image ◽

Wide Range

AbstractThe development of deep learning and open access to a substantial collection of imaging data together provide a potential solution for computational image transformation, which is gradually changing the landscape of optical imaging and biomedical research. However, current implementations of deep learning usually operate in a supervised manner, and their reliance on laborious and error-prone data annotation procedures remains a barrier to more general applicability. Here, we propose an unsupervised image transformation to facilitate the utilization of deep learning for optical microscopy, even in some cases in which supervised models cannot be applied. Through the introduction of a saliency constraint, the unsupervised model, named Unsupervised content-preserving Transformation for Optical Microscopy (UTOM), can learn the mapping between two image domains without requiring paired training data while avoiding distortions of the image content. UTOM shows promising performance in a wide range of biomedical image transformation tasks, including in silico histological staining, fluorescence image restoration, and virtual fluorescence labeling. Quantitative evaluations reveal that UTOM achieves stable and high-fidelity image transformations across different imaging conditions and modalities. We anticipate that our framework will encourage a paradigm shift in training neural networks and enable more applications of artificial intelligence in biomedical imaging.

Download Full-text