Improving Registration of Augmented Reality by Incorporating DCNNS into Visual SLAM

Author(s):  
Yongbin Chen ◽  
Hanwu He ◽  
Heen Chen ◽  
Teng Zhu

Augmented reality (AR) by analyzing the characteristics of the scene, the computer-generated geometric information which can be added to the real environment in the way of visual fusion, reinforces the perception of the world. Three-dimensional (3D) registration is one of the core issues of in AR. The key issue is to estimate the visual sensor’s posture in the 3D environment and figure out the objects in the scene. Recently, computer vision has made significant progress, but the registration based on natural feature points in 3D space for AR system is still a severe problem. There is the difficulty of working out the mobile camera’s posture in the 3D scene precisely due to the unstable factors, such as the image noise, changing light and the complex background pattern. Therefore, to design a stable, reliable and efficient scene recognition algorithm is still very challenging work. In this paper, we propose an algorithm which combines Visual Simultaneous Localization and Mapping (SLAM) and Deep Convolutional Neural Networks (DCNNS) to boost the performance of AR registration. Semantic segmentation is a dense prediction task which aims to predict categories for each pixel in an image when applying to AR registration, and it will be able to narrow the searching range of the feature point between the two frames thus enhancing the stability of the system. Comparative experiments in this paper show that the semantic scene information will bring a revolutionary breakthrough to the AR interaction.

2021 ◽  
Vol 15 ◽  
Author(s):  
Xinglong Wu ◽  
Yuhang Tao ◽  
Guangzhi He ◽  
Dun Liu ◽  
Meiling Fan ◽  
...  

Deep convolutional neural networks (DCNNs) are widely utilized for the semantic segmentation of dense nerve tissues from light and electron microscopy (EM) image data; the goal of this technique is to achieve efficient and accurate three-dimensional reconstruction of the vasculature and neural networks in the brain. The success of these tasks heavily depends on the amount, and especially the quality, of the human-annotated labels fed into DCNNs. However, it is often difficult to acquire the gold standard of human-annotated labels for dense nerve tissues; human annotations inevitably contain discrepancies or even errors, which substantially impact the performance of DCNNs. Thus, a novel boosting framework consisting of a DCNN for multilabel semantic segmentation with a customized Dice-logarithmic loss function, a fusion module combining the annotated labels and the corresponding predictions from the DCNN, and a boosting algorithm to sequentially update the sample weights during network training iterations was proposed to systematically improve the quality of the annotated labels; this framework eventually resulted in improved segmentation task performance. The microoptical sectioning tomography (MOST) dataset was then employed to assess the effectiveness of the proposed framework. The result indicated that the framework, even trained with a dataset including some poor-quality human-annotated labels, achieved state-of-the-art performance in the segmentation of somata and vessels in the mouse brain. Thus, the proposed technique of artificial intelligence could advance neuroscience research.


Author(s):  
B. Vishnyakov ◽  
Y. Blokhinov ◽  
I. Sgibnev ◽  
V. Sheverdin ◽  
A. Sorokin ◽  
...  

Abstract. In this paper we describe a new multi-sensor platform for data collection and algorithm testing. We propose a couple of methods for solution of semantic scene understanding problem for land autonomous vehicles. We describe our approaches for automatic camera and LiDAR calibration; three-dimensional scene reconstruction and odometry calculation; semantic segmentation that provides obstacle recognition and underlying surface classification; object detection; point cloud segmentation. Also, we describe our virtual simulation complex based on Unreal Engine, that can be used for both data collection and algorithm testing. We collected a large database of field and virtual data: more than 1,000,000 real images with corresponding LiDAR data and more than 3,500,000 simulated images with corresponding LiDAR data. All proposed methods were implemented and tested on our autonomous platform; accuracy estimates were obtained on the collected database.


Author(s):  
B. Vishnyakov ◽  
I. Sgibnev ◽  
V. Sheverdin ◽  
A. Sorokin ◽  
P. Masalov ◽  
...  

Abstract. In this paper we present the semantic SLAM method based on a bundle of deep convolutional neural networks. It provides real-time dense semantic scene reconstruction for the autonomous driving system of an off-road robotic vehicle. Most state-of-the-art neural networks require large computing resources that go beyond the capabilities of many robotic platforms. We propose an architecture for 3D semantic scene reconstruction on top of the recent progress in computer vision by integrating SuperPoint, SuperGlue, Bi3D, DeepLabV3+, RTM3D and additional module with pre-processing, inference and postprocessing operations performed on GPU. We also updated our simulated dataset for semantic segmentation and added disparity images.


Author(s):  
Daniele Gibelli ◽  
Andrea Palamenghi ◽  
Pasquale Poppa ◽  
Chiarella Sforza ◽  
Cristina Cattaneo ◽  
...  

AbstractPersonal identification of the living from video surveillance systems usually involves 2D images. However, the potentiality of three-dimensional facial models in gaining personal identification through 3D-3D comparison still needs to be verified. This study aims at testing the reliability of a protocol for 3D-3D registration of facial models, potentially useful for personal identification. Fifty male subjects aged between 18 and 45 years were randomly chosen from a database of 3D facial models acquired through stereophotogrammetry. For each subject, two acquisitions were available; the 3D models of faces were then registered onto other models belonging to the same and different individuals according to the least point-to-point distance on the entire facial surface, for a total of 50 matches and 50 mismatches. RMS value (root mean square) of point-to-point distance between the two models was then calculated through the VAM® software. Intra- and inter-observer errors were assessed through calculation of relative technical error of measurement (rTEM). Possible statistically significant differences between matches and mismatches were assessed through Mann–Whitney test (p < 0.05). Both for intra- and inter-observer repeatability rTEM was between 2.2 and 5.2%. Average RMS point-to-point distance was 0.50 ± 0.28 mm in matches, 2.62 ± 0.56 mm in mismatches (p < 0.01). An RMS threshold of 1.50 mm could distinguish matches and mismatches in 100% of cases. This study provides an improvement to existing 3D-3D superimposition methods and confirms the great advantages which may derive to personal identification of the living from 3D facial analysis.


2021 ◽  
Vol 45 (5) ◽  
Author(s):  
Yuri Nagayo ◽  
Toki Saito ◽  
Hiroshi Oyama

AbstractThe surgical education environment has been changing significantly due to restricted work hours, limited resources, and increasing public concern for safety and quality, leading to the evolution of simulation-based training in surgery. Of the various simulators, low-fidelity simulators are widely used to practice surgical skills such as sutures because they are portable, inexpensive, and easy to use without requiring complicated settings. However, since low-fidelity simulators do not offer any teaching information, trainees do self-practice with them, referring to textbooks or videos, which are insufficient to learn open surgical procedures. This study aimed to develop a new suture training system for open surgery that provides trainees with the three-dimensional information of exemplary procedures performed by experts and allows them to observe and imitate the procedures during self-practice. The proposed system consists of a motion capture system of surgical instruments and a three-dimensional replication system of captured procedures on the surgical field. Motion capture of surgical instruments was achieved inexpensively by using cylindrical augmented reality (AR) markers, and replication of captured procedures was realized by visualizing them three-dimensionally at the same position and orientation as captured, using an AR device. For subcuticular interrupted suture, it was confirmed that the proposed system enabled users to observe experts’ procedures from any angle and imitate them by manipulating the actual surgical instruments during self-practice. We expect that this training system will contribute to developing a novel surgical training method that enables trainees to learn surgical skills by themselves in the absence of experts.


1994 ◽  
Vol 14 (5) ◽  
pp. 749-762 ◽  
Author(s):  
Jean-François Mangin ◽  
Vincent Frouin ◽  
Isabelle Bloch ◽  
Bernard Bendriem ◽  
Jaime Lopez-Krahe

We propose a fully nonsupervised methodology dedicated to the fast registration of positron emission tomography (PET) and magnetic resonance images of the brain. First, discrete representations of the surfaces of interest (head or brain surface) are automatically extracted from both images. Then, a shape-independent surface-matching algorithm gives a rigid body transformation, which allows the transfer of information between both modalities. A three-dimensional (3D) extension of the chamfer-matching principle makes up the core of this surface-matching algorithm. The optimal transformation is inferred from the minimization of a quadratic generalized distance between discrete surfaces, taking into account between-modality differences in the localization of the segmented surfaces. The minimization process is efficiently performed via the precomputation of a 3D distance map. Validation studies using a dedicated brain-shaped phantom have shown that the maximum registration error was of the order of the PET pixel size (2 mm) for the wide variety of tested configurations. The software is routinely used today in a clinical context by the physicians of the Service Hospitalier Frédéric Joliot (>150 registrations performed). The entire registration process requires ∼5 min on a conventional workstation.


2019 ◽  
Vol 18 (6) ◽  
pp. e2690 ◽  
Author(s):  
F. Porpiglia ◽  
E. Checcucci ◽  
D. Amparore ◽  
F. Piramide ◽  
P. Verri ◽  
...  

Author(s):  
Leonardo Tanzi ◽  
Pietro Piazzolla ◽  
Francesco Porpiglia ◽  
Enrico Vezzetti

Abstract Purpose The current study aimed to propose a Deep Learning (DL) and Augmented Reality (AR) based solution for a in-vivo robot-assisted radical prostatectomy (RARP), to improve the precision of a published work from our group. We implemented a two-steps automatic system to align a 3D virtual ad-hoc model of a patient’s organ with its 2D endoscopic image, to assist surgeons during the procedure. Methods This approach was carried out using a Convolutional Neural Network (CNN) based structure for semantic segmentation and a subsequent elaboration of the obtained output, which produced the needed parameters for attaching the 3D model. We used a dataset obtained from 5 endoscopic videos (A, B, C, D, E), selected and tagged by our team’s specialists. We then evaluated the most performing couple of segmentation architecture and neural network and tested the overlay performances. Results U-Net stood out as the most effecting architectures for segmentation. ResNet and MobileNet obtained similar Intersection over Unit (IoU) results but MobileNet was able to elaborate almost twice operations per seconds. This segmentation technique outperformed the results from the former work, obtaining an average IoU for the catheter of 0.894 (σ = 0.076) compared to 0.339 (σ = 0.195). This modifications lead to an improvement also in the 3D overlay performances, in particular in the Euclidean Distance between the predicted and actual model’s anchor point, from 12.569 (σ= 4.456) to 4.160 (σ = 1.448) and in the Geodesic Distance between the predicted and actual model’s rotations, from 0.266 (σ = 0.131) to 0.169 (σ = 0.073). Conclusion This work is a further step through the adoption of DL and AR in the surgery domain. In future works, we will overcome the limits of this approach and finally improve every step of the surgical procedure.


Sign in / Sign up

Export Citation Format

Share Document