Deep Light Direction Reconstruction from single RGB images

Mapping Intimacies ◽

10.24132/csrn.2021.3101.4 ◽

2021 ◽

Author(s):

Markus Miller ◽

Alfred Nischwitz ◽

Rüdiger Westermann

Keyword(s):

Neural Network ◽

Deep Learning ◽

Augmented Reality ◽

Real World ◽

Depth Information ◽

Illumination Model ◽

Lighting Conditions ◽

Real Objects ◽

Rgb Images ◽

Light Source Direction

In augmented reality applications, consistent illumination between virtual and real objects is important for creating an immersive user experience. Consistent illumination can be achieved by appropriate parameterisation of the virtual illumination model, that is consistent with real-world lighting conditions. In this study, we developed a method to reconstruct the general light direction from red-green-blue (RGB) images of real-world scenes using a modified VGG-16 neural network. We reconstructed the general light direction as azimuth and elevation angles. To avoid inaccurate results caused by coordinate uncertainty occurring at steep elevation angles, we further introduced stereographically projected coordinates. Unlike recent deep-learning-based approaches for reconstructing the light source direction, our approach does not require depth information and thus does not rely on special red-green-blue- depth (RGB-D) images as input.

Download Full-text

Augmented Reality Maintenance Assistant Using YOLOv5

Applied Sciences ◽

10.3390/app11114758 ◽

2021 ◽

Vol 11 (11) ◽

pp. 4758

Author(s):

Ana Malta ◽

Mateus Mendes ◽

Torres Farinha

Keyword(s):

Neural Network ◽

Deep Learning ◽

Object Recognition ◽

Augmented Reality ◽

Real Time ◽

Recognition System ◽

High Accuracy ◽

Video Streams ◽

The Neural Network ◽

Deep Learning Neural Network

Maintenance professionals and other technical staff regularly need to learn to identify new parts in car engines and other equipment. The present work proposes a model of a task assistant based on a deep learning neural network. A YOLOv5 network is used for recognizing some of the constituent parts of an automobile. A dataset of car engine images was created and eight car parts were marked in the images. Then, the neural network was trained to detect each part. The results show that YOLOv5s is able to successfully detect the parts in real time video streams, with high accuracy, thus being useful as an aid to train professionals learning to deal with new equipment using augmented reality. The architecture of an object recognition system using augmented reality glasses is also designed.

Download Full-text

CIRO: The Effects of Visually Diminished Real Objects on Human Perception in Handheld Augmented Reality

Electronics ◽

10.3390/electronics10080900 ◽

2021 ◽

Vol 10 (8) ◽

pp. 900

Author(s):

Hanseob Kim ◽

Taehyung Kim ◽

Myungho Lee ◽

Gerard Jounghyun Kim ◽

Jae-In Hwang

Keyword(s):

Augmented Reality ◽

Real World ◽

Human Perception ◽

Ground Truth ◽

Prior Work ◽

Comparative Experiment ◽

User Perception ◽

Depth Distortion ◽

Real Objects ◽

Visual Artifacts

Augmented reality (AR) scenes often inadvertently contain real world objects that are not relevant to the main AR content, such as arbitrary passersby on the street. We refer to these real-world objects as content-irrelevant real objects (CIROs). CIROs may distract users from focusing on the AR content and bring about perceptual issues (e.g., depth distortion or physicality conflict). In a prior work, we carried out a comparative experiment investigating the effects on user perception of the AR content by the degree of the visual diminishment of such a CIRO. Our findings revealed that the diminished representation had positive impacts on human perception, such as reducing the distraction and increasing the presence of the AR objects in the real environment. However, in that work, the ground truth test was staged with perfect and artifact-free diminishment. In this work, we applied an actual real-time object diminishment algorithm on the handheld AR platform, which cannot be completely artifact-free in practice, and evaluated its performance both objectively and subjectively. We found that the imperfect diminishment and visual artifacts can negatively affect the subjective user experience.

Download Full-text

Real-time deep learning semantic segmentation during intra-operative surgery for 3D augmented reality assistance

International Journal of Computer Assisted Radiology and Surgery ◽

10.1007/s11548-021-02432-y ◽

2021 ◽

Author(s):

Leonardo Tanzi ◽

Pietro Piazzolla ◽

Francesco Porpiglia ◽

Enrico Vezzetti

Keyword(s):

Neural Network ◽

Deep Learning ◽

Augmented Reality ◽

Ad Hoc ◽

Geodesic Distance ◽

Semantic Segmentation ◽

Endoscopic Image ◽

Operative Surgery ◽

Endoscopic Videos

Abstract Purpose The current study aimed to propose a Deep Learning (DL) and Augmented Reality (AR) based solution for a in-vivo robot-assisted radical prostatectomy (RARP), to improve the precision of a published work from our group. We implemented a two-steps automatic system to align a 3D virtual ad-hoc model of a patient’s organ with its 2D endoscopic image, to assist surgeons during the procedure. Methods This approach was carried out using a Convolutional Neural Network (CNN) based structure for semantic segmentation and a subsequent elaboration of the obtained output, which produced the needed parameters for attaching the 3D model. We used a dataset obtained from 5 endoscopic videos (A, B, C, D, E), selected and tagged by our team’s specialists. We then evaluated the most performing couple of segmentation architecture and neural network and tested the overlay performances. Results U-Net stood out as the most effecting architectures for segmentation. ResNet and MobileNet obtained similar Intersection over Unit (IoU) results but MobileNet was able to elaborate almost twice operations per seconds. This segmentation technique outperformed the results from the former work, obtaining an average IoU for the catheter of 0.894 (σ = 0.076) compared to 0.339 (σ = 0.195). This modifications lead to an improvement also in the 3D overlay performances, in particular in the Euclidean Distance between the predicted and actual model’s anchor point, from 12.569 (σ= 4.456) to 4.160 (σ = 1.448) and in the Geodesic Distance between the predicted and actual model’s rotations, from 0.266 (σ = 0.131) to 0.169 (σ = 0.073). Conclusion This work is a further step through the adoption of DL and AR in the surgery domain. In future works, we will overcome the limits of this approach and finally improve every step of the surgical procedure.

Download Full-text

Real-Life Continuous Flash Suppression - Rendering The Real World Unconscious Using Augmented Reality Goggles

10.31234/osf.io/y3fcz ◽

2018 ◽

Author(s):

Uri Korisky ◽

Rony Hirschhorn ◽

Liad Mudrik

Keyword(s):

Augmented Reality ◽

Real World ◽

Three Dimensional ◽

Real Life ◽

Continuous Flash Suppression ◽

Future Studies ◽

The Real ◽

Dominant Eye ◽

Novel Variant ◽

Real Objects

Notice: a peer-reviewed version of this preprint has been published in Behavior Research Methods and is available freely at http://link.springer.com/article/10.3758/s13428-018-1162-0Continuous Flash Suppression (CFS) is a popular method for suppressing visual stimuli from awareness for relatively long periods. Thus far, it has only been used for suppressing two-dimensional images presented on-screen. We present a novel variant of CFS, termed ‘real-life CFS’, with which the actual immediate surroundings of an observer – including three-dimensional, real life objects – can be rendered unconscious. Real-life CFS uses augmented reality goggles to present subjects with CFS masks to their dominant eye, leaving their non-dominant eye exposed to the real world. In three experiments we demonstrate that real objects can indeed be suppressed from awareness using real-life CFS, and that duration suppression is comparable that obtained using the classic, on-screen CFS. We further provide an example for an experimental code, which can be modified for future studies using ‘real-life CFS’. This opens the gate for new questions in the study of consciousness and its functions.

Download Full-text

SeDAR: Reading Floorplans Like a Human—Using Deep Learning to Enable Human-Inspired Localisation

International Journal of Computer Vision ◽

10.1007/s11263-019-01239-4 ◽

2019 ◽

Vol 128 (5) ◽

pp. 1286-1310 ◽

Cited By ~ 3

Author(s):

Oscar Mendez ◽

Simon Hadfield ◽

Nicolas Pugeault ◽

Richard Bowden

Keyword(s):

Deep Learning ◽

Semantic Information ◽

State Of The Art ◽

Depth Information ◽

Semantic Maps ◽

Novel Method ◽

Rgb Images ◽

High Level ◽

Robotic Tasks ◽

And Robotics

Abstract The use of human-level semantic information to aid robotic tasks has recently become an important area for both Computer Vision and Robotics. This has been enabled by advances in Deep Learning that allow consistent and robust semantic understanding. Leveraging this semantic vision of the world has allowed human-level understanding to naturally emerge from many different approaches. Particularly, the use of semantic information to aid in localisation and reconstruction has been at the forefront of both fields. Like robots, humans also require the ability to localise within a structure. To aid this, humans have designed high-level semantic maps of our structures called floorplans. We are extremely good at localising in them, even with limited access to the depth information used by robots. This is because we focus on the distribution of semantic elements, rather than geometric ones. Evidence of this is that humans are normally able to localise in a floorplan that has not been scaled properly. In order to grant this ability to robots, it is necessary to use localisation approaches that leverage the same semantic information humans use. In this paper, we present a novel method for semantically enabled global localisation. Our approach relies on the semantic labels present in the floorplan. Deep Learning is leveraged to extract semantic labels from RGB images, which are compared to the floorplan for localisation. While our approach is able to use range measurements if available, we demonstrate that they are unnecessary as we can achieve results comparable to state-of-the-art without them.

Download Full-text

Perceptual Effects in Aligning Virtual and Real Objects in Augmented Reality Displays

Proceedings of the Human Factors and Ergonomics Society Annual Meeting ◽

10.1177/1071181397041002115 ◽

1997 ◽

Vol 41 (2) ◽

pp. 1239-1243 ◽

Cited By ~ 5

Author(s):

Paul Milgram ◽

David Drascic

Keyword(s):

Augmented Reality ◽

Real World ◽

Computer Assisted ◽

Important Class ◽

Stereo Images ◽

Virtual Objects ◽

3D Measurements ◽

Tape Measure ◽

Accuracy And Precision ◽

Real Objects

The concept of Augmented Reality (AR) displays is defined, in relation to the amount of real (unmodelled) and virtual (modelled) data presented in an image, as those displays in which real images, such as video, are enhanced with computer generated graphics. For the important class of stereoscopic AR displays, several factors may cause potential perceptual ambiguities, however, which manifest themselves in terms of decreased accuracy and precision whenever virtual objects must be aligned with real ones. A review is given of research conducted to assess both the magnitude of these perceptual effects and the effectiveness of a computer assisted Virtual Tape Measure (VTM), which has been developed for performing quantitative 3D measurements on real-world stereo images.

Download Full-text

Augmented Reality Research and Applications in Education

10.5772/intechopen.99356 ◽

2021 ◽

Author(s):

Ezgi Pelin Yildiz

Keyword(s):

Virtual Reality ◽

Augmented Reality ◽

Real Time ◽

Virtual Environment ◽

Real World ◽

The Real ◽

Virtual Objects ◽

Physical Environments ◽

Real Objects ◽

As If

Augmented reality is defined as the technology in which virtual objects are blended with the real world and also interact with each other. Although augmented reality applications are used in many areas, the most important of these areas is the field of education. AR technology allows the combination of real objects and virtual information in order to increase students’ interaction with physical environments and facilitate their learning. Developing technology enables students to learn complex topics in a fun and easy way through virtual reality devices. Students interact with objects in the virtual environment and can learn more about it. For example; by organizing digital tours to a museum or zoo in a completely different country, lessons can be taught in the company of a teacher as if they were there at that moment. In the light of all these, this study is a compilation study. In this context, augmented reality technologies were introduced and attention was drawn to their use in different fields of education with their examples. As a suggestion at the end of the study, it was emphasized that the prepared sections should be carefully read by the educators and put into practice in their lessons. In addition it was also pointed out that it should be preferred in order to communicate effectively with students by interacting in real time, especially during the pandemic process.

Download Full-text

An Effective Convolutional Neural Network Model for the Early Detection of COVID-19 Using Chest X-ray Images

Applied Sciences ◽

10.3390/app112110301 ◽

2021 ◽

Vol 11 (21) ◽

pp. 10301

Author(s):

Muhammad Shoaib Farooq ◽

Attique Ur Rehman ◽

Muhammad Idrees ◽

Muhammad Ahsan Raza ◽

Jehad Ali ◽

...

Keyword(s):

Neural Network ◽

Deep Learning ◽

Convolutional Neural Network ◽

Real World ◽

Early Stage ◽

Binary Classification ◽

X Rays ◽

X Ray ◽

Proposed Model ◽

Chest X Ray

COVID-19 has been difficult to diagnose and treat at an early stage all over the world. The numbers of patients showing symptoms for COVID-19 have caused medical facilities at hospitals to become unavailable or overcrowded, which is a major challenge. Studies have recently allowed us to determine that COVID-19 can be diagnosed with the aid of chest X-ray images. To combat the COVID-19 outbreak, developing a deep learning (DL) based model for automated COVID-19 diagnosis on chest X-ray is beneficial. In this research, we have proposed a customized convolutional neural network (CNN) model to detect COVID-19 from chest X-ray images. The model is based on nine layers which uses a binary classification method to differentiate between COVID-19 and normal chest X-rays. It provides COVID-19 detection early so the patients can be admitted in a timely fashion. The proposed model was trained and tested on two publicly available datasets. Cross-dataset studies are used to assess the robustness in a real-world context. Six hundred X-ray images were used for training and two hundred X-rays were used for validation of the model. The X-ray images of the dataset were preprocessed to improve the results and visualized for better analysis. The developed algorithm reached 98% precision, recall and f1-score. The cross-dataset studies also demonstrate the resilience of deep learning algorithms in a real-world context with 98.5 percent accuracy. Furthermore, a comparison table was created which shows that our proposed model outperforms other relative models in terms of accuracy. The quick and high-performance of our proposed DL-based customized model identifies COVID-19 patients quickly, which is helpful in controlling the COVID-19 outbreak.

Download Full-text

Self-Adaptive Approximate Mobile Deep Learning

Electronics ◽

10.3390/electronics10232958 ◽

2021 ◽

Vol 10 (23) ◽

pp. 2958

Author(s):

Timotej Knez ◽

Octavian Machidon ◽

Veljko Pejović

Keyword(s):

Neural Network ◽

Deep Learning ◽

Real World ◽

Edge Computing ◽

Major Drawback ◽

High Resource ◽

Average Accuracy ◽

Resource Requirements ◽

And Performance ◽

Network Compression

Edge intelligence is currently facing several important challenges hindering its performance, with the major drawback being meeting the high resource requirements of deep learning by the resource-constrained edge computing devices. The most recent adaptive neural network compression techniques demonstrated, in theory, the potential to facilitate the flexible deployment of deep learning models in real-world applications. However, their actual suitability and performance in ubiquitous or edge computing applications has not, to this date, been evaluated. In this context, our work aims to bridge the gap between the theoretical resource savings promised by such approaches and the requirements of a real-world mobile application by introducing algorithms that dynamically guide the compression rate of a neural network according to the continuously changing context in which the mobile computation is taking place. Through an in-depth trace-based investigation, we confirm the feasibility of our adaptation algorithms in offering a scalable trade-off between the inference accuracy and resource usage. We then implement our approach on real-world edge devices and, through a human activity recognition application, confirm that it offers efficient neural network compression adaptation in highly dynamic environments. The results of our experiment with 21 participants show that, compared to using static network compression, our approach uses 2.18× less energy with only a 1.5% drop in the average accuracy of the classification.

Download Full-text

Four-Dimension Deep Learning Method for Flower Quality Grading with Depth Information

Electronics ◽

10.3390/electronics10192353 ◽

2021 ◽

Vol 10 (19) ◽

pp. 2353

Author(s):

Xinyan Sun ◽

Zhenye Li ◽

Tingting Zhu ◽

Chao Ni

Keyword(s):

Neural Network ◽

Deep Learning ◽

Convolutional Neural Network ◽

Classification Accuracy ◽

Network Models ◽

Depth Image ◽

Depth Information ◽

Neural Network Models ◽

Flower Bud ◽

Flower Quality

Grading the quality of fresh cut flowers is an important practice in the flower industry. Based on the flower maturing status, a classification method based on deep learning and depth information was proposed for the grading of flower quality. Firstly, the RGB image and the depth image of a flower bud were collected and transformed into fused RGBD information. Then, the RGBD information of a flower was set as inputs of a convolutional neural network to determine the flower bud maturing status. Four convolutional neural network models (VGG16, ResNet18, MobileNetV2, and InceptionV3) were adjusted for a four-dimensional (4D) RGBD input to classify flowers, and their classification performances were compared with and without depth information. The experimental results show that the classification accuracy was improved with depth information, and the improved InceptionV3 network with RGBD achieved the highest classification accuracy (up to 98%), which means that the depth information can effectively reflect the characteristics of the flower bud and is helpful for the classification of the maturing status. These results have a certain significance for the intelligent classification and sorting of fresh flowers.

Download Full-text