scholarly journals Approximate Depth Shape Reconstruction for RGB-D Images Captured from HMDs for Mixed Reality Applications

2020 ◽  
Vol 6 (3) ◽  
pp. 11
Author(s):  
Naoyuki Awano

Depth sensors are important in several fields to recognize real space. However, there are cases where most depth values in a depth image captured by a sensor are constrained because the depths of distal objects are not always captured. This often occurs when a low-cost depth sensor or structured-light depth sensor is used. This also occurs frequently in applications where depth sensors are used to replicate human vision, e.g., when using the sensors in head-mounted displays (HMDs). One ideal inpainting (repair or restoration) approach for depth images with large missing areas, such as partial foreground depths, is to inpaint only the foreground; however, conventional inpainting studies have attempted to inpaint entire images. Thus, under the assumption of an HMD-mounted depth sensor, we propose a method to inpaint partially and reconstruct an RGB-D depth image to preserve foreground shapes. The proposed method is comprised of a smoothing process for noise reduction, filling defects in the foreground area, and refining the filled depths. Experimental results demonstrate that the inpainted results produced using the proposed method preserve object shapes in the foreground area with accurate results of the inpainted area with respect to the real depth with the peak signal-to-noise ratio metric.

Sensors ◽  
2019 ◽  
Vol 19 (2) ◽  
pp. 393 ◽  
Author(s):  
Jonha Lee ◽  
Dong-Wook Kim ◽  
Chee Won ◽  
Seung-Won Jung

Segmentation of human bodies in images is useful for a variety of applications, including background substitution, human activity recognition, security, and video surveillance applications. However, human body segmentation has been a challenging problem, due to the complicated shape and motion of a non-rigid human body. Meanwhile, depth sensors with advanced pattern recognition algorithms provide human body skeletons in real time with reasonable accuracy. In this study, we propose an algorithm that projects the human body skeleton from a depth image to a color image, where the human body region is segmented in the color image by using the projected skeleton as a segmentation cue. Experimental results using the Kinect sensor demonstrate that the proposed method provides high quality segmentation results and outperforms the conventional methods.


2002 ◽  
Vol 11 (2) ◽  
pp. 176-188 ◽  
Author(s):  
Yuichi Ohta ◽  
Yasuyuki Sugaya ◽  
Hiroki Igarashi ◽  
Toshikazu Ohtsuki ◽  
Kaito Taguchi

In mixed reality, occlusions and shadows are important to realize a natural fusion between the real and virtual worlds. In order to achieve this, it is necessary to acquire dense depth information of the real world from the observer's viewing position. The depth sensor must be attached to the see-through HMD of the observer because he/she moves around. The sensor should be small and light enough to be attached to the HMD and should be able to produce a reliable dense depth map at video rate. Unfortunately, however, no such depth sensors are available. We propose a client/server depth-sensing scheme to solve this problem. A server sensor located at a fixed position in the real world acquires the 3-D information of the world, and a client sensor attached to each observer produces the depth map from his/her viewing position using the 3-D information supplied from the server. Multiple clients can share the 3-D information of the server; we call it Share-Z. In this paper, the concept and merits of Share-Z are discussed. An experimental system developed to demonstrate the feasibility of Share-Z is also described.


Author(s):  
W. Baumeister ◽  
R. Rachel ◽  
R. Guckenberger ◽  
R. Hegerl

IntroductionCorrelation averaging (CAV) is meanwhile an established technique in image processing of two-dimensional crystals /1,2/. The basic idea is to detect the real positions of unit cells in a crystalline array by means of correlation functions and to average them by real space superposition of the aligned motifs. The signal-to-noise ratio improves in proportion to the number of motifs included in the average. Unlike filtering in the Fourier domain, CAV corrects for lateral displacements of the unit cells; thus it avoids the loss of resolution entailed by these distortions in the conventional approach. Here we report on some variants of the method, aimed at retrieving a maximum of information from images with very low signal-to-noise ratios (low dose microscopy of unstained or lightly stained specimens) while keeping the procedure economical.


Sensors ◽  
2020 ◽  
Vol 20 (13) ◽  
pp. 3747
Author(s):  
Adriana Lipovac ◽  
Vlatko Lipovac ◽  
Borivoj Modlic

Contemporary wireless networks dramatically enhance data rates and latency to become a key enabler of massive communication among various low-cost devices of limited computational power, standardized by the Long-Term Evolution (LTE) downscaled derivations LTE-M or narrowband Internet of Things (NB IoT), in particular. Specifically, assessment of the physical-layer transmission performance is important for higher-layer protocols determining the extent of the potential error recovery escalation upwards the protocol stack. Thereby, it is needed that the end-points of low processing capacity most efficiently estimate the residual bit error rate (BER) solely determined by the main orthogonal frequency-division multiplexing (OFDM) impairment–carrier frequency offset (CFO), specifically in small cells, where the signal-to-noise ratio is large enough, as well as the OFDM symbol cyclic prefix, preventing inter-symbol interference. However, in contrast to earlier analytical models with computationally demanding estimation of BER from the phase deviation caused by CFO, in this paper, after identifying the optimal sample instant in a power delay profile, we abstract the CFO by equivalent time dispersion (i.e., by additional spreading of the power delay profile that would produce the same BER degradation as the CFO). The proposed BER estimation is verified by means of the industry-standard LTE software simulator.


2021 ◽  
Vol 40 (3) ◽  
pp. 1-12
Author(s):  
Hao Zhang ◽  
Yuxiao Zhou ◽  
Yifei Tian ◽  
Jun-Hai Yong ◽  
Feng Xu

Reconstructing hand-object interactions is a challenging task due to strong occlusions and complex motions. This article proposes a real-time system that uses a single depth stream to simultaneously reconstruct hand poses, object shape, and rigid/non-rigid motions. To achieve this, we first train a joint learning network to segment the hand and object in a depth image, and to predict the 3D keypoints of the hand. With most layers shared by the two tasks, computation cost is saved for the real-time performance. A hybrid dataset is constructed here to train the network with real data (to learn real-world distributions) and synthetic data (to cover variations of objects, motions, and viewpoints). Next, the depth of the two targets and the keypoints are used in a uniform optimization to reconstruct the interacting motions. Benefitting from a novel tangential contact constraint, the system not only solves the remaining ambiguities but also keeps the real-time performance. Experiments show that our system handles different hand and object shapes, various interactive motions, and moving cameras.


2021 ◽  
Vol 11 (6) ◽  
pp. 2666
Author(s):  
Hafiz Muhammad Usama Hassan Alvi ◽  
Muhammad Shahid Farid ◽  
Muhammad Hassan Khan ◽  
Marcin Grzegorzek

Emerging 3D-related technologies such as augmented reality, virtual reality, mixed reality, and stereoscopy have gained remarkable growth due to their numerous applications in the entertainment, gaming, and electromedical industries. In particular, the 3D television (3DTV) and free-viewpoint television (FTV) enhance viewers’ television experience by providing immersion. They need an infinite number of views to provide a full parallax to the viewer, which is not practical due to various financial and technological constraints. Therefore, novel 3D views are generated from a set of available views and their depth maps using depth-image-based rendering (DIBR) techniques. The quality of a DIBR-synthesized image may be compromised for several reasons, e.g., inaccurate depth estimation. Since depth is important in this application, inaccuracies in depth maps lead to different textural and structural distortions that degrade the quality of the generated image and result in a poor quality of experience (QoE). Therefore, quality assessment DIBR-generated images are essential to guarantee an appreciative QoE. This paper aims at estimating the quality of DIBR-synthesized images and proposes a novel 3D objective image quality metric. The proposed algorithm aims to measure both textural and structural distortions in the DIBR image by exploiting the contrast sensitivity and the Hausdorff distance, respectively. The two measures are combined to estimate an overall quality score. The experimental evaluations performed on the benchmark MCL-3D dataset show that the proposed metric is reliable and accurate, and performs better than existing 2D and 3D quality assessment metrics.


2021 ◽  
Vol 20 (3) ◽  
pp. 1-22
Author(s):  
David Langerman ◽  
Alan George

High-resolution, low-latency apps in computer vision are ubiquitous in today’s world of mixed-reality devices. These innovations provide a platform that can leverage the improving technology of depth sensors and embedded accelerators to enable higher-resolution, lower-latency processing for 3D scenes using depth-upsampling algorithms. This research demonstrates that filter-based upsampling algorithms are feasible for mixed-reality apps using low-power hardware accelerators. The authors parallelized and evaluated a depth-upsampling algorithm on two different devices: a reconfigurable-logic FPGA embedded within a low-power SoC; and a fixed-logic embedded graphics processing unit. We demonstrate that both accelerators can meet the real-time requirements of 11 ms latency for mixed-reality apps. 1


Electronics ◽  
2021 ◽  
Vol 10 (4) ◽  
pp. 517
Author(s):  
Seong-heum Kim ◽  
Youngbae Hwang

Owing to recent advancements in deep learning methods and relevant databases, it is becoming increasingly easier to recognize 3D objects using only RGB images from single viewpoints. This study investigates the major breakthroughs and current progress in deep learning-based monocular 3D object detection. For relatively low-cost data acquisition systems without depth sensors or cameras at multiple viewpoints, we first consider existing databases with 2D RGB photos and their relevant attributes. Based on this simple sensor modality for practical applications, deep learning-based monocular 3D object detection methods that overcome significant research challenges are categorized and summarized. We present the key concepts and detailed descriptions of representative single-stage and multiple-stage detection solutions. In addition, we discuss the effectiveness of the detection models on their baseline benchmarks. Finally, we explore several directions for future research on monocular 3D object detection.


2021 ◽  
Vol 11 (4) ◽  
pp. 1499
Author(s):  
Bingchen Han ◽  
Junyu Xu ◽  
Pengfei Chen ◽  
Rongrong Guo ◽  
Yuanqi Gu ◽  
...  

An all-optical non-inverted parity generator and checker based on semiconductor optical amplifiers (SOAs) are proposed with four-wave mixing (FWM) and cross-gain modulation (XGM) non-linear effects. A 2-bit parity generator and checker using by exclusive NOR (XNOR) and exclusive OR (XOR) gates are implemented by first SOA and second SOA with 10 Gb/s return-to-zero (RZ) code, respectively. The parity and check bits are provided by adjusting the center wavelength of the tunable optical bandpass filter (TOBPF). A saturable absorber (SA) is used to reduce the negative effect of small signal clock (Clk) probe light to improve extinction ratio (ER) and optical signal-to-noise ratio (OSNR). For Pe and Ce (even parity bit and even check bit) without Clk probe light, ER and OSNR still maintain good performance because of the amplified effect of SOA. For Po (odd parity bit), ER and OSNR are improved to 1 dB difference for the original value. For Co (odd check bit), ER is deteriorated by 4 dB without SA, while OSNR is deteriorated by 12 dB. ER and OSNR are improved by about 2 dB for the original value with the SA. This design has the advantages of simple structure and great integration capability and low cost.


Sensors ◽  
2021 ◽  
Vol 21 (6) ◽  
pp. 2144
Author(s):  
Stefan Reitmann ◽  
Lorenzo Neumann ◽  
Bernhard Jung

Common Machine-Learning (ML) approaches for scene classification require a large amount of training data. However, for classification of depth sensor data, in contrast to image data, relatively few databases are publicly available and manual generation of semantically labeled 3D point clouds is an even more time-consuming task. To simplify the training data generation process for a wide range of domains, we have developed the BLAINDER add-on package for the open-source 3D modeling software Blender, which enables a largely automated generation of semantically annotated point-cloud data in virtual 3D environments. In this paper, we focus on classical depth-sensing techniques Light Detection and Ranging (LiDAR) and Sound Navigation and Ranging (Sonar). Within the BLAINDER add-on, different depth sensors can be loaded from presets, customized sensors can be implemented and different environmental conditions (e.g., influence of rain, dust) can be simulated. The semantically labeled data can be exported to various 2D and 3D formats and are thus optimized for different ML applications and visualizations. In addition, semantically labeled images can be exported using the rendering functionalities of Blender.


Sign in / Sign up

Export Citation Format

Share Document