Compact single-shot metalens depth sensors inspired by eyes of jumping spiders

Jumping spiders (Salticidae) rely on accurate depth perception for predation and navigation. They accomplish depth perception, despite their tiny brains, by using specialized optics. Each principal eye includes a multitiered retina that simultaneously receives multiple images with different amounts of defocus, and from these images, distance is decoded with relatively little computation. We introduce a compact depth sensor that is inspired by the jumping spider. It combines metalens optics, which modifies the phase of incident light at a subwavelength scale, with efficient computations to measure depth from image defocus. Instead of using a multitiered retina to transduce multiple simultaneous images, the sensor uses a metalens to split the light that passes through an aperture and concurrently form 2 differently defocused images at distinct regions of a single planar photosensor. We demonstrate a system that deploys a 3-mm-diameter metalens to measure depth over a 10-cm distance range, using fewer than 700 floating point operations per output pixel. Compared with previous passive depth sensors, our metalens depth sensor is compact, single-shot, and requires a small amount of computation. This integration of nanophotonics and efficient computation brings artificial depth sensing closer to being feasible on millimeter-scale, microwatts platforms such as microrobots and microsensor networks.

Download Full-text

BLAINDER—A Blender AI Add-On for Generation of Semantically Labeled Depth-Sensing Data

Sensors ◽

10.3390/s21062144 ◽

2021 ◽

Vol 21 (6) ◽

pp. 2144

Author(s):

Stefan Reitmann ◽

Lorenzo Neumann ◽

Bernhard Jung

Keyword(s):

Point Clouds ◽

Training Data ◽

Sensor Data ◽

Generation Process ◽

Data Generation ◽

Depth Sensor ◽

Depth Sensing ◽

Depth Sensors ◽

3D Point Clouds ◽

Wide Range

Common Machine-Learning (ML) approaches for scene classification require a large amount of training data. However, for classification of depth sensor data, in contrast to image data, relatively few databases are publicly available and manual generation of semantically labeled 3D point clouds is an even more time-consuming task. To simplify the training data generation process for a wide range of domains, we have developed the BLAINDER add-on package for the open-source 3D modeling software Blender, which enables a largely automated generation of semantically annotated point-cloud data in virtual 3D environments. In this paper, we focus on classical depth-sensing techniques Light Detection and Ranging (LiDAR) and Sound Navigation and Ranging (Sonar). Within the BLAINDER add-on, different depth sensors can be loaded from presets, customized sensors can be implemented and different environmental conditions (e.g., influence of rain, dust) can be simulated. The semantically labeled data can be exported to various 2D and 3D formats and are thus optimized for different ML applications and visualizations. In addition, semantically labeled images can be exported using the rendering functionalities of Blender.

Download Full-text

Share-Z: Client/Server Depth Sensing for See-Through Head-Mounted Displays

Presence Teleoperators & Virtual Environments ◽

10.1162/1054746021470612 ◽

2002 ◽

Vol 11 (2) ◽

pp. 176-188 ◽

Cited By ~ 9

Author(s):

Yuichi Ohta ◽

Yasuyuki Sugaya ◽

Hiroki Igarashi ◽

Toshikazu Ohtsuki ◽

Kaito Taguchi

Keyword(s):

Real World ◽

Mixed Reality ◽

Depth Map ◽

Depth Information ◽

Depth Sensor ◽

Depth Sensing ◽

Client Server ◽

The Real ◽

Depth Sensors ◽

Viewing Position

In mixed reality, occlusions and shadows are important to realize a natural fusion between the real and virtual worlds. In order to achieve this, it is necessary to acquire dense depth information of the real world from the observer's viewing position. The depth sensor must be attached to the see-through HMD of the observer because he/she moves around. The sensor should be small and light enough to be attached to the HMD and should be able to produce a reliable dense depth map at video rate. Unfortunately, however, no such depth sensors are available. We propose a client/server depth-sensing scheme to solve this problem. A server sensor located at a fixed position in the real world acquires the 3-D information of the world, and a client sensor attached to each observer produces the depth map from his/her viewing position using the 3-D information supplied from the server. Multiple clients can share the 3-D information of the server; we call it Share-Z. In this paper, the concept and merits of Share-Z are discussed. An experimental system developed to demonstrate the feasibility of Share-Z is also described.

Download Full-text

Bayesian network based general correspondence retrieval method for depth sensing with single-shot structured light

Displays ◽

10.1016/j.displa.2021.102001 ◽

2021 ◽

pp. 102001

Author(s):

Mingming Ma ◽

Yi Niu ◽

Ruodai Li

Keyword(s):

Bayesian Network ◽

Structured Light ◽

Single Shot ◽

Depth Sensing ◽

Retrieval Method ◽

General Correspondence

Download Full-text

Vari-Focal Light Field Camera for Extended Depth of Field

Micromachines ◽

10.3390/mi12121453 ◽

2021 ◽

Vol 12 (12) ◽

pp. 1453

Author(s):

Hyun Myung Kim ◽

Min Seok Kim ◽

Sehui Chang ◽

Jiseong Jeong ◽

Hae-Gon Jeon ◽

...

Keyword(s):

Light Field ◽

High Reliability ◽

Focal Length ◽

Outdoor Environment ◽

Depth Of Field ◽

Depth Information ◽

Single Shot ◽

Depth Sensing ◽

Depth Measurement ◽

Sensing Applications

The light field camera provides a robust way to capture both spatial and angular information within a single shot. One of its important applications is in 3D depth sensing, which can extract depth information from the acquired scene. However, conventional light field cameras suffer from shallow depth of field (DoF). Here, a vari-focal light field camera (VF-LFC) with an extended DoF is newly proposed for mid-range 3D depth sensing applications. As a main lens of the system, a vari-focal lens with four different focal lengths is adopted to extend the DoF up to ~15 m. The focal length of the micro-lens array (MLA) is optimized by considering the DoF both in the image plane and in the object plane for each focal length. By dividing measurement regions with each focal length, depth estimation with high reliability is available within the entire DoF. The proposed VF-LFC is evaluated by the disparity data extracted from images with different distances. Moreover, the depth measurement in an outdoor environment demonstrates that our VF-LFC could be applied in various fields such as delivery robots, autonomous vehicles, and remote sensing drones.

Download Full-text

Graph Cut-Based Human Body Segmentation in Color Images Using Skeleton Information from the Depth Sensor

Sensors ◽

10.3390/s19020393 ◽

2019 ◽

Vol 19 (2) ◽

pp. 393 ◽

Cited By ~ 1

Author(s):

Jonha Lee ◽

Dong-Wook Kim ◽

Chee Won ◽

Seung-Won Jung

Keyword(s):

Human Body ◽

Color Image ◽

Depth Image ◽

Body Region ◽

Reasonable Accuracy ◽

Depth Sensor ◽

Complicated Shape ◽

Depth Sensors ◽

Body Segmentation ◽

Human Body Segmentation

Segmentation of human bodies in images is useful for a variety of applications, including background substitution, human activity recognition, security, and video surveillance applications. However, human body segmentation has been a challenging problem, due to the complicated shape and motion of a non-rigid human body. Meanwhile, depth sensors with advanced pattern recognition algorithms provide human body skeletons in real time with reasonable accuracy. In this study, we propose an algorithm that projects the human body skeleton from a depth image to a color image, where the human body region is segmented in the color image by using the projected skeleton as a segmentation cue. Experimental results using the Kinect sensor demonstrate that the proposed method provides high quality segmentation results and outperforms the conventional methods.

Download Full-text

Semi-Automatic Calibration Method for a Bed-Monitoring System Using Infrared Image Depth Sensors

Sensors ◽

10.3390/s19204581 ◽

2019 ◽

Vol 19 (20) ◽

pp. 4581 ◽

Cited By ~ 2

Author(s):

Komagata ◽

Kakinuma ◽

Ishikawa ◽

Shinoda ◽

Kobayashi

Keyword(s):

Monitoring System ◽

Infrared Image ◽

Calibration Method ◽

Spatial Position ◽

Depth Sensor ◽

Depth Sensors ◽

Care Facilities ◽

Set Up ◽

Automated Methods ◽

Image Depth

With the aging of society, the number of fall accidents has increased in hospitals and care facilities, and some accidents have happened around beds. To help prevent accidents, mats and clip sensors have been used in these facilities but they can be invasive, and their purpose may be misinterpreted. In recent years, research has been conducted using an infrared-image depth sensor as a bed-monitoring system for detecting a patient getting up, exiting the bed, and/or falling; however, some manual calibration was required initially to set up the sensor in each instance. We propose a bed-monitoring system that retains the infrared-image depth sensors but uses semi-automatic rather than manual calibration in each situation where it is applied. Our automated methods robustly calculate the bed region, surrounding floor, sensor location, and attitude, and can recognize the spatial position of the patient even when the sensor is attached but unconstrained. Also, we propose a means to reconfigure the spatial position considering occlusion by parts of the bed and also accounting for the gravity center of the patient’s body. Experimental results of multi-view calibration and motion simulation showed that our methods were effective for recognition of the spatial position of the patient.

Download Full-text

Approximate Depth Shape Reconstruction for RGB-D Images Captured from HMDs for Mixed Reality Applications

Journal of Imaging ◽

10.3390/jimaging6030011 ◽

2020 ◽

Vol 6 (3) ◽

pp. 11

Author(s):

Naoyuki Awano

Keyword(s):

Mixed Reality ◽

Signal To Noise Ratio ◽

Low Cost ◽

Real Space ◽

Shape Reconstruction ◽

Human Vision ◽

Depth Image ◽

Depth Sensor ◽

Depth Sensors ◽

Object Shapes

Depth sensors are important in several fields to recognize real space. However, there are cases where most depth values in a depth image captured by a sensor are constrained because the depths of distal objects are not always captured. This often occurs when a low-cost depth sensor or structured-light depth sensor is used. This also occurs frequently in applications where depth sensors are used to replicate human vision, e.g., when using the sensors in head-mounted displays (HMDs). One ideal inpainting (repair or restoration) approach for depth images with large missing areas, such as partial foreground depths, is to inpaint only the foreground; however, conventional inpainting studies have attempted to inpaint entire images. Thus, under the assumption of an HMD-mounted depth sensor, we propose a method to inpaint partially and reconstruct an RGB-D depth image to preserve foreground shapes. The proposed method is comprised of a smoothing process for noise reduction, filling defects in the foreground area, and refining the filled depths. Experimental results demonstrate that the inpainted results produced using the proposed method preserve object shapes in the foreground area with accurate results of the inpainted area with respect to the real depth with the peak signal-to-noise ratio metric.

Download Full-text

Measurement Noise Model for Depth Camera-Based People Tracking

Sensors ◽

10.3390/s21134488 ◽

2021 ◽

Vol 21 (13) ◽

pp. 4488

Author(s):

Otto Korkalo ◽

Tapio Takala

Keyword(s):

Real Life ◽

Measurement Noise ◽

Noise Model ◽

Depth Camera ◽

Depth Sensor ◽

People Tracking ◽

Plan View ◽

Depth Cameras ◽

Depth Sensors ◽

Noise Models

Depth cameras are widely used in people tracking applications. They typically suffer from significant range measurement noise, which causes uncertainty in the detections made of the people. The data fusion, state estimation and data association tasks require that the measurement uncertainty is modelled, especially in multi-sensor systems. Measurement noise models for different kinds of depth sensors have been proposed, however, the existing approaches require manual calibration procedures which can be impractical to conduct in real-life scenarios. In this paper, we present a new measurement noise model for depth camera-based people tracking. In our tracking solution, we utilise the so-called plan-view approach, where the 3D measurements are transformed to the floor plane, and the tracking problem is solved in 2D. We directly model the measurement noise in the plan-view domain, and the errors that originate from the imaging process and the geometric transformations of the 3D data are combined. We also present a method for directly defining the noise models from the observations. Together with our depth sensor network self-calibration routine, the approach allows fast and practical deployment of depth-based people tracking systems.

Download Full-text

Research on Action Recognition of Human Body Based on Kinect Sensor

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.644-650.4162 ◽

2014 ◽

Vol 644-650 ◽

pp. 4162-4166

Author(s):

Dan Dan Guo ◽

Xi’an Zhu

Keyword(s):

Action Recognition ◽

Recognition Rate ◽

Human Action Recognition ◽

Human Action ◽

Human Motion ◽

Skeletal Structure ◽

Depth Sensor ◽

Depth Sensors ◽

3D Space ◽

3D Information

An effective Human action recognition method based on the human skeletal information which is extracted by Kinect depth sensor is proposed in this paper. Skeleton’s 3D space coordinates and the angles between nodes of human related actions are collected as action characteristics through the research of human skeletal structure, node data and research on human actions. First, 3D information of human skeletons is acquired by Kinect depth sensors and the cosine of relevant nodes is calculated. Then human skeletal information within the time prior to current state is stored in real time. Finally, the relevant locations of the skeleton nodes and the variation of the cosine of skeletal joints within a certain time are analyzed to recognize the human motion. This algorithm has higher adaptability and practicability because of the complicated sample trainings and recognizing processes of traditional method is not taken up. The results of the experiment indicate that this method is with high recognition rate.

Download Full-text

Automated Quantification of Macronutrients using Computer Vision on a Depth-Sensing Smartphone (Preprint)

10.2196/preprints.15294 ◽

2019 ◽

Author(s):

David Herzig ◽

Christos T Nakas ◽

Janine Stalder ◽

Christophe Kosinski ◽

Céline Laesser ◽

...

Keyword(s):

Computer Vision ◽

Processing Time ◽

Energy Content ◽

Absolute Error ◽

Smartphone Application ◽

Estimation Accuracy ◽

Depth Sensor ◽

Depth Sensing ◽

Food Items ◽

Macronutrient Content

BACKGROUND Quantification of dietary intake is key to the prevention and management of numerous metabolic disorders. Conventional approaches are challenging, laborious, and, suffer from lack of accuracy. The recent advent of depth-sensing smartphones in conjunction with computer vision has the potential to facilitate reliable quantification of food intake. OBJECTIVE To evaluate the accuracy of a novel smartphone application combining depth-sensing hardware with computer vision to quantify meal macronutrient content. METHODS The application ran on a smartphone with built-in depth sensor applying structured light (iPhone X) and estimated weight, macronutrient (carbohydrate, protein, fat) and energy content of 48 randomly chosen meals (type of meals: breakfast, cooked meals, snacks) encompassing 128 food items. Reference weight was generated by weighing individual food items using a precision scale. The study endpoints were fourfold: i) error of estimated meal weight; ii) error of estimated meal macronutrient content and energy content; iii) segmentation performance; and iv) processing time. RESULTS Mean±SD absolute error of the application’s estimate was 35.1±42.8g (14.0±12.2%) for weight, 5.5±5.1g (14.8±10.9%) for carbohydrate content, 2.4±5.6g (13.0±13.8%), 1.3±1.7g (12.3±12.8%) for fat content and 41.2±42.5kcal (12.7±10.8%) for energy content. While estimation accuracy was not affected by the viewing angle, the type of meal mattered with slightly worse performance for cooked meals compared to breakfast and snack. Segmentation required adjustment for 7 out of 128 items. Mean±SD processing time across all meals was 22.9±8.6s. CONCLUSIONS The present study evaluated the accuracy of a novel smartphone application with integrated depth-sensing camera and found a high accuracy in food estimation across all macronutrients. This was paralleled by a high segmentation performance and low processing time corroborating the high usability of this system.

Download Full-text