Optical flow estimation using the Fisher–Rao metric

Stephen J Maybank; Sio-Hoi Ieng; Davide Migliore; Ryad Benosman

doi:10.1088/2634-4386/ac2bed

Optical flow estimation using the Fisher–Rao metric

Neuromorphic Computing and Engineering ◽

10.1088/2634-4386/ac2bed ◽

2021 ◽

Vol 1 (2) ◽

pp. 024004

Author(s):

Stephen J Maybank ◽

Sio-Hoi Ieng ◽

Davide Migliore ◽

Ryad Benosman

Keyword(s):

Optical Flow ◽

Three Dimensional ◽

Ground Truth ◽

Small Region ◽

Event Representation ◽

Least Eigenvalue ◽

Finite Set ◽

Parameterized Family ◽

Pixel Value ◽

Event Camera

Abstract The optical flow in an event camera is estimated using measurements in the address event representation (AER). Each measurement consists of a pixel address and the time at which a change in the pixel value equalled a given fixed threshold. The measurements in a small region of the pixel array and within a given window in time are approximated by a probability distribution defined on a finite set. The distributions obtained in this way form a three dimensional family parameterized by the pixel addresses and by time. Each parameter value has an associated Fisher–Rao matrix obtained from the Fisher–Rao metric for the parameterized family of distributions. The optical flow vector at a given pixel and at a given time is obtained from the eigenvector of the associated Fisher–Rao matrix with the least eigenvalue. The Fisher–Rao algorithm for estimating optical flow is tested on eight datasets, of which six have ground truth optical flow. It is shown that the Fisher–Rao algorithm performs well in comparison with two state of the art algorithms for estimating optical flow from AER measurements.

Download Full-text

Three-Dimensional Reconstruction Based on Visual SLAM of Mobile Robot in Search and Rescue Disaster Scenarios

Robotica ◽

10.1017/s0263574719000675 ◽

2019 ◽

Vol 38 (2) ◽

pp. 350-373 ◽

Cited By ~ 1

Author(s):

Hongling Wang ◽

Chengjin Zhang ◽

Yong Song ◽

Bao Pang ◽

Guangyuan Zhang

Keyword(s):

Mobile Robots ◽

Optical Flow ◽

Wavelet Transformation ◽

Three Dimensional ◽

Distance Estimation ◽

Ground Truth ◽

Object Identification ◽

Search And Rescue ◽

Map Building ◽

Flow Calculation

SummaryConventional simultaneous localization and mapping (SLAM) has concentrated on two-dimensional (2D) map building. To adapt it to urgent search and rescue (SAR) environments, it is necessary to combine the fast and simple global 2D SLAM and three-dimensional (3D) objects of interest (OOIs) local sub-maps. The main novelty of the present work is a method for 3D OOI reconstruction based on a 2D map, thereby retaining the fast performances of the latter. A theory is established that is adapted to a SAR environment, including the object identification, exploration area coverage (AC), and loop closure detection of revisited spots. Proposed for the first is image optical flow calculation with a 2D/3D fusion method and RGB-D (red, green, blue + depth) transformation based on Joblove–Greenberg mathematics and OpenCV processing. The mathematical theories of optical flow calculation and wavelet transformation are used for the first time to solve the robotic SAR SLAM problem. The present contributions indicate two aspects: (i) mobile robots depend on planar distance estimation to build 2D maps quickly and to provide SAR exploration AC; (ii) 3D OOIs are reconstructed using the proposed innovative methods of RGB-D iterative closest points (RGB-ICPs) and 2D/3D principle of wavelet transformation. Different mobile robots are used to conduct indoor and outdoor SAR SLAM. Both the SLAM and the SAR OOIs detection are implemented by simulations and ground-truth experiments, which provide strong evidence for the proposed 2D/3D reconstruction SAR SLAM approaches adapted to post-disaster environments.

Download Full-text

Crosstalk Minimization Method for Eye-tracking-based 3D Display

Journal of Imaging Science and Technology ◽

10.2352/j.imagingsci.technol.2020.64.6.060407 ◽

2020 ◽

Author(s):

Seok Lee ◽

Juyong Park ◽

Dongkyung Nam

Keyword(s):

Image Processing ◽

Eye Tracking ◽

Three Dimensional ◽

Processing Method ◽

Eye Position ◽

Relative Location ◽

3D Display ◽

Minimization Method ◽

3D Crosstalk ◽

Pixel Value

In this article, the authors present an image processing method to reduce three-dimensional (3D) crosstalk for eye-tracking-based 3D display. Specifically, they considered 3D pixel crosstalk and offset crosstalk and applied different approaches based on its characteristics. For 3D pixel crosstalk which depends on the viewer’s relative location, they proposed output pixel value weighting scheme based on viewer’s eye position, and for offset crosstalk they subtracted luminance of crosstalk components according to the measured display crosstalk level in advance. By simulations and experiments using the 3D display prototypes, the authors evaluated the effectiveness of proposed method.

Download Full-text

Experimental Comparison between Event and Global Shutter Cameras

Sensors ◽

10.3390/s21041137 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1137

Author(s):

Ondřej Holešovský ◽

Radoslav Škoviera ◽

Václav Hlaváč ◽

Roman Vítek

Keyword(s):

High Speed ◽

Ground Truth ◽

Motion Blur ◽

Position Estimation ◽

Estimation Accuracy ◽

Experimental Comparison ◽

Estimation Errors ◽

Detection Rates ◽

Ballistic Experiment ◽

Event Camera

We compare event-cameras with fast (global shutter) frame-cameras experimentally, asking: “What is the application domain, in which an event-camera surpasses a fast frame-camera?” Surprisingly, finding the answer has been difficult. Our methodology was to test event- and frame-cameras on generic computer vision tasks where event-camera advantages should manifest. We used two methods: (1) a controlled, cheap, and easily reproducible experiment (observing a marker on a rotating disk at varying speeds); (2) selecting one challenging practical ballistic experiment (observing a flying bullet having a ground truth provided by an ultra-high-speed expensive frame-camera). The experimental results include sampling/detection rates and position estimation errors as functions of illuminance and motion speed; and the minimum pixel latency of two commercial state-of-the-art event-cameras (ATIS, DVS240). Event-cameras respond more slowly to positive than to negative large and sudden contrast changes. They outperformed a frame-camera in bandwidth efficiency in all our experiments. Both camera types provide comparable position estimation accuracy. The better event-camera was limited by pixel latency when tracking small objects, resulting in motion blur effects. Sensor bandwidth limited the event-camera in object recognition. However, future generations of event-cameras might alleviate bandwidth limitations.

Download Full-text

Assessing biases, relaxing moralism: On ground-truthing practices in machine learning design and application

Big Data & Society ◽

10.1177/20539517211013569 ◽

2021 ◽

Vol 8 (1) ◽

pp. 205395172110135

Author(s):

Florian Jaton

Keyword(s):

Machine Learning ◽

William James ◽

A Priori ◽

Learning Algorithms ◽

Three Dimensional ◽

Ground Truth ◽

Machine Learning Algorithms ◽

Ground Truthing ◽

Set Up ◽

The Moment

This theoretical paper considers the morality of machine learning algorithms and systems in the light of the biases that ground their correctness. It begins by presenting biases not as a priori negative entities but as contingent external referents—often gathered in benchmarked repositories called ground-truth datasets—that define what needs to be learned and allow for performance measures. I then argue that ground-truth datasets and their concomitant practices—that fundamentally involve establishing biases to enable learning procedures—can be described by their respective morality, here defined as the more or less accounted experience of hesitation when faced with what pragmatist philosopher William James called “genuine options”—that is, choices to be made in the heat of the moment that engage different possible futures. I then stress three constitutive dimensions of this pragmatist morality, as far as ground-truthing practices are concerned: (I) the definition of the problem to be solved (problematization), (II) the identification of the data to be collected and set up (databasing), and (III) the qualification of the targets to be learned (labeling). I finally suggest that this three-dimensional conceptual space can be used to map machine learning algorithmic projects in terms of the morality of their respective and constitutive ground-truthing practices. Such techno-moral graphs may, in turn, serve as equipment for greater governance of machine learning algorithms and systems.

Download Full-text

TightCap: 3D Human Shape Capture with Clothing Tightness Field

ACM Transactions on Graphics ◽

10.1145/3478518 ◽

2022 ◽

Vol 41 (1) ◽

pp. 1-17

Author(s):

Xin Chen ◽

Anqi Pang ◽

Wei Yang ◽

Peihao Wang ◽

Lan Xu ◽

...

Keyword(s):

Three Dimensional ◽

Ground Truth ◽

Data Driven ◽

Optimization Scheme ◽

High Quality ◽

3D Scan ◽

Multi Stage ◽

Geometry Image

In this article, we present TightCap, a data-driven scheme to capture both the human shape and dressed garments accurately with only a single three-dimensional (3D) human scan, which enables numerous applications such as virtual try-on, biometrics, and body evaluation. To break the severe variations of the human poses and garments, we propose to model the clothing tightness field—the displacements from the garments to the human shape implicitly in the global UV texturing domain. To this end, we utilize an enhanced statistical human template and an effective multi-stage alignment scheme to map the 3D scan into a hybrid 2D geometry image. Based on this 2D representation, we propose a novel framework to predict clothing tightness field via a novel tightness formulation, as well as an effective optimization scheme to further reconstruct multi-layer human shape and garments under various clothing categories and human postures. We further propose a new clothing tightness dataset of human scans with a large variety of clothing styles, poses, and corresponding ground-truth human shapes to stimulate further research. Extensive experiments demonstrate the effectiveness of our TightCap to achieve the high-quality human shape and dressed garments reconstruction, as well as the further applications for clothing segmentation, retargeting, and animation.

Download Full-text

Monitoring of Sitting Postures With Sensor Networks in Controlled and Free-living Environments: Systematic Review

JMIR Biomedical Engineering ◽

10.2196/21105 ◽

2021 ◽

Vol 6 (1) ◽

pp. e21105

Author(s):

Arpita Mallikarjuna Kappattanavar ◽

Nico Steckhan ◽

Jan Philipp Sachs ◽

Harry Freitas da Cruz ◽

Erwin Böttinger ◽

...

Keyword(s):

Three Dimensional ◽

Pressure Sensors ◽

Ground Truth ◽

Living Environment ◽

Measurement Unit ◽

Future Research ◽

Free Living ◽

Medium Quality ◽

Underlying Causes ◽

Future Work

Background A majority of employees in the industrial world spend most of their working time in a seated position. Monitoring sitting postures can provide insights into the underlying causes of occupational discomforts such as low back pain. Objective This study focuses on the technologies and algorithms used to classify sitting postures on a chair with respect to spine and limb movements. Methods A total of three electronic literature databases were surveyed to identify studies classifying sitting postures in adults. Quality appraisal was performed to extract critical details and assess biases in the shortlisted papers. Results A total of 14 papers were shortlisted from 952 papers obtained after a systematic search. The majority of the studies used pressure sensors to measure sitting postures, whereas neural networks were the most frequently used approaches for classification tasks in this context. Only 2 studies were performed in a free-living environment. Most studies presented ethical and methodological shortcomings. Moreover, the findings indicate that the strategic placement of sensors can lead to better performance and lower costs. Conclusions The included studies differed in various aspects of design and analysis. The majority of studies were rated as medium quality according to our assessment. Our study suggests that future work for posture classification can benefit from using inertial measurement unit sensors, since they make it possible to differentiate among spine movements and similar postures, considering transitional movements between postures, and using three-dimensional cameras to annotate the data for ground truth. Finally, comparing such studies is challenging, as there are no standard definitions of sitting postures that could be used for classification. In addition, this study identifies five basic sitting postures along with different combinations of limb and spine movements to help guide future research efforts.

Download Full-text

Three-dimensional residual channel attention networks denoise and sharpen fluorescence microscopy image volumes

10.21203/rs.3.rs-68002/v1 ◽

2020 ◽

Author(s):

Jiji Chen ◽

Hideki Sasaki ◽

Hoyin Lai ◽

Yijun Su ◽

Jiamin Liu ◽

...

Keyword(s):

Fluorescence Microscopy ◽

Network Performance ◽

Resolution Enhancement ◽

Three Dimensional ◽

Ground Truth ◽

Time Lapse ◽

Stimulated Emission ◽

Structured Illumination ◽

Structured Illumination Microscopy ◽

Attention Networks

Abstract We demonstrate residual channel attention networks (RCAN) for restoring and enhancing volumetric time-lapse (4D) fluorescence microscopy data. First, we modify RCAN to handle image volumes, showing that our network enables denoising competitive with three other state-of-the-art neural networks. We use RCAN to restore noisy 4D super-resolution data, enabling image capture over tens of thousands of images (thousands of volumes) without apparent photobleaching. Second, using simulations we show that RCAN enables class-leading resolution enhancement, superior to other networks. Third, we exploit RCAN for denoising and resolution improvement in confocal microscopy, enabling ~2.5-fold lateral resolution enhancement using stimulated emission depletion (STED) microscopy ground truth. Fourth, we develop methods to improve spatial resolution in structured illumination microscopy using expansion microscopy ground truth, achieving improvements of ~1.4-fold laterally and ~3.4-fold axially. Finally, we characterize the limits of denoising and resolution enhancement, suggesting practical benchmarks for evaluating and further enhancing network performance.

Download Full-text

DENSE 3D OBJECT RECONSTRUCTION USING STRUCTURED-LIGHT SCANNER AND DEEP LEARNING

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xliii-b2-2020-777-2020 ◽

2020 ◽

Vol XLIII-B2-2020 ◽

pp. 777-783

Author(s):

V. V. Kniaz ◽

V. A. Mizginov ◽

L. V. Grodzitkiy ◽

N. A. Fomin ◽

V. A. Knyaz

Keyword(s):

Optical Flow ◽

3D Model ◽

Stereo Matching ◽

Structured Light ◽

Primary Source ◽

Ground Truth ◽

Complex Surface ◽

Systems Model ◽

Structured Light Scanner ◽

Starting Point

Abstract. Structured light scanners are intensively exploited in various applications such as non-destructive quality control at an assembly line, optical metrology, and cultural heritage documentation. While more than 20 companies develop commercially available structured light scanners, structured light technology accuracy has limitations for fast systems. Model surface discrepancies often present if the texture of the object has severe changes in brightness or reflective properties of its texture. The primary source of such discrepancies is errors in the stereo matching caused by complex surface texture. These errors result in ridge-like structures on the surface of the reconstructed 3D model. This paper is focused on the development of a deep neural network LineMatchGAN for error reduction in 3D models produced by a structured light scanner. We use the pix2pix model as a starting point for our research. The aim of our LineMatchGAN is a refinement of the rough optical flow A and generation of an error-free optical flow B̂. We collected a dataset (which we term ZebraScan) consisting of 500 samples to train our LineMatchGAN model. Each sample includes image sequences (Sl, Sr), ground-truth optical flow B and a ground-truth 3D model. We evaluate our LineMatchGAN on a test split of our ZebraScan dataset that includes 50 samples. The evaluation proves that our LineMatchGAN improves the stereo matching accuracy (optical flow end point error, EPE) from 0.05 pixels to 0.01 pixels.

Download Full-text

Impact of image reconstruction parameters when using 3D DSA reconstructions to measure intracranial aneurysms

Journal of NeuroInterventional Surgery ◽

10.1136/neurintsurg-2017-013080 ◽

2017 ◽

Vol 10 (3) ◽

pp. 285-289 ◽

Cited By ~ 5

Author(s):

Katrina L Ruedinger ◽

David R Rutkowski ◽

Sebastian Schafer ◽

Alejandro Roldán-Alzate ◽

Erick L Oberstar ◽

...

Keyword(s):

Three Dimensional ◽

Hounsfield Unit ◽

Ground Truth ◽

Individual Case ◽

Dome Height ◽

Patient Specific ◽

Analysis Tool ◽

Accuracy Of Measurements ◽

Image Characteristic

Background and purposeSafe and effective use of newly developed devices for aneurysm treatment requires the ability to make accurate measurements in the angiographic suite. Our purpose was to determine the parameters that optimize the geometric accuracy of three-dimensional (3D) vascular reconstructions.MethodsAn in vitro flow model consisting of a peristaltic pump, plastic tubing, and 3D printed patient-specific aneurysm models was used to simulate blood flow in an intracranial aneurysm. Flow rates were adjusted to match values reported in the literature for the internal carotid artery. 3D digital subtraction angiography acquisitions were obtained using a commercially available biplane angiographic system. Reconstructions were done using Edge Enhancement (EE) or Hounsfield Unit (HU) kernels and a Normal or Smooth image characteristic. Reconstructed images were analyzed using the vendor's aneurysm analysis tool. Ground truth measurements were derived from metrological scans of the models with a microCT. Aneurysm volume, surface area, dome height, minimum and maximum ostium diameter were determined for the five models.ResultsIn all cases, measurements made with the EE kernel most closely matched ground truth values. Differences in values derived from reconstructions displayed with Smooth or Normal image characteristics were small and had only little impact on the geometric parameters considered.ConclusionsReconstruction parameters impact the accuracy of measurements made using the aneurysm analysis tool of a commercially available angiographic system. Absolute differences between measurements made using reconstruction parameters determined as optimal in this study were, overall, very small. The significance of these differences, if any, will depend on the details of each individual case.

Download Full-text

Is Crowdsourcing for Optical Flow Ground Truth Generation Feasible?

Lecture Notes in Computer Science - Computer Vision Systems ◽

10.1007/978-3-642-39402-7_20 ◽

2013 ◽

pp. 193-202 ◽

Cited By ~ 6

Author(s):

Axel Donath ◽

Daniel Kondermann

Keyword(s):

Optical Flow ◽

Ground Truth ◽

Ground Truth Generation

Download Full-text