scholarly journals Spatiotemporal Correlation-Based Accurate 3D Face Imaging Using Speckle Projection and Real-Time Improvement

2021 ◽  
Vol 11 (18) ◽  
pp. 8588
Author(s):  
Wei Xiong ◽  
Hongyu Yang ◽  
Pei Zhou ◽  
Keren Fu ◽  
Jiangping Zhu

The reconstruction of 3D face data is widely used in the fields of biometric recognition and virtual reality. However, the rapid acquisition of 3D data is plagued by reconstruction accuracy, slow speed, excessive scenes and contemporary reconstruction-technology. To solve this problem, an accurate 3D face-imaging implementation framework based on coarse-to-fine spatiotemporal correlation is designed, improving the spatiotemporal correlation stereo matching process and accelerating the processing using a spatiotemporal box filter. The reliability of the reconstruction parameters is further verified in order to resolve the contention between the measurement accuracy and time cost. A binocular 3D data acquisition device with a rotary speckle projector is used to continuously and synchronously acquire an infrared speckle stereo image sequence for reconstructing an accurate 3D face model. Based on the face mask data obtained by the high-precision industrial 3D scanner, the relationship between the number of projected speckle patterns, the matching window size, the reconstruction accuracy and the time cost is quantitatively analysed. An optimal combination of parameters is used to achieve a balance between reconstruction speed and accuracy. Thus, to overcome the problem of a long acquisition time caused by the switching of the rotary speckle pattern, a compact 3D face acquisition device using a fixed three-speckle projector is designed. Using the optimal combination parameters of the three speckles, the parallel pipeline strategy is adopted in each core processing unit to maximise system resource utilisation and data throughput. The most time-consuming spatiotemporal correlation stereo matching activity was accelerated by the graphical processing unit. The results show that the system achieves real-time image acquisition, as well as 3D face reconstruction, while maintaining acceptable systematic precision.

Author(s):  
Bo Yu ◽  
Ian Lane ◽  
Fang Chen

There are multiple challenges in face detection, including illumination conditions and diverse poses of the user. Prior works tend to detect faces by segmentation at pixel level, which are generally not computationally efficient. When people are sitting in the car, which can be regarded as single face situations, most face detectors fail to detect faces under various poses and illumination conditions. In this paper, we propose a simple but efficient approach for single face detection. We train a deep learning model that reconstructs face directly from input image by removing background and synthesizing 3D data for only the face region. We apply the proposed model to two public 3D face datasets, and obtain significant improvements in false rejection rate (FRR) of 4.6% (from 4.6% to 0.0%) and 21.7% (from 30.2% to 8.5%), respectively, compared with state-of-art performances in two datasets. Furthermore, we show that our reconstruction approach can be applied using 1/2 the time of a widely used real-time face detector. These results demonstrate that the proposed Reconstruction ConNet (RN) is both more accurate and efficient for real-time face detection than prior works.


2013 ◽  
Vol 21 (4) ◽  
Author(s):  
T. Hachaj ◽  
M. Ogiela

AbstractIn this paper we investigate stereovision algorithms that are suitable for multimedia video devices. The main novel contribution of this article is detailed analysis of modern graphical processing unit (GPU)-based dense local stereovision matching algorithm for real time multimedia applications. We considered two GPU-based implementations and one CPU implementation (as the baseline). The results (in terms of frame per second, fps) were measured twenty times per algorithm configuration and, then averaged (the standard deviation was below 5%). The disparity range was [0,20], [0,40], [0,60], [0,80], [0,100] and [0,120]. We also have used three different matching window sizes (3×3, 5×5 and 7×7) and three stereo pair image resolutions 320×240, 640×480 and 1024×768. We developed our algorithm under assumption that it should process data with the same speed as it arrives from captures’ devices. Because most popular of the shelf video cameras (multimedia video devices) capture data with the frequency of 30Hz, this frequency was threshold to consider implementation of our algorithm to be “real time”. We have proved that our GPU algorithm that uses only global memory can be used successfully in that kind of tasks. It is very important because that kind of implementation is more hardware-independent than algorithms that operate on shared memory. Knowing that we might avoid the algorithms failure while moving the multimedia application between machines operating different hardware. From our knowledge this type of research has not been yet reported.


This paper presents an improvement of the processing speed of the stereo matching problem. The time required for stereo matching represents a problem for many real time applications such as robot navigation , self-driving vehicles and object tracking. In this work, a real-time stereo matching system is proposed that utilizes the parallelism of Graphics Processing Unit (GPU). An area based stereo matching system is used to generate the disparity map. Four different sequential and parallel computational models are used to analyze the time consumed by the stereo matching. The models are: 1) Sequential CPU, 2) Parallel multi-core CPU, 3) Parallel GPU and 4) Parallel heterogenous CPU/GPU. The dense disparity image is calculated, and the time is highly reduced using the heterogenous CPU/GPU model, while maintaining the same accuracy of other models. A static partitioning of CPU and GPU workload is properly designed based on time analysis. Different cost functions are used to measure the correspondence and to generate the disparity map. A sliding window is used to calculate the cost functions efficiently. A speed of more than 100 frames per second(f/s) is achieved using parallel heterogenous CPU/GPU for 640 x 480 image resolution and a disparity range equals 50.


Author(s):  
Muhammad Hanif Ahmad Nizar ◽  
Chow Khuen Chan ◽  
Azira Khalil ◽  
Ahmad Khairuddin Mohamed Yusof ◽  
Khin Wee Lai

Background: Valvular heart disease is a serious disease leading to mortality and increasing medical care cost. The aortic valve is the most common valve affected by this disease. Doctors rely on echocardiogram for diagnosing and evaluating valvular heart disease. However, the images from echocardiogram are poor in comparison to Computerized Tomography and Magnetic Resonance Imaging scan. This study proposes the development of Convolutional Neural Networks (CNN) that can function optimally during a live echocardiographic examination for detection of the aortic valve. An automated detection system in an echocardiogram will improve the accuracy of medical diagnosis and can provide further medical analysis from the resulting detection. Methods: Two detection architectures, Single Shot Multibox Detector (SSD) and Faster Regional based Convolutional Neural Network (R-CNN) with various feature extractors were trained on echocardiography images from 33 patients. Thereafter, the models were tested on 10 echocardiography videos. Results: Faster R-CNN Inception v2 had shown the highest accuracy (98.6%) followed closely by SSD Mobilenet v2. In terms of speed, SSD Mobilenet v2 resulted in a loss of 46.81% in framesper- second (fps) during real-time detection but managed to perform better than the other neural network models. Additionally, SSD Mobilenet v2 used the least amount of Graphic Processing Unit (GPU) but the Central Processing Unit (CPU) usage was relatively similar throughout all models. Conclusion: Our findings provide a foundation for implementing a convolutional detection system to echocardiography for medical purposes.


2021 ◽  
Vol 20 (3) ◽  
pp. 1-22
Author(s):  
David Langerman ◽  
Alan George

High-resolution, low-latency apps in computer vision are ubiquitous in today’s world of mixed-reality devices. These innovations provide a platform that can leverage the improving technology of depth sensors and embedded accelerators to enable higher-resolution, lower-latency processing for 3D scenes using depth-upsampling algorithms. This research demonstrates that filter-based upsampling algorithms are feasible for mixed-reality apps using low-power hardware accelerators. The authors parallelized and evaluated a depth-upsampling algorithm on two different devices: a reconfigurable-logic FPGA embedded within a low-power SoC; and a fixed-logic embedded graphics processing unit. We demonstrate that both accelerators can meet the real-time requirements of 11 ms latency for mixed-reality apps. 1


2020 ◽  
Vol 32 ◽  
pp. 03054
Author(s):  
Akshata Parab ◽  
Rashmi Nagare ◽  
Omkar Kolambekar ◽  
Parag Patil

Vision is one of the very essential human senses and it plays a major role in human perception about surrounding environment. But for people with visual impairment their definition of vision is different. Visually impaired people are often unaware of dangers in front of them, even in familiar environment. This study proposes a real time guiding system for visually impaired people for solving their navigation problem and to travel without any difficulty. This system will help the visually impaired people by detecting the objects and giving necessary information about that object. This information may include what the object is, its location, its precision, distance from the visually impaired etc. All these information will be conveyed to the person through audio commands so that they can navigate freely anywhere anytime with no or minimal assistance. Object detection is done using You Only Look Once (YOLO) algorithm. As the process of capturing the video/images and sending it to the main module has to be carried at greater speed, Graphics Processing Unit (GPU) is used. This will help in enhancing the overall speed of the system and will help the visually Impaired to get the maximum necessary instructions as quickly as possible. The process starts from capturing the real time video, sending it for analysis and processing and get the calculated results. The results obtained from analysis are conveyed to user by means of hearing aid. As a result by this system the blind or the visually impaired people can visualize the surrounding environment and travel freely from source to destination on their own.


2001 ◽  
Author(s):  
Mitchell Parry ◽  
Brendan Hannigan ◽  
William Ribarsky ◽  
Christopher D. Shaw ◽  
Nickolas L. Faust

2012 ◽  
Vol 3 (7) ◽  
pp. 1557 ◽  
Author(s):  
Kenneth K. C. Lee ◽  
Adrian Mariampillai ◽  
Joe X. Z. Yu ◽  
David W. Cadotte ◽  
Brian C. Wilson ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document