scholarly journals Adaptive Depth Guided Image Completion for Structure and Texture Synthesis

Author(s):  
Michael Luigi Ciotta

The problem of synthesis of missing image parts represents an interesting and challenging area of image processing and computer vision with significant potential. This thesis, focuses on an adaptive depth-guided image completion method that addresses the image completion problem using information contained in the rest of the image. The completion process is separated into structure and texture synthesis. A method is first introduced for completing the respective depth map through the use of a diffusion-based operation, preserving global image structure within the unknown region. Building upon the state of the art exemplar based inpainting technique of Barnes et al., we complete the target (unknown) region by matching to and blending source patches drawn from the rest of the image, using the reconstructed depth information to guide the completion process. Secondly, for each target patch, we formulate an adaptive patch size determination as an optimization problem that minimizes an objective function involving local image gradient magnitude and orientations. An extension to the coherence- based objective function introduced by Wexler et al. is then introduced, which not only encourages coherence of the respective target region with respect to the source region in colour but also in depth. We further consider the variance between patches in the SSD criteria for preventing error accumulation and propagation. Experimental results show that our method can provide a significant improvement to patch-based image completion algorithms shown by PSNR and SSIM calculations as well as a qualitative subjective study.

2021 ◽  
Author(s):  
Michael Luigi Ciotta

The problem of synthesis of missing image parts represents an interesting and challenging area of image processing and computer vision with significant potential. This thesis, focuses on an adaptive depth-guided image completion method that addresses the image completion problem using information contained in the rest of the image. The completion process is separated into structure and texture synthesis. A method is first introduced for completing the respective depth map through the use of a diffusion-based operation, preserving global image structure within the unknown region. Building upon the state of the art exemplar based inpainting technique of Barnes et al., we complete the target (unknown) region by matching to and blending source patches drawn from the rest of the image, using the reconstructed depth information to guide the completion process. Secondly, for each target patch, we formulate an adaptive patch size determination as an optimization problem that minimizes an objective function involving local image gradient magnitude and orientations. An extension to the coherence- based objective function introduced by Wexler et al. is then introduced, which not only encourages coherence of the respective target region with respect to the source region in colour but also in depth. We further consider the variance between patches in the SSD criteria for preventing error accumulation and propagation. Experimental results show that our method can provide a significant improvement to patch-based image completion algorithms shown by PSNR and SSIM calculations as well as a qualitative subjective study.


2018 ◽  
Vol 2018 ◽  
pp. 1-13 ◽  
Author(s):  
Hasan Mahmud ◽  
Md. Kamrul Hasan ◽  
Abdullah-Al-Tariq ◽  
Md. Hasanul Kabir ◽  
M. A. Mottalib

Symbolic gestures are the hand postures with some conventionalized meanings. They are static gestures that one can perform in a very complex environment containing variations in rotation and scale without using voice. The gestures may be produced in different illumination conditions or occluding background scenarios. Any hand gesture recognition system should find enough discriminative features, such as hand-finger contextual information. However, in existing approaches, depth information of hand fingers that represents finger shapes is utilized in limited capacity to extract discriminative features of fingers. Nevertheless, if we consider finger bending information (i.e., a finger that overlaps palm), extracted from depth map, and use them as local features, static gestures varying ever so slightly can become distinguishable. Our work here corroborated this idea and we have generated depth silhouettes with variation in contrast to achieve more discriminative keypoints. This approach, in turn, improved the recognition accuracy up to 96.84%. We have applied Scale-Invariant Feature Transform (SIFT) algorithm which takes the generated depth silhouettes as input and produces robust feature descriptors as output. These features (after converting into unified dimensional feature vectors) are fed into a multiclass Support Vector Machine (SVM) classifier to measure the accuracy. We have tested our results with a standard dataset containing 10 symbolic gesture representing 10 numeric symbols (0-9). After that we have verified and compared our results among depth images, binary images, and images consisting of the hand-finger edge information generated from the same dataset. Our results show higher accuracy while applying SIFT features on depth images. Recognizing numeric symbols accurately performed through hand gestures has a huge impact on different Human-Computer Interaction (HCI) applications including augmented reality, virtual reality, and other fields.


Sensors ◽  
2021 ◽  
Vol 21 (18) ◽  
pp. 6095
Author(s):  
Xiaojing Sun ◽  
Bin Wang ◽  
Longxiang Huang ◽  
Qian Zhang ◽  
Sulei Zhu ◽  
...  

Despite recent successes in hand pose estimation from RGB images or depth maps, inherent challenges remain. RGB-based methods suffer from heavy self-occlusions and depth ambiguity. Depth sensors rely heavily on distance and can only be used indoors, thus there are many limitations to the practical application of depth-based methods. The aforementioned challenges have inspired us to combine the two modalities to offset the shortcomings of the other. In this paper, we propose a novel RGB and depth information fusion network to improve the accuracy of 3D hand pose estimation, which is called CrossFuNet. Specifically, the RGB image and the paired depth map are input into two different subnetworks, respectively. The feature maps are fused in the fusion module in which we propose a completely new approach to combine the information from the two modalities. Then, the common method is used to regress the 3D key-points by heatmaps. We validate our model on two public datasets and the results reveal that our model outperforms the state-of-the-art methods.


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Li-fen Tu ◽  
Qi Peng

Robot detection, recognition, positioning, and other applications require not only real-time video image information but also the distance from the target to the camera, that is, depth information. This paper proposes a method to automatically generate any monocular camera depth map based on RealSense camera data. By using this method, any current single-camera detection system can be upgraded online. Without changing the original system, the depth information of the original monocular camera can be obtained simply, and the transition from 2D detection to 3D detection can be realized. In order to verify the effectiveness of the proposed method, a hardware system was constructed using the Micro-vision RS-A14K-GC8 industrial camera and the Intel RealSense D415 depth camera, and the depth map fitting algorithm proposed in this paper was used to test the system. The results show that, except for a few depth-missing areas, the results of other areas with depth are still good, which can basically describe the distance difference between the target and the camera. In addition, in order to verify the scalability of the method, a new hardware system was constructed with different cameras, and images were collected in a complex farmland environment. The generated depth map was good, which could basically describe the distance difference between the target and the camera.


Optik ◽  
2019 ◽  
Vol 185 ◽  
pp. 896-909 ◽  
Author(s):  
Qiaochuan Chen ◽  
Guangyao Li ◽  
Li Xie ◽  
Qingguo Xiao ◽  
Mang Xiao

Sensors ◽  
2019 ◽  
Vol 19 (23) ◽  
pp. 5242
Author(s):  
Loris Nanni ◽  
Sheryl Brahnam ◽  
Alessandra Lumini

A fundamental problem in computer vision is face detection. In this paper, an experimentally derived ensemble made by a set of six face detectors is presented that maximizes the number of true positives while simultaneously reducing the number of false positives produced by the ensemble. False positives are removed using different filtering steps based primarily on the characteristics of the depth map related to the subwindows of the whole image that contain candidate faces. A new filtering approach based on processing the image with different wavelets is also proposed here. The experimental results show that the applied filtering steps used in our best ensemble reduce the number of false positives without decreasing the detection rate. This finding is validated on a combined dataset composed of four others for a total of 549 images, including 614 upright frontal faces acquired in unconstrained environments. The dataset provides both 2D and depth data. For further validation, the proposed ensemble is tested on the well-known BioID benchmark dataset, where it obtains a 100% detection rate with an acceptable number of false positives.


Author(s):  
XIAOWU CHEN ◽  
BIN ZHOU ◽  
FANG XU ◽  
QINPING ZHAO

In this paper, we present a novel automatic image completion solution in a greedy manner inspired by a primal sketch representation model. Firstly, an image is divided into structure (sketchable) components and texture (non-sketchable) components, and the missing structures, such as curves and corners, are predicted by tensor voting. Secondly, the textures along structural sketches are synthesized with the sampled patches of some known structure components. Then, using the texture completion priorities decided by the confidence term, data term and distance term, the similar image patches of some known texture components are found by selecting a point with the maximum priority on the boundary of hole region. Finally, these image patches inpaint the missing textures of hole region seamlessly through graph cuts. The characteristics of this solution include: (1) introducing the primal sketch representation model to guide completion for visual consistency; (2) achieving fully automatic completion. The experiments on natural images illustrate satisfying image completion results.


Sign in / Sign up

Export Citation Format

Share Document