Joint 3D facial shape reconstruction and texture completion from a single image

Xiaoxing Zeng; Zhelun Wu; Xiaojiang Peng; Yu Qiao

doi:10.1007/s41095-021-0238-4

Joint 3D facial shape reconstruction and texture completion from a single image

Computational Visual Media ◽

10.1007/s41095-021-0238-4 ◽

2021 ◽

Vol 8 (2) ◽

pp. 239-256

Author(s):

Xiaoxing Zeng ◽

Zhelun Wu ◽

Xiaojiang Peng ◽

Yu Qiao

Keyword(s):

Shape Reconstruction ◽

Input Image ◽

Deep Convolutional Neural Networks ◽

3D Face Reconstruction ◽

3D Face ◽

3D Texture ◽

Reconstruction Methods ◽

Facial Shape ◽

In The Wild ◽

Geometric Cues

AbstractRecent years have witnessed significant progress in image-based 3D face reconstruction using deep convolutional neural networks. However, current reconstruction methods often perform improperly in self-occluded regions and can lead to inaccurate correspondences between a 2D input image and a 3D face template, hindering use in real applications. To address these problems, we propose a deep shape reconstruction and texture completion network, SRTC-Net, which jointly reconstructs 3D facial geometry and completes texture with correspondences from a single input face image. In SRTC-Net, we leverage the geometric cues from completed 3D texture to reconstruct detailed structures of 3D shapes. The SRTC-Net pipeline has three stages. The first introduces a correspondence network to identify pixel-wise correspondence between the input 2D image and a 3D template model, and transfers the input 2D image to a U-V texture map. Then we complete the invisible and occluded areas in the U-V texture map using an inpainting network. To get the 3D facial geometries, we predict coarse shape (U-V position maps) from the segmented face from the correspondence network using a shape network, and then refine the 3D coarse shape by regressing the U-V displacement map from the completed U-V texture map in a pixel-to-pixel way. We examine our methods on 3D reconstruction tasks as well as face frontalization and pose invariant face recognition tasks, using both in-the-lab datasets (MICC, MultiPIE) and in-the-wild datasets (CFP). The qualitative and quantitative results demonstrate the effectiveness of our methods on inferring 3D facial geometry and complete texture; they outperform or are comparable to the state-of-the-art.

Download Full-text

Multi-View 3D Face Reconstruction in the Wild Using Siamese Networks

2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW) ◽

10.1109/iccvw.2019.00373 ◽

2019 ◽

Author(s):

Eduard Ramon ◽

Janna Escur ◽

Xavier Giro-i-Nieto

Keyword(s):

3D Face Reconstruction ◽

3D Face ◽

Face Reconstruction ◽

In The Wild ◽

Siamese Networks

Download Full-text

Towards High-Fidelity 3D Face Reconstruction From In-the-Wild Images Using Graph Convolutional Networks

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) ◽

10.1109/cvpr42600.2020.00593 ◽

2020 ◽

Cited By ~ 1

Author(s):

Jiangke Lin ◽

Yi Yuan ◽

Tianjia Shao ◽

Kun Zhou

Keyword(s):

High Fidelity ◽

3D Face Reconstruction ◽

3D Face ◽

Convolutional Networks ◽

Face Reconstruction ◽

In The Wild

Download Full-text

Normalized Avatar Synthesis Using StyleGAN and Perceptual Refinement

10.31219/osf.io/72c4m ◽

2021 ◽

Author(s):

Huiwen Luo ◽

Koki Nagano ◽

Han-Wei Kung ◽

Mclean Goldwhite

Keyword(s):

Cutting Edge ◽

Superior Performance ◽

3D Face Reconstruction ◽

High Quality ◽

Ground Truth Data ◽

Lighting Condition ◽

3D Face ◽

Lighting Conditions ◽

Reconstruction Methods ◽

Non Linear

We introduce a highly robust GAN-based framework for digitizing a normalized 3D avatar of a person from a single unconstrained photo. While the input image can be of a smiling person or taken in extreme lighting conditions, our method can reliably produce a high-quality textured model of a person's face in neutral expression and skin textures under diffuse lighting condition. Cutting-edge 3D face reconstruction methods use non-linear morphable face models combined with GAN-based decoders to capture the likeness and details of a person but fail to produce neutral head models with unshaded albedo textures which is critical for creating relightable and animation-friendly avatars for integration in virtual environments. The key challenges for existing methods to work is the lack of training and ground truth data containing normalized 3D faces. We propose a two-stage approach to address this problem. First, we adopt a highly robust normalized 3D face generator by embedding a non-linear morphable face model into a StyleGAN2 network. This allows us to generate detailed but normalized facial assets. This inference is then followed by a perceptual refinement step that uses the generated assets as regularization to cope with the limited available training samples of normalized faces. We further introduce a Normalized Face Dataset, which consists of a combination photogrammetry scans, carefully selected photographs, and generated fake people with neutral expressions in diffuse lighting conditions. While our prepared dataset contains two orders of magnitude less subjects than cutting edge GAN-based 3D facial reconstruction methods, we show that it is possible to produce high-quality normalized face models for very challenging unconstrained input images, and demonstrate superior performance to the current state-of-the-art.

Download Full-text

3D Face Reconstruction from A Single Image Assisted by 2D Face Images in the Wild

IEEE Transactions on Multimedia ◽

10.1109/tmm.2020.2993962 ◽

2020 ◽

pp. 1-1

Author(s):

Xiaoguang Tu ◽

Jian Zhao ◽

Mei Xie ◽

Zihang Jiang ◽

Akshaya Balamurugan ◽

...

Keyword(s):

Single Image ◽

3D Face Reconstruction ◽

3D Face ◽

Face Reconstruction ◽

Face Images ◽

In The Wild

Download Full-text

Learning Free-Form Deformation for 3D Face Reconstruction from In-The-Wild Images

10.1109/smc52423.2021.9659124 ◽

2021 ◽

Author(s):

Harim Jung ◽

Myeong-Seok Oh ◽

Seong-Whan Lee

Keyword(s):

Free Form ◽

3D Face Reconstruction ◽

3D Face ◽

Face Reconstruction ◽

Free Form Deformation ◽

In The Wild

Download Full-text

Head Pose Estimation through Keypoints Matching between Reconstructed 3D Face Model and 2D Image

Sensors ◽

10.3390/s21051841 ◽

2021 ◽

Vol 21 (5) ◽

pp. 1841

Author(s):

Leyuan Liu ◽

Zeran Ke ◽

Jiao Huo ◽

Jingying Chen

Keyword(s):

Pose Estimation ◽

Input Image ◽

Head Pose Estimation ◽

Head Pose ◽

3D Face Reconstruction ◽

Regression Problem ◽

Face Model ◽

3D Face ◽

Face Reconstruction ◽

Keypoints Matching

Mainstream methods treat head pose estimation as a supervised classification/regression problem, whose performance heavily depends on the accuracy of ground-truth labels of training data. However, it is rather difficult to obtain accurate head pose labels in practice, due to the lack of effective equipment and reasonable approaches for head pose labeling. In this paper, we propose a method which does not need to be trained with head pose labels, but matches the keypoints between a reconstructed 3D face model and the 2D input image, for head pose estimation. The proposed head pose estimation method consists of two components: the 3D face reconstruction and the 3D–2D matching keypoints. At the 3D face reconstruction phase, a personalized 3D face model is reconstructed from the input head image using convolutional neural networks, which are jointly optimized by an asymmetric Euclidean loss and a keypoint loss. At the 3D–2D keypoints matching phase, an iterative optimization algorithm is proposed to match the keypoints between the reconstructed 3D face model and the 2D input image efficiently under the constraint of perspective transformation. The proposed method is extensively evaluated on five widely used head pose estimation datasets, including Pointing’04, BIWI, AFLW2000, Multi-PIE, and Pandora. The experimental results demonstrate that the proposed method achieves excellent cross-dataset performance and surpasses most of the existing state-of-the-art approaches, with average MAEs of 4.78∘ on Pointing’04, 6.83∘ on BIWI, 7.05∘ on AFLW2000, 5.47∘ on Multi-PIE, and 5.06∘ on Pandora, although the model of the proposed method is not trained on any of these five datasets.

Download Full-text

Low-Frequency Guided Self-Supervised Learning For High-Fidelity 3d Face Reconstruction In The Wild

2020 IEEE International Conference on Multimedia and Expo (ICME) ◽

10.1109/icme46284.2020.9102812 ◽

2020 ◽

Author(s):

Pengrui Wang ◽

Chunze Lin ◽

Bo Xu ◽

Wujun Che ◽

Quan Wang

Keyword(s):

Supervised Learning ◽

Low Frequency ◽

High Fidelity ◽

3D Face Reconstruction ◽

3D Face ◽

Face Reconstruction ◽

In The Wild

Download Full-text

A Brief Review of 3D Face Reconstruction Methods for Face-Related Product Design

Advances in Intelligent Systems and Computing - Convergence of Ergonomics and Design ◽

10.1007/978-3-030-63335-6_37 ◽

2021 ◽

pp. 357-366

Author(s):

Jie Zhang ◽

Kangneng Zhou ◽

Yan Luximon

Keyword(s):

Product Design ◽

3D Face Reconstruction ◽

3D Face ◽

Face Reconstruction ◽

IMAGE BASED 3D FACE RECONSTRUCTION: A SURVEY

International Journal of Image and Graphics ◽

10.1142/s0219467809003411 ◽

2009 ◽

Vol 09 (02) ◽

pp. 217-250 ◽

Cited By ~ 15

Author(s):

GEORGIOS STYLIANOU ◽

ANDREAS LANITIS

Keyword(s):

3D Reconstruction ◽

Feature Detection ◽

Low Cost ◽

Reconstruction Algorithms ◽

3D Face Reconstruction ◽

Reconstruction Process ◽

3D Face ◽

Advantages And Disadvantages ◽

Face Reconstruction ◽

Reconstruction Methods

The use of 3D data in face image processing applications has received considerable attention during the last few years. A major issue for the implementation of 3D face processing systems is the accurate and real time acquisition of 3D faces using low cost equipment. In this paper we provide a survey of 3D reconstruction methods used for generating the 3D appearance of a face using either a single or multiple 2D images captured with ordinary equipment such as digital cameras and camcorders. In this context we discuss various issues pertaining to the general problem of 3D face reconstruction such as the existence of suitable 3D face databases, correspondence of 3D faces, feature detection, deformable 3D models and typical assumptions used during the reconstruction process. Different approaches to the problem of 3D reconstruction are presented and for each category the most important advantages and disadvantages are outlined. In particular we describe example-based methods, stereo methods, video-based methods and silhouette-based methods. The issue of performance evaluation of 3D face reconstruction algorithms, the state of the art and future trends are also discussed.

Download Full-text

Shape My Face: Registering 3D Face Scans by Surface-to-Surface Translation

International Journal of Computer Vision ◽

10.1007/s11263-021-01494-4 ◽

2021 ◽

Author(s):

Mehdi Bahri ◽

Eimear O’ Sullivan ◽

Shunwang Gong ◽

Feng Liu ◽

Xiaoming Liu ◽

...

Keyword(s):

Large Scale ◽

Training Time ◽

3D Face ◽

In The Wild ◽

Previous State ◽

Surface Translation ◽

Visual Attention Mechanism ◽

Diverse Data ◽

Human Faces ◽

Robust To Noise

AbstractStandard registration algorithms need to be independently applied to each surface to register, following careful pre-processing and hand-tuning. Recently, learning-based approaches have emerged that reduce the registration of new scans to running inference with a previously-trained model. The potential benefits are multifold: inference is typically orders of magnitude faster than solving a new instance of a difficult optimization problem, deep learning models can be made robust to noise and corruption, and the trained model may be re-used for other tasks, e.g. through transfer learning. In this paper, we cast the registration task as a surface-to-surface translation problem, and design a model to reliably capture the latent geometric information directly from raw 3D face scans. We introduce Shape-My-Face (SMF), a powerful encoder-decoder architecture based on an improved point cloud encoder, a novel visual attention mechanism, graph convolutional decoders with skip connections, and a specialized mouth model that we smoothly integrate with the mesh convolutions. Compared to the previous state-of-the-art learning algorithms for non-rigid registration of face scans, SMF only requires the raw data to be rigidly aligned (with scaling) with a pre-defined face template. Additionally, our model provides topologically-sound meshes with minimal supervision, offers faster training time, has orders of magnitude fewer trainable parameters, is more robust to noise, and can generalize to previously unseen datasets. We extensively evaluate the quality of our registrations on diverse data. We demonstrate the robustness and generalizability of our model with in-the-wild face scans across different modalities, sensor types, and resolutions. Finally, we show that, by learning to register scans, SMF produces a hybrid linear and non-linear morphable model. Manipulation of the latent space of SMF allows for shape generation, and morphing applications such as expression transfer in-the-wild. We train SMF on a dataset of human faces comprising 9 large-scale databases on commodity hardware.

Download Full-text