scholarly journals Joint 3D facial shape reconstruction and texture completion from a single image

2021 ◽  
Vol 8 (2) ◽  
pp. 239-256
Author(s):  
Xiaoxing Zeng ◽  
Zhelun Wu ◽  
Xiaojiang Peng ◽  
Yu Qiao

AbstractRecent years have witnessed significant progress in image-based 3D face reconstruction using deep convolutional neural networks. However, current reconstruction methods often perform improperly in self-occluded regions and can lead to inaccurate correspondences between a 2D input image and a 3D face template, hindering use in real applications. To address these problems, we propose a deep shape reconstruction and texture completion network, SRTC-Net, which jointly reconstructs 3D facial geometry and completes texture with correspondences from a single input face image. In SRTC-Net, we leverage the geometric cues from completed 3D texture to reconstruct detailed structures of 3D shapes. The SRTC-Net pipeline has three stages. The first introduces a correspondence network to identify pixel-wise correspondence between the input 2D image and a 3D template model, and transfers the input 2D image to a U-V texture map. Then we complete the invisible and occluded areas in the U-V texture map using an inpainting network. To get the 3D facial geometries, we predict coarse shape (U-V position maps) from the segmented face from the correspondence network using a shape network, and then refine the 3D coarse shape by regressing the U-V displacement map from the completed U-V texture map in a pixel-to-pixel way. We examine our methods on 3D reconstruction tasks as well as face frontalization and pose invariant face recognition tasks, using both in-the-lab datasets (MICC, MultiPIE) and in-the-wild datasets (CFP). The qualitative and quantitative results demonstrate the effectiveness of our methods on inferring 3D facial geometry and complete texture; they outperform or are comparable to the state-of-the-art.

2021 ◽  
Author(s):  
Huiwen Luo ◽  
Koki Nagano ◽  
Han-Wei Kung ◽  
Mclean Goldwhite

We introduce a highly robust GAN-based framework for digitizing a normalized 3D avatar of a person from a single unconstrained photo. While the input image can be of a smiling person or taken in extreme lighting conditions, our method can reliably produce a high-quality textured model of a person's face in neutral expression and skin textures under diffuse lighting condition. Cutting-edge 3D face reconstruction methods use non-linear morphable face models combined with GAN-based decoders to capture the likeness and details of a person but fail to produce neutral head models with unshaded albedo textures which is critical for creating relightable and animation-friendly avatars for integration in virtual environments. The key challenges for existing methods to work is the lack of training and ground truth data containing normalized 3D faces. We propose a two-stage approach to address this problem. First, we adopt a highly robust normalized 3D face generator by embedding a non-linear morphable face model into a StyleGAN2 network. This allows us to generate detailed but normalized facial assets. This inference is then followed by a perceptual refinement step that uses the generated assets as regularization to cope with the limited available training samples of normalized faces. We further introduce a Normalized Face Dataset, which consists of a combination photogrammetry scans, carefully selected photographs, and generated fake people with neutral expressions in diffuse lighting conditions. While our prepared dataset contains two orders of magnitude less subjects than cutting edge GAN-based 3D facial reconstruction methods, we show that it is possible to produce high-quality normalized face models for very challenging unconstrained input images, and demonstrate superior performance to the current state-of-the-art.


2020 ◽  
pp. 1-1
Author(s):  
Xiaoguang Tu ◽  
Jian Zhao ◽  
Mei Xie ◽  
Zihang Jiang ◽  
Akshaya Balamurugan ◽  
...  

Sensors ◽  
2021 ◽  
Vol 21 (5) ◽  
pp. 1841
Author(s):  
Leyuan Liu ◽  
Zeran Ke ◽  
Jiao Huo ◽  
Jingying Chen

Mainstream methods treat head pose estimation as a supervised classification/regression problem, whose performance heavily depends on the accuracy of ground-truth labels of training data. However, it is rather difficult to obtain accurate head pose labels in practice, due to the lack of effective equipment and reasonable approaches for head pose labeling. In this paper, we propose a method which does not need to be trained with head pose labels, but matches the keypoints between a reconstructed 3D face model and the 2D input image, for head pose estimation. The proposed head pose estimation method consists of two components: the 3D face reconstruction and the 3D–2D matching keypoints. At the 3D face reconstruction phase, a personalized 3D face model is reconstructed from the input head image using convolutional neural networks, which are jointly optimized by an asymmetric Euclidean loss and a keypoint loss. At the 3D–2D keypoints matching phase, an iterative optimization algorithm is proposed to match the keypoints between the reconstructed 3D face model and the 2D input image efficiently under the constraint of perspective transformation. The proposed method is extensively evaluated on five widely used head pose estimation datasets, including Pointing’04, BIWI, AFLW2000, Multi-PIE, and Pandora. The experimental results demonstrate that the proposed method achieves excellent cross-dataset performance and surpasses most of the existing state-of-the-art approaches, with average MAEs of 4.78∘ on Pointing’04, 6.83∘ on BIWI, 7.05∘ on AFLW2000, 5.47∘ on Multi-PIE, and 5.06∘ on Pandora, although the model of the proposed method is not trained on any of these five datasets.


2009 ◽  
Vol 09 (02) ◽  
pp. 217-250 ◽  
Author(s):  
GEORGIOS STYLIANOU ◽  
ANDREAS LANITIS

The use of 3D data in face image processing applications has received considerable attention during the last few years. A major issue for the implementation of 3D face processing systems is the accurate and real time acquisition of 3D faces using low cost equipment. In this paper we provide a survey of 3D reconstruction methods used for generating the 3D appearance of a face using either a single or multiple 2D images captured with ordinary equipment such as digital cameras and camcorders. In this context we discuss various issues pertaining to the general problem of 3D face reconstruction such as the existence of suitable 3D face databases, correspondence of 3D faces, feature detection, deformable 3D models and typical assumptions used during the reconstruction process. Different approaches to the problem of 3D reconstruction are presented and for each category the most important advantages and disadvantages are outlined. In particular we describe example-based methods, stereo methods, video-based methods and silhouette-based methods. The issue of performance evaluation of 3D face reconstruction algorithms, the state of the art and future trends are also discussed.


Author(s):  
Mehdi Bahri ◽  
Eimear O’ Sullivan ◽  
Shunwang Gong ◽  
Feng Liu ◽  
Xiaoming Liu ◽  
...  

AbstractStandard registration algorithms need to be independently applied to each surface to register, following careful pre-processing and hand-tuning. Recently, learning-based approaches have emerged that reduce the registration of new scans to running inference with a previously-trained model. The potential benefits are multifold: inference is typically orders of magnitude faster than solving a new instance of a difficult optimization problem, deep learning models can be made robust to noise and corruption, and the trained model may be re-used for other tasks, e.g. through transfer learning. In this paper, we cast the registration task as a surface-to-surface translation problem, and design a model to reliably capture the latent geometric information directly from raw 3D face scans. We introduce Shape-My-Face (SMF), a powerful encoder-decoder architecture based on an improved point cloud encoder, a novel visual attention mechanism, graph convolutional decoders with skip connections, and a specialized mouth model that we smoothly integrate with the mesh convolutions. Compared to the previous state-of-the-art learning algorithms for non-rigid registration of face scans, SMF only requires the raw data to be rigidly aligned (with scaling) with a pre-defined face template. Additionally, our model provides topologically-sound meshes with minimal supervision, offers faster training time, has orders of magnitude fewer trainable parameters, is more robust to noise, and can generalize to previously unseen datasets. We extensively evaluate the quality of our registrations on diverse data. We demonstrate the robustness and generalizability of our model with in-the-wild face scans across different modalities, sensor types, and resolutions. Finally, we show that, by learning to register scans, SMF produces a hybrid linear and non-linear morphable model. Manipulation of the latent space of SMF allows for shape generation, and morphing applications such as expression transfer in-the-wild. We train SMF on a dataset of human faces comprising 9 large-scale databases on commodity hardware.


Sign in / Sign up

Export Citation Format

Share Document