scholarly journals Disentangled Variational Representation for Heterogeneous Face Recognition

Author(s):  
Xiang Wu ◽  
Huaibo Huang ◽  
Vishal M. Patel ◽  
Ran He ◽  
Zhenan Sun

Visible (VIS) to near infrared (NIR) face matching is a challenging problem due to the significant domain discrepancy between the domains and a lack of sufficient data for training cross-modal matching algorithms. Existing approaches attempt to tackle this problem by either synthesizing visible faces from NIR faces, extracting domain-invariant features from these modalities, or projecting heterogeneous data onto a common latent space for cross-modal matching. In this paper, we take a different approach in which we make use of the Disentangled Variational Representation (DVR) for crossmodal matching. First, we model a face representation with an intrinsic identity information and its within-person variations. By exploring the disentangled latent variable space, a variational lower bound is employed to optimize the approximate posterior for NIR and VIS representations. Second, aiming at obtaining more compact and discriminative disentangled latent space, we impose a minimization of the identity information for the same subject and a relaxed correlation alignment constraint between the NIR and VIS modality variations. An alternative optimization scheme is proposed for the disentangled variational representation part and the heterogeneous face recognition network part. The mutual promotion between these two parts effectively reduces the NIR and VIS domain discrepancy and alleviates over-fitting. Extensive experiments on three challenging NIR-VIS heterogeneous face recognition databases demonstrate that the proposed method achieves significant improvements over the state-of-the-art methods.

Author(s):  
Tarek Iraki ◽  
Norbert Link

AbstractVariations of dedicated process conditions (such as workpiece and tool properties) yield different process state evolutions, which are reflected by different time series of the observable quantities (process curves). A novel method is presented, which firstly allows to extract the statistical influence of these conditions on the process curves and its representation via generative models, and secondly represents their influence on the ensemble of curves by transformations of the representation space. A latent variable space is derived from sampled process data, which represents the curves with only few features. Generative models are formed based on conditional propability functions estimated in this space. Furthermore, the influence of conditions on the ensemble of process curves is represented by estimated transformations of the feature space, which map the process curve densities with different conditions on each other. The latent space is formed via Multi-Task-Learning of an auto-encoder and condition-detectors. The latter classifies the latent space representations of the process curves into the considered conditions. The Bayes framework and the Multi-task Learning models are used to obtain the process curve probabilty densities from the latent space densities. The methods are shown to reveal and represent the influence of combinations of workpiece and tool properties on resistance spot welding process curves.


Author(s):  
Junchi Yu ◽  
Jie Cao ◽  
Yi Li ◽  
Xiaofei Jia ◽  
Ran He

To narrow the inherent sensing gap in heterogeneous face recognition (HFR), recent methods have resorted to generative models and explored the ?recognition via generation? framework. Even though, it remains a very challenging task to synthesize photo-realistic visible faces (VIS) from near-infrared (NIR) images especially when paired training data are unavailable. We present an approach to avert the data misalignment problem and faithfully preserve pose, expression and identity information during cross-spectral face hallucination. At the pixel level, we introduce an unsupervised attention mechanism to warping that is jointly learned with the generator to derive pixel-wise correspondence from unaligned data. At the image level, an auxiliary generator is employed to facilitate the learning of mapping from NIR to VIS domain. At the domain level, we first apply the mutual information constraint to explicitly measure the correlation between domains and thus benefit synthesis. Extensive experiments on three heterogeneous face datasets demonstrate that our approach not only outperforms current state-of-the-art HFR methods but also produce visually appealing results at a high resolution.


2021 ◽  
Vol 11 (3) ◽  
pp. 987
Author(s):  
Pengcheng Zhao ◽  
Fuping Zhang ◽  
Jianming Wei ◽  
Yingbo Zhou ◽  
Xiao Wei

Heterogeneous face recognition (HFR) has aroused significant interest in recent years, with some challenging tasks such as misalignment problems and limited HFR data. Misalignment occurs among different modalities’ images mainly because of misaligned semantics. Although recent methods have attempted to settle the low-shot problem, they suffer from the misalignment problem between paired near infrared (NIR) and visible (VIS) images. Misalignment can bring performance degradation to most image-to-image translation networks. In this work, we propose a self-aligned dual generation (SADG) architecture for generating semantics-aligned pairwise NIR-VIS images with the same identity, but without the additional guidance of external information learning. Specifically, we propose a self-aligned generator to align the data distributions between two modalities. Then, we present a multiscale patch discriminator to get high quality images. Furthermore, we raise the mean landmark distance (MLD) to test the alignment performance between NIR and VIS images with the same identity. Extensive experiments and an ablation study of SADG on three public datasets show significant alignment performance and recognition results. Specifically, the Rank1 accuracy achieved was close to 99.9% for the CASIA NIR-VIS 2.0, Oulu-CASIA NIR-VIS and BUAA VIS-NIR datasets, respectively.


2021 ◽  
Vol 13 (2) ◽  
pp. 51
Author(s):  
Lili Sun ◽  
Xueyan Liu ◽  
Min Zhao ◽  
Bo Yang

Variational graph autoencoder, which can encode structural information and attribute information in the graph into low-dimensional representations, has become a powerful method for studying graph-structured data. However, most existing methods based on variational (graph) autoencoder assume that the prior of latent variables obeys the standard normal distribution which encourages all nodes to gather around 0. That leads to the inability to fully utilize the latent space. Therefore, it becomes a challenge on how to choose a suitable prior without incorporating additional expert knowledge. Given this, we propose a novel noninformative prior-based interpretable variational graph autoencoder (NPIVGAE). Specifically, we exploit the noninformative prior as the prior distribution of latent variables. This prior enables the posterior distribution parameters to be almost learned from the sample data. Furthermore, we regard each dimension of a latent variable as the probability that the node belongs to each block, thereby improving the interpretability of the model. The correlation within and between blocks is described by a block–block correlation matrix. We compare our model with state-of-the-art methods on three real datasets, verifying its effectiveness and superiority.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Yunjun Nam ◽  
Takayuki Sato ◽  
Go Uchida ◽  
Ekaterina Malakhova ◽  
Shimon Ullman ◽  
...  

AbstractHumans recognize individual faces regardless of variation in the facial view. The view-tuned face neurons in the inferior temporal (IT) cortex are regarded as the neural substrate for view-invariant face recognition. This study approximated visual features encoded by these neurons as combinations of local orientations and colors, originated from natural image fragments. The resultant features reproduced the preference of these neurons to particular facial views. We also found that faces of one identity were separable from the faces of other identities in a space where each axis represented one of these features. These results suggested that view-invariant face representation was established by combining view sensitive visual features. The face representation with these features suggested that, with respect to view-invariant face representation, the seemingly complex and deeply layered ventral visual pathway can be approximated via a shallow network, comprised of layers of low-level processing for local orientations and colors (V1/V2-level) and the layers which detect particular sets of low-level elements derived from natural image fragments (IT-level).


2016 ◽  
Vol 7 (3) ◽  
pp. 1-23 ◽  
Author(s):  
Zhifeng Li ◽  
Dihong Gong ◽  
Qiang Li ◽  
Dacheng Tao ◽  
Xuelong Li

Sign in / Sign up

Export Citation Format

Share Document