scholarly journals Face Verification with Multi-Task and Multi-Scale Features Fusion

Author(s):  
Xiaojun Lu ◽  
Yue Yang ◽  
Weilin Zhang ◽  
Qi Wang ◽  
Yang Wang

Face verification for unrestricted faces in the wild is a challenging task. This paper proposes a method based on two deep convolutional neural networks(CNN) for face verification. In this work, we explore to use identification signal to supervise one CNN and the combination of semi-verification and identification to train the other one. In order to estimate semi-verification loss at a low computation cost, a circle, which is composed of all faces, is used for selecting face pairs from pairwise samples. In the process of face normalization, we propose to use different landmarks of faces to solve the problems caused by poses. And the final face representation is formed by the concatenating feature of each deep CNN after PCA reduction. What's more, each feature is a combination of multi-scale representations through making use of auxiliary classifiers. For the final verification, we only adopt the face representation of one region and one resolution of a face jointing Joint Bayesian classifier. Experiments show that our method can extract effective face representation with a small training dataset and our algorithm achieves 99.71% verification accuracy on LFW dataset.

Author(s):  
Fadhlan Hafizhelmi Kamaru Zaman ◽  
Juliana Johari ◽  
Ahmad Ihsan Mohd Yassin

<span>Face verification focuses on the task of determining whether two face images belong to the same identity or not. For unrestricted faces in the wild, this is a very challenging task. Besides significant degradation due to images that have large variations in pose, illumination, expression, aging, and occlusions, it also suffers from large-scale ever-expanding data needed to perform one-to-many recognition task. In this paper, we propose a face verification method by learning face similarities using a Convolutional Neural Networks (ConvNet). Instead of extracting features from each face image separately, our ConvNet model jointly extracts relational visual features from two face images in comparison. We train four hybrid ConvNet models to learn how to distinguish similarities between the face pair of four different face portions and join them at top-layer classifier level. We use binary-class classifier at top-layer level to identify the similarity of face pairs which includes a conventional Multi-Layer Perceptron (MLP), Support Vector Machines (SVM), Native Bayes, and another ConvNet. There are 3 face pairing configurations discussed in this paper. Results from experiments using Labeled face in the Wild (LFW) and CelebA datasets indicate that our hybrid ConvNet increases the face verification accuracy by as much as 27% when compared to individual ConvNet approach. We also found that Lateral face pair configuration yields the best LFW test accuracy on a very strict test protocol without any face alignment using MLP as top-layer classifier at 87.89%, which on-par with the state-of-the-arts. We showed that our approach is more flexible in terms of inferencing the learned models on out-of-sample data by testing LFW and CelebA on either model.</span>


Author(s):  
Yu-Sheng Lin ◽  
Zhe-Yu Liu ◽  
Yu-An Chen ◽  
Yu-Siang Wang ◽  
Ya-Liang Chang ◽  
...  

We study the XAI (explainable AI) on the face recognition task, particularly the face verification. Face verification has become a crucial task in recent days and it has been deployed to plenty of applications, such as access control, surveillance, and automatic personal log-on for mobile devices. With the increasing amount of data, deep convolutional neural networks can achieve very high accuracy for the face verification task. Beyond exceptional performances, deep face verification models need more interpretability so that we can trust the results they generate. In this article, we propose a novel similarity metric, called explainable cosine ( xCos ), that comes with a learnable module that can be plugged into most of the verification models to provide meaningful explanations. With the help of xCos , we can see which parts of the two input faces are similar, where the model pays its attention to, and how the local similarities are weighted to form the output xCos score. We demonstrate the effectiveness of our proposed method on LFW and various competitive benchmarks, not only resulting in providing novel and desirable model interpretability for face verification but also ensuring the accuracy as plugging into existing face recognition models.


Author(s):  
Jonathan Koss ◽  
Anthony Jiang ◽  
Patrick Sweeney ◽  
Nelson Rios ◽  
Aaron Dollar

There is much excitement across a broad range of biological disciplines over the prospect of using deep learning and similar modern statistical methods to label research data. The extensive time, effort, and cost required for humans to label a dataset drastically limits the type and amount of data that can be reasonably utilized, and is currently a major bottleneck to the extensive application of biological datasets such as specimen imagery, video and audio recordings. While a number of researchers have shown how deep convolutional neural networks (CNN) can be trained to classify image data with 80-90% accuracy, that range of accuracy is still too low for most research applications. Furthermore, applying these classifiers to new, unlabeled data from a dataset other than the one used for training the classifier would likely result in even lower accuracy. As a result, these classifiers have still not generally been applied to unlabeled data—which is where they could be most useful. In this talk, we will present a method for determining a confidence metric on predicted classifications (i.e. "labels") from a deep CNN classifier that can inform a user whether to trust a particular automatic label or to discard it, thereby giving a reasonable and straight-forward method to label a previously unlabeled dataset with high confidence. Essentially, it is an approach that allows an imperfect method of classification to be used in a useful way that can save an enormous amount of time and effort and/or greatly increase the amount of data that can be reasonably utilized. In this work, the training dataset consisted of a set of records of flowering plant species that collectively exhibited a range of reproductive morphologies, represented multiple taxonomic groups, and could be easily scored by humans for reproductive condition by examination of specimen images. The records were labeled as reproductive, budding, flowering and/or fruiting. All of the data and images were obtained from the Consortium of Northeastern Herbaria portal (CNH). There were two unscored datasets that were used to evaluate the classifiers. One dataset contained the same taxa that were in the training dataset and the second dataset contained all remaining flowering plant taxa in the CNH portal database that were not included in the other two datasets. Records of families with flowers that are obscure (i.e., they lack petals &amp; sepals or have vestigial structures) were excluded. To label the reproductive state of the plants, we trained one deep CNN classifier using the XCeption architecture for the binary classification of each state (e.g., budding vs. not budding). This method and architecture was chosen because of its success in similar image-classification tasks. Each of these networks takes an image of a herbarium sheet as input, and outputs a value in the interval [0,1]. In these networks, the output is typically thresholded to generate a binary label, but we found it could also be used to approximate a measure of confidence in the network’s classification. By treating this value as a confidence metric, we are able to input a large unlabeled dataset into the classifier and then trust the labels that were assigned a “high confidence” and leave the remainder unlabeled. After training the network, the performance of the four classifiers (reproductive, budding, flowering, fruiting) achieved 85-90% accuracy compared to expert-labeled data. However, as described above, the real value of these approaches comes from their prospects for labelling previously unlabeled data, thus helping to replace expensive and time-consuming human labor. We then applied our confidence-interval-based approach to a collection of 600k images and were able to label 35-70% of the samples with a chosen confidence threshold of 95%. In other words, we could then use the high-confidence labels and simply not automatically label the remaining unclassifiable samples. The data from these samples could then be labeled manually, or, if appropriate, not labeled at all.


Entropy ◽  
2021 ◽  
Vol 23 (11) ◽  
pp. 1390
Author(s):  
Ruoqi Wei ◽  
Ausif Mahmood

Despite the importance of few-shot learning, the lack of labeled training data in the real world makes it extremely challenging for existing machine learning methods because this limited dataset does not well represent the data variance. In this research, we suggest employing a generative approach using variational autoencoders (VAEs), which can be used specifically to optimize few-shot learning tasks by generating new samples with more intra-class variations on the Labeled Faces in the Wild (LFW) dataset. The purpose of our research is to increase the size of the training dataset using various methods to improve the accuracy and robustness of the few-shot face recognition. Specifically, we employ the VAE generator to increase the size of the training dataset, including the basic and the novel sets while utilizing transfer learning as the backend. Based on extensive experimental research, we analyze various data augmentation methods to observe how each method affects the accuracy of face recognition. The face generation method based on VAEs with perceptual loss can effectively improve the recognition accuracy rate to 96.47% using both the base and the novel sets.


Electronics ◽  
2021 ◽  
Vol 10 (11) ◽  
pp. 1269
Author(s):  
Jiabin Luo ◽  
Wentai Lei ◽  
Feifei Hou ◽  
Chenghao Wang ◽  
Qiang Ren ◽  
...  

Ground-penetrating radar (GPR), as a non-invasive instrument, has been widely used in civil engineering. In GPR B-scan images, there may exist random noise due to the influence of the environment and equipment hardware, which complicates the interpretability of the useful information. Many methods have been proposed to eliminate or suppress the random noise. However, the existing methods have an unsatisfactory denoising effect when the image is severely contaminated by random noise. This paper proposes a multi-scale convolutional autoencoder (MCAE) to denoise GPR data. At the same time, to solve the problem of training dataset insufficiency, we designed the data augmentation strategy, Wasserstein generative adversarial network (WGAN), to increase the training dataset of MCAE. Experimental results conducted on both simulated, generated, and field datasets demonstrated that the proposed scheme has promising performance for image denoising. In terms of three indexes: the peak signal-to-noise ratio (PSNR), the time cost, and the structural similarity index (SSIM), the proposed scheme can achieve better performance of random noise suppression compared with the state-of-the-art competing methods (e.g., CAE, BM3D, WNNM).


2021 ◽  
Vol 13 (7) ◽  
pp. 1243
Author(s):  
Wenxin Yin ◽  
Wenhui Diao ◽  
Peijin Wang ◽  
Xin Gao ◽  
Ya Li ◽  
...  

The detection of Thermal Power Plants (TPPs) is a meaningful task for remote sensing image interpretation. It is a challenging task, because as facility objects TPPs are composed of various distinctive and irregular components. In this paper, we propose a novel end-to-end detection framework for TPPs based on deep convolutional neural networks. Specifically, based on the RetinaNet one-stage detector, a context attention multi-scale feature extraction network is proposed to fuse global spatial attention to strengthen the ability in representing irregular objects. In addition, we design a part-based attention module to adapt to TPPs containing distinctive components. Experiments show that the proposed method outperforms the state-of-the-art methods and can achieve 68.15% mean average precision.


Electronics ◽  
2021 ◽  
Vol 10 (12) ◽  
pp. 1388
Author(s):  
Sk Mahmudul Hassan ◽  
Arnab Kumar Maji ◽  
Michał Jasiński ◽  
Zbigniew Leonowicz ◽  
Elżbieta Jasińska

The timely identification and early prevention of crop diseases are essential for improving production. In this paper, deep convolutional-neural-network (CNN) models are implemented to identify and diagnose diseases in plants from their leaves, since CNNs have achieved impressive results in the field of machine vision. Standard CNN models require a large number of parameters and higher computation cost. In this paper, we replaced standard convolution with depth=separable convolution, which reduces the parameter number and computation cost. The implemented models were trained with an open dataset consisting of 14 different plant species, and 38 different categorical disease classes and healthy plant leaves. To evaluate the performance of the models, different parameters such as batch size, dropout, and different numbers of epochs were incorporated. The implemented models achieved a disease-classification accuracy rates of 98.42%, 99.11%, 97.02%, and 99.56% using InceptionV3, InceptionResNetV2, MobileNetV2, and EfficientNetB0, respectively, which were greater than that of traditional handcrafted-feature-based approaches. In comparison with other deep-learning models, the implemented model achieved better performance in terms of accuracy and it required less training time. Moreover, the MobileNetV2 architecture is compatible with mobile devices using the optimized parameter. The accuracy results in the identification of diseases showed that the deep CNN model is promising and can greatly impact the efficient identification of the diseases, and may have potential in the detection of diseases in real-time agricultural systems.


Cancers ◽  
2021 ◽  
Vol 13 (13) ◽  
pp. 3308
Author(s):  
Won Sang Shim ◽  
Kwangil Yim ◽  
Tae-Jung Kim ◽  
Yeoun Eun Sung ◽  
Gyeongyun Lee ◽  
...  

The prognosis of patients with lung adenocarcinoma (LUAD), especially early-stage LUAD, is dependent on clinicopathological features. However, its predictive utility is limited. In this study, we developed and trained a DeepRePath model based on a deep convolutional neural network (CNN) using multi-scale pathology images to predict the prognosis of patients with early-stage LUAD. DeepRePath was pre-trained with 1067 hematoxylin and eosin-stained whole-slide images of LUAD from the Cancer Genome Atlas. DeepRePath was further trained and validated using two separate CNNs and multi-scale pathology images of 393 resected lung cancer specimens from patients with stage I and II LUAD. Of the 393 patients, 95 patients developed recurrence after surgical resection. The DeepRePath model showed average area under the curve (AUC) scores of 0.77 and 0.76 in cohort I and cohort II (external validation set), respectively. Owing to low performance, DeepRePath cannot be used as an automated tool in a clinical setting. When gradient-weighted class activation mapping was used, DeepRePath indicated the association between atypical nuclei, discohesive tumor cells, and tumor necrosis in pathology images showing recurrence. Despite the limitations associated with a relatively small number of patients, the DeepRePath model based on CNNs with transfer learning could predict recurrence after the curative resection of early-stage LUAD using multi-scale pathology images.


2021 ◽  
pp. 108107
Author(s):  
Qianfen Jiao ◽  
Rui Li ◽  
Wenming Cao ◽  
Jian Zhong ◽  
Si Wu ◽  
...  

Author(s):  
Ojasvi Yadav ◽  
Koustav Ghosal ◽  
Sebastian Lutz ◽  
Aljosa Smolic

AbstractWe address the problem of exposure correction of dark, blurry and noisy images captured in low-light conditions in the wild. Classical image-denoising filters work well in the frequency space but are constrained by several factors such as the correct choice of thresholds and frequency estimates. On the other hand, traditional deep networks are trained end to end in the RGB space by formulating this task as an image translation problem. However, that is done without any explicit constraints on the inherent noise of the dark images and thus produces noisy and blurry outputs. To this end, we propose a DCT/FFT-based multi-scale loss function, which when combined with traditional losses, trains a network to translate the important features for visually pleasing output. Our loss function is end to end differentiable, scale-agnostic and generic; i.e., it can be applied to both RAW and JPEG images in most existing frameworks without additional overhead. Using this loss function, we report significant improvements over the state of the art using quantitative metrics and subjective tests.


Sign in / Sign up

Export Citation Format

Share Document