CNN Customizations With Transfer Learning for Face Recognition Task

Author(s):  
Chantana Chantrapornchai ◽  
Samrid Duangkaew

Several kinds of pretrained convolutional neural networks (CNN) exist nowadays. Utilizing these networks with the new classification task requires the retraining with new data sets. With the small embedded device, large network cannot be implemented. The authors study the use of pretrained models and customizing them towards accuracy and size against face recognition tasks. The results show 1) the performance of existing pretrained networks (e.g., AlexNet, GoogLeNet, CaffeNet, SqueezeNet), as well as size, and 2) demonstrate the layers customization towards the model size and accuracy. The studied results show that among the various networks with different data sets, SqueezeNet can achieve the same accuracy (0.99) as others with small size (up to 25 times smaller). Secondly, the two customizations with layer skipping are presented. The experiments show the example of SqueezeNet layer customizing, reducing the network size while keeping the accuracy (i.e., reducing the size by 7% with the slower convergence time). The experiments are measured based on Caffe 0.15.14.

2018 ◽  
Vol 23 (2) ◽  
pp. 141-149 ◽  
Author(s):  
Vadim Romanuke

Abstract A complex classification task as scene recognition is considered in the present research. Scene recognition tasks are successfully solved by the paradigm of transfer learning from pretrained convolutional neural networks, but a problem is that the eventual size of the network is huge despite a common scene recognition task has up to a few tens of scene categories. Thus, the goal is to ascertain possibility of a size reduction. The modelling recognition task is a small dataset of 4485 grayscale images broken into 15 image categories. The pretrained network is AlexNet dealing with much simpler image categories whose number is 1000, though. This network has two fully connected layers, which can be potentially reduced or deleted. A regular transfer learning network occupies about 202.6 MB performing at up to 92 % accuracy rate for the scene recognition. It is revealed that deleting the layers is not reasonable. The network size is reduced by setting a fewer number of filters in the 17th and 20th layers of the AlexNet-based networks using a dichotomy principle or similar. The best truncated network with 384 and 192 filters in those layers performs at 93.3 % accuracy rate, and its size is 21.63 MB.


2021 ◽  
Vol 9 ◽  
Author(s):  
Hui Liu ◽  
Zi-Hua Mo ◽  
Hang Yang ◽  
Zheng-Fu Zhang ◽  
Dian Hong ◽  
...  

Background: Williams-Beuren syndrome (WBS) is a rare genetic syndrome with a characteristic “elfin” facial gestalt. The “elfin” facial characteristics include a broad forehead, periorbital puffiness, flat nasal bridge, short upturned nose, wide mouth, thick lips, and pointed chin. Recently, deep convolutional neural networks (CNNs) have been successfully applied to facial recognition for diagnosing genetic syndromes. However, there is little research on WBS facial recognition using deep CNNs.Objective: The purpose of this study was to construct an automatic facial recognition model for WBS diagnosis based on deep CNNs.Methods: The study enrolled 104 WBS children, 91 cases with other genetic syndromes, and 145 healthy children. The photo dataset used only one frontal facial photo from each participant. Five face recognition frameworks for WBS were constructed by adopting the VGG-16, VGG-19, ResNet-18, ResNet-34, and MobileNet-V2 architectures, respectively. ImageNet transfer learning was used to avoid over-fitting. The classification performance of the facial recognition models was assessed by five-fold cross validation, and comparison with human experts was performed.Results: The five face recognition frameworks for WBS were constructed. The VGG-19 model achieved the best performance. The accuracy, precision, recall, F1 score, and area under curve (AUC) of the VGG-19 model were 92.7 ± 1.3%, 94.0 ± 5.6%, 81.7 ± 3.6%, 87.2 ± 2.0%, and 89.6 ± 1.3%, respectively. The highest accuracy, precision, recall, F1 score, and AUC of human experts were 82.1, 65.9, 85.6, 74.5, and 83.0%, respectively. The AUCs of each human expert were inferior to the AUCs of the VGG-16 (88.6 ± 3.5%), VGG-19 (89.6 ± 1.3%), ResNet-18 (83.6 ± 8.2%), and ResNet-34 (86.3 ± 4.9%) models.Conclusions: This study highlighted the possibility of using deep CNNs for diagnosing WBS in clinical practice. The facial recognition framework based on VGG-19 could play a prominent role in WBS diagnosis. Transfer learning technology can help to construct facial recognition models of genetic syndromes with small-scale datasets.


Information ◽  
2021 ◽  
Vol 12 (5) ◽  
pp. 191
Author(s):  
Wenting Liu ◽  
Li Zhou ◽  
Jie Chen

Face recognition algorithms based on deep learning methods have become increasingly popular. Most of these are based on highly precise but complex convolutional neural networks (CNNs), which require significant computing resources and storage, and are difficult to deploy on mobile devices or embedded terminals. In this paper, we propose several methods to improve the algorithms for face recognition based on a lightweight CNN, which is further optimized in terms of the network architecture and training pattern on the basis of MobileFaceNet. Regarding the network architecture, we introduce the Squeeze-and-Excitation (SE) block and propose three improved structures via a channel attention mechanism—the depthwise SE module, the depthwise separable SE module, and the linear SE module—which are able to learn the correlation of information between channels and assign them different weights. In addition, a novel training method for the face recognition task combined with an additive angular margin loss function is proposed that performs the compression and knowledge transfer of the deep network for face recognition. Finally, we obtained high-precision and lightweight face recognition models with fewer parameters and calculations that are more suitable for applications. Through extensive experiments and analysis, we demonstrate the effectiveness of the proposed methods.


Author(s):  
Ilya Sochenkov ◽  
Anastasiia S. Sochenkova ◽  
Artyom Makovetskii ◽  
Andrey Melnikov ◽  
Alexander Vokhmintsev

2021 ◽  
Vol 2 (3) ◽  
Author(s):  
Gustaf Halvardsson ◽  
Johanna Peterson ◽  
César Soto-Valero ◽  
Benoit Baudry

AbstractThe automatic interpretation of sign languages is a challenging task, as it requires the usage of high-level vision and high-level motion processing systems for providing accurate image perception. In this paper, we use Convolutional Neural Networks (CNNs) and transfer learning to make computers able to interpret signs of the Swedish Sign Language (SSL) hand alphabet. Our model consists of the implementation of a pre-trained InceptionV3 network, and the usage of the mini-batch gradient descent optimization algorithm. We rely on transfer learning during the pre-training of the model and its data. The final accuracy of the model, based on 8 study subjects and 9400 images, is 85%. Our results indicate that the usage of CNNs is a promising approach to interpret sign languages, and transfer learning can be used to achieve high testing accuracy despite using a small training dataset. Furthermore, we describe the implementation details of our model to interpret signs as a user-friendly web application.


Sign in / Sign up

Export Citation Format

Share Document