Learned versus Handcrafted Features for Person Re-identification

Author(s):  
C. Chahla ◽  
H. Snoussi ◽  
F. Abdallah ◽  
F. Dornaika

Person re-identification is one of the indispensable elements for visual surveillance. It assigns consistent labeling for the same person within the field of view of the same camera or even across multiple cameras. While handcrafted feature extraction is certainly one way of approaching this problem, in many cases, these features are becoming more and more complex. Besides, training a deep convolutional neural network (CNN) from scratch is difficult because it requires a large amount of labeled training data and a great deal of expertise to ensure proper convergence. This paper explores the following three main strategies for solving the person re-identification problem: (i) using handcrafted features, (ii) using transfer learning based on a pre-trained deep CNN (trained for object categorization) and (iii) training a deep CNN from scratch. Our experiments consistently demonstrated that: (1) The handcrafted features may still have favorable characteristics and benefits especially in cases where the learning database is not sufficient to train a deep network. (2) A fully trained Siamese CNN outperforms handcrafted approaches and the combination of pre-trained CNN with different re-identification processes. (3) Moreover, our experiments demonstrated that pre-trained features and handcrafted features perform equally well. These experiments have also revealed the most discriminative parts in the human body.

Sensors ◽  
2020 ◽  
Vol 20 (9) ◽  
pp. 2639
Author(s):  
Quan T. Ngo ◽  
Seokhoon Yoon

Facial expression recognition (FER) is a challenging problem in the fields of pattern recognition and computer vision. The recent success of convolutional neural networks (CNNs) in object detection and object segmentation tasks has shown promise in building an automatic deep CNN-based FER model. However, in real-world scenarios, performance degrades dramatically owing to the great diversity of factors unrelated to facial expressions, and due to a lack of training data and an intrinsic imbalance in the existing facial emotion datasets. To tackle these problems, this paper not only applies deep transfer learning techniques, but also proposes a novel loss function called weighted-cluster loss, which is used during the fine-tuning phase. Specifically, the weighted-cluster loss function simultaneously improves the intra-class compactness and the inter-class separability by learning a class center for each emotion class. It also takes the imbalance in a facial expression dataset into account by giving each emotion class a weight based on its proportion of the total number of images. In addition, a recent, successful deep CNN architecture, pre-trained in the task of face identification with the VGGFace2 database from the Visual Geometry Group at Oxford University, is employed and fine-tuned using the proposed loss function to recognize eight basic facial emotions from the AffectNet database of facial expression, valence, and arousal computing in the wild. Experiments on an AffectNet real-world facial dataset demonstrate that our method outperforms the baseline CNN models that use either weighted-softmax loss or center loss.


2019 ◽  
Author(s):  
Raphael Prates ◽  
William Robson Schwartz

This work addresses the person re-identification problem, which consists on matching images of individuals captured by multiple and non-overlapping surveillance cameras. Works from literature tackle this problem proposing robust feature descriptors and matching functions, where the latter is responsible to assign the correct identity for individuals and is the focus of this work. Specifically, we propose two matching methods: the Kernel MBPLS and the Kernel X-CRC. The Kernel MBPLS is a nonlinear regression model that is scalable with respect to the number of cameras and allows the inclusion of additional labelled information (e.g., attributes). Differently, the Kernel X-CRC is a nonlinear and multitask matching function that can be used jointly with subspace learning approaches to boost the matching rates. We present an extensive experimental evaluation of both approaches in four datasets (VIPeR, PRID450S, WARD and Market-1501). Experimental results demonstrate that the Kernel MBPLS and the Kernel X-CRC outperforms approaches from literature. Furthermore, we show that the Kernel X-CRC can be successfuly applied in large-scale and multiple cameras datasets.


2020 ◽  
Vol 39 (3) ◽  
pp. 4405-4418
Author(s):  
Yao-Liang Chung ◽  
Hung-Yuan Chung ◽  
Wei-Feng Tsai

In the present study, we sought to enable instant tracking of the hand region as a region of interest (ROI) within the image range of a webcam, while also identifying specific hand gestures to facilitate the control of home appliances in smart homes or issuing of commands to human-computer interaction fields. To accomplish this objective, we first applied skin color detection and noise processing to remove unnecessary background information from the captured image, before applying background subtraction for detection of the ROI. Then, to prevent background objects or noise from influencing the ROI, we utilized the kernelized correlation filters (KCF) algorithm to implement tracking of the detected ROI. Next, the size of the ROI image was resized to 100×120 and input into a deep convolutional neural network (CNN) to enable the identification of various hand gestures. In the present study, two deep CNN architectures modified from the AlexNet CNN and VGGNet CNN, respectively, were developed by substantially reducing the number of network parameters used and appropriately adjusting internal network configuration settings. Then, the tracking and recognition process described above was continuously repeated to achieve immediate effect, with the execution of the system continuing until the hand is removed from the camera range. The results indicated excellent performance by both of the proposed deep CNN architectures. In particular, the modified version of the VGGNet CNN achieved better performance with a recognition rate of 99.90% for the utilized training data set and a recognition rate of 95.61% for the utilized test data set, which indicate the good feasibility of the system for practical applications.


2003 ◽  
Vol 69 (680) ◽  
pp. 1011-1018 ◽  
Author(s):  
Toshio FUKUDA ◽  
Tatsuya SUZUKI ◽  
Yasuhisa HASEGAWA ◽  
Fumihito ARAI ◽  
Masaru NEGI

Author(s):  
WOJCIECH ZAJDEL ◽  
BEN J. A. KRÖSE

Visual surveillance in wide areas (e.g. airports) relies on sparsely distributed cameras, that is, cameras that observe nonoverlapping scenes. In this setup, multiobject tracking requires reidentification of an object when it leaves one field of view, and later appears at some other. Although similar association problems are common for multiobject tracking scenarios, in the distributed case one has to cope with asynchronous observations and cannot assume smooth motion of the objects. In this paper, we propose a method for human indoor tracking. The method is based on a Dynamic Bayes Network (DBN) as a probabilistic model for the observations. The edges of the network define the correspondences between observations of the same object. Accordingly, we derive an approximate EM-like method for selecting the most likely structure of DBN and learning model parameters. The presented algorithm is tested on a collection of real-world observations gathered by a system of cameras in an office building.


2013 ◽  
Vol 467 ◽  
pp. 323-326
Author(s):  
Jong Eun Ha

Fish-eye lens is used in various applications due to wide coverage of scene. In particular, it can be effectively used in visual surveillance and surround monitoring in automotive. It has large radial distortion compared to the conventional lens with small field of view. In this paper, we present comparison results for the calibration of fish-eye lens. We compare two algorithms where one is available in OpenCV [ and the other is Devernay and Faugeras [. Also, we present experimental result according to the number of calibration points, initialization value. We evaluate the accuracy of calibration thorough 3D reconstruction by stereo system. It can give more reliable evaluation than using reprojection eror by single camera.


2020 ◽  
Author(s):  
Yaoda Xu ◽  
Maryam Vaziri-Pashkam

AbstractExisting single cell neural recording findings predict that, as information ascends the visual processing hierarchy in the primate brain, the relative similarity among the objects would be increasingly preserved across identity-preserving image transformations. Here we confirm this prediction and show that object category representational structure becomes increasingly invariant across position and size changes as information ascends the human ventral visual processing pathway. Such a representation, however, is not found in 14 different convolutional neural networks (CNNs) trained for object categorization that varied in architecture, depth and the presence/absence of recurrent processing. CNNs thus do not appear to form or maintain brain-like transformation-tolerant object identity representations at higher levels of visual processing despite the fact that CNNs may classify objects under various transformations. This limitation could potentially contribute to the large number of training data required to train CNNs and their limited ability to generalize to objects not included in training.


Sign in / Sign up

Export Citation Format

Share Document