Stacked Denoising Autoencoder for feature representation learning in pose-based action recognition

For the merits of high-order statistics and Riemannian geometry, covariance matrix has become a generic feature representation for action recognition. An independent action can be represented by an empirical statistics over all of its pose samples. Two major problems of covariance include the following: (1) it is prone to be singular so that actions fail to be represented properly, and (2) it is short of global action/pose-aware information so that expressive and discriminative power is limited. In this article, we propose a novel Bayesian covariance representation by a prior regularization method to solve the preceding problems. Specifically, covariance is viewed as a parametric maximum likelihood estimate of Gaussian distribution over local poses from an independent action. Then, a Global Informative Prior (GIP) is generated over global poses with sufficient statistics to regularize covariance. In this way, (1) singularity is greatly relieved due to sufficient statistics, (2) global pose information of GIP makes Bayesian covariance theoretically equivalent to a saliency weighting covariance over global action poses so that discriminative characteristics of actions can be represented more clearly. Experimental results show that our Bayesian covariance with GIP efficiently improves the performance of action recognition. In some databases, it outperforms the state-of-the-art variant methods that are based on kernels, temporal-order structures, and saliency weighting attentions, among others.

Download Full-text

Semantic Relationships Guided Representation Learning for Facial Action Unit Recognition

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33018594 ◽

2019 ◽

Vol 33 ◽

pp. 8594-8601 ◽

Cited By ~ 5

Author(s):

Guanbin Li ◽

Xin Zhu ◽

Yirui Zeng ◽

Qing Wang ◽

Liang Lin

Keyword(s):

Neural Network ◽

Facial Expressions ◽

Mutual Exclusion ◽

Representation Learning ◽

Feature Representation ◽

Semantic Relationship ◽

Action Unit ◽

Facial Action ◽

Feature Representations ◽

Regional Feature

Facial action unit (AU) recognition is a crucial task for facial expressions analysis and has attracted extensive attention in the field of artificial intelligence and computer vision. Existing works have either focused on designing or learning complex regional feature representations, or delved into various types of AU relationship modeling. Albeit with varying degrees of progress, it is still arduous for existing methods to handle complex situations. In this paper, we investigate how to integrate the semantic relationship propagation between AUs in a deep neural network framework to enhance the feature representation of facial regions, and propose an AU semantic relationship embedded representation learning (SRERL) framework. Specifically, by analyzing the symbiosis and mutual exclusion of AUs in various facial expressions, we organize the facial AUs in the form of structured knowledge-graph and integrate a Gated Graph Neural Network (GGNN) in a multi-scale CNN framework to propagate node information through the graph for generating enhanced AU representation. As the learned feature involves both the appearance characteristics and the AU relationship reasoning, the proposed model is more robust and can cope with more challenging cases, e.g., illumination change and partial occlusion. Extensive experiments on the two public benchmarks demonstrate that our method outperforms the previous work and achieves state of the art performance.

Download Full-text

Redundancy Removal Adversarial Active Learning Based on Norm Online Uncertainty Indicator

Computational Intelligence and Neuroscience ◽

10.1155/2021/4752568 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Jifeng Guo ◽

Zhiqi Pang ◽

Wenbo Sun ◽

Shi Li ◽

Yu Chen

Keyword(s):

Active Learning ◽

Feature Vector ◽

State Of The Art ◽

Representation Learning ◽

The State ◽

Feature Representation ◽

Learning Ability ◽

Redundancy Removal ◽

Previous State ◽

Uncertainty Score

Active learning aims to select the most valuable unlabelled samples for annotation. In this paper, we propose a redundancy removal adversarial active learning (RRAAL) method based on norm online uncertainty indicator, which selects samples based on their distribution, uncertainty, and redundancy. RRAAL includes a representation generator, state discriminator, and redundancy removal module (RRM). The purpose of the representation generator is to learn the feature representation of a sample, and the state discriminator predicts the state of the feature vector after concatenation. We added a sample discriminator to the representation generator to improve the representation learning ability of the generator and designed a norm online uncertainty indicator (Norm-OUI) to provide a more accurate uncertainty score for the state discriminator. In addition, we designed an RRM based on a greedy algorithm to reduce the number of redundant samples in the labelled pool. The experimental results on four datasets show that the state discriminator, Norm-OUI, and RRM can improve the performance of RRAAL, and RRAAL outperforms the previous state-of-the-art active learning methods.

Download Full-text

Deep Learning for Person Reidentification Using Support Vector Machines

Advances in Multimedia ◽

10.1155/2017/9874345 ◽

2017 ◽

Vol 2017 ◽

pp. 1-12 ◽

Cited By ~ 2

Author(s):

Mengyu Xu ◽

Zhenmin Tang ◽

Yazhou Yao ◽

Lingxiang Yao ◽

Huafeng Liu ◽

...

Keyword(s):

Data Augmentation ◽

Metric Learning ◽

Representation Learning ◽

Activation Function ◽

Feature Representation ◽

Support Vector ◽

Similarity Learning ◽

Vector Machines ◽

Augmentation Techniques ◽

Suboptimal Solution

Due to the variations of viewpoint, pose, and illumination, a given individual may appear considerably different across different camera views. Tracking individuals across camera networks with no overlapping fields is still a challenging problem. Previous works mainly focus on feature representation and metric learning individually which tend to have a suboptimal solution. To address this issue, in this work, we propose a novel framework to do the feature representation learning and metric learning jointly. Different from previous works, we represent the pairs of pedestrian images as new resized input and use linear Support Vector Machine to replace softmax activation function for similarity learning. Particularly, dropout and data augmentation techniques are also employed in this model to prevent the network from overfitting. Extensive experiments on two publically available datasets VIPeR and CUHK01 demonstrate the effectiveness of our proposed approach.

Download Full-text