Mutual information guided 3D ResNet for self-supervised video representation learning

2020 ◽  
Vol 14 (13) ◽  
pp. 3066-3075
Author(s):  
Fei Xue ◽  
Hongbing Ji ◽  
Wenbo Zhang
Symmetry ◽  
2021 ◽  
Vol 13 (1) ◽  
pp. 115
Author(s):  
Yongjun Jing ◽  
Hao Wang ◽  
Kun Shao ◽  
Xing Huo

Trust prediction is essential to enhancing reliability and reducing risk from the unreliable node, especially for online applications in open network environments. An essential fact in trust prediction is to measure the relation of both the interacting entities accurately. However, most of the existing methods infer the trust relation between interacting entities usually rely on modeling the similarity between nodes on a graph and ignore semantic relation and the influence of negative links (e.g., distrust relation). In this paper, we proposed a relation representation learning via signed graph mutual information maximization (called SGMIM). In SGMIM, we incorporate a translation model and positive point-wise mutual information to enhance the relation representations and adopt Mutual Information Maximization to align the entity and relation semantic spaces. Moreover, we further develop a sign prediction model for making accurate trust predictions. We conduct link sign prediction in trust networks based on learned the relation representation. Extensive experimental results in four real-world datasets on trust prediction task show that SGMIM significantly outperforms state-of-the-art baseline methods.


Author(s):  
Zhipeng Wang ◽  
Chunping Hou ◽  
Guanghui Yue ◽  
Qingyuan Yang

Author(s):  
Chenrui Zhang ◽  
Yuxin Peng

Video representation learning is a vital problem for classification task. Recently, a promising unsupervised paradigm termed self-supervised learning has emerged, which explores inherent supervisory signals implied in massive data for feature learning via solving auxiliary tasks. However, existing methods in this regard suffer from two limitations when extended to video classification. First, they focus only on a single task, whereas ignoring complementarity among different task-specific features and thus resulting in suboptimal video representation. Second, high computational and memory cost hinders their application in real-world scenarios. In this paper, we propose a graph-based distillation framework to address these problems: (1) We propose logits graph and representation graph to transfer knowledge from multiple self-supervised tasks, where the former distills classifier-level knowledge by solving a multi-distribution joint matching problem, and the latter distills internal feature knowledge from pairwise ensembled representations with tackling the challenge of heterogeneity among different features; (2) The proposal that adopts a teacher-student framework can reduce the redundancy of knowledge learned from teachers dramatically, leading to a lighter student model that solves classification task more efficiently. Experimental results on 3 video datasets validate that our proposal not only helps learn better video representation but also compress model for faster inference.


Sign in / Sign up

Export Citation Format

Share Document