scholarly journals Graph Consistency Based Mean-Teaching for Unsupervised Domain Adaptive Person Re-Identification

Author(s):  
Xiaobin Liu ◽  
Shiliang Zhang

Recent works show that mean-teaching is an effective framework for unsupervised domain adaptive person re-identification. However, existing methods perform contrastive learning on selected samples between teacher and student networks, which is sensitive to noises in pseudo labels and neglects the relationship among most samples. Moreover, these methods are not effective in cooperation of different teacher networks. To handle these issues, this paper proposes a Graph Consistency based Mean-Teaching (GCMT) method with constructing the Graph Consistency Constraint (GCC) between teacher and student networks. Specifically, given unlabeled training images, we apply teacher networks to extract corresponding features and further construct a teacher graph for each teacher network to describe the similarity relationships among training images. To boost the representation learning, different teacher graphs are fused to provide the supervise signal for optimizing student networks. GCMT fuses similarity relationships predicted by different teacher networks as supervision and effectively optimizes student networks with more sample relationships involved. Experiments on three datasets, i.e., Market-1501, DukeMTMCreID, and MSMT17, show that proposed GCMT outperforms state-of-the-art methods by clear margin. Specially, GCMT even outperforms the previous method that uses a deeper backbone. Experimental results also show that GCMT can effectively boost the performance with multiple teacher and student networks. Our code is available at https://github.com/liu-xb/GCMT .

Author(s):  
Guoqiang Gong ◽  
Liangfeng Zheng ◽  
Wenhao Jiang ◽  
Yadong Mu

Weakly-supervised temporal action localization aims to locate intervals of action instances with only video-level action labels for training. However, the localization results generated from video classification networks are often not accurate due to the lack of temporal boundary annotation of actions. Our motivating insight is that the temporal boundary of action should be stably predicted under various temporal transforms. This inspires a self-supervised equivariant transform consistency constraint. We design a set of temporal transform operations, including naive temporal down-sampling to learnable attention-piloted time warping. In our model, a localization network aims to perform well under all transforms, and another policy network is designed to choose a temporal transform at each iteration that adversarially brings localization result inconsistent with the localization network's. Additionally, we devise a self-refine module to enhance the completeness of action intervals harnessing temporal and semantic contexts. Experimental results on THUMOS14 and ActivityNet demonstrate that our model consistently outperforms the state-of-the-art weakly-supervised temporal action localization methods.


Author(s):  
Penghui Wei ◽  
Wenji Mao ◽  
Guandan Chen

Analyzing public attitudes plays an important role in opinion mining systems. Stance detection aims to determine from a text whether its author is in favor of, against, or neutral towards a given target. One challenge of this task is that a text may not explicitly express an attitude towards the target, but existing approaches utilize target content alone to build models. Moreover, although weakly supervised approaches have been proposed to ease the burden of manually annotating largescale training data, such approaches are confronted with noisy labeling problem. To address the above two issues, in this paper, we propose a Topic-Aware Reinforced Model (TARM) for weakly supervised stance detection. Our model consists of two complementary components: (1) a detection network that incorporates target-related topic information into representation learning for identifying stance effectively; (2) a policy network that learns to eliminate noisy instances from auto-labeled data based on off-policy reinforcement learning. Two networks are alternately optimized to improve each other’s performances. Experimental results demonstrate that our proposed model TARM outperforms the state-of-the-art approaches.


Sensors ◽  
2020 ◽  
Vol 20 (8) ◽  
pp. 2170 ◽  
Author(s):  
Yuya Moroto ◽  
Keisuke Maeda ◽  
Takahiro Ogawa ◽  
Miki Haseyama

A few-shot personalized saliency prediction based on adaptive image selection considering object and visual attention is presented in this paper. Since general methods predicting personalized saliency maps (PSMs) need a large number of training images, the establishment of a theory using a small number of training images is needed. To tackle this problem, although finding persons who have visual attention similar to that of a target person is effective, all persons have to commonly gaze at many images. Thus, it becomes difficult and unrealistic when considering their burden. On the other hand, this paper introduces a novel adaptive image selection (AIS) scheme that focuses on the relationship between human visual attention and objects in images. AIS focuses on both a diversity of objects in images and a variance of PSMs for the objects. Specifically, AIS selects images so that selected images have various kinds of objects to maintain their diversity. Moreover, AIS guarantees the high variance of PSMs for persons since it represents the regions that many persons commonly gaze at or do not gaze at. The proposed method enables selecting similar users from a small number of images by selecting images that have high diversities and variances. This is the technical contribution of this paper. Experimental results show the effectiveness of our personalized saliency prediction including the new image selection scheme.


Sensors ◽  
2019 ◽  
Vol 19 (4) ◽  
pp. 969
Author(s):  
JongGeun Oh ◽  
Min-Cheol Hong

This paper introduces an adaptive image rendering using a parametric nonlinear mapping-function-based on the retinex model in a low-light source. For this study, only a luminance channel was used to estimate the reflectance component of an observed low-light image, therefore halo artifacts coming from the use of the multiple center/surround Gaussian filters were reduced. A new nonlinear mapping function that incorporates the statistics of the luminance and the estimated reflectance in the reconstruction process is proposed. In addition, a new method to determine the gain and offset of the mapping function is addressed to adaptively control the contrast ratio. Finally, the relationship between the estimated luminance and the reconstructed luminance is used to reconstruct the chrominance channels. The experimental results demonstrate that the proposed method leads to the promised subjective and objective improvements over state-of-the-art, scale-based retinex methods.


Author(s):  
Hao Zhu ◽  
Man-Di Luo ◽  
Rui Wang ◽  
Ai-Hua Zheng ◽  
Ran He

AbstractAudio-visual learning, aimed at exploiting the relationship between audio and visual modalities, has drawn considerable attention since deep learning started to be used successfully. Researchers tend to leverage these two modalities to improve the performance of previously considered single-modality tasks or address new challenging problems. In this paper, we provide a comprehensive survey of recent audio-visual learning development. We divide the current audio-visual learning tasks into four different subfields: audio-visual separation and localization, audio-visual correspondence learning, audio-visual generation, and audio-visual representation learning. State-of-the-art methods, as well as the remaining challenges of each subfield, are further discussed. Finally, we summarize the commonly used datasets and challenges.


2020 ◽  
Author(s):  
Jagriti Mishra ◽  
Takuya Inoue

Abstract. Several studies have implied towards the importance of bed roughness on alluvial cover, besides, several mathematical models have also been introduced to mimic the effect bed roughness may project on alluvial cover. Here, we provide a state of the art review of research exploring the relationship between alluvial cover, sediment supply and bed topography, thereby, describing various mathematical models used to analyse deposition of alluvium. In the interest of analysing the efficiency of various available mathematical models, we performed laboratory-scale experiments and compared the results with various models. Our experiments show that alluvial cover is not merely governed by increasing sediment supply, and, bed topography is an important controlling factor of alluvial cover. Testing experimental results with various theoretical models suggest a fit of certain models for a particular bed topography and inefficiency in predicting higher roughness topography. Three models efficiently predict the experimental observations, albeit their limitations which we discuss here in detail.


Author(s):  
Zhihao Fan ◽  
Zhongyu Wei ◽  
Siyuan Wang ◽  
Ruize Wang ◽  
Zejun Li ◽  
...  

Existing research for image captioning usually represents an image using a scene graph with low-level facts (objects and relations) and fails to capture the high-level semantics. In this paper, we propose a Theme Concepts extended Image Captioning (TCIC) framework that incorporates theme concepts to represent high-level cross-modality semantics. In practice, we model theme concepts as memory vectors and propose Transformer with Theme Nodes (TTN) to incorporate those vectors for image captioning. Considering that theme concepts can be learned from both images and captions, we propose two settings for their representations learning based on TTN. On the vision side, TTN is configured to take both scene graph based features and theme concepts as input for visual representation learning. On the language side, TTN is configured to take both captions and theme concepts as input for text representation re-construction. Both settings aim to generate target captions with the same transformer-based decoder. During the training, we further align representations of theme concepts learned from images and corresponding captions to enforce the cross-modality learning. Experimental results on MS COCO show the effectiveness of our approach compared to some state-of-the-art models.


Author(s):  
Hansheng Xue ◽  
Jiajie Peng ◽  
Xuequn Shang

Multi-networks integration methods have achieved prominent performance on many network-based tasks, but these approaches often incur information loss problem. In this paper, we propose a novel multi-networks representation learning method based on semi-supervised autoencoder, termed as DeepMNE, which captures complex topological structures of each network and takes the correlation among multinetworks into account. The experimental results on two realworld datasets indicate that DeepMNE outperforms the existing state-of-the-art algorithms.


Author(s):  
J. Romero ◽  
L. Diago ◽  
J. Shinoda ◽  
I. Hagiwara

People rapidly form impressions from facial appearance, and these impressions affect social decisions. Data-driven, computational models are the best available tools for identifying the source of such impressions. However, the computational models cannot be accepted unless they have passed the tests of validation to ascertain their credibility. In this paper, the condition of the eyes of the person is used to validate the fuzzy rules extracted from the computational models. A simple and effective classifier is proposed to evaluate the closeness of the eyes during the evaluation of a small database of portraits. The experimental results show that closed-eyes can be detected only after the proposed shift of the normalized histogram is applied. Although it is very simple, the proposed classifier can achieve better accuracy than other state of the art classifiers. The relationship between the closeness of the eyes and the evaluation of the subjects is also analyzed.


Sign in / Sign up

Export Citation Format

Share Document