RefNet: A Reference-Aware Network for Background Based Conversation

In this paper, we developed a practical approach for automatic detection of discrimination actions from social images. Firstly, an image set is established, in which various discrimination actions and relations are manually labeled. To the best of our knowledge, this is the first work to create a dataset for discrimination action recognition and relationship identification. Secondly, a practical approach is developed to achieve automatic detection and identification of discrimination actions and relationships from social images. Thirdly, the task of relationship identification is seamlessly integrated with the task of discrimination action recognition into one single network called the Co-operative Visual Translation Embedding++ network (CVTransE++). We also compared our proposed method with numerous state-of-the-art methods, and our experimental results demonstrated that our proposed methods can significantly outperform state-of-the-art approaches.

Download Full-text

Dual-Stream Guided-Learning via a Priori Optimization for Person Re-identification

ACM Transactions on Multimedia Computing Communications and Applications ◽

10.1145/3447715 ◽

2021 ◽

Vol 17 (4) ◽

pp. 1-22

Author(s):

Junyi Wu ◽

Yan Huang ◽

Qiang Wu ◽

Zhipeng Gao ◽

Jianqiang Zhao ◽

...

Keyword(s):

Learning Strategy ◽

State Of The Art ◽

A Priori ◽

Background Information ◽

Stream Network ◽

Related Information ◽

Guided Learning ◽

Segmentation Algorithms ◽

Art Methods ◽

Background Clutter

The task of person re-identification (re-ID) is to find the same pedestrian across non-overlapping camera views. Generally, the performance of person re-ID can be affected by background clutter. However, existing segmentation algorithms cannot obtain perfect foreground masks to cover the background information clearly. In addition, if the background is completely removed, some discriminative ID-related cues (i.e., backpack or companion) may be lost. In this article, we design a dual-stream network consisting of a Provider Stream (P-Stream) and a Receiver Stream (R-Stream). The R-Stream performs an a priori optimization operation on foreground information. The P-Stream acts as a pusher to guide the R-Stream to concentrate on foreground information and some useful ID-related cues in the background. The proposed dual-stream network can make full use of the a priori optimization and guided-learning strategy to learn encouraging foreground information and some useful ID-related information in the background. Our method achieves Rank-1 accuracy of 95.4% on Market-1501, 89.0% on DukeMTMC-reID, 78.9% on CUHK03 (labeled), and 75.4% on CUHK03 (detected), outperforming state-of-the-art methods.

Download Full-text

Feature Prioritization and Regularization Improve Standard Accuracy and Adversarial Robustness

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/415 ◽

2019 ◽

Author(s):

Chihuang Liu ◽

Joseph JaJa

Keyword(s):

Classification Accuracy ◽

State Of The Art ◽

The State ◽

Experimental Results ◽

Trade Off ◽

Experimental Strategy ◽

Art Methods ◽

Adversarial Training ◽

Standard Classification ◽

Improve Standard

Adversarial training has been successfully applied to build robust models at a certain cost. While the robustness of a model increases, the standard classification accuracy declines. This phenomenon is suggested to be an inherent trade-off. We propose a model that employs feature prioritization by a nonlinear attention module and L2 feature regularization to improve the adversarial robustness and the standard accuracy relative to adversarial training. The attention module encourages the model to rely heavily on robust features by assigning larger weights to them while suppressing non-robust features. The regularizer encourages the model to extract similar features for the natural and adversarial images, effectively ignoring the added perturbation. In addition to evaluating the robustness of our model, we provide justification for the attention module and propose a novel experimental strategy that quantitatively demonstrates that our model is almost ideally aligned with salient data characteristics. Additional experimental results illustrate the power of our model relative to the state of the art methods.

Download Full-text

Query Expansion based on Central Tendency and PRF for Monolingual Retrieval

International Journal of Information Retrieval Research ◽

10.4018/ijirr.2016100103 ◽

2016 ◽

Vol 6 (4) ◽

pp. 30-50

Author(s):

Rekha Vaidyanathan ◽

Sujoy Das ◽

Namita Srivastava

Keyword(s):

Statistical Method ◽

Query Expansion ◽

State Of The Art ◽

The State ◽

Experimental Results ◽

Central Tendency ◽

Relevant Document ◽

Retrieval Engine ◽

The Right ◽

A Performance

Query Expansion is the process of selecting relevant words that are closest in meaning and context to that of the keyword(s) of query. In this paper, a statistical method of automatically selecting contextually related words for expansion, after identifying a pattern in their score, is proposed. Words appearing in top 10 relevant document is given a score w.r.t partitions they appear in. Proposed statistical method, identifies a pattern of central tendency in the high scores and selects the right group of words for query expansion. The objective of the method is to keep the expanded query with minimum words (light), and still give statistically significant MAP values compared to the original query. Experimental results show 17-21% improvement of MAP over the original unexpanded query as baseline but achieves a performance similar to that of the state of the art query expansion models - Bo1 and KL. FIRE 2011 Adhoc English and Hindi data for 50 topics each were used for experiments with Terrier as the Retrieval Engine.

Download Full-text

Query Expansion Based on Central Tendency and PRF for Monolingual Retrieval

Information Retrieval and Management ◽

10.4018/978-1-5225-5191-1.ch022 ◽

2018 ◽

pp. 479-501

Author(s):

Rekha Vaidyanathan ◽

Sujoy Das ◽

Namita Srivastava

Keyword(s):

Statistical Method ◽

Query Expansion ◽

State Of The Art ◽

The State ◽

Experimental Results ◽

Central Tendency ◽

Relevant Document ◽

Retrieval Engine ◽

The Right ◽

A Performance

Query Expansion is the process of selecting relevant words that are closest in meaning and context to that of the keyword(s) of query. In this paper, a statistical method of automatically selecting contextually related words for expansion, after identifying a pattern in their score, is proposed. Words appearing in top 10 relevant document is given a score w.r.t partitions they appear in. Proposed statistical method, identifies a pattern of central tendency in the high scores and selects the right group of words for query expansion. The objective of the method is to keep the expanded query with minimum words (light), and still give statistically significant MAP values compared to the original query. Experimental results show 17-21% improvement of MAP over the original unexpanded query as baseline but achieves a performance similar to that of the state of the art query expansion models - Bo1 and KL. FIRE 2011 Adhoc English and Hindi data for 50 topics each were used for experiments with Terrier as the Retrieval Engine.

Download Full-text

Modeling Multi-Purpose Sessions for Next-Item Recommendations via Mixture-Channel Purpose Routing Networks

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/523 ◽

2019 ◽

Cited By ~ 8

Author(s):

Shoujin Wang ◽

Liang Hu ◽

Yan Wang ◽

Quan Z. Sheng ◽

Mehmet Orgun ◽

...

Keyword(s):

Recommender System ◽

Channel Model ◽

State Of The Art ◽

Recurrent Network ◽

The State ◽

Experimental Results ◽

Art Methods ◽

Recommendation Accuracy ◽

The Difference

A session-based recommender system (SBRS) suggests the next item by modeling the dependencies between items in a session. Most of existing SBRSs assume the items inside a session are associated with one (implicit) purpose. However, this may not always be true in reality, and a session may often consist of multiple subsets of items for different purposes (e.g., breakfast and decoration). Specifically, items (e.g., bread and milk) in a subsethave strong purpose-specific dependencies whereas items (e.g., bread and vase) from different subsets have much weaker or even no dependencies due to the difference of purposes. Therefore, we propose a mixture-channel model to accommodate the multi-purpose item subsets for more precisely representing a session. Filling gaps in existing SBRSs, this model recommends more diverse items to satisfy different purposes. Accordingly, we design effective mixture-channel purpose routing networks (MCPRN) with a purpose routing network to detect the purposes of each item and assign it into the corresponding channels. Moreover, a purpose specific recurrent network is devised to model the dependencies between items within each channel for a specific purpose. The experimental results show the superiority of MCPRN over the state-of-the-art methods in terms of both recommendation accuracy and diversity.

Download Full-text

Cross-Modality Paired-Images Generation for RGB-Infrared Person Re-Identification

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6894 ◽

2020 ◽

Vol 34 (07) ◽

pp. 12144-12151

Author(s):

Guan-An Wang ◽

Tianzhu Zhang ◽

Yang Yang ◽

Jian Cheng ◽

Jianlong Chang ◽

...

Keyword(s):

State Of The Art ◽

Experimental Results ◽

Fine Grained ◽

Invariant Features ◽

Proposed Model ◽

Art Methods ◽

Conventional Methods

RGB-Infrared (IR) person re-identification is very challenging due to the large cross-modality variations between RGB and IR images. The key solution is to learn aligned features to the bridge RGB and IR modalities. However, due to the lack of correspondence labels between every pair of RGB and IR images, most methods try to alleviate the variations with set-level alignment by reducing the distance between the entire RGB and IR sets. However, this set-level alignment may lead to misalignment of some instances, which limits the performance for RGB-IR Re-ID. Different from existing methods, in this paper, we propose to generate cross-modality paired-images and perform both global set-level and fine-grained instance-level alignments. Our proposed method enjoys several merits. First, our method can perform set-level alignment by disentangling modality-specific and modality-invariant features. Compared with conventional methods, ours can explicitly remove the modality-specific features and the modality variation can be better reduced. Second, given cross-modality unpaired-images of a person, our method can generate cross-modality paired images from exchanged images. With them, we can directly perform instance-level alignment by minimizing distances of every pair of images. Extensive experimental results on two standard benchmarks demonstrate that the proposed model favourably against state-of-the-art methods. Especially, on SYSU-MM01 dataset, our model can achieve a gain of 9.2% and 7.7% in terms of Rank-1 and mAP. Code is available at https://github.com/wangguanan/JSIA-ReID.

Download Full-text

Panoptic Segmentation-Based Attention for Image Captioning

Applied Sciences ◽

10.3390/app10010391 ◽

2020 ◽

Vol 10 (1) ◽

pp. 391

Author(s):

Wenjie Cai ◽

Zheng Xiong ◽

Xianfang Sun ◽

Paul L. Rosin ◽

Longcun Jin ◽

...

Keyword(s):

Main Part ◽

State Of The Art ◽

Image Representation ◽

Experimental Results ◽

Competitive Performance ◽

Image Captioning ◽

Feature Vectors ◽

Fine Grained ◽

Art Methods

Image captioning is the task of generating textual descriptions of images. In order to obtain a better image representation, attention mechanisms have been widely adopted in image captioning. However, in existing models with detection-based attention, the rectangular attention regions are not fine-grained, as they contain irrelevant regions (e.g., background or overlapped regions) around the object, making the model generate inaccurate captions. To address this issue, we propose panoptic segmentation-based attention that performs attention at a mask-level (i.e., the shape of the main part of an instance). Our approach extracts feature vectors from the corresponding segmentation regions, which is more fine-grained than current attention mechanisms. Moreover, in order to process features of different classes independently, we propose a dual-attention module which is generic and can be applied to other frameworks. Experimental results showed that our model could recognize the overlapped objects and understand the scene better. Our approach achieved competitive performance against state-of-the-art methods. We made our code available.

Download Full-text

Adaptive Graph Guided Embedding for Multi-label Annotation

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/388 ◽

2018 ◽

Cited By ~ 8

Author(s):

Lichen Wang ◽

Zhengming Ding ◽

Yun Fu

Keyword(s):

Large Scale ◽

State Of The Art ◽

Unlabeled Data ◽

Label Propagation ◽

Experimental Results ◽

Training Data ◽

Learning Performance ◽

Intrinsic Structure ◽

Latent Space ◽

Art Methods

Multi-label annotation is challenging since a large amount of well-labeled training data are required to achieve promising performance. However, providing such data is expensive while unlabeled data are widely available. To this end, we propose a novel Adaptive Graph Guided Embedding (AG2E) approach for multi-label annotation in a semi-supervised fashion, which utilizes limited labeled data associating with large-scale unlabeled data to facilitate learning performance. Specifically, a multi-label propagation scheme and an effective embedding are jointly learned to seek a latent space where unlabeled instances tend to be well assigned multiple labels. Furthermore, a locality structure regularizer is designed to preserve the intrinsic structure and enhance the multi-label annotation. We evaluate our model in both conventional multi-label learning and zero-shot learning scenario. Experimental results demonstrate that our approach outperforms other compared state-of-the-art methods.

Download Full-text

Reliable Memory Model for Visual Tracking

Electronics ◽

10.3390/electronics10202488 ◽

2021 ◽

Vol 10 (20) ◽

pp. 2488

Author(s):

Daohui Ge ◽

Ruyi Liu ◽

Yunan Li ◽

Qiguang Miao

Keyword(s):

Visual Tracking ◽

State Of The Art ◽

Experimental Results ◽

Memory Model ◽

Background Information ◽

Evaluation Strategy ◽

Active Memory ◽

Training Samples ◽

Art Performance ◽

Similarity Distance

Effectively learning the appearance change of a target is the key point of an online tracker. When occlusion and misalignment occur, the tracking results usually contain a great amount of background information, which heavily affects the ability of a tracker to distinguish between targets and backgrounds, eventually leading to tracking failure. To solve this problem, we propose a simple and robust reliable memory model. In particular, an adaptive evaluation strategy (AES) is proposed to assess the reliability of tracking results. AES combines the confidence of the tracker predictions and the similarity distance, which is between the current predicted result and the existing tracking results. Based on the reliable results of AES selection, we designed an active–frozen memory model to store reliable results. Training samples stored in active memory are used to update the tracker, while frozen memory temporarily stores inactive samples. The active–frozen memory model maintains the diversity of samples while satisfying the limitation of storage. We performed comprehensive experiments on five benchmarks: OTB-2013, OTB-2015, UAV123, Temple-color-128, and VOT2016. The experimental results show that our tracker achieves state-of-the-art performance.

Download Full-text