scholarly journals Fine-Grained Fashion Similarity Learning by Attribute-Specific Embedding Network

2020 ◽  
Vol 34 (07) ◽  
pp. 11741-11748 ◽  
Author(s):  
Zhe Ma ◽  
Jianfeng Dong ◽  
Zhongzi Long ◽  
Yao Zhang ◽  
Yuan He ◽  
...  

This paper strives to learn fine-grained fashion similarity. In this similarity paradigm, one should pay more attention to the similarity in terms of a specific design/attribute among fashion items, which has potential values in many fashion related applications such as fashion copyright protection. To this end, we propose an Attribute-Specific Embedding Network (ASEN) to jointly learn multiple attribute-specific embeddings in an end-to-end manner, thus measure the fine-grained similarity in the corresponding space. With two attention modules, i.e., Attribute-aware Spatial Attention and Attribute-aware Channel Attention, ASEN is able to locate the related regions and capture the essential patterns under the guidance of the specified attribute, thus make the learned attribute-specific embeddings better reflect the fine-grained similarity. Extensive experiments on four fashion-related datasets show the effectiveness of ASEN for fine-grained fashion similarity learning and its potential for fashion reranking. Code and data are available at https://github.com/Maryeon/asen.

eLife ◽  
2016 ◽  
Vol 5 ◽  
Author(s):  
Tao Yao ◽  
Madhura Ketkar ◽  
Stefan Treue ◽  
B Suresh Krishna

Maintaining attention at a task-relevant spatial location while making eye-movements necessitates a rapid, saccade-synchronized shift of attentional modulation from the neuronal population representing the task-relevant location before the saccade to the one representing it after the saccade. Currently, the precise time at which spatial attention becomes fully allocated to the task-relevant location after the saccade remains unclear. Using a fine-grained temporal analysis of human peri-saccadic detection performance in an attention task, we show that spatial attention is fully available at the task-relevant location within 30 milliseconds after the saccade. Subjects tracked the attentional target veridically throughout our task: i.e. they almost never responded to non-target stimuli. Spatial attention and saccadic processing therefore co-ordinate well to ensure that relevant locations are attentionally enhanced soon after the beginning of each eye fixation.


2021 ◽  
Vol 2021 ◽  
pp. 1-8
Author(s):  
Shaoqi Hou ◽  
Chunhui Liu ◽  
Kangning Yin ◽  
Yiyin Ding ◽  
Zhiguo Wang ◽  
...  

Person Re-identification (Re-ID) is aimed at solving the matching problem of the same pedestrian at a different time and in different places. Due to the cross-device condition, the appearance of different pedestrians may have a high degree of similarity; at this time, using the global features of pedestrians to match often cannot achieve good results. In order to solve these problems, we designed a Spatial Attention Network Guided by Attribute Label (SAN-GAL), which is a dual-trace network containing both attribute classification and Re-ID. Different from the previous approach of simply adding a branch of attribute binary classification network, our SAN-GAL is mainly divided into two connecting steps. First, with attribute labels as guidance, we generate Attribute Attention Heat map (AAH) through Grad-CAM algorithm to accurately locate fine-grained attribute areas of pedestrians. Then, the Attribute Spatial Attention Module (ASAM) is constructed according to the AHH which is taken as the prior knowledge and introduced into the Re-ID network to assist in the discrimination of the Re-ID task. In particular, our SAN-GAL network can integrate the local attribute information and global ID information of pedestrians without introducing additional attribute region annotation, which has good flexibility and adaptability. The test results on Market1501 and DukeMTMC-reID show that our SAN-GAL can achieve good results and can achieve 85.8% Rank-1 accuracy on DukeMTMC-reID dataset, which is obviously competitive compared with most Re-ID algorithms.


Author(s):  
Zhu Zhang ◽  
Zhou Zhao ◽  
Zhijie Lin ◽  
Jingkuan Song ◽  
Deng Cai

Action localization in untrimmed videos is an important topic in the field of video understanding. However, existing action localization methods are restricted to a pre-defined set of actions and cannot localize unseen activities. Thus, we consider a new task to localize unseen activities in videos via image queries, named Image-Based Activity Localization. This task faces three inherent challenges: (1) how to eliminate the influence of semantically inessential contents in image queries; (2) how to deal with the fuzzy localization of inaccurate image queries; (3) how to determine the precise boundaries of target segments. We then propose a novel self-attention interaction localizer to retrieve unseen activities in an end-to-end fashion. Specifically, we first devise a region self-attention method with relative position encoding to learn fine-grained image region representations. Then, we employ a local transformer encoder to build multi-step fusion and reasoning of image and video contents. We next adopt an order-sensitive localizer to directly retrieve the target segment. Furthermore, we construct a new dataset ActivityIBAL by reorganizing the ActivityNet dataset. The extensive experiments show the effectiveness of our method.


Author(s):  
Raphael V. Rosa ◽  
Christian Esteve Rothenberg

Towards end-to-end network slicing, diverse envisioned 5G services (eg, augmented reality, vehicular communications, IoT) Call for advanced multi-administrative domain service deployments, open challenges from vertical Agreement (SLA) -based orchestration hazards. Through different proposed methodologies and demonstrated prototypes, this work showcases: the automated extraction of network function profiles; the manners to analyze how such profiles compose programmable network slice footprints; and the means to perform fine-grained auditable SLAs for end-to-end network slicing among multiple administrative domains. Sustained on state-of-the-art networking concepts, this work presents contributions by detecting roots on standardization efforts and best-of-breed open source embodiments, each one standing prominent future work topics in shape of its shortcomings.


Author(s):  
Qiuxia Lai ◽  
Yu Li ◽  
Ailing Zeng ◽  
Minhao Liu ◽  
Hanqiu Sun ◽  
...  

The selective visual attention mechanism in the human visual system (HVS) restricts the amount of information to reach visual awareness for perceiving natural scenes, allowing near real-time information processing with limited computational capacity. This kind of selectivity acts as an ‘Information Bottleneck (IB)’, which seeks a trade-off between information compression and predictive accuracy. However, such information constraints are rarely explored in the attention mechanism for deep neural networks (DNNs). In this paper, we propose an IB-inspired spatial attention module for DNN structures built for visual recognition. The module takes as input an intermediate representation of the input image, and outputs a variational 2D attention map that minimizes the mutual information (MI) between the attention-modulated representation and the input, while maximizing the MI between the attention-modulated representation and the task label. To further restrict the information bypassed by the attention map, we quantize the continuous attention scores to a set of learnable anchor values during training. Extensive experiments show that the proposed IB-inspired spatial attention mechanism can yield attention maps that neatly highlight the regions of interest while suppressing backgrounds, and bootstrap standard DNN structures for visual recognition tasks (e.g., image classification, fine-grained recognition, cross-domain classification). The attention maps are interpretable for the decision making of the DNNs as verified in the experiments. Our code is available at this https URL.


Sign in / Sign up

Export Citation Format

Share Document