Siamese anchor-free object tracking with multiscale spatial attentions

AbstractRecently, object trackers based on Siamese networks have attracted considerable attentions due to their remarkable tracking performance and widespread application. Especially, the anchor-based methods exploit the region proposal subnetwork to get accurate prediction of a target and make great performance improvement. However, those trackers cannot capture the spatial information very well and the pre-defined anchors will hinder robustness. To solve these problems, we propose a Siamese-based anchor-free object tracking algorithm with multiscale spatial attentions in this paper. Firstly, we take ResNet-50 as the backbone network to generate multiscale features of both template patch and search regions. Secondly, we propose the spatial attention extraction (SAE) block to capture the spatial information among all positions in the template and search region feature maps. Thirdly, we put these features into the SAE block to get the multiscale spatial attentions. Finally, an anchor-free classification and regression subnetwork is used for predicting the location of the target. Unlike anchor-based methods, our tracker directly predicts the target position without predefined parameters. Extensive experiments with state-of-the-art trackers are carried out on four challenging visual object tracking benchmarks: OTB100, UAV123, VOT2016 and GOT-10k. Those experimental results confirm the effectiveness of our proposed tracker.

Download Full-text

A Deep Hyper Siamese Network for Real-Time Object Tracking

Transactions on Machine Learning and Artificial Intelligence ◽

10.14738/tmlai.81.8020 ◽

2020 ◽

Vol 8 (1) ◽

pp. 35-46

Author(s):

Yongpeng Zhao ◽

Lasheng Yu ◽

Xiaopeng Zheng

Keyword(s):

Object Tracking ◽

Target Object ◽

Visual Object ◽

Feature Maps ◽

Deep Convolutional Neural Networks ◽

Backbone Networks ◽

Feature Representations ◽

Siamese Network ◽

Benchmark Datasets ◽

Siamese Networks

Siamese networks have drawn increasing interest in the field of visual object tracking due to their balance of precision and efficiency. However, Siamese trackers use relatively shallow backbone networks, such as AlexNet, and therefore do not take full advantage of the capabilities of modern deep convolutional neural networks (CNNs). Moreover, the feature representations of the target object in a Siamese tracker are extracted through the last layer of CNNs and mainly capture semantic information, which causes the tracker's precision to be relatively low and to drift easily in the presence of similar distractors. In this paper, a new nonpadding residual unit (NPRU) is designed and used to stack a 22-layer deep ResNet, referred as ResNet22. After utilizing ResNet22 as the backbone network, we can build a deep Siamese network, which can greatly enhance the tracking performance. Considering that the different levels of the feature maps of the CNN represent different aspects of the target object, we aggregated different deep convolutional layers to make use of ResNet22's multilevel feature maps, which can form hyperfeature representations of targets. The designed deep hyper Siamese network is named DHSiam. Experimental results show that DHSiam has achieved significant improvement on multiple benchmark datasets.

Download Full-text

Multiple Context Features in Siamese Networks for Visual Object Tracking

Lecture Notes in Computer Science - Computer Vision – ECCV 2018 Workshops ◽

10.1007/978-3-030-11009-3_6 ◽

2019 ◽

pp. 116-131

Author(s):

Henrique Morimitsu

Keyword(s):

Object Tracking ◽

Visual Object ◽

Visual Object Tracking ◽

Multiple Context ◽

Context Features ◽

Siamese Networks

Download Full-text

Distractor-Aware Siamese Networks for Visual Object Tracking

Computer Vision – ECCV 2018 - Lecture Notes in Computer Science ◽

10.1007/978-3-030-01240-3_7 ◽

2018 ◽

pp. 103-119 ◽

Cited By ~ 130

Author(s):

Zheng Zhu ◽

Qiang Wang ◽

Bo Li ◽

Wei Wu ◽

Junjie Yan ◽

...

Keyword(s):

Object Tracking ◽

Visual Object ◽

Visual Object Tracking ◽

Siamese Networks

Download Full-text

Robust real-time visual object tracking via multi-scale fully convolutional Siamese networks

Multimedia Tools and Applications ◽

10.1007/s11042-018-5664-7 ◽

2018 ◽

Vol 77 (17) ◽

pp. 22131-22143 ◽

Cited By ~ 4

Author(s):

Longchao Yang ◽

Peilin Jiang ◽

Fei Wang ◽

Xuan Wang

Keyword(s):

Object Tracking ◽

Real Time ◽

Visual Object ◽

Visual Object Tracking ◽

Multi Scale ◽

Siamese Networks

Download Full-text

Fully Convolutional Single-Crop Siamese Networks for Real-Time Visual Object Tracking

Electronics ◽

10.3390/electronics8101084 ◽

2019 ◽

Vol 8 (10) ◽

pp. 1084 ◽

Cited By ~ 2

Author(s):

Dong-Hyun Lee

Keyword(s):

Object Tracking ◽

Real Time ◽

Multiple Scale ◽

Visual Object ◽

Feature Maps ◽

Visual Object Tracking ◽

Performance Improvements ◽

Time Operation ◽

Real Time Operation ◽

Speed Performance

The visual object tracking problem seeks to track an arbitrary object in a video, and many deep convolutional neural network-based algorithms have achieved significant performance improvements in recent years. However, most of them do not guarantee real-time operation due to the large computation overhead for deep feature extraction. This paper presents a single-crop visual object tracking algorithm based on a fully convolutional Siamese network (SiamFC). The proposed algorithm significantly reduces the computation burden by extracting multiple scale feature maps from a single image crop. Experimental results show that the proposed algorithm demonstrates superior speed performance in comparison with that of SiamFC.

Download Full-text

Deep Activation Feature Maps for Visual Object Tracking

Proceedings of the 2018 International Conference on Signal Processing and Machine Learning - SPML '18 ◽

10.1145/3297067.3297088 ◽

2018 ◽

Author(s):

Yang Li ◽

Zhuang Miao ◽

Jiabao Wang

Keyword(s):

Object Tracking ◽

Visual Object ◽

Feature Maps ◽

Visual Object Tracking

Download Full-text

CSASN: Learning Complementary Spatial-Aware Siamese Networks for Visual Object Tracking

2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI) ◽

10.1109/ictai50040.2020.00043 ◽

2020 ◽

Author(s):

Ying She ◽

Yang Yi

Keyword(s):

Object Tracking ◽

Visual Object ◽

Visual Object Tracking ◽

Siamese Networks

Download Full-text

Improved Hierarchical Convolutional Features for Robust Visual Object Tracking

Complexity ◽

10.1155/2021/6690237 ◽

2021 ◽

Vol 2021 ◽

pp. 1-16

Author(s):

Jinping Sun

Keyword(s):

Object Tracking ◽

Target Position ◽

Feature Representation ◽

Correlation Filter ◽

Low Rank ◽

Visual Object ◽

Threshold Condition ◽

Current Frame ◽

Visual Object Tracking

The target and background will change continuously in the long-term tracking process, which brings great challenges to the accurate prediction of targets. The correlation filter algorithm based on manual features is difficult to meet the actual needs due to its limited feature representation ability. Thus, to improve the tracking performance and robustness, an improved hierarchical convolutional features model is proposed into a correlation filter framework for visual object tracking. First, the objective function is designed by lasso regression modeling, and a sparse, time-series low-rank filter is learned to increase the interpretability of the model. Second, the features of the last layer and the second pool layer of the convolutional neural network are extracted to realize the target position prediction from coarse to fine. In addition, using the filters learned from the first frame and the current frame to calculate the response maps, respectively, the target position is obtained by finding the maximum response value in the response map. The filter model is updated only when these two maximum responses meet the threshold condition. The proposed tracker is evaluated by simulation analysis on TC-128/OTB2015 benchmarks including more than 100 video sequences. Extensive experiments demonstrate that the proposed tracker achieves competitive performance against state-of-the-art trackers. The distance precision rate and overlap success rate of the proposed algorithm on OTB2015 are 0.829 and 0.695, respectively. The proposed algorithm effectively solves the long-term object tracking problem in complex scenes.

Download Full-text

A Learning Frequency-Aware Feature Siamese Network for Real-Time Visual Tracking

Electronics ◽

10.3390/electronics9050854 ◽

2020 ◽

Vol 9 (5) ◽

pp. 854

Author(s):

Yuxiang Yang ◽

Weiwei Xing ◽

Shunli Zhang ◽

Qi Yu ◽

Xiaoyu Guo ◽

...

Keyword(s):

Object Tracking ◽

Low Frequency ◽

Linear Representation ◽

Visual Object ◽

Tracking Accuracy ◽

Discriminative Ability ◽

Visual Object Tracking ◽

Training Samples ◽

Complex Scenes ◽

Siamese Networks

Visual object tracking by Siamese networks has achieved favorable performance in accuracy and speed. However, the features used in Siamese networks have spatially redundant information, which increases computation and limits the discriminative ability of Siamese networks. Addressing this issue, we present a novel frequency-aware feature (FAF) method for robust visual object tracking in complex scenes. Unlike previous works, which select features from different channels or layers, the proposed method factorizes the feature map into multi-frequency and reduces the low-frequency information that is spatially redundant. By reducing the low-frequency map’s resolution, the computation is saved and the receptive field of the layer is also increased to obtain more discriminative information. To further improve the performance of the FAF, we design an innovative data-independent augmentation for object tracking to improve the discriminative ability of tracker, which enhanced linear representation among training samples by convex combinations of the images and tags. Finally, a joint judgment strategy is proposed to adjust the bounding box result that combines intersection-over-union (IoU) and classification scores to improve tracking accuracy. Extensive experiments on 5 challenging benchmarks demonstrate that our FAF method performs favorably against SOTA tracking methods while running around 45 frames per second.

Download Full-text