scholarly journals Siamese anchor-free object tracking with multiscale spatial attentions

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Jianming Zhang ◽  
Benben Huang ◽  
Zi Ye ◽  
Li-Dan Kuang ◽  
Xin Ning

AbstractRecently, object trackers based on Siamese networks have attracted considerable attentions due to their remarkable tracking performance and widespread application. Especially, the anchor-based methods exploit the region proposal subnetwork to get accurate prediction of a target and make great performance improvement. However, those trackers cannot capture the spatial information very well and the pre-defined anchors will hinder robustness. To solve these problems, we propose a Siamese-based anchor-free object tracking algorithm with multiscale spatial attentions in this paper. Firstly, we take ResNet-50 as the backbone network to generate multiscale features of both template patch and search regions. Secondly, we propose the spatial attention extraction (SAE) block to capture the spatial information among all positions in the template and search region feature maps. Thirdly, we put these features into the SAE block to get the multiscale spatial attentions. Finally, an anchor-free classification and regression subnetwork is used for predicting the location of the target. Unlike anchor-based methods, our tracker directly predicts the target position without predefined parameters. Extensive experiments with state-of-the-art trackers are carried out on four challenging visual object tracking benchmarks: OTB100, UAV123, VOT2016 and GOT-10k. Those experimental results confirm the effectiveness of our proposed tracker.

2020 ◽  
Vol 8 (1) ◽  
pp. 35-46
Author(s):  
Yongpeng Zhao ◽  
Lasheng Yu ◽  
Xiaopeng Zheng

Siamese networks have drawn increasing interest in the field of visual object tracking due to their balance of precision and efficiency. However, Siamese trackers use relatively shallow backbone networks, such as AlexNet, and therefore do not take full advantage of the capabilities of modern deep convolutional neural networks (CNNs). Moreover, the feature representations of the target object in a Siamese tracker are extracted through the last layer of CNNs and mainly capture semantic information, which causes the tracker's precision to be relatively low and to drift easily in the presence of similar distractors. In this paper, a new nonpadding residual unit (NPRU) is designed and used to stack a 22-layer deep ResNet, referred as ResNet22. After utilizing ResNet22 as the backbone network, we can build a deep Siamese network, which can greatly enhance the tracking performance. Considering that the different levels of the feature maps of the CNN represent different aspects of the target object, we aggregated different deep convolutional layers to make use of ResNet22's multilevel feature maps, which can form hyperfeature representations of targets. The designed deep hyper Siamese network is named DHSiam. Experimental results show that DHSiam has achieved significant improvement on multiple benchmark datasets.


Author(s):  
Zheng Zhu ◽  
Qiang Wang ◽  
Bo Li ◽  
Wei Wu ◽  
Junjie Yan ◽  
...  

2018 ◽  
Vol 77 (17) ◽  
pp. 22131-22143 ◽  
Author(s):  
Longchao Yang ◽  
Peilin Jiang ◽  
Fei Wang ◽  
Xuan Wang

Electronics ◽  
2019 ◽  
Vol 8 (10) ◽  
pp. 1084 ◽  
Author(s):  
Dong-Hyun Lee

The visual object tracking problem seeks to track an arbitrary object in a video, and many deep convolutional neural network-based algorithms have achieved significant performance improvements in recent years. However, most of them do not guarantee real-time operation due to the large computation overhead for deep feature extraction. This paper presents a single-crop visual object tracking algorithm based on a fully convolutional Siamese network (SiamFC). The proposed algorithm significantly reduces the computation burden by extracting multiple scale feature maps from a single image crop. Experimental results show that the proposed algorithm demonstrates superior speed performance in comparison with that of SiamFC.


Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-16
Author(s):  
Jinping Sun

The target and background will change continuously in the long-term tracking process, which brings great challenges to the accurate prediction of targets. The correlation filter algorithm based on manual features is difficult to meet the actual needs due to its limited feature representation ability. Thus, to improve the tracking performance and robustness, an improved hierarchical convolutional features model is proposed into a correlation filter framework for visual object tracking. First, the objective function is designed by lasso regression modeling, and a sparse, time-series low-rank filter is learned to increase the interpretability of the model. Second, the features of the last layer and the second pool layer of the convolutional neural network are extracted to realize the target position prediction from coarse to fine. In addition, using the filters learned from the first frame and the current frame to calculate the response maps, respectively, the target position is obtained by finding the maximum response value in the response map. The filter model is updated only when these two maximum responses meet the threshold condition. The proposed tracker is evaluated by simulation analysis on TC-128/OTB2015 benchmarks including more than 100 video sequences. Extensive experiments demonstrate that the proposed tracker achieves competitive performance against state-of-the-art trackers. The distance precision rate and overlap success rate of the proposed algorithm on OTB2015 are 0.829 and 0.695, respectively. The proposed algorithm effectively solves the long-term object tracking problem in complex scenes.


Electronics ◽  
2020 ◽  
Vol 9 (5) ◽  
pp. 854
Author(s):  
Yuxiang Yang ◽  
Weiwei Xing ◽  
Shunli Zhang ◽  
Qi Yu ◽  
Xiaoyu Guo ◽  
...  

Visual object tracking by Siamese networks has achieved favorable performance in accuracy and speed. However, the features used in Siamese networks have spatially redundant information, which increases computation and limits the discriminative ability of Siamese networks. Addressing this issue, we present a novel frequency-aware feature (FAF) method for robust visual object tracking in complex scenes. Unlike previous works, which select features from different channels or layers, the proposed method factorizes the feature map into multi-frequency and reduces the low-frequency information that is spatially redundant. By reducing the low-frequency map’s resolution, the computation is saved and the receptive field of the layer is also increased to obtain more discriminative information. To further improve the performance of the FAF, we design an innovative data-independent augmentation for object tracking to improve the discriminative ability of tracker, which enhanced linear representation among training samples by convex combinations of the images and tags. Finally, a joint judgment strategy is proposed to adjust the bounding box result that combines intersection-over-union (IoU) and classification scores to improve tracking accuracy. Extensive experiments on 5 challenging benchmarks demonstrate that our FAF method performs favorably against SOTA tracking methods while running around 45 frames per second.


2020 ◽  
pp. 107698
Author(s):  
Shiyu Xuan ◽  
Shengyang Li ◽  
Zifei Zhao ◽  
Longxuan Kou ◽  
Zhuang Zhou ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document