MFCFSiam: A Correlation-Filter-Guided Siamese Network with Multifeature for Visual Tracking

With the development of deep learning, trackers based on convolutional neural networks (CNNs) have made significant achievements in visual tracking over the years. The fully connected Siamese network (SiamFC) is a typical representation of those trackers. SiamFC designs a two-branch architecture of a CNN and models’ visual tracking as a general similarity-learning problem. However, the feature maps it uses for visual tracking are only from the last layer of the CNN. Those features contain high-level semantic information but lack sufficiently detailed texture information. This means that the SiamFC tracker tends to drift when there are other same-category objects or when the contrast between the target and the background is very low. Focusing on addressing this problem, we design a novel tracking algorithm that combines a correlation filter tracker and the SiamFC tracker into one framework. In this framework, the correlation filter tracker can use the Histograms of Oriented Gradients (HOG) and color name (CN) features to guide the SiamFC tracker. This framework also contains an evaluation criterion which we design to evaluate the tracking result of the two trackers. If this criterion finds the SiamFC tracker fails in some cases, our framework will use the tracking result from the correlation filter tracker to correct the SiamFC. In this way, the defects of SiamFC’s high-level semantic features are remedied by the HOG and CN features. So, our algorithm provides a framework which combines two trackers together and makes them complement each other in visual tracking. And to the best of our knowledge, our algorithm is also the first one which designs an evaluation criterion using correlation filter and zero padding to evaluate the tracking result. Comprehensive experiments are conducted on the Online Tracking Benchmark (OTB), Temple Color (TC128), Benchmark for UAV Tracking (UAV-123), and Visual Object Tracking (VOT) Benchmark. The results show that our algorithm achieves quite a competitive performance when compared with the baseline tracker and several other state-of-the-art trackers.

Download Full-text

An Anchor-Free Siamese Network with Multi-Template Update for Object Tracking

Electronics ◽

10.3390/electronics10091067 ◽

2021 ◽

Vol 10 (9) ◽

pp. 1067

Author(s):

Tongtong Yuan ◽

Wenzhu Yang ◽

Qian Li ◽

Yuxia Wang

Keyword(s):

Object Tracking ◽

Correlation Energy ◽

Feature Maps ◽

Siamese Network ◽

Template Update ◽

Free Network ◽

Multiple Prediction ◽

Bounding Boxes ◽

High Level ◽

Speed And Accuracy

Siamese trackers are widely used in various fields for their advantages of balancing speed and accuracy. Compared with the anchor-based method, the anchor-free-based approach can reach faster speeds without any drop in precision. Inspired by the Siamese network and anchor-free idea, an anchor-free Siamese network (AFSN) with multi-template updates for object tracking is proposed. To improve tracking performance, a dual-fusion method is adopted in which the multi-layer features and multiple prediction results are combined respectively. The low-level feature maps are concatenated with the high-level feature maps to make full use of both spatial and semantic information. To make the results as stable as possible, the final results are obtained by combining multiple prediction results. Aiming at the template update, a high-confidence multi-template update mechanism is used. The average peak to correlation energy is used to determine whether the template should be updated. We use the anchor-free network to implement object tracking in a per-pixel manner, which computes the object category and bounding boxes directly. Experimental results indicate that the average overlap and success rate of the proposed algorithm increase by about 5% and 10%, respectively, compared to the SiamRPN++ algorithm when running on the dataset of GOT-10k (Generic Object Tracking Benchmark).

Download Full-text

Distractor-Aware Tracking with Multi-Task and Dynamic Feature Learning

Journal of Circuits System and Computers ◽

10.1142/s0218126621500316 ◽

2020 ◽

pp. 2150031

Author(s):

Weichun Liu ◽

Xiaoan Tang ◽

Chenglin Zhao

Keyword(s):

Correlation Filter ◽

Coarse Grained ◽

Dynamic Feature ◽

Semantic Features ◽

Low Level ◽

Fine Grained ◽

Semantic Embedding ◽

Training Stage ◽

Online Tracking ◽

High Level

Recently, deep trackers based on the siamese networking are enjoying increasing popularity in the tracking community. Generally, those trackers learn a high-level semantic embedding space for feature representation but lose low-level fine-grained details. Meanwhile, the learned high-level semantic features are not updated during online tracking, which results in tracking drift in presence of target appearance variation and similar distractors. In this paper, we present a novel end-to-end trainable Convolutional Neural Network (CNN) based on the siamese network for distractor-aware tracking. It enhances target appearance representation in both the offline training stage and online tracking stage. In the offline training stage, this network learns both the low-level fine-grained details and high-level coarse-grained semantics simultaneously in a multi-task learning framework. The low-level features with better resolution are complementary to semantic features and able to distinguish the foreground target from background distractors. In the online stage, the learned low-level features are fed into a correlation filter layer and updated in an interpolated manner to encode target appearance variation adaptively. The learned high-level features are fed into a cross-correlation layer without online update. Therefore, the proposed tracker benefits from both the adaptability of the fine-grained correlation filter and the generalization capability of the semantic embedding. Extensive experiments are conducted on the public OTB100 and UAV123 benchmark datasets. Our tracker achieves state-of-the-art performance while running with a real-time frame-rate.

Download Full-text

Visual Tracking Based on Complementary Learners with Distractor Handling

Mathematical Problems in Engineering ◽

10.1155/2017/5295601 ◽

2017 ◽

Vol 2017 ◽

pp. 1-13 ◽

Cited By ~ 2

Author(s):

Suryo Adhi Wibowo ◽

Hansoo Lee ◽

Eun Kyeong Kim ◽

Sungshin Kim

Keyword(s):

Visual Tracking ◽

Object Representation ◽

Target Location ◽

Target Object ◽

Tracking Algorithm ◽

Color Histogram ◽

Correlation Filter ◽

Visual Object ◽

Visual Object Tracking ◽

Benchmark Datasets

The representation of the object is an important factor in building a robust visual object tracking algorithm. To resolve this problem, complementary learners that use color histogram- and correlation filter-based representation to represent the target object can be used since they each have advantages that can be exploited to compensate the other’s drawback in visual tracking. Further, a tracking algorithm can fail because of the distractor, even when complementary learners have been implemented for the target object representation. In this study, we show that, in order to handle the distractor, first the distractor must be detected by learning the responses from the color-histogram- and correlation-filter-based representation. Then, to determine the target location, we can decide whether the responses from each representation should be merged or only the response from the correlation filter should be used. This decision depends on the result obtained from the distractor detection process. Experiments were performed on the widely used VOT2014 and VOT2015 benchmark datasets. It was verified that our proposed method performs favorably as compared with several state-of-the-art visual tracking algorithms.

Download Full-text

Release the Power of Online-Training for Robust Visual Tracking

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6956 ◽

2020 ◽

Vol 34 (07) ◽

pp. 12645-12652

Author(s):

Yifan Yang ◽

Guorong Li ◽

Yuankai Qi ◽

QIngming Huang

Keyword(s):

Visual Tracking ◽

High Performance ◽

Large Scale ◽

Feature Space ◽

Online Training ◽

Training Data ◽

Semantic Features ◽

Tracking Accuracy ◽

Tightly Coupled ◽

High Level

Convolutional neural networks (CNNs) have been widely adopted in the visual tracking community, significantly improving the state-of-the-art. However, most of them ignore the important cues lying in the distribution of training data and high-level features that are tightly coupled with the target/background classification. In this paper, we propose to improve the tracking accuracy via online training. On the one hand, we squeeze redundant training data by analyzing the dataset distribution in low-level feature space. On the other hand, we design statistic-based losses to increase the inter-class distance while decreasing the intra-class variance of high-level semantic features. We demonstrate the effectiveness on top of two high-performance tracking methods: MDNet and DAT. Experimental results on the challenging large-scale OTB2015 and UAVDT demonstrate the outstanding performance of our tracking method.

Download Full-text

Siamese High-Level Feature Refine Network for Visual Object Tracking

Electronics ◽

10.3390/electronics9111918 ◽

2020 ◽

Vol 9 (11) ◽

pp. 1918 ◽

Cited By ~ 1

Author(s):

Md. Maklachur Rahman ◽

Md Rishad Ahmed ◽

Lamyanba Laishram ◽

Seock Ho Kim ◽

Soon Ki Jung

Keyword(s):

Visual Tracking ◽

Feature Representation ◽

Visual Object ◽

Target Feature ◽

Discriminative Ability ◽

Visual Object Tracking ◽

Discrimination Ability ◽

Proposed Model ◽

Real Time Tracking ◽

High Level

Siamese network-based trackers are broadly applied to solve visual tracking problems due to its balanced performance in terms of speed and accuracy. Tracking desired objects in challenging scenarios is still one of the fundamental concerns during visual tracking. This research paper proposes a feature refined end-to-end tracking framework with real-time tracking speed and considerable performance. The feature refine network has been incorporated to enhance the target feature representation power, utilizing high-level semantic information. Besides, it allows the network to capture the salient information to locate the target and learns to represent the target feature in a more generalized way advancing the overall tracking performance, particularly in the challenging sequences. But, only the feature refine module is unable to handle such challenges because of its less discriminative ability. To overcome this difficulty, we employ an attention module inside the feature refine network that strengths the tracker discrimination ability between the target and background. Furthermore, we conduct extensive experiments to ensure the proposed tracker’s effectiveness using several popular tracking benchmarks, demonstrating that our proposed model achieves state-of-the-art performance over other trackers.

Download Full-text

A Robust Visual Tracking Algorithm Based on Spatial-Temporal Context Hierarchical Response Fusion

Algorithms ◽

10.3390/a12010008 ◽

2018 ◽

Vol 12 (1) ◽

pp. 8 ◽

Cited By ~ 2

Author(s):

Wancheng Zhang ◽

Yanmin Luo ◽

Zhi Chen ◽

Yongzhao Du ◽

Daxin Zhu ◽

...

Keyword(s):

Visual Tracking ◽

Correlation Filter ◽

Temporal Context ◽

Visual Object ◽

Correlation Filters ◽

Visual Object Tracking ◽

Illumination Changes ◽

Model Update ◽

Benchmark Datasets ◽

Hierarchical Features

Discriminative correlation filters (DCFs) have been shown to perform superiorly in visual object tracking. However, visual tracking is still challenging when the target objects undergo complex scenarios such as occlusion, deformation, scale changes and illumination changes. In this paper, we utilize the hierarchical features of convolutional neural networks (CNNs) and learn a spatial-temporal context correlation filter on convolutional layers. Then, the translation is estimated by fusing the response score of the filters on the three convolutional layers. In terms of scale estimation, we learn a discriminative correlation filter to estimate scale from the best confidence results. Furthermore, we proposed a re-detection activation discrimination method to improve the robustness of visual tracking in the case of tracking failure and an adaptive model update method to reduce tracking drift caused by noisy updates. We evaluate the proposed tracker with DCFs and deep features on OTB benchmark datasets. The tracking results demonstrated that the proposed algorithm is superior to several state-of-the-art DCF methods in terms of accuracy and robustness.

Download Full-text

CFNN: Correlation Filter Neural Network for Visual Object Tracking

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/309 ◽

2017 ◽

Cited By ~ 2

Author(s):

Yang Li ◽

Zhan Xu ◽

Jianke Zhu

Keyword(s):

Neural Network ◽

Visual Tracking ◽

Network Architecture ◽

Back Propagation ◽

Correlation Filter ◽

Visual Object ◽

Neural Network Architecture ◽

Visual Object Tracking ◽

Single Target ◽

Wide Range

Albeit convolutional neural network (CNN) has shown promising capacity in many computer vision tasks, applying it to visual tracking is yet far from solved. Existing methods either employ a large external dataset to undertake exhaustive pre-training or suffer from less satisfactory results in terms of accuracy and robustness. To track single target in a wide range of videos, we present a novel Correlation Filter Neural Network architecture, as well as a complete visual tracking pipeline, The proposed approach is a special case of CNN, whose initialization does not need any pre-training on the external dataset. The initialization of network enjoys the merits of cyclic sampling to achieve the appealing discriminative capability, while the network updating scheme adopts advantages from back-propagation in order to capture new appearance variations. The tracking pipeline integrates both aspects well by making them complementary to each other. We validate our tracker on OTB-2013 benchmark. The proposed tracker obtains the promising results compared to most of existing representative trackers.

Download Full-text

The Multi-task Fully Convolutional Siamese Network with Correlation Filter Layer for Real-Time Visual Tracking

Pattern Recognition and Computer Vision - Lecture Notes in Computer Science ◽

10.1007/978-3-030-31654-9_11 ◽

2019 ◽

pp. 123-134

Author(s):

Shiyu Xuan ◽

Shengyang Li ◽

Zifei Zhao ◽

Mingfei Han

Keyword(s):

Real Time ◽

Visual Tracking ◽

Correlation Filter ◽

Siamese Network ◽

Filter Layer

Download Full-text

A Deep Hyper Siamese Network for Real-Time Object Tracking

Transactions on Machine Learning and Artificial Intelligence ◽

10.14738/tmlai.81.8020 ◽

2020 ◽

Vol 8 (1) ◽

pp. 35-46

Author(s):

Yongpeng Zhao ◽

Lasheng Yu ◽

Xiaopeng Zheng

Keyword(s):

Object Tracking ◽

Target Object ◽

Visual Object ◽

Feature Maps ◽

Deep Convolutional Neural Networks ◽

Backbone Networks ◽

Feature Representations ◽

Siamese Network ◽

Benchmark Datasets ◽

Siamese Networks

Siamese networks have drawn increasing interest in the field of visual object tracking due to their balance of precision and efficiency. However, Siamese trackers use relatively shallow backbone networks, such as AlexNet, and therefore do not take full advantage of the capabilities of modern deep convolutional neural networks (CNNs). Moreover, the feature representations of the target object in a Siamese tracker are extracted through the last layer of CNNs and mainly capture semantic information, which causes the tracker's precision to be relatively low and to drift easily in the presence of similar distractors. In this paper, a new nonpadding residual unit (NPRU) is designed and used to stack a 22-layer deep ResNet, referred as ResNet22. After utilizing ResNet22 as the backbone network, we can build a deep Siamese network, which can greatly enhance the tracking performance. Considering that the different levels of the feature maps of the CNN represent different aspects of the target object, we aggregated different deep convolutional layers to make use of ResNet22's multilevel feature maps, which can form hyperfeature representations of targets. The designed deep hyper Siamese network is named DHSiam. Experimental results show that DHSiam has achieved significant improvement on multiple benchmark datasets.

Download Full-text

Correlation Filter and Deep Siamese Network Hybrid Algorithm for Visual Object Tracking

2021 6th International Conference on Intelligent Computing and Signal Processing (ICSP) ◽

10.1109/icsp51882.2021.9408815 ◽

2021 ◽

Author(s):

Ying Hou ◽

Xinyu Lin ◽

Jiao Li

Keyword(s):

Object Tracking ◽

Hybrid Algorithm ◽

Correlation Filter ◽

Visual Object ◽

Visual Object Tracking ◽

Siamese Network

Download Full-text