scholarly journals Proposal-Based Visual Tracking Using Spatial Cascaded Transformed Region Proposal Network

Sensors ◽  
2020 ◽  
Vol 20 (17) ◽  
pp. 4810
Author(s):  
Ximing Zhang ◽  
Shujuan Luo ◽  
Xuewu Fan

Region proposal network (RPN) based trackers employ the classification and regression block to generate the proposals, the proposal that contains the highest similarity score is formulated to be the groundtruth candidate of next frame. However, region proposal network based trackers cannot make the best of the features from different convolutional layers, and the original loss function cannot alleviate the data imbalance issue of the training procedure. We propose the Spatial Cascaded Transformed RPN to combine the RPN and STN (spatial transformer network) together, in order to successfully obtain the proposals of high quality, which can simultaneously improves the robustness. The STN can transfer the spatial transformed features though different stages, which extends the spatial representation capability of such networks handling complex scenarios such as scale variation and affine transformation. We break the restriction though an easy samples penalization loss (shrinkage loss) instead of smooth L1 function. Moreover, we perform the multi-cue proposals re-ranking to guarantee the accuracy of the proposed tracker. We extensively prove the effectiveness of our proposed method on the ablation studies of the tracking datasets, which include OTB-2015 (Object Tracking Benchmark 2015), VOT-2018 (Visual Object Tracking 2018), LaSOT (Large Scale Single Object Tracking), TrackingNet (A Large-Scale Dataset and Benchmark for Object Tracking in the Wild) and UAV123 (UAV Tracking Dataset).

Author(s):  
Matthias Müller ◽  
Adel Bibi ◽  
Silvio Giancola ◽  
Salman Alsubaihi ◽  
Bernard Ghanem

Symmetry ◽  
2021 ◽  
Vol 13 (9) ◽  
pp. 1665
Author(s):  
Fei Chen ◽  
Xiaodong Wang

Recently, Discriminative Correlation Filters (DCF) have shown excellent performance in visual object tracking. The correlation for a computing response map can be conducted efficiently in Fourier domain by Discrete Fourier Transform (DFT) of inputs, where the DFT of an image has symmetry on the Fourier domain. To enhance the robustness and discriminative ability of the filters, many efforts have been devoted to optimizing the learning process. Regularization methods, such as spatial regularization or temporal regularization, used in existing DCF trackers aim to enhance the capacity of the filters. Most existing methods still fail to deal with severe appearance variations—in particular, the large scale and aspect ratio changes. In this paper, we propose a novel framework that employs adaptive spatial regularization and temporal regularization to learn reliable filters in both spatial and temporal domains for tracking. To alleviate the influence of the background and distractors to the non-rigid target objects, two sub-models are combined, and multiple features are utilized for learning of robust correlation filters. In addition, most DCF trackers that applied 1-dimensional scale space search method suffered from appearance changes, such as non-rigid deformation. We proposed a 2-dimensional scale space search method to find appropriate scales to adapt to large scale and aspect ratio changes. We perform comprehensive experiments on four benchmarks: OTB-100, VOT-2016, VOT-2018, and LaSOT. The experimental results illustrate the effectiveness of our tracker, which achieved a competitive tracking performance. On OTB-100, our tracker achieved a gain of 0.8% in success, compared to the best existing DCF trackers. On VOT2018, our tracker outperformed the top DCF trackers with a gain of 1.1% in Expected Average Overlap (EAO). On LaSOT, we obtained a gain of 5.2% in success, compared to the best DCF trackers.


Author(s):  
Tianyang Xu ◽  
Zhenhua Feng ◽  
Xiao-Jun Wu ◽  
Josef Kittler

AbstractDiscriminative Correlation Filters (DCF) have been shown to achieve impressive performance in visual object tracking. However, existing DCF-based trackers rely heavily on learning regularised appearance models from invariant image feature representations. To further improve the performance of DCF in accuracy and provide a parsimonious model from the attribute perspective, we propose to gauge the relevance of multi-channel features for the purpose of channel selection. This is achieved by assessing the information conveyed by the features of each channel as a group, using an adaptive group elastic net inducing independent sparsity and temporal smoothness on the DCF solution. The robustness and stability of the learned appearance model are significantly enhanced by the proposed method as the process of channel selection performs implicit spatial regularisation. We use the augmented Lagrangian method to optimise the discriminative filters efficiently. The experimental results obtained on a number of well-known benchmarking datasets demonstrate the effectiveness and stability of the proposed method. A superior performance over the state-of-the-art trackers is achieved using less than $$10\%$$ 10 % deep feature channels.


2021 ◽  
Vol 434 ◽  
pp. 268-284
Author(s):  
Muxi Jiang ◽  
Rui Li ◽  
Qisheng Liu ◽  
Yingjing Shi ◽  
Esteban Tlelo-Cuautle

IEEE Access ◽  
2020 ◽  
pp. 1-1
Author(s):  
Ershen Wang ◽  
Donglei Wang ◽  
Yufeng Huang ◽  
Gang Tong ◽  
Song Xu ◽  
...  

2015 ◽  
Vol 10 (1) ◽  
pp. 167-188 ◽  
Author(s):  
Ahmad Ali ◽  
Abdul Jalil ◽  
Jianwei Niu ◽  
Xiaoke Zhao ◽  
Saima Rathore ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document