HRSiam: High-Resolution Siamese Network, Towards Space-Borne Satellite Video Tracking

2021 ◽  
Vol 30 ◽  
pp. 3056-3068
Author(s):  
Jia Shao ◽  
Bo Du ◽  
Chen Wu ◽  
Mingming Gong ◽  
Tongliang Liu
2021 ◽  
Vol 13 (7) ◽  
pp. 1298
Author(s):  
Kun Zhu ◽  
Xiaodong Zhang ◽  
Guanzhou Chen ◽  
Xiaoliang Tan ◽  
Puyun Liao ◽  
...  

Satellite video single object tracking has attracted wide attention. The development of remote sensing platforms for earth observation technologies makes it increasingly convenient to acquire high-resolution satellite videos, which greatly accelerates ground target tracking. However, overlarge images with small object size, high similarity among multiple moving targets, and poor distinguishability between the objects and the background make this task most challenging. To solve these problems, a deep Siamese network (DSN) incorporating an interframe difference centroid inertia motion (ID-CIM) model is proposed in this paper. In object tracking tasks, the DSN inherently includes a template branch and a search branch; it extracts the features from these two branches and employs a Siamese region proposal network to obtain the position of the target in the search branch. The ID-CIM mechanism was proposed to alleviate model drift. These two modules build the ID-DSN framework and mutually reinforce the final tracking results. In addition, we also adopted existing object detection datasets for remotely sensed images to generate training datasets suitable for satellite video single object tracking. Ablation experiments were performed on six high-resolution satellite videos acquired from the International Space Station and “Jilin-1” satellites. We compared the proposed ID-DSN results with other 11 state-of-the-art trackers, including different networks and backbones. The comparison results show that our ID-DSN obtained a precision criterion of 0.927 and a success criterion of 0.694 with a frames per second (FPS) value of 32.117 implemented on a single NVIDIA GTX1070Ti GPU.


Sensors ◽  
2020 ◽  
Vol 20 (17) ◽  
pp. 4807
Author(s):  
Dawei Zhang ◽  
Zhonglong Zheng ◽  
Tianxiang Wang ◽  
Yiran He

Siamese network-based trackers consider tracking as features cross-correlation between the target template and the search region. Therefore, feature representation plays an important role for constructing a high-performance tracker. However, all existing Siamese networks extract the deep but low-resolution features of the entire patch, which is not robust enough to estimate the target bounding box accurately. In this work, to address this issue, we propose a novel high-resolution Siamese network, which connects the high-to-low resolution convolution streams in parallel as well as repeatedly exchanges the information across resolutions to maintain high-resolution representations. The resulting representation is semantically richer and spatially more precise by a simple yet effective multi-scale feature fusion strategy. Moreover, we exploit attention mechanisms to learn object-aware masks for adaptive feature refinement, and use deformable convolution to handle complex geometric transformations. This makes the target more discriminative against distractors and background. Without bells and whistles, extensive experiments on popular tracking benchmarks containing OTB100, UAV123, VOT2018 and LaSOT demonstrate that the proposed tracker achieves state-of-the-art performance and runs in real time, confirming its efficiency and effectiveness.


2020 ◽  
Vol 12 (9) ◽  
pp. 1441 ◽  
Author(s):  
Lijun Huang ◽  
Ru An ◽  
Shengyin Zhao ◽  
Tong Jiang ◽  
Hao Hu

Very high-resolution remote sensing change detection has always been an important research issue due to the registration error, robustness of the method, and monitoring accuracy, etc. This paper proposes a robust and more accurate approach of change detection (CD), and it is applied on a smaller experimental area, and then extended to a wider range. A feature space, including object features, Visual Geometry Group (VGG) depth features, and texture features, is constructed. The difference image is obtained by considering the contextual information in a radius scalable circular. This is to overcome the registration error caused by the rotation and shift of the instantaneous field of view and also to improve the reliability and robustness of the CD. To enhance the robustness of the U-Net model, the training dataset is constructed manually via various operations, such as blurring the image, increasing noise, and rotating the image. After this, the trained model is used to predict the experimental areas, which achieved 92.3% accuracy. The proposed method is compared with Support Vector Machine (SVM) and Siamese Network, and the check error rate dropped to 7.86%, while the Kappa increased to 0.8254. The results revealed that our method outperforms SVM and Siamese Network.


Sign in / Sign up

Export Citation Format

Share Document