scholarly journals Object Relocation Visual Tracking Based On Histogram Filter And Siamese Network

Author(s):  
Jianlong Zhang ◽  
Qiao Li ◽  
Bin Wang ◽  
Chen Chen ◽  
Tianhong Wang ◽  
...  

Abstract Siamese network based trackers formulate the visual tracking mission as an image matching process by regression and classification branches, which simplifies the network structure and improves tracking accuracy. However, there remain many problems as described below. 1) The lightweight neural networks decreases feature representation ability. The tracker is easy to fail under the disturbing distractors (e.g., deformation and similar objects) or large changes in viewing angle. 2) The tracker cannot adapt to variations of the object. 3) The tracker cannot reposition the object that has failed to track. To address these issues, we first propose a novel match filter arbiter based on the Euclidean distance histogram between the centers of multiple candidate objects to automatically determine whether the tracker fails. Secondly, Hopcroft-Karp algorithm is introduced to select the winners from the dynamic template set through the backtracking process, and object relocation is achieved by comparing the Gradient Magnitude Similarity Deviation between the template and the winners. The experiments show that our method obtains better performance on several tracking benchmarks, i.e., OTB100, VOT2018, GOT-10k and LaSOT, compared with state-of-the-art methods.

2020 ◽  
Vol 10 (21) ◽  
pp. 7780
Author(s):  
Dokyeong Kwon ◽  
Junseok Kwon

In this study, we present a novel tracking system, in which the tracking accuracy can be considerably enhanced by state prediction. Accordingly, we present a new Q-learning-based reinforcement method, augmented by Wang–Landau sampling. In the proposed method, reinforcement learning is used to predict a target configuration for the subsequent frame, while Wang–Landau sampler balances the exploitation and exploration degrees of the prediction. Our method can adapt to control the randomness of policy, using statistics on the number of visits in a particular state. Thus, our method considerably enhances conventional Q-learning algorithm performance, which also enhances visual tracking performance. Numerical results demonstrate that our method substantially outperforms other state-of-the-art visual trackers and runs in realtime because our method contains no complicated deep neural network architectures.


Author(s):  
Donghao Luo ◽  
Bingbing Ni ◽  
Yichao Yan ◽  
Xiaokang Yang

Most existing matching algorithms are one-off algorithms, i.e., they usually measure the distance between the two image feature representation vectors for only one time. In contrast, human's vision system achieves this task, i.e., image matching, by recursively looking at specific/related parts of both images and then making the final judgement. Towards this end, we propose a novel loopy recurrent neural network (Loopy RNN), which is capable of aggregating relationship information of two input images in a progressive/iterative manner and outputting the consolidated matching score in the final iteration. A Loopy RNN features two uniqueness. First, built on conventional long short-term memory (LSTM) nodes, it links the output gate of the tail node to the input gate of the head node, thus it brings up symmetry property required for matching. Second, a monotonous loss designed for the proposed network guarantees increasing confidence during the recursive matching process. Extensive experiments on several image matching benchmarks demonstrate the great potential of the proposed method.


IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 119094-119104
Author(s):  
Chaoyi Zhang ◽  
Howard Wang ◽  
Jiwei Wen ◽  
Li Peng

Sensors ◽  
2021 ◽  
Vol 21 (4) ◽  
pp. 1466
Author(s):  
Chuanming Tang ◽  
Peng Qin ◽  
Jianlin Zhang

Most of the existing trackers address the visual tracking problem by extracting an appearance template from the first frame, which is used to localize the target in the current frame. Unfortunately, they typically face the model degeneration challenge, which easily results in model drift and target loss. To address this issue, a novel Template Adjustment Siamese Network (TA-Siam) is proposed in this paper. The proposed framework TA-Siam consists of two simple subnetworks: The template adjustment subnetwork for feature extraction and the classification-regression subnetwork for bounding box prediction. The template adjustment module adaptively uses the feature of subsequent frames to adjust the current template. It makes the template adapt to the target appearance variation of long-term sequence and effectively overcomes model drift problem of Siamese networks. In order to reduce classification errors, the rhombus labels are proposed in our TA-Siam. For more efficient learning and faster convergence, our proposed tracker uses a more effective regression loss in the training process. Extensive experiments and comparisons with trackers are conducted on the challenging benchmarks including VOT2016, VOT2018, OTB50, OTB100, GOT-10K, and LaSOT. Our TA-Siam achieves state-of-the-art performance at the speed of 45 FPS.


Author(s):  
Zihao Wang ◽  
Xueyi Li ◽  
Zhen Li

Significant progress has been witnessed for the descriptor and detector of local features, but there still exist several challenging and intractable limitations, such as insufficient localization accuracy and non-discriminative description, especially in repetitive- or blank-texture regions, which haven't be well addressed. The coarse feature representation and limited receptive field are considered as the main issues for these limitations. To address these issues, we propose a novel Soft Point-Wise Transformer for Descriptor and Detector, simultaneously mining long-range intrinsic and cross-scale dependencies of local features. Furthermore, our model leverages the distinct transformers based on the soft point-wise attention, substantially decreasing the memory and computation complexity, especially for high-resolution feature maps. In addition, multi-level decoder is constructed to guarantee the high detection accuracy and discriminative description. Extensive experiments demonstrate that our model outperforms the existing state-of-the-art methods on the image matching and visual localization benchmarks.


2019 ◽  
Vol 55 (13) ◽  
pp. 742-745 ◽  
Author(s):  
Kang Yang ◽  
Huihui Song ◽  
Kaihua Zhang ◽  
Jiaqing Fan

Author(s):  
Jianhai Zhang ◽  
Zhiyong Feng ◽  
Yong Su ◽  
Meng Xing

For the merits of high-order statistics and Riemannian geometry, covariance matrix has become a generic feature representation for action recognition. An independent action can be represented by an empirical statistics over all of its pose samples. Two major problems of covariance include the following: (1) it is prone to be singular so that actions fail to be represented properly, and (2) it is short of global action/pose-aware information so that expressive and discriminative power is limited. In this article, we propose a novel Bayesian covariance representation by a prior regularization method to solve the preceding problems. Specifically, covariance is viewed as a parametric maximum likelihood estimate of Gaussian distribution over local poses from an independent action. Then, a Global Informative Prior (GIP) is generated over global poses with sufficient statistics to regularize covariance. In this way, (1) singularity is greatly relieved due to sufficient statistics, (2) global pose information of GIP makes Bayesian covariance theoretically equivalent to a saliency weighting covariance over global action poses so that discriminative characteristics of actions can be represented more clearly. Experimental results show that our Bayesian covariance with GIP efficiently improves the performance of action recognition. In some databases, it outperforms the state-of-the-art variant methods that are based on kernels, temporal-order structures, and saliency weighting attentions, among others.


2021 ◽  
Vol 5 (4) ◽  
pp. 783-793
Author(s):  
Muhammad Muttabi Hudaya ◽  
Siti Saadah ◽  
Hendy Irawan

needs a solid validation that has verification and matching uploaded images. To solve this problem, this paper implementing a detection model using Faster R-CNN and a matching method using ORB (Oriented FAST and Rotated BRIEF) and KNN-BFM (K-Nearest Neighbor Brute Force Matcher). The goal of the implementations is to reach both an 80% mark of accuracy and prove matching using ORB only can be a replaced OCR technique. The implementation accuracy results in the detection model reach mAP (Mean Average Precision) of 94%. But, the matching process only achieves an accuracy of 43,46%. The matching process using only image feature matching underperforms the previous OCR technique but improves processing time from 4510ms to 60m). Image matching accuracy has proven to increase by using a high-quality dan high quantity dataset, extracting features on the important area of EKTP card images.


Sign in / Sign up

Export Citation Format

Share Document