Object Relocation Visual Tracking Based On Histogram Filter And Siamese Network

Mapping Intimacies ◽

10.21203/rs.3.rs-1201475/v1 ◽

2022 ◽

Author(s):

Jianlong Zhang ◽

Qiao Li ◽

Bin Wang ◽

Chen Chen ◽

Tianhong Wang ◽

...

Keyword(s):

Visual Tracking ◽

Image Matching ◽

Euclidean Distance ◽

State Of The Art ◽

Feature Representation ◽

Tracking Accuracy ◽

Match Filter ◽

Siamese Network ◽

Matching Process ◽

Dynamic Template

Abstract Siamese network based trackers formulate the visual tracking mission as an image matching process by regression and classification branches, which simplifies the network structure and improves tracking accuracy. However, there remain many problems as described below. 1) The lightweight neural networks decreases feature representation ability. The tracker is easy to fail under the disturbing distractors (e.g., deformation and similar objects) or large changes in viewing angle. 2) The tracker cannot adapt to variations of the object. 3) The tracker cannot reposition the object that has failed to track. To address these issues, we first propose a novel match filter arbiter based on the Euclidean distance histogram between the centers of multiple candidate objects to automatically determine whether the tracker fails. Secondly, Hopcroft-Karp algorithm is introduced to select the winners from the dynamic template set through the backtracking process, and object relocation is achieved by comparing the Gradient Magnitude Similarity Deviation between the template and the winners. The experiments show that our method obtains better performance on several tracking benchmarks, i.e., OTB100, VOT2018, GOT-10k and LaSOT, compared with state-of-the-art methods.

Download Full-text

Visual Tracking Using Wang–Landau Reinforcement Sampler

Applied Sciences ◽

10.3390/app10217780 ◽

2020 ◽

Vol 10 (21) ◽

pp. 7780

Author(s):

Dokyeong Kwon ◽

Junseok Kwon

Keyword(s):

Visual Tracking ◽

Deep Neural Network ◽

Learning Algorithm ◽

State Of The Art ◽

Tracking System ◽

Network Architectures ◽

Tracking Accuracy ◽

Algorithm Performance ◽

Q Learning ◽

Number Of Visits

In this study, we present a novel tracking system, in which the tracking accuracy can be considerably enhanced by state prediction. Accordingly, we present a new Q-learning-based reinforcement method, augmented by Wang–Landau sampling. In the proposed method, reinforcement learning is used to predict a target configuration for the subsequent frame, while Wang–Landau sampler balances the exploitation and exploration degrees of the prediction. Our method can adapt to control the randomness of policy, using statistics on the number of visits in a particular state. Thus, our method considerably enhances conventional Q-learning algorithm performance, which also enhances visual tracking performance. Numerical results demonstrate that our method substantially outperforms other state-of-the-art visual trackers and runs in realtime because our method contains no complicated deep neural network architectures.

Download Full-text

Image Matching via Loopy RNN

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/335 ◽

2017 ◽

Author(s):

Donghao Luo ◽

Bingbing Ni ◽

Yichao Yan ◽

Xiaokang Yang

Keyword(s):

Image Matching ◽

Short Term Memory ◽

Vision System ◽

Feature Representation ◽

Image Feature ◽

Matching Process ◽

Head Node ◽

Long Short Term Memory ◽

Input Gate ◽

Matching Score

Most existing matching algorithms are one-off algorithms, i.e., they usually measure the distance between the two image feature representation vectors for only one time. In contrast, human's vision system achieves this task, i.e., image matching, by recursively looking at specific/related parts of both images and then making the final judgement. Towards this end, we propose a novel loopy recurrent neural network (Loopy RNN), which is capable of aggregating relationship information of two input images in a progressive/iterative manner and outputting the consolidated matching score in the final iteration. A Loopy RNN features two uniqueness. First, built on conventional long short-term memory (LSTM) nodes, it links the output gate of the tail node to the input gate of the head node, thus it brings up symmetry property required for matching. Second, a monotonous loss designed for the proposed network guarantees increasing confidence during the recursive matching process. Extensive experiments on several image matching benchmarks demonstrate the great potential of the proposed method.

Download Full-text

Deeper Siamese Network With Stronger Feature Representation for Visual Tracking

IEEE Access ◽

10.1109/access.2020.3005511 ◽

2020 ◽

Vol 8 ◽

pp. 119094-119104

Author(s):

Chaoyi Zhang ◽

Howard Wang ◽

Jiwei Wen ◽

Li Peng

Keyword(s):

Visual Tracking ◽

Feature Representation ◽

Siamese Network

Download Full-text

Robust Template Adjustment Siamese Network for Object Visual Tracking

Sensors ◽

10.3390/s21041466 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1466

Author(s):

Chuanming Tang ◽

Peng Qin ◽

Jianlin Zhang

Keyword(s):

Feature Extraction ◽

Visual Tracking ◽

State Of The Art ◽

Tracking Problem ◽

Training Process ◽

Current Frame ◽

Siamese Network ◽

Efficient Learning ◽

Siamese Networks

Most of the existing trackers address the visual tracking problem by extracting an appearance template from the first frame, which is used to localize the target in the current frame. Unfortunately, they typically face the model degeneration challenge, which easily results in model drift and target loss. To address this issue, a novel Template Adjustment Siamese Network (TA-Siam) is proposed in this paper. The proposed framework TA-Siam consists of two simple subnetworks: The template adjustment subnetwork for feature extraction and the classification-regression subnetwork for bounding box prediction. The template adjustment module adaptively uses the feature of subsequent frames to adjust the current template. It makes the template adapt to the target appearance variation of long-term sequence and effectively overcomes model drift problem of Siamese networks. In order to reduce classification errors, the rhombus labels are proposed in our TA-Siam. For more efficient learning and faster convergence, our proposed tracker uses a more effective regression loss in the training process. Extensive experiments and comparisons with trackers are conducted on the challenging benchmarks including VOT2016, VOT2018, OTB50, OTB100, GOT-10K, and LaSOT. Our TA-Siam achieves state-of-the-art performance at the speed of 45 FPS.

Download Full-text

Local Representation is Not Enough: Soft Point-Wise Transformer for Descriptor and Detector of Local Features

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/159 ◽

2021 ◽

Author(s):

Zihao Wang ◽

Xueyi Li ◽

Zhen Li

Keyword(s):

Image Matching ◽

State Of The Art ◽

Local Features ◽

Feature Representation ◽

Detection Accuracy ◽

Visual Localization ◽

Feature Maps ◽

Localization Accuracy ◽

Multi Level ◽

Soft Point

Significant progress has been witnessed for the descriptor and detector of local features, but there still exist several challenging and intractable limitations, such as insufficient localization accuracy and non-discriminative description, especially in repetitive- or blank-texture regions, which haven't be well addressed. The coarse feature representation and limited receptive field are considered as the main issues for these limitations. To address these issues, we propose a novel Soft Point-Wise Transformer for Descriptor and Detector, simultaneously mining long-range intrinsic and cross-scale dependencies of local features. Furthermore, our model leverages the distinct transformers based on the soft point-wise attention, substantially decreasing the memory and computation complexity, especially for high-resolution feature maps. In addition, multi-level decoder is constructed to guarantee the high detection accuracy and discriminative description. Extensive experiments demonstrate that our model outperforms the existing state-of-the-art methods on the image matching and visual localization benchmarks.

Download Full-text

Deeper Siamese network with multi‐level feature fusion for real‐time visual tracking

Electronics Letters ◽

10.1049/el.2019.1041 ◽

2019 ◽

Vol 55 (13) ◽

pp. 742-745 ◽

Cited By ~ 2

Author(s):

Kang Yang ◽

Huihui Song ◽

Kaihua Zhang ◽

Jiaqing Fan

Keyword(s):

Real Time ◽

Visual Tracking ◽

Feature Fusion ◽

Siamese Network ◽

Multi Level

Download Full-text

Exemplar Loss for Siamese Network in Visual Tracking

2020 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom) ◽

10.1109/ispa-bdcloud-socialcom-sustaincom51426.2020.00210 ◽

2020 ◽

Author(s):

Shuo Chang ◽

Hua Lu ◽

Sai Huang ◽

Yifan Zhang

Keyword(s):

Visual Tracking ◽

Siamese Network

Download Full-text

Cross-Modality Image Matching Network With Modality-Invariant Feature Representation for Airborne-Ground Thermal Infrared and Visible Datasets

IEEE Transactions on Geoscience and Remote Sensing ◽

10.1109/tgrs.2021.3099506 ◽

2021 ◽

pp. 1-14

Author(s):

Song Cui ◽

Ailong Ma ◽

Yuting Wan ◽

Yanfei Zhong ◽

Bin Luo ◽

...

Keyword(s):

Image Matching ◽

Thermal Infrared ◽

Feature Representation ◽

Matching Network ◽

Invariant Feature

Download Full-text

Bayesian Covariance Representation with Global Informative Prior for 3D Action Recognition

ACM Transactions on Multimedia Computing Communications and Applications ◽

10.1145/3460235 ◽

2021 ◽

Vol 17 (4) ◽

pp. 1-22

Author(s):

Jianhai Zhang ◽

Zhiyong Feng ◽

Yong Su ◽

Meng Xing

Keyword(s):

Action Recognition ◽

Temporal Order ◽

State Of The Art ◽

Feature Representation ◽

Independent Action ◽

Sufficient Statistics ◽

Informative Prior ◽

Global Action ◽

High Order Statistics ◽

Order Structures

For the merits of high-order statistics and Riemannian geometry, covariance matrix has become a generic feature representation for action recognition. An independent action can be represented by an empirical statistics over all of its pose samples. Two major problems of covariance include the following: (1) it is prone to be singular so that actions fail to be represented properly, and (2) it is short of global action/pose-aware information so that expressive and discriminative power is limited. In this article, we propose a novel Bayesian covariance representation by a prior regularization method to solve the preceding problems. Specifically, covariance is viewed as a parametric maximum likelihood estimate of Gaussian distribution over local poses from an independent action. Then, a Global Informative Prior (GIP) is generated over global poses with sufficient statistics to regularize covariance. In this way, (1) singularity is greatly relieved due to sufficient statistics, (2) global pose information of GIP makes Bayesian covariance theoretically equivalent to a saliency weighting covariance over global action poses so that discriminative characteristics of actions can be represented more clearly. Experimental results show that our Bayesian covariance with GIP efficiently improves the performance of action recognition. In some databases, it outperforms the state-of-the-art variant methods that are based on kernels, temporal-order structures, and saliency weighting attentions, among others.

Download Full-text

Implementation of Verification and Matching E-KTP with Faster R-CNN and ORB

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v5i4.3175 ◽

2021 ◽

Vol 5 (4) ◽

pp. 783-793

Author(s):

Muhammad Muttabi Hudaya ◽

Siti Saadah ◽

Hendy Irawan

Keyword(s):

Image Matching ◽

Nearest Neighbor ◽

Feature Matching ◽

Image Feature ◽

Brute Force ◽

K Nearest Neighbor ◽

Matching Method ◽

Average Precision ◽

Detection Model ◽

Matching Process

needs a solid validation that has verification and matching uploaded images. To solve this problem, this paper implementing a detection model using Faster R-CNN and a matching method using ORB (Oriented FAST and Rotated BRIEF) and KNN-BFM (K-Nearest Neighbor Brute Force Matcher). The goal of the implementations is to reach both an 80% mark of accuracy and prove matching using ORB only can be a replaced OCR technique. The implementation accuracy results in the detection model reach mAP (Mean Average Precision) of 94%. But, the matching process only achieves an accuracy of 43,46%. The matching process using only image feature matching underperforms the previous OCR technique but improves processing time from 4510ms to 60m). Image matching accuracy has proven to increase by using a high-quality dan high quantity dataset, extracting features on the important area of EKTP card images.

Download Full-text