Auto-attentional mechanism in multi-domain convolutional neural networks for improving object tracking

2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Jinchao Huang

PurposeMulti-domain convolutional neural network (MDCNN) model has been widely used in object recognition and tracking in the field of computer vision. However, if the objects to be tracked move rapid or the appearances of moving objects vary dramatically, the conventional MDCNN model will suffer from the model drift problem. To solve such problem in tracking rapid objects under limiting environment for MDCNN model, this paper proposed an auto-attentional mechanism-based MDCNN (AA-MDCNN) model for the rapid moving and changing objects tracking under limiting environment.Design/methodology/approachFirst, to distinguish the foreground object between background and other similar objects, the auto-attentional mechanism is used to selectively aggregate the weighted summation of all feature maps to make the similar features related to each other. Then, the bidirectional gated recurrent unit (Bi-GRU) architecture is used to integrate all the feature maps to selectively emphasize the importance of the correlated feature maps. Finally, the final feature map is obtained by fusion the above two feature maps for object tracking. In addition, a composite loss function is constructed to solve the similar but different attribute sequences tracking using conventional MDCNN model.FindingsIn order to validate the effectiveness and feasibility of the proposed AA-MDCNN model, this paper used ImageNet-Vid dataset to train the object tracking model, and the OTB-50 dataset is used to validate the AA-MDCNN tracking model. Experimental results have shown that the augmentation of auto-attentional mechanism will improve the accuracy rate 2.75% and success rate 2.41%, respectively. In addition, the authors also selected six complex tracking scenarios in OTB-50 dataset; over eleven attributes have been validated that the proposed AA-MDCNN model outperformed than the comparative models over nine attributes. In addition, except for the scenario of multi-objects moving with each other, the proposed AA-MDCNN model solved the majority rapid moving objects tracking scenarios and outperformed than the comparative models on such complex scenarios.Originality/valueThis paper introduced the auto-attentional mechanism into MDCNN model and adopted Bi-GRU architecture to extract key features. By using the proposed AA-MDCNN model, rapid object tracking under complex background, motion blur and occlusion objects has better effect, and such model is expected to be further applied to the rapid object tracking in the real world.

2019 ◽  
Vol 50 (1) ◽  
pp. 27-32
Author(s):  
Yingwu Fang

The objective is to present an adaptive tracking approach of visual objects in the framework of particle filters (PF) and conflict redistribution strategy based on Dezert-Smarandache theory (DSmT). A combination rule of conflict redistribution was introduced into the researches on visual objects tracking, position and color information of moving objects were combined by modified combination strategy information, and the objects tracking model of fusing multiple information on the condition of occluded objects was established effectively. On the basis of building the execution panel of objects tracking simulation platform, many tracking experiments in complicated scene were carried out to validate the correctness and availability of the introduced method by comparing with other tracking method. Results showed that the introduced approach had excellent adaptive ability for dealing with high conflict among evidences under different occlusion condition, and could solve partly high-level vision tracking problems efficiently.


2016 ◽  
Vol 11 (4) ◽  
pp. 324
Author(s):  
Nor Nadirah Abdul Aziz ◽  
Yasir Mohd Mustafah ◽  
Amelia Wong Azman ◽  
Amir Akramin Shafie ◽  
Muhammad Izad Yusoff ◽  
...  

Author(s):  
Wei Huang ◽  
Xiaoshu Zhou ◽  
Mingchao Dong ◽  
Huaiyu Xu

AbstractRobust and high-performance visual multi-object tracking is a big challenge in computer vision, especially in a drone scenario. In this paper, an online Multi-Object Tracking (MOT) approach in the UAV system is proposed to handle small target detections and class imbalance challenges, which integrates the merits of deep high-resolution representation network and data association method in a unified framework. Specifically, while applying tracking-by-detection architecture to our tracking framework, a Hierarchical Deep High-resolution network (HDHNet) is proposed, which encourages the model to handle different types and scales of targets, and extract more effective and comprehensive features during online learning. After that, the extracted features are fed into different prediction networks for interesting targets recognition. Besides, an adjustable fusion loss function is proposed by combining focal loss and GIoU loss to solve the problems of class imbalance and hard samples. During the tracking process, these detection results are applied to an improved DeepSORT MOT algorithm in each frame, which is available to make full use of the target appearance features to match one by one on a practical basis. The experimental results on the VisDrone2019 MOT benchmark show that the proposed UAV MOT system achieves the highest accuracy and the best robustness compared with state-of-the-art methods.


Symmetry ◽  
2021 ◽  
Vol 13 (2) ◽  
pp. 266 ◽  
Author(s):  
Yifeng Wang ◽  
Zhijiang Zhang ◽  
Ning Zhang ◽  
Dan Zeng

The one-shot multiple object tracking (MOT) framework has drawn more and more attention in the MOT research community due to its advantage in inference speed. However, the tracking accuracy of current one-shot approaches could lead to an inferior performance compared with their two-stage counterparts. The reasons are two-fold: one is that motion information is often neglected due to the single-image input. The other is that detection and re-identification (ReID) are two different tasks with different focuses. Joining detection and re-identification at the training stage could lead to a suboptimal performance. To alleviate the above limitations, we propose a one-shot network named Motion and Correlation-Multiple Object Tracking (MAC-MOT). MAC-MOT introduces a motion enhance attention module (MEA) and a dual correlation attention module (DCA). MEA performs differences on adjacent feature maps which enhances the motion-related features while suppressing irrelevant information. The DCA module focuses on decoupling the detection task and re-identification task to strike a balance and reduce the competition between these two tasks. Moreover, symmetry is a core design idea in our proposed framework which is reflected in Siamese-based deep learning backbone networks, the input of dual stream images, as well as a dual correlation attention module. Our proposed approach is evaluated on the popular multiple object tracking benchmarks MOT16 and MOT17. We demonstrate that the proposed MAC-MOT can achieve a better performance than the baseline state of the arts (SOTAs).


Sensors ◽  
2021 ◽  
Vol 21 (9) ◽  
pp. 2894
Author(s):  
Minh-Quan Dao ◽  
Vincent Frémont

Multi-Object Tracking (MOT) is an integral part of any autonomous driving pipelines because it produces trajectories of other moving objects in the scene and predicts their future motion. Thanks to the recent advances in 3D object detection enabled by deep learning, track-by-detection has become the dominant paradigm in 3D MOT. In this paradigm, a MOT system is essentially made of an object detector and a data association algorithm which establishes track-to-detection correspondence. While 3D object detection has been actively researched, association algorithms for 3D MOT has settled at bipartite matching formulated as a Linear Assignment Problem (LAP) and solved by the Hungarian algorithm. In this paper, we adapt a two-stage data association method which was successfully applied to image-based tracking to the 3D setting, thus providing an alternative for data association for 3D MOT. Our method outperforms the baseline using one-stage bipartite matching for data association by achieving 0.587 Average Multi-Object Tracking Accuracy (AMOTA) in NuScenes validation set and 0.365 AMOTA (at level 2) in Waymo test set.


2020 ◽  
Vol 13 (1) ◽  
pp. 60
Author(s):  
Chenjie Wang ◽  
Chengyuan Li ◽  
Jun Liu ◽  
Bin Luo ◽  
Xin Su ◽  
...  

Most scenes in practical applications are dynamic scenes containing moving objects, so accurately segmenting moving objects is crucial for many computer vision applications. In order to efficiently segment all the moving objects in the scene, regardless of whether the object has a predefined semantic label, we propose a two-level nested octave U-structure network with a multi-scale attention mechanism, called U2-ONet. U2-ONet takes two RGB frames, the optical flow between these frames, and the instance segmentation of the frames as inputs. Each stage of U2-ONet is filled with the newly designed octave residual U-block (ORSU block) to enhance the ability to obtain more contextual information at different scales while reducing the spatial redundancy of the feature maps. In order to efficiently train the multi-scale deep network, we introduce a hierarchical training supervision strategy that calculates the loss at each level while adding knowledge-matching loss to keep the optimization consistent. The experimental results show that the proposed U2-ONet method can achieve a state-of-the-art performance in several general moving object segmentation datasets.


Electronics ◽  
2021 ◽  
Vol 10 (9) ◽  
pp. 1067
Author(s):  
Tongtong Yuan ◽  
Wenzhu Yang ◽  
Qian Li ◽  
Yuxia Wang

Siamese trackers are widely used in various fields for their advantages of balancing speed and accuracy. Compared with the anchor-based method, the anchor-free-based approach can reach faster speeds without any drop in precision. Inspired by the Siamese network and anchor-free idea, an anchor-free Siamese network (AFSN) with multi-template updates for object tracking is proposed. To improve tracking performance, a dual-fusion method is adopted in which the multi-layer features and multiple prediction results are combined respectively. The low-level feature maps are concatenated with the high-level feature maps to make full use of both spatial and semantic information. To make the results as stable as possible, the final results are obtained by combining multiple prediction results. Aiming at the template update, a high-confidence multi-template update mechanism is used. The average peak to correlation energy is used to determine whether the template should be updated. We use the anchor-free network to implement object tracking in a per-pixel manner, which computes the object category and bounding boxes directly. Experimental results indicate that the average overlap and success rate of the proposed algorithm increase by about 5% and 10%, respectively, compared to the SiamRPN++ algorithm when running on the dataset of GOT-10k (Generic Object Tracking Benchmark).


Sensors ◽  
2021 ◽  
Vol 21 (8) ◽  
pp. 2841
Author(s):  
Khizer Mehmood ◽  
Abdul Jalil ◽  
Ahmad Ali ◽  
Baber Khan ◽  
Maria Murad ◽  
...  

Despite eminent progress in recent years, various challenges associated with object tracking algorithms such as scale variations, partial or full occlusions, background clutters, illumination variations are still required to be resolved with improved estimation for real-time applications. This paper proposes a robust and fast algorithm for object tracking based on spatio-temporal context (STC). A pyramid representation-based scale correlation filter is incorporated to overcome the STC’s inability on the rapid change of scale of target. It learns appearance induced by variations in the target scale sampled at a different set of scales. During occlusion, most correlation filter trackers start drifting due to the wrong update of samples. To prevent the target model from drift, an occlusion detection and handling mechanism are incorporated. Occlusion is detected from the peak correlation score of the response map. It continuously predicts target location during occlusion and passes it to the STC tracking model. After the successful detection of occlusion, an extended Kalman filter is used for occlusion handling. This decreases the chance of tracking failure as the Kalman filter continuously updates itself and the tracking model. Further improvement to the model is provided by fusion with average peak to correlation energy (APCE) criteria, which automatically update the target model to deal with environmental changes. Extensive calculations on the benchmark datasets indicate the efficacy of the proposed tracking method with state of the art in terms of performance analysis.


Sign in / Sign up

Export Citation Format

Share Document