Soft labeling with quasi-Gaussian structure for training samples of deep classification trackers

Deep classification tracking aims at classifying the candidate samples into target or background by a classifier generally trained with a binary label. However, the binary label merely distinguishes samples of different classes, while inadvertently ignoring the distinction among the samples belonging to the same class, which weakens the classification and locating ability. To cope with this problem, this article proposes a soft labeling with quasi-Gaussian structure instead of the binary labeling, which distinguishes the samples belonging to different classes and the same class simultaneously. Like as the binary label, the signs of labels for target and background samples are set to be plus and minus respectively to distinguish samples of different classes. Further, to exploit the difference among samples in the same class, the label values of samples in the same class are designed as a monotonically decreasing quasi-Gaussian function about Intersection over Union. Therefore, the corresponding response function is a two-piecewise monotonically increasing quasi-Gaussian combination function about Intersection over Union. Due to such response function, deep classification tracking trained with this proposed soft labeling achieves better classification and location performance. To validate this, the proposed soft labeling is integrated into the pipeline of the deep classification tracker SiamFC. Experimental results on OTB-2015 and VOT benchmark show that our variant achieves significant improvement to the baseline tracker while maintaining real-time tracking speed and acquires comparable accuracy as recent state-of-the-art trackers.

Download Full-text

Random Forest with Adaptive Local Template for Pedestrian Detection

Mathematical Problems in Engineering ◽

10.1155/2015/767423 ◽

2015 ◽

Vol 2015 ◽

pp. 1-11 ◽

Cited By ~ 2

Author(s):

Tao Xiang ◽

Tao Li ◽

Mao Ye ◽

Zijian Liu

Keyword(s):

Computer Vision ◽

Random Forest ◽

Classification Accuracy ◽

Template Matching ◽

Detection Method ◽

State Of The Art ◽

Pedestrian Detection ◽

Sliding Window ◽

Experimental Results ◽

Training Samples

Pedestrian detection with large intraclass variations is still a challenging task in computer vision. In this paper, we propose a novel pedestrian detection method based on Random Forest. Firstly, we generate a few local templates with different sizes and different locations in positive exemplars. Then, the Random Forest is built whose splitting functions are optimized by maximizing class purity of matching the local templates to the training samples, respectively. To improve the classification accuracy, we adopt a boosting-like algorithm to update the weights of the training samples in a layer-wise fashion. During detection, the trained Random Forest will vote the category when a sliding window is input. Our contributions are the splitting functions based on local template matching with adaptive size and location and iteratively weight updating method. We evaluate the proposed method on 2 well-known challenging datasets: TUD pedestrians and INRIA pedestrians. The experimental results demonstrate that our method achieves state-of-the-art or competitive performance.

Download Full-text

A Tour of Lattice-Based Skyline Algorithms

Handbook of Research on Investigations in Artificial Life Research and Development - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-5225-5396-0.ch006 ◽

2018 ◽

pp. 96-122

Author(s):

Markus Endres ◽

Lena Rudenko

Keyword(s):

Real Time ◽

Data Streams ◽

High Performance ◽

State Of The Art ◽

Experimental Results ◽

Lattice Structures ◽

Skyline Query ◽

Basic Concepts ◽

Generic Index

A skyline query retrieves all objects in a dataset that are not dominated by other objects according to some given criteria. There exist many skyline algorithms which can be classified into generic, index-based, and lattice-based algorithms. This chapter takes a tour through lattice-based skyline algorithms. It summarizes the basic concepts and properties, presents high-performance parallel approaches, shows how one overcomes the low-cardinality restriction of lattice structures, and finally presents an application on data streams for real-time skyline computation. Experimental results on synthetic and real datasets show that lattice-based algorithms outperform state-of-the-art skyline techniques, and additionally have a linear runtime complexity.

Download Full-text

An Efficient Tongue Segmentation Model Based on U-Net Framework

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001421540355 ◽

2021 ◽

Author(s):

Qunsheng Ruan ◽

Qingfeng Wu ◽

Junfeng Yao ◽

Yingdong Wang ◽

Hsien-Wei Tseng ◽

...

Keyword(s):

Loss Function ◽

Loss Rate ◽

State Of The Art ◽

Experimental Results ◽

Cross Entropy ◽

Model Based ◽

Training Samples ◽

Net Framework ◽

Tongue Segmentation

In the intelligently processing of the tongue image, one of the most important tasks is to accurately segment the tongue body from a whole tongue image, and the good quality of tongue body edge processing is of great significance for the relevant tongue feature extraction. To improve the performance of the segmentation model for tongue images, we propose an efficient tongue segmentation model based on U-Net. Three important studies are launched, including optimizing the model’s main network, innovating a new network to specially handle tongue edge cutting and proposing a weighted binary cross-entropy loss function. The purpose of optimizing the tongue image main segmentation network is to make the model recognize the foreground and background features for the tongue image as well as possible. A novel tongue edge segmentation network is used to focus on handling the tongue edge because the edge of the tongue contains a number of important information. Furthermore, the advantageous loss function proposed is to be adopted to enhance the pixel supervision corresponding to tongue images. Moreover, thanks to a lack of tongue image resources on Traditional Chinese Medicine (TCM), some special measures are adopted to augment training samples. Various comparing experiments on two datasets were conducted to verify the performance of the segmentation model. The experimental results indicate that the loss rate of our model converges faster than the others. It is proved that our model has better stability and robustness of segmentation for tongue image from poor environment. The experimental results also indicate that our model outperforms the state-of-the-art ones in aspects of the two most important tongue image segmentation indexes: IoU and Dice. Moreover, experimental results on augmentation samples demonstrate our model have better performances.

Download Full-text

Training-Time-Friendly Network for Real-Time Object Detection

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6838 ◽

2020 ◽

Vol 34 (07) ◽

pp. 11685-11692

Author(s):

Zili Liu ◽

Tu Zheng ◽

Guodong Xu ◽

Zheng Yang ◽

Haifeng Liu ◽

...

Keyword(s):

Real Time ◽

State Of The Art ◽

Batch Size ◽

Training Process ◽

Training Time ◽

Sample Weights ◽

Gaussian Kernels ◽

Novel Approach ◽

Training Samples ◽

Speed And Accuracy

Modern object detectors can rarely achieve short training time, fast inference speed, and high accuracy at the same time. To strike a balance among them, we propose the Training-Time-Friendly Network (TTFNet). In this work, we start with light-head, single-stage, and anchor-free designs, which enable fast inference speed. Then, we focus on shortening training time. We notice that encoding more training samples from annotated boxes plays a similar role as increasing batch size, which helps enlarge the learning rate and accelerate the training process. To this end, we introduce a novel approach using Gaussian kernels to encode training samples. Besides, we design the initiative sample weights for better information utilization. Experiments on MS COCO show that our TTFNet has great advantages in balancing training time, inference speed, and accuracy. It has reduced training time by more than seven times compared to previous real-time detectors while maintaining state-of-the-art performances. In addition, our super-fast version of TTFNet-18 and TTFNet-53 can outperform SSD300 and YOLOv3 by less than one-tenth of their training time, respectively. The code has been made available at https://github.com/ZJULearning/ttfnet.

Download Full-text

Deep Learning-Based Real-Time Multiple-Object Detection and Tracking from Aerial Imagery via a Flying Robot with GPU-Based Embedded Devices

Sensors ◽

10.3390/s19153371 ◽

2019 ◽

Vol 19 (15) ◽

pp. 3371 ◽

Cited By ~ 16

Author(s):

Hossain ◽

Lee

Keyword(s):

Deep Learning ◽

Object Detection ◽

Real Time ◽

Moving Objects ◽

State Of The Art ◽

Target Position ◽

Guidance System ◽

Aerial Imagery ◽

Detection And Tracking ◽

Real Time Tracking

In recent years, demand has been increasing for target detection and tracking from aerial imagery via drones using onboard powered sensors and devices. We propose a very effective method for this application based on a deep learning framework. A state-of-the-art embedded hardware system empowers small flying robots to carry out the real-time onboard computation necessary for object tracking. Two types of embedded modules were developed: one was designed using a Jetson TX or AGX Xavier, and the other was based on an Intel Neural Compute Stick. These are suitable for real-time onboard computing power on small flying drones with limited space. A comparative analysis of current state-of-the-art deep learning-based multi-object detection algorithms was carried out utilizing the designated GPU-based embedded computing modules to obtain detailed metric data about frame rates, as well as the computation power. We also introduce an effective target tracking approach for moving objects. The algorithm for tracking moving objects is based on the extension of simple online and real-time tracking. It was developed by integrating a deep learning-based association metric approach with simple online and real-time tracking (Deep SORT), which uses a hypothesis tracking methodology with Kalman filtering and a deep learning-based association metric. In addition, a guidance system that tracks the target position using a GPU-based algorithm is introduced. Finally, we demonstrate the effectiveness of the proposed algorithms by real-time experiments with a small multi-rotor drone.

Download Full-text

Modeling Multi-Purpose Sessions for Next-Item Recommendations via Mixture-Channel Purpose Routing Networks

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/523 ◽

2019 ◽

Cited By ~ 8

Author(s):

Shoujin Wang ◽

Liang Hu ◽

Yan Wang ◽

Quan Z. Sheng ◽

Mehmet Orgun ◽

...

Keyword(s):

Recommender System ◽

Channel Model ◽

State Of The Art ◽

Recurrent Network ◽

The State ◽

Experimental Results ◽

Art Methods ◽

Recommendation Accuracy ◽

The Difference

A session-based recommender system (SBRS) suggests the next item by modeling the dependencies between items in a session. Most of existing SBRSs assume the items inside a session are associated with one (implicit) purpose. However, this may not always be true in reality, and a session may often consist of multiple subsets of items for different purposes (e.g., breakfast and decoration). Specifically, items (e.g., bread and milk) in a subsethave strong purpose-specific dependencies whereas items (e.g., bread and vase) from different subsets have much weaker or even no dependencies due to the difference of purposes. Therefore, we propose a mixture-channel model to accommodate the multi-purpose item subsets for more precisely representing a session. Filling gaps in existing SBRSs, this model recommends more diverse items to satisfy different purposes. Accordingly, we design effective mixture-channel purpose routing networks (MCPRN) with a purpose routing network to detect the purposes of each item and assign it into the corresponding channels. Moreover, a purpose specific recurrent network is devised to model the dependencies between items within each channel for a specific purpose. The experimental results show the superiority of MCPRN over the state-of-the-art methods in terms of both recommendation accuracy and diversity.

Download Full-text

Real-Time Thermal Infrared Tracking Based on Collaborative Online and Offline Method

Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University ◽

10.1051/jnwpu/20183661052 ◽

2018 ◽

Vol 36 (6) ◽

pp. 1052-1058 ◽

Cited By ~ 1

Author(s):

Ximing Zhang ◽

Mingang Wang ◽

Lin Cao

Keyword(s):

Real Time ◽

Visual Cues ◽

Detection Method ◽

State Of The Art ◽

Thermal Infrared ◽

Superior Performance ◽

Model Update ◽

Real Time Tracking ◽

Siamese Networks ◽

Infrared Tracking

Most tracking-by-detection based trackers employ the online model update scheme based on the spatiotemporal consistency of visual cues. In presence of self-deformation, abrupt motion and heavy occlusion, these trackers suffer from different attributes and are prone to drifting. The model based on offline training, namely Siamese networks is invariant when suffering from the attributes. While the tracking speed of the offline method can be slow which is not enough for real-time tracking. In this paper, a novel collaborative tracker which decomposes the tracking task into online and offline modes is proposed. Our tracker switches between the online and offline modes automatically based on the tracker status inferred from the present failure tracking detection method which is based on the dispersal measure of the response map. The present Real-Time Thermal Infrared Collaborative Online and Offline Tracker (TCOOT) achieves state-of-the-art tracking performance while maintaining real-time speed at the same time. Experiments are carried out on the VOT-TIR-2015 benchmark dataset and our tracker achieves superior performance against Staple and Siam FC trackers by 3.3% and 3.6% on precision criterion and 3.8% and 5% on success criterion, respectively. The present method is real-time tracker as well.

Download Full-text

Action Recognition of Sleeping at the Desk Using Regression on a Dependency Graph

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.742.318 ◽

2015 ◽

Vol 742 ◽

pp. 318-321

Author(s):

Wang Luo ◽

Lei Yu ◽

Min Feng ◽

Gong Yi Hong ◽

Qi Wei Peng ◽

...

Keyword(s):

Activity Recognition ◽

Action Recognition ◽

State Of The Art ◽

Dependency Graph ◽

Reference Points ◽

Experimental Results ◽

Hierarchical Method ◽

Comparable Accuracy ◽

Body Joints ◽

Reference Body

In this paper, we present a hierarchical method of activity recognition for sleeping at the desk in business hall. The method consists of three steps. First, the reference points such as body joints are obtained from workers in business hall. Second, we build the dependency graph to represent the relationships between reference points. Third, the multidimensional output regressions along the dependency paths are used to estimate the positions of these reference body points. Experimental results demonstrate that our method achieves comparable accuracy to state-of-the-art results.

Download Full-text

The Improvement of Mean - Shift Algorithm in the Video of Global Visual Robotic Fish in Tracking Moving Targets

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.475-476.947 ◽

2013 ◽

Vol 475-476 ◽

pp. 947-951

Author(s):

Zhi Yuan Mai ◽

Kun Yu Tan ◽

An Ting Xu ◽

Wei Xiang

Keyword(s):

Real Time ◽

Mean Shift ◽

Tracking Algorithm ◽

Robotic Fish ◽

Experimental Results ◽

Moving Targets ◽

Mean Shift Algorithm ◽

Good For ◽

The Difference

The tracking effect is not good for the faster track with Mean Shift tracking algorithm when the difference is not obvious between the track target and background pixels in the video of global visual robotic fish.To solve the difficulty of tracking drastically moving targets in this paper, determining the position of moving targets in the next frame through comparing with two bc coefficients which have been set when the Epanechnikov has been selected core to estimate is indeed. The experimental results show the proposed algorithm can track the moving targets efficiently and precisely in video,and also can meet high real-time situation with small calculation.

Download Full-text

Reliable Memory Model for Visual Tracking

Electronics ◽

10.3390/electronics10202488 ◽

2021 ◽

Vol 10 (20) ◽

pp. 2488

Author(s):

Daohui Ge ◽

Ruyi Liu ◽

Yunan Li ◽

Qiguang Miao

Keyword(s):

Visual Tracking ◽

State Of The Art ◽

Experimental Results ◽

Memory Model ◽

Background Information ◽

Evaluation Strategy ◽

Active Memory ◽

Training Samples ◽

Art Performance ◽

Similarity Distance

Effectively learning the appearance change of a target is the key point of an online tracker. When occlusion and misalignment occur, the tracking results usually contain a great amount of background information, which heavily affects the ability of a tracker to distinguish between targets and backgrounds, eventually leading to tracking failure. To solve this problem, we propose a simple and robust reliable memory model. In particular, an adaptive evaluation strategy (AES) is proposed to assess the reliability of tracking results. AES combines the confidence of the tracker predictions and the similarity distance, which is between the current predicted result and the existing tracking results. Based on the reliable results of AES selection, we designed an active–frozen memory model to store reliable results. Training samples stored in active memory are used to update the tracker, while frozen memory temporarily stores inactive samples. The active–frozen memory model maintains the diversity of samples while satisfying the limitation of storage. We performed comprehensive experiments on five benchmarks: OTB-2013, OTB-2015, UAV123, Temple-color-128, and VOT2016. The experimental results show that our tracker achieves state-of-the-art performance.

Download Full-text