An Enhanced Visual Attention Siamese Network That Updates Template Features Online

Recently, Siamese trackers have attracted extensive attention because of their simplicity and low computational cost. However, for most Siamese trackers, only a frame of the video sequence is used as the template, and the template is not updated in inference process, which makes the tracking success rate inferior to the trackers that can update the template online. In the current study, we introduce an enhanced visual attention Siamese network (ESA-Siam). The method is based on a deep convolutional neural network, which integrates channel attention and spatial self-attention to improve the discriminative ability of the tracker for positive and negative samples. Channel attention reflects different targets according to the response value of different channels to achieve better target representation. Spatial self-attention captures the correlation between two arbitrary positions to help locate the target. At the same time, a template search attention module is designed to implicitly update the template features online, which can effectively improve the success rate of the tracker when the target is interfered by the background. The proposed ESA-Siam tracker shows superior performance compared with 18 existing state-of-the-art trackers on five benchmark datasets including OTB50, OTB100, VOT2016, VOT2018, and LaSOT.

Download Full-text

An efficient pruning scheme of deep neural networks for Internet of Things applications

EURASIP Journal on Advances in Signal Processing ◽

10.1186/s13634-021-00744-4 ◽

2021 ◽

Vol 2021 (1) ◽

Author(s):

Chen Qi ◽

Shibo Shen ◽

Rongpeng Li ◽

Zhifeng Zhao ◽

Qing Liu ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Internet Of Things ◽

Deep Neural Networks ◽

Computational Cost ◽

Superior Performance ◽

Compact Structure ◽

Resource Limited ◽

Benchmark Datasets ◽

Iot Devices

AbstractNowadays, deep neural networks (DNNs) have been rapidly deployed to realize a number of functionalities like sensing, imaging, classification, recognition, etc. However, the computational-intensive requirement of DNNs makes it difficult to be applicable for resource-limited Internet of Things (IoT) devices. In this paper, we propose a novel pruning-based paradigm that aims to reduce the computational cost of DNNs, by uncovering a more compact structure and learning the effective weights therein, on the basis of not compromising the expressive capability of DNNs. In particular, our algorithm can achieve efficient end-to-end training that transfers a redundant neural network to a compact one with a specifically targeted compression rate directly. We comprehensively evaluate our approach on various representative benchmark datasets and compared with typical advanced convolutional neural network (CNN) architectures. The experimental results verify the superior performance and robust effectiveness of our scheme. For example, when pruning VGG on CIFAR-10, our proposed scheme is able to significantly reduce its FLOPs (floating-point operations) and number of parameters with a proportion of 76.2% and 94.1%, respectively, while still maintaining a satisfactory accuracy. To sum up, our scheme could facilitate the integration of DNNs into the common machine-learning-based IoT framework and establish distributed training of neural networks in both cloud and edge.

Download Full-text

A Novel Method for Determining Angular Speed and Acceleration Using Sin-Cos Encoders

Sensors ◽

10.3390/s21020577 ◽

2021 ◽

Vol 21 (2) ◽

pp. 577

Author(s):

Manuel Alcázar Vargas ◽

Javier Pérez Fernández ◽

Juan M. Velasco García ◽

Juan A. Cabrera Carrillo ◽

Juan J. Castillo Aguilar

Keyword(s):

Computational Cost ◽

Angular Speed ◽

Wheel Speed ◽

Superior Performance ◽

Whole Process ◽

Novel Method ◽

Safety Systems ◽

Vehicle Sensors ◽

Best Fit ◽

Low Computational Cost

The performance of vehicle safety systems depends very much on the accuracy of the signals coming from vehicle sensors. Among them, the wheel speed is of vital importance. This paper describes a new method to obtain the wheel speed by using Sin-Cos encoders. The methodology is based on the use of the Savitzky–Golay filters to optimally determine the coefficients of the polynomials that best fit the measured signals and their time derivatives. The whole process requires a low computational cost, which makes it suitable for real-time applications. This way it is possible to provide the safety system with an accurate measurement of both the angular speed and acceleration of the wheels. The proposed method has been compared to other conventional approaches. The results obtained in simulations and real tests show the superior performance of the proposed method, particularly for medium and low wheel angular speeds.

Download Full-text

Designing Compact Convolutional Filters for Lightweight Human Pose Estimation

Wireless Communications and Mobile Computing ◽

10.1155/2021/1333250 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Shili Niu ◽

Weihua Ou ◽

Shihua Feng ◽

Jianping Gou ◽

Fei Long ◽

...

Keyword(s):

Pose Estimation ◽

State Of The Art ◽

Computational Cost ◽

Estimation Accuracy ◽

Human Pose Estimation ◽

Model Parameters ◽

Resource Limited ◽

Benchmark Datasets ◽

Human Pose ◽

Low Computational Cost

Existing methods for human pose estimation usually use a large intermediate tensor, leading to a high computational load, which is detrimental to resource-limited devices. To solve this problem, we propose a low computational cost pose estimation network, MobilePoseNet, which includes encoder, decoder, and parallel nonmaximum suppression operation. Specifically, we design a lightweight upsampling block instead of transposing the convolution as the decoder and use the lightweight network as our downsampling part. Then, we choose the high-resolution features as the input for upsampling to reduce the number of model parameters. Finally, we propose a parallel OKS-NMS, which significantly outperforms the conventional NMS in terms of accuracy and speed. Experimental results on the benchmark datasets show that MobilePoseNet obtains almost comparable results to state-of-the-art methods with a low compilation load. Compared to SimpleBaseline, the parameter of MobilePoseNet is only 4%, while the estimation accuracy reaches 98%.

Download Full-text

TOWARDS STRUCTURELESS BUNDLE ADJUSTMENT WITH TWO- AND THREE-VIEW STRUCTURE APPROXIMATION

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-annals-v-2-2020-71-2020 ◽

2020 ◽

Vol V-2-2020 ◽

pp. 71-78

Author(s):

E. Rupnik ◽

M. Pierrot Deseilligny

Keyword(s):

Good Accuracy ◽

Computational Cost ◽

Sequential Estimation ◽

Global Solutions ◽

Bundle Adjustment ◽

Processing Times ◽

Benchmark Datasets ◽

Low Computational Cost ◽

Relative Motions

Abstract. The global approaches solve SfM problems by independently inferring relative motions, followed be a sequential estimation of global rotations and translations. It is a fast approach but not optimal because it relies only on pairs and triplets of images and it is not a joint optimisation. In this publication we present a methodology that increases the quality of global solutions without the usual computational burden tied to the bundle adjustment. We propose an efficient structure approximation approach that relies on relative motions known upfront. Using the approximated structure, we are capable of refining the initial poses at very low computational cost. Compared to different benchmark datasets and software solutions, our approach improves the processing times while maintaining good accuracy.

Download Full-text

Improving Object Tracking by Added Noise and Channel Attention

Sensors ◽

10.3390/s20133780 ◽

2020 ◽

Vol 20 (13) ◽

pp. 3780 ◽

Cited By ~ 2

Author(s):

Mustansar Fiaz ◽

Arif Mahmood ◽

Ki Yeol Baek ◽

Sehar Shahzad Farooq ◽

Soon Ki Jung

Keyword(s):

Large Scale ◽

Data Augmentation ◽

Feature Fusion ◽

State Of The Art ◽

Computational Cost ◽

Training Data ◽

Superior Performance ◽

Input Noise ◽

Offline Learning ◽

Benchmark Datasets

CNN-based trackers, especially those based on Siamese networks, have recently attracted considerable attention because of their relatively good performance and low computational cost. For many Siamese trackers, learning a generic object model from a large-scale dataset is still a challenging task. In the current study, we introduce input noise as regularization in the training data to improve generalization of the learned model. We propose an Input-Regularized Channel Attentional Siamese (IRCA-Siam) tracker which exhibits improved generalization compared to the current state-of-the-art trackers. In particular, we exploit offline learning by introducing additive noise for input data augmentation to mitigate the overfitting problem. We propose feature fusion from noisy and clean input channels which improves the target localization. Channel attention integrated with our framework helps finding more useful target features resulting in further performance improvement. Our proposed IRCA-Siam enhances the discrimination of the tracker/background and improves fault tolerance and generalization. An extensive experimental evaluation on six benchmark datasets including OTB2013, OTB2015, TC128, UAV123, VOT2016 and VOT2017 demonstrate superior performance of the proposed IRCA-Siam tracker compared to the 30 existing state-of-the-art trackers.

Download Full-text

Inverse Kinematics of a Humanoid Robotic Arm With Fixed Joint Method

ASME 2019 28th Conference on Information Storage and Processing Systems ◽

10.1115/isps2019-7495 ◽

2019 ◽

Author(s):

Yu-Heng Deng ◽

Jen-Yuan (James) Chang

Keyword(s):

Success Rate ◽

Inverse Kinematics ◽

Computational Cost ◽

Target Position ◽

Iteration Process ◽

Robotic Arm ◽

Parameter Study ◽

Inverse Kinematics Problem ◽

Low Computational Cost ◽

Joint Method

Abstract In this paper, we propose a new algorithm to solve the inverse kinematics problem. It would adaptively limit the movability of certain joints during the iteration process of inverse kinematics according to the pose of the manipulator and the target position. This change could prevent lower links to get tangled and cause a singularity. Our research is based on a model of a 7-DOFs (degree of freedoms) humanoid robotic arm, and the parameter study of the inverse kinematics algorithm has been done to increase both success rate and decrease calculation time. This paper provides a fast and low computational cost scheme to calculate inverse kinematics with the ability to reduce the chance of encountering the singular points.

Download Full-text

Salient Object Detection via Nonlocal Diffusion Tensor

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001415550137 ◽

2015 ◽

Vol 29 (07) ◽

pp. 1555013 ◽

Cited By ~ 4

Author(s):

Xiujun Zhang ◽

Chen Xu ◽

Xiaoli Sun ◽

George Baciu

Keyword(s):

Visual Attention ◽

Diffusion Equation ◽

Diffusion Tensor ◽

Salient Object Detection ◽

Nonlocal Diffusion ◽

Superior Performance ◽

Salient Object ◽

Saliency Maps ◽

Benchmark Datasets ◽

Two Stages

In this paper, visual attention spreading is formulated as a nonlocal diffusion equation. Different from other diffusion-based methods, a nonlocal diffusion tensor is introduced to consider both the diffusion strength and the diffusion direction. With the help of diffusion tensor, along with the principle direction, the diffusion has been suppressed to preserve the dissimilarity between the foreground and background, while in other directions, the diffusion has been boosted to combine the similar regions and highlight the salient object as a whole. Through a two-stages diffusion, the final saliency maps are obtained. Extensive quantitative or visual comparisons are performed on three widely used benchmark datasets, i.e. MSRA-ASD, MSRA-B and PASCAL-1500 datasets. Experimental results demonstrate the superior performance of our method.

Download Full-text

Simple Yet Effective Fine-Tuning of Deep CNNs Using an Auxiliary Classification Loss for Remote Sensing Scene Classification

Remote Sensing ◽

10.3390/rs11242908 ◽

2019 ◽

Vol 11 (24) ◽

pp. 2908 ◽

Cited By ~ 7

Author(s):

Yakoub Bazi ◽

Mohamad M. Al Rahhal ◽

Haikel Alhichri ◽

Naif Alajlan

Keyword(s):

Remote Sensing ◽

State Of The Art ◽

Computational Cost ◽

Feature Learning ◽

Extraction Methods ◽

Fine Tuning ◽

Scene Classification ◽

Negative Effect ◽

Benchmark Datasets ◽

Low Computational Cost

The current literature of remote sensing (RS) scene classification shows that state-of-the-art results are achieved using feature extraction methods, where convolutional neural networks (CNNs) (mostly VGG16 with 138.36 M parameters) are used as feature extractors and then simple to complex handcrafted modules are added for additional feature learning and classification, thus coming back to feature engineering. In this paper, we revisit the fine-tuning approach for deeper networks (GoogLeNet and Beyond) and show that it has not been well exploited due to the negative effect of the vanishing gradient problem encountered when transferring knowledge to small datasets. The aim of this work is two-fold. Firstly, we provide best practices for fine-tuning pre-trained CNNs using the root-mean-square propagation (RMSprop) method. Secondly, we propose a simple yet effective solution for tackling the vanishing gradient problem by injecting gradients at an earlier layer of the network using an auxiliary classification loss function. Then, we fine-tune the resulting regularized network by optimizing both the primary and auxiliary losses. As for pre-trained CNNs, we consider in this work inception-based networks and EfficientNets with small weights: GoogLeNet (7 M) and EfficientNet-B0 (5.3 M) and their deeper versions Inception-v3 (23.83 M) and EfficientNet-B3 (12 M), respectively. The former networks have been used previously in the context of RS and yielded low accuracies compared to VGG16, while the latter are new state-of-the-art models. Extensive experimental results on several benchmark datasets reveal clearly that if fine-tuning is done in an appropriate way, it can settle new state-of-the-art results with low computational cost.

Download Full-text

Binary Competitive Swarm Optimizer Approaches for Feature Selection

Computation ◽

10.3390/computation7020031 ◽

2019 ◽

Vol 7 (2) ◽

pp. 31 ◽

Cited By ~ 4

Author(s):

Jingwei Too ◽

Abdul Rahim Abdullah ◽

Norhashimah Mohd Saad

Keyword(s):

Feature Selection ◽

Transfer Functions ◽

Computational Cost ◽

Combinatorial Problem ◽

Search Space ◽

Relevant Information ◽

Classification Performance ◽

Feature Size ◽

Benchmark Datasets ◽

Low Computational Cost

Feature selection is known as an NP-hard combinatorial problem in which the possible feature subsets increase exponentially with the number of features. Due to the increment of the feature size, the exhaustive search has become impractical. In addition, a feature set normally includes irrelevant, redundant, and relevant information. Therefore, in this paper, binary variants of a competitive swarm optimizer are proposed for wrapper feature selection. The proposed approaches are used to select a subset of significant features for classification purposes. The binary version introduced here is performed by employing the S-shaped and V-shaped transfer functions, which allows the search agents to move on the binary search space. The proposed approaches are tested by using 15 benchmark datasets collected from the UCI machine learning repository, and the results are compared with other conventional feature selection methods. Our results prove the capability of the proposed binary version of the competitive swarm optimizer not only in terms of high classification performance, but also low computational cost.

Download Full-text

Assessing Facial Symmetry and Attractiveness using Augmented Reality

Pattern Analysis and Applications ◽

10.1007/s10044-021-00975-z ◽

2021 ◽

Author(s):

Wei Wei ◽

Edmond S. L. Ho ◽

Kevin D. McCay ◽

Robertas Damaševičius ◽

Rytis Maskeliūnas ◽

...

Keyword(s):

Augmented Reality ◽

Computational Cost ◽

Video Stream ◽

Facial Features ◽

Face Database ◽

Facial Symmetry ◽

Facial Landmarks ◽

Android Os ◽

Benchmark Datasets ◽

Low Computational Cost

AbstractFacial symmetry is a key component in quantifying the perception of beauty. In this paper, we propose a set of facial features computed from facial landmarks which can be extracted at a low computational cost. We quantitatively evaluated the proposed features for predicting perceived attractiveness from human portraits on four benchmark datasets (SCUT-FBP, SCUT-FBP5500, FACES and Chicago Face Database). Experimental results showed that the performance of the proposed features is comparable to those extracted from a set with much denser facial landmarks. The computation of facial features was also implemented as an augmented reality (AR) app developed on Android OS. The app overlays four types of measurements and guidelines over a live video stream, while the facial measurements are computed from the tracked facial landmarks at run time. The developed app can be used to assist plastic surgeons in assessing facial symmetry when planning reconstructive facial surgeries.

Download Full-text