A Visual Object Tracking Algorithm Based on Dynamics Pattern and Convolutional Feature

Deep visual feature-based method has demonstrated impressive performance in visual tracking attributing to its powerful capability of visual feature representation. However, in some complex environments such as dramatic change of appearance, illumination variation and rotation, the extracted deep visual feature is insufficient for accurately characterizing the target. To solve this problem, we present an integrated tracking framework which combines a Long Short-Term Memory (LSTM) network and a Convolutional Neural Network (CNN). Firstly, the LSTM extracted dynamics feature of target on time sequence, resulting the state of target at present time step. With that state, the accurate preprocessed bounding box was obtained. Then, deep convolutional feature of the target was extracted using a CNN, based on the processed bounding box. Finally, the position of the target was determined based on the score of the feature. During tracking stage, in order to improve the adaptation of the network, the parameters of the network were updated using samples of the target captured while successful tracking. The experiment shows that the proposed method achieves outstanding tracking performance and robustness in cases of partial occlusion, out-of-view, motion blur and fast motion.

Download Full-text

The Concept of Using LSTM to Detect Moisture in Brick Walls by Means of Electrical Impedance Tomography

Energies ◽

10.3390/en14227617 ◽

2021 ◽

Vol 14 (22) ◽

pp. 7617

Author(s):

Grzegorz Kłosowski ◽

Anna Hoła ◽

Tomasz Rymarczyk ◽

Łukasz Skowron ◽

Tomasz Wołowiec ◽

...

Keyword(s):

Electrical Impedance ◽

Short Term Memory ◽

Brick Wall ◽

Time Step ◽

Impedance Tomography ◽

Single Time ◽

Measurement Vector ◽

Lstm Network ◽

Tomographic Measurement ◽

Selection Of

This paper refers to an original concept of tomographic measurement of brick wall humidity using an algorithm based on long short-term memory (LSTM) neural networks. The measurement vector was treated as a data sequence with a single time step in the presented study. This approach enabled the use of an algorithm utilising a recurrent deep neural network of the LSTM type as a system for converting the measurement vector into output images. A prototype electrical impedance tomograph was used in the research. The LSTM network, which is often employed for time series classification, was used to tackle the inverse problem. The task of the LSTM network was to convert 448 voltage measurements into spatial images of a selected section of a historical building’s brick wall. The 3D tomographic image mesh consisted of 11,297 finite elements. A novelty is using the measurement vector as a single time step sequence consisting of 448 features (channels). Through the appropriate selection of network parameters and the training algorithm, it was possible to obtain an LSTM network that reconstructs images of damp brick walls with high accuracy. Additionally, the reconstruction times are very short.

Download Full-text

Spatial-Temporal Feature Representation Learning for Facial Fatigue Detection

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001418560189 ◽

2018 ◽

Vol 32 (12) ◽

pp. 1856018

Author(s):

Changyuan Wang ◽

Ting Yan ◽

Hongbo Jia

Keyword(s):

Network Model ◽

Network Structure ◽

Short Term Memory ◽

Representation Learning ◽

Detection Algorithm ◽

Feature Representation ◽

Video Sequences ◽

Lstm Network ◽

Fatigue Detection ◽

Inter Frame

In order to reduce the serious problems caused by the operators’ fatigue, we propose a novel network model Convolutional Neural Network and Long Short-Term Memory Network (CNN-LSTM) — for fatigue detection in the inter-frame images of video sequences, which mainly consists of CNN and LSTM network. Firstly, in order to improve the accuracy of the deep network structure, the Viola–Jones detection algorithm and the Kernelized Correlation Filter (KCF) tracking algorithm are used in the face detection to normalize the size of the inter-frame images of video sequences. Secondly, we use the CNN and the LSTM network to detect the fatigue state in real time and efficiently. The fatigue-related facial features are extracted by the CNN. Then, the temporal symptoms of the whole fatigue process can be extracted by LSTM networks, the input data which is the facial feature vector can be obtained by the CNN. Thirdly, we train and test the network in a step-by-step approach. Finally, we experiment with the proposed network model. The experimental results demonstrate that the network structure can effectively detect the fatigue state, and the overall accuracy rate can rise to 82.8%.

Download Full-text

A Deep Multimodal Model for Predicting Affective Responses Evoked by Movies Based on Shot Segmentation

Security and Communication Networks ◽

10.1155/2021/7650483 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Chunxiao Wang ◽

Jingjing Zhang ◽

Wei Jiang ◽

Shuang Wang

Keyword(s):

Short Term Memory ◽

Feature Fusion ◽

Pearson Correlation ◽

Feature Representation ◽

Visual Features ◽

Visual Feature ◽

Temporal Attention ◽

Video Content Analysis ◽

Wide Range ◽

Experienced Emotion

Predicting the emotions evoked in a viewer watching movies is an important research element in affective video content analysis over a wide range of applications. Generally, the emotion of the audience is evoked by the combined effect of the audio-visual messages of the movies. Current research has mainly used rough middle- and high-level audio and visual features to predict experienced emotions, but combining semantic information to refine features to improve emotion prediction results is still not well studied. Therefore, on the premise of considering the time structure and semantic units of a movie, this paper proposes a shot-based audio-visual feature representation method and a long short-term memory (LSTM) model incorporating a temporal attention mechanism for experienced emotion prediction. First, the shot-based audio-visual feature representation defines a method for extracting and combining audio and visual features of each shot clip, and the advanced pretraining models in the related audio-visual tasks are used to extract the audio and visual features with different semantic levels. Then, four components are included in the prediction model: a nonlinear multimodal feature fusion layer, a temporal feature capture layer, a temporal attention layer, and a sentiment prediction layer. This paper focuses on experienced emotion prediction and evaluates the proposed method on the extended COGNIMUSE dataset. The method performs significantly better than the state-of-the-art while significantly reducing the number of calculations, with increases in the Pearson correlation coefficient (PCC) from 0.46 to 0.62 for arousal and from 0.18 to 0.34 for valence in experienced emotion.

Download Full-text

Weighted statistical binary patterns for facial feature representation

Applied Intelligence ◽

10.1007/s10489-021-02477-1 ◽

2021 ◽

Author(s):

Hung Phuoc Truong ◽

Thanh Phuong Nguyen ◽

Yong-Guk Kim

Keyword(s):

Comprehensive Evaluation ◽

Facial Feature ◽

Input Image ◽

Feature Representation ◽

Illumination Variation ◽

Straight Line ◽

Mean And Variance ◽

The Mean ◽

Face Datasets ◽

Degraded Images

AbstractWe present a novel framework for efficient and robust facial feature representation based upon Local Binary Pattern (LBP), called Weighted Statistical Binary Pattern, wherein the descriptors utilize the straight-line topology along with different directions. The input image is initially divided into mean and variance moments. A new variance moment, which contains distinctive facial features, is prepared by extracting root k-th. Then, when Sign and Magnitude components along four different directions using the mean moment are constructed, a weighting approach according to the new variance is applied to each component. Finally, the weighted histograms of Sign and Magnitude components are concatenated to build a novel histogram of Complementary LBP along with different directions. A comprehensive evaluation using six public face datasets suggests that the present framework outperforms the state-of-the-art methods and achieves 98.51% for ORL, 98.72% for YALE, 98.83% for Caltech, 99.52% for AR, 94.78% for FERET, and 99.07% for KDEF in terms of accuracy, respectively. The influence of color spaces and the issue of degraded images are also analyzed with our descriptors. Such a result with theoretical underpinning confirms that our descriptors are robust against noise, illumination variation, diverse facial expressions, and head poses.

Download Full-text

Power System Transient Stability Assessment Based on Snapshot Ensemble LSTM Network

Sustainability ◽

10.3390/su13126953 ◽

2021 ◽

Vol 13 (12) ◽

pp. 6953

Author(s):

Yixing Du ◽

Zhijian Hu

Keyword(s):

Power Systems ◽

Power System ◽

Transient Stability ◽

Short Term Memory ◽

Risk Function ◽

Stability Margin ◽

Risk Level ◽

Stability Assessment ◽

Lstm Network ◽

Hierarchical Assessment

Data-driven methods using synchrophasor measurements have a broad application prospect in Transient Stability Assessment (TSA). Most previous studies only focused on predicting whether the power system is stable or not after disturbance, which lacked a quantitative analysis of the risk of transient stability. Therefore, this paper proposes a two-stage power system TSA method based on snapshot ensemble long short-term memory (LSTM) network. This method can efficiently build an ensemble model through a single training process, and employ the disturbed trajectory measurements as the inputs, which can realize rapid end-to-end TSA. In the first stage, dynamic hierarchical assessment is carried out through the classifier, so as to screen out credible samples step by step. In the second stage, the regressor is used to predict the transient stability margin of the credible stable samples and the undetermined samples, and combined with the built risk function to realize the risk quantification of transient angle stability. Furthermore, by modifying the loss function of the model, it effectively overcomes sample imbalance and overlapping. The simulation results show that the proposed method can not only accurately predict binary information representing transient stability status of samples, but also reasonably reflect the transient safety risk level of power systems, providing reliable reference for the subsequent control.

Download Full-text

A Combined Method for MEMS Gyroscope Error Compensation Using a Long Short-Term Memory Network and Kalman Filter in Random Vibration Environments

Sensors ◽

10.3390/s21041181 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1181

Author(s):

Chenhao Zhu ◽

Sheng Cai ◽

Yifan Yang ◽

Wei Xu ◽

Honghai Shen ◽

...

Keyword(s):

Kalman Filter ◽

Standard Deviation ◽

Error Compensation ◽

Random Vibration ◽

Short Term Memory ◽

Combined Method ◽

Short Term ◽

Mems Gyroscope ◽

Long Short Term Memory ◽

Lstm Network

In applications such as carrier attitude control and mobile device navigation, a micro-electro-mechanical-system (MEMS) gyroscope will inevitably be affected by random vibration, which significantly affects the performance of the MEMS gyroscope. In order to solve the degradation of MEMS gyroscope performance in random vibration environments, in this paper, a combined method of a long short-term memory (LSTM) network and Kalman filter (KF) is proposed for error compensation, where Kalman filter parameters are iteratively optimized using the Kalman smoother and expectation-maximization (EM) algorithm. In order to verify the effectiveness of the proposed method, we performed a linear random vibration test to acquire MEMS gyroscope data. Subsequently, an analysis of the effects of input data step size and network topology on gyroscope error compensation performance is presented. Furthermore, the autoregressive moving average-Kalman filter (ARMA-KF) model, which is commonly used in gyroscope error compensation, was also combined with the LSTM network as a comparison method. The results show that, for the x-axis data, the proposed combined method reduces the standard deviation (STD) by 51.58% and 31.92% compared to the bidirectional LSTM (BiLSTM) network, and EM-KF method, respectively. For the z-axis data, the proposed combined method reduces the standard deviation by 29.19% and 12.75% compared to the BiLSTM network and EM-KF method, respectively. Furthermore, for x-axis data and z-axis data, the proposed combined method reduces the standard deviation by 46.54% and 22.30% compared to the BiLSTM-ARMA-KF method, respectively, and the output is smoother, proving the effectiveness of the proposed method.

Download Full-text

Real-Time Detection of Dictionary DGA Network Traffic Using Deep Learning

SN Computer Science ◽

10.1007/s42979-021-00507-w ◽

2021 ◽

Vol 2 (2) ◽

Author(s):

Kate Highnam ◽

Domenic Puzio ◽

Song Luo ◽

Nicholas R. Jennings

Keyword(s):

Neural Network ◽

Deep Learning ◽

Real Time ◽

Network Traffic ◽

Short Term Memory ◽

Domain Names ◽

Control Networks ◽

Detection Techniques ◽

Lstm Network ◽

And Control

AbstractBotnets and malware continue to avoid detection by static rule engines when using domain generation algorithms (DGAs) for callouts to unique, dynamically generated web addresses. Common DGA detection techniques fail to reliably detect DGA variants that combine random dictionary words to create domain names that closely mirror legitimate domains. To combat this, we created a novel hybrid neural network, Bilbo the “bagging” model, that analyses domains and scores the likelihood they are generated by such algorithms and therefore are potentially malicious. Bilbo is the first parallel usage of a convolutional neural network (CNN) and a long short-term memory (LSTM) network for DGA detection. Our unique architecture is found to be the most consistent in performance in terms of AUC, $$F_1$$ F 1 score, and accuracy when generalising across different dictionary DGA classification tasks compared to current state-of-the-art deep learning architectures. We validate using reverse-engineered dictionary DGA domains and detail our real-time implementation strategy for scoring real-world network logs within a large enterprise. In 4 h of actual network traffic, the model discovered at least five potential command-and-control networks that commercial vendor tools did not flag.

Download Full-text

Extraction of local and global features by a convolutional neural network–long short-term memory network for diagnosing bearing faults

Proceedings of the Institution of Mechanical Engineers Part C Journal of Mechanical Engineering Science ◽

10.1177/09544062211016505 ◽

2021 ◽

pp. 095440622110165

Author(s):

Zhang Chao ◽

Wang Wei-zhi ◽

Zhang Chen ◽

Fan Bin ◽

Wang Jian-guo ◽

...

Keyword(s):

Neural Network ◽

Fault Diagnosis ◽

Condition Monitoring ◽

Short Term Memory ◽

Vibration Signal ◽

Short Term ◽

Global Features ◽

Term Memory ◽

Long Short Term Memory ◽

Lstm Network

Accurate and reliable fault diagnosis is one of the key and difficult issues in mechanical condition monitoring. In recent years, Convolutional Neural Network (CNN) has been widely used in mechanical condition monitoring, which is also a great breakthrough in the field of bearing fault diagnosis. However, CNN can only extract local features of signals. The model accuracy and generalization of the original vibration signals are very low in the process of vibration signal processing only by CNN. Based on the above problems, this paper improves the traditional convolution layer of CNN, and builds the learning module (local feature learning block, LFLB) of the local characteristics. At the same time, the Long Short-Term Memory (LSTM) is introduced into the network, which is used to extract the global features. This paper proposes the new neural network—improved CNN-LSTM network. The extracted deep feature is used for fault classification. The improved CNN-LSTM network is applied to the processing of the vibration signal of the faulty bearing collected by the bearing failure laboratory of Inner Mongolia University of science and technology. The results show that the accuracy of the improved CNN-LSTM network on the same batch test set is 98.75%, which is about 24% higher than that of the traditional CNN. The proposed network is applied to the bearing data collection of Western Reserve University under the condition that the network parameters remain unchanged. The experiment shows that the improved CNN-LSTM network has better generalization than the traditional CNN.

Download Full-text

Near Real-Time Global Solar Radiation Forecasting at Multiple Time-Step Horizons Using the Long Short-Term Memory Network

Energies ◽

10.3390/en13143517 ◽

2020 ◽

Vol 13 (14) ◽

pp. 3517 ◽

Cited By ~ 1

Author(s):

Anh Ngoc-Lan Huynh ◽

Ravinesh C. Deo ◽

Duc-Anh An-Vo ◽

Mumtaz Ali ◽

Nawin Raj ◽

...

Keyword(s):

Deep Learning ◽

Solar Radiation ◽

Short Term Memory ◽

Global Solar Radiation ◽

Multiple Time ◽

Short Term ◽

Time Step ◽

Term Memory ◽

Multiple Time Step ◽

Long Short Term Memory

This paper aims to develop the long short-term memory (LSTM) network modelling strategy based on deep learning principles, tailored for the very short-term, near-real-time global solar radiation (GSR) forecasting. To build the prescribed LSTM model, the partial autocorrelation function is applied to the high resolution, 1 min scaled solar radiation dataset that generates statistically significant lagged predictor variables describing the antecedent behaviour of GSR. The LSTM algorithm is adopted to capture the short- and the long-term dependencies within the GSR data series patterns to accurately predict the future GSR at 1, 5, 10, 15, and 30 min forecasting horizons. This objective model is benchmarked at a solar energy resource rich study site (Bac-Ninh, Vietnam) against the competing counterpart methods employing other deep learning, a statistical model, a single hidden layer and a machine learning-based model. The LSTM model generates satisfactory predictions at multiple-time step horizons, achieving a correlation coefficient exceeding 0.90, outperforming all of the counterparts. In accordance with robust statistical metrics and visual analysis of all tested data, the study ascertains the practicality of the proposed LSTM approach to generate reliable GSR forecasts. The Diebold–Mariano statistic test also shows LSTM outperforms the counterparts in most cases. The study confirms the practical utility of LSTM in renewable energy studies, and broadly in energy-monitoring devices tailored for other energy variables (e.g., hydro and wind energy).

Download Full-text

Predicting remaining useful life of rolling bearings based on deep feature representation and long short-term memory neural network

Advances in Mechanical Engineering ◽

10.1177/1687814018817184 ◽

2018 ◽

Vol 10 (12) ◽

pp. 168781401881718 ◽

Cited By ~ 16

Author(s):

Wentao Mao ◽

Jianliang He ◽

Jiamei Tang ◽

Yuan Li

Keyword(s):

Neural Network ◽

Life Prediction ◽

Short Term Memory ◽

Degradation Process ◽

Remaining Useful Life ◽

Feature Representation ◽

Short Term ◽

Term Memory ◽

Useful Life ◽

Long Short Term Memory

For bearing remaining useful life prediction problem, the traditional machine-learning-based methods are generally short of feature representation ability and incapable of adaptive feature extraction. Although deep-learning-based remaining useful life prediction methods proposed in recent years can effectively extract discriminative features for bearing fault, these methods tend to less consider temporal information of fault degradation process. To solve this problem, a new remaining useful life prediction approach based on deep feature representation and long short-term memory neural network is proposed in this article. First, a new criterion, named support vector data normalized correlation coefficient, is proposed to automatically divide the whole bearing life as normal state and fast degradation state. Second, deep features of bearing fault with good representation ability can be obtained from convolutional neural network by means of the marginal spectrum in Hilbert–Huang transform of raw vibration signals and health state label. Finally, by considering the temporal information of degradation process, these features are fed into a long short-term memory neural network to construct a remaining useful life prediction model. Experiments are conducted on bearing data sets of IEEE PHM Challenge 2012. The results show the significance of performance improvement of the proposed method in terms of predictive accuracy and numerical stability.

Download Full-text