Gaze Tracking Based on Concatenating Spatial-Temporal Features

Based on experimental observations, there is a correlation between time and consecutive gaze positions in visual behaviors. Previous studies on gaze point estimation usually use images as the input for model trainings without taking into account the sequence relationship between image data. In addition to the spatial features, the temporal features are considered to improve the accuracy in this paper by using videos instead of images as the input data. To be able to capture spatial and temporal features at the same time, the convolutional neural network (CNN) and long short-term memory (LSTM) network are introduced to build a training model. In this way, CNN is used to extract the spatial features, and LSTM correlates temporal features. This paper presents a CNN Concatenating LSTM network (CCLN) that concatenates spatial and temporal features to improve the performance of gaze estimation in the case of time-series videos as the input training data. In addition, the proposed model can be optimized by exploring the numbers of LSTM layers, the influence of batch normalization (BN) and global average pooling layer (GAP) on CCLN. It is generally believed that larger amounts of training data will lead to better models. To provide data for training and prediction, we propose a method for constructing datasets of video for gaze point estimation. The issues are studied, including the effectiveness of different commonly used general models and the impact of transfer learning. Through exhaustive evaluation, it has been proved that the proposed method achieves a better prediction accuracy than the existing CNN-based methods. Finally, 93.1% of the best model and 92.6% of the general model MobileNet are obtained.

Download Full-text

Composite load modeling by spatial-temporal deep attention network based on wide-area monitoring systems

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-210296 ◽

2021 ◽

pp. 1-12

Author(s):

Omid Izadi Ghafarokhi ◽

Mazda Moattari ◽

Ahmad Forouzantabar

Keyword(s):

Neural Network ◽

Short Term Memory ◽

Wide Area ◽

Load Modeling ◽

Wide Area Monitoring ◽

Training Performance ◽

Spatial Features ◽

Temporal Features ◽

Frequency Domains ◽

Area Monitoring

With the development of the wide-area monitoring system (WAMS), power system operators are capable of providing an accurate and fast estimation of time-varying load parameters. This study proposes a spatial-temporal deep network-based new attention concept to capture the dynamic and static patterns of electrical load consumption through modeling complicated and non-stationary interdependencies between time sequences. The designed deep attention-based network benefits from long short-term memory (LSTM) based component to learning temporal features in time and frequency-domains as encoder-decoder based recurrent neural network. Furthermore, to inherently learn spatial features, a convolutional neural network (CNN) based attention mechanism is developed. Besides, this paper develops a loss function based on a pseudo-Huber concept to enhance the robustness of the proposed network in noisy conditions as well as improve the training performance. The simulation results on IEEE 68-bus demonstrates the effectiveness and superiority of the proposed network through comparison with several previously presented and state-of-the-art methods.

Download Full-text

Deep learning approaches for improving prediction of daily stream temperature in data-scarce, unmonitored, and dammed basins

10.22541/au.162184348.87839543/v1 ◽

2021 ◽

Author(s):

Farshid Rahmani ◽

Chaopeng Shen ◽

Samantha Oliver ◽

Kathryn Lawson ◽

Alison Appling

Keyword(s):

Short Term Memory ◽

Network Models ◽

Stream Temperature ◽

Data Availability ◽

Training Data ◽

Training Dataset ◽

Learning Approaches ◽

Input Selection ◽

Daily Sampling ◽

Lstm Network

Basin-centric long short-term memory (LSTM) network models have recently been shown to be an exceptionally powerful tool for simulating stream temperature (Ts, temperature measured in rivers), among other hydrological variables. However, spatial extrapolation is a well-known challenge to modeling Ts and it is uncertain how an LSTM-based daily Ts model will perform in unmonitored or dammed basins. Here we compiled a new benchmark dataset consisting of >400 basins for across the contiguous United States in different data availability groups (DAG, meaning the daily sampling frequency) with or without major dams and study how to assemble suitable training datasets for predictions in monitored or unmonitored situations. For temporal generalization, CONUS-median best root-mean-square error (RMSE) values for sites with extensive (99%), intermediate (60%), scarce (10%) and absent (0%, unmonitored) data for training were 0.75, 0.83, 0.88, and 1.59°C, representing the state of the art. For prediction in unmonitored basins (PUB), LSTM’s results surpassed those reported in the literature. Even for unmonitored basins with major reservoirs, we obtained a median RMSE of 1.492°C and an R2 of 0.966. The most suitable training set was the matching DAG that the basin could be grouped into, e.g., the 60% DAG for a basin with 61% data availability. However, for PUB, a training dataset including all basins with data is preferred. An input-selection ensemble moderately mitigated attribute overfitting. Our results suggest there are influential latent processes not sufficiently described by the inputs (e.g., geology, wetland covers), but temporal fluctuations are well predictable, and LSTM appears to be the more accurate Ts modeling tool when sufficient training data are available.

Download Full-text

Modeling Anticipation and Relaxation of Lane Changing Behavior Using Deep Learning

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/03611981211028624 ◽

2021 ◽

pp. 036119812110286

Author(s):

Kequan Chen ◽

Pan Liu ◽

Zhibin Li ◽

Yuxuan Wang ◽

Yunxue Lu

Keyword(s):

Short Term Memory ◽

Trajectory Data ◽

Lane Changing ◽

Factors Affecting ◽

Video Footage ◽

Input Variables ◽

Vehicle Trajectory ◽

Lstm Network ◽

Two Stages ◽

The Impact

Modeling lane changing driving behavior has attracted significant attention recently. Most of the existing models are homogeneous and do not recognize the anticipation and relaxation phenomena occurring during the maneuver. To fill this gap, we adopted long short-term memory (LSTM) network and used large quantities of trajectory data extracted from video footage collected by an unmanned automated vehicle in Nanjing, China. Then, we divided complete lane changing behavior into two stages, that is, anticipation and relaxation. Description analysis of lane changing behavior revealed that the factors affecting the two stages are significantly different. In this context, two LSTM models with different input variables were proposed to predict the anticipation and the relaxation during the lane changing activity, respectively. The vehicle trajectory data were further divided into an anticipation dataset and a relaxation dataset to train the two LSTM models. Then we applied numerical tests to compare our models with two baseline models using real trajectory data of lane changing behavior. The results suggest that our models achieved the best performance for trajectory prediction in both lateral and longitudinal positions. Moreover, the simulation results show that the proposed models can precisely replicate the impact of the anticipation phenomenon on the target lane, and the relationship between the speed and spacing of the lane changing vehicle during the relaxation process can be reproduced with reasonable accuracy.

Download Full-text

Noise-Robust Voice Conversion Using High-Quefrency Boosting via Sub-Band Cepstrum Conversion and Fusion

Applied Sciences ◽

10.3390/app10010151 ◽

2019 ◽

Vol 10 (1) ◽

pp. 151

Author(s):

Xiaokong Miao ◽

Meng Sun ◽

Xiongwei Zhang ◽

Yimin Wang

Keyword(s):

Short Term Memory ◽

Training Data ◽

Voice Conversion ◽

Noisy Input ◽

Mean And Variance ◽

Natural Target ◽

Noise Robust ◽

The Impact ◽

Target Speaker

This paper presents a noise-robust voice conversion method with high-quefrency boosting via sub-band cepstrum conversion and fusion based on the bidirectional long short-term memory (BLSTM) neural networks that can convert parameters of vocal tracks of a source speaker into those of a target speaker. With the implementation of state-of-the-art machine learning methods, voice conversion has achieved good performance given abundant clean training data. However, the quality and similarity of the converted voice are significantly degraded compared to that of a natural target voice due to various factors, such as limited training data and noisy input speech from the source speaker. To address the problem of noisy input speech, an architecture of voice conversion with statistical filtering and sub-band cepstrum conversion and fusion is introduced. The impact of noises on the converted voice is reduced by the accurate reconstruction of the sub-band cepstrum and the subsequent statistical filtering. By normalizing the mean and variance of the converted cepstrum to those of the target cepstrum in the training phase, a cepstrum filter was constructed to further improve the quality of the converted voice. The experimental results showed that the proposed method significantly improved the naturalness and similarity of the converted voice compared to the baselines, even with the noisy inputs of source speakers.

Download Full-text

Deepfake video detection: YOLO-Face convolution recurrent approach

PeerJ Computer Science ◽

10.7717/peerj-cs.730 ◽

2021 ◽

Vol 7 ◽

pp. e730

Author(s):

Aya Ismail ◽

Marwa Elpeltagy ◽

Mervat Zaki ◽

Kamal A. ElDahshan

Keyword(s):

Large Scale ◽

Operating Characteristic ◽

Short Term Memory ◽

Negative Impact ◽

Characteristic Curve ◽

Spatial Features ◽

Temporal Features ◽

Large Scale Dataset ◽

Long Short Term Memory ◽

Face Detector

Recently, the deepfake techniques for swapping faces have been spreading, allowing easy creation of hyper-realistic fake videos. Detecting the authenticity of a video has become increasingly critical because of the potential negative impact on the world. Here, a new project is introduced; You Only Look Once Convolution Recurrent Neural Networks (YOLO-CRNNs), to detect deepfake videos. The YOLO-Face detector detects face regions from each frame in the video, whereas a fine-tuned EfficientNet-B5 is used to extract the spatial features of these faces. These features are fed as a batch of input sequences into a Bidirectional Long Short-Term Memory (Bi-LSTM), to extract the temporal features. The new scheme is then evaluated on a new large-scale dataset; CelebDF-FaceForencics++ (c23), based on a combination of two popular datasets; FaceForencies++ (c23) and Celeb-DF. It achieves an Area Under the Receiver Operating Characteristic Curve (AUROC) 89.35% score, 89.38% accuracy, 83.15% recall, 85.55% precision, and 84.33% F1-measure for pasting data approach. The experimental analysis approves the superiority of the proposed method compared to the state-of-the-art methods.

Download Full-text

An Approach of Linear Regression-Based UAV GPS Spoofing Detection

Wireless Communications and Mobile Computing ◽

10.1155/2021/5517500 ◽

2021 ◽

Vol 2021 ◽

pp. 1-16

Author(s):

Lianxiao Meng ◽

Lin Yang ◽

Shuangyin Ren ◽

Gaigai Tang ◽

Long Zhang ◽

...

Keyword(s):

Linear Regression ◽

Short Term Memory ◽

Stackelberg Game ◽

Response Strategy ◽

Security Threat ◽

Detection Mechanism ◽

Detection Scheme ◽

Aerial Vehicle ◽

Lstm Network ◽

The Impact

A prominent security threat to unmanned aerial vehicle (UAV) is to capture it by GPS spoofing, in which the attacker manipulates the GPS signal of the UAV to capture it. This paper introduces an anti-spoofing model to mitigate the impact of GPS spoofing attack on UAV mission security. In this model, linear regression (LR) is used to predict and model the optimal route of UAV to its destination. On this basis, a countermeasure mechanism is proposed to reduce the impact of GPS spoofing attack. Confrontation is based on the progressive detection mechanism of the model. In order to better ensure the flight security of UAV, the model provides more than one detection scheme for spoofing signal to improve the sensitivity of UAV to deception signal detection. For better proving the proposed LR anti-spoofing model, a dynamic Stackelberg game is formulated to simulate the interaction between GPS spoofer and UAV. In particular, for GPS spoofer, it is worth mentioning that for the scenario that the UAV is cheated by GPS spoofing signal in the mission environment of the designated route is simulated in the experiment. In particular, UAV with the LR anti-spoofing model, as the leader in this game, dynamically adjusts its response strategy according to the deception’s attack strategy when upon detection of GPS spoofer’s attack. The simulation results show that the method can effectively enhance the ability of UAV to resist GPS spoofing without increasing the hardware cost of the UAV and is easy to implement. Furthermore, we also try to use long short-term memory (LSTM) network in the trajectory prediction module of the model. The experimental results show that the LR anti-spoofing model proposed is far better than that of LSTM in terms of prediction accuracy.

Download Full-text

An Efficient Anomaly Recognition Framework Using an Attention Residual LSTM in Surveillance Videos

Sensors ◽

10.3390/s21082811 ◽

2021 ◽

Vol 21 (8) ◽

pp. 2811

Author(s):

Waseem Ullah ◽

Amin Ullah ◽

Tanveer Hussain ◽

Zulfiqar Ahmad Khan ◽

Sung Wook Baik

Keyword(s):

Time Complexity ◽

Short Term Memory ◽

Smart Cities ◽

Vital Role ◽

Training Data ◽

Surveillance Videos ◽

Video Frames ◽

Proposed Model ◽

Lstm Network ◽

Increase In Accuracy

Video anomaly recognition in smart cities is an important computer vision task that plays a vital role in smart surveillance and public safety but is challenging due to its diverse, complex, and infrequent occurrence in real-time surveillance environments. Various deep learning models use significant amounts of training data without generalization abilities and with huge time complexity. To overcome these problems, in the current work, we present an efficient light-weight convolutional neural network (CNN)-based anomaly recognition framework that is functional in a surveillance environment with reduced time complexity. We extract spatial CNN features from a series of video frames and feed them to the proposed residual attention-based long short-term memory (LSTM) network, which can precisely recognize anomalous activity in surveillance videos. The representative CNN features with the residual blocks concept in LSTM for sequence learning prove to be effective for anomaly detection and recognition, validating our model’s effective usage in smart cities video surveillance. Extensive experiments on the real-world benchmark UCF-Crime dataset validate the effectiveness of the proposed model within complex surveillance environments and demonstrate that our proposed model outperforms state-of-the-art models with a 1.77%, 0.76%, and 8.62% increase in accuracy on the UCF-Crime, UMN and Avenue datasets, respectively.

Download Full-text

Robust optical autofocus system utilizing neural networks trained for extended range and time-course and automated multiwell plate imaging including single molecule localization microscopy

10.1101/2021.03.05.431171 ◽

2021 ◽

Author(s):

Jonathan Lightley ◽

Frederik Görlitz ◽

Sunil Kumar ◽

Ranjan Kalita ◽

Arinbjorn Kolbeinsson ◽

...

Keyword(s):

Data Acquisition ◽

Single Molecule ◽

Time Course ◽

Numerical Aperture ◽

Image Data ◽

Training Data ◽

Depth Of Field ◽

Objective Lens ◽

Localisation Microscopy ◽

The Impact

We present a robust, long-range optical autofocus system for microscopy utilizing machine learning. This can be useful for experiments with long image data acquisition times that may be impacted by defocusing resulting from drift of components, e.g. due to changes in temperature or mechanical drift. It is also useful for automated slide scanning or multiwell plate imaging where the sample(s) to be imaged may not be in the same horizontal plane throughout the image data acquisition. To address the impact of (thermal or mechanical) fluctuations over time in the optical autofocus system itself, we utilise a convolutional neural network (CNN) that is trained over multiple days to account for such fluctuations. To address the trade-off between axial precision and range of the autofocus, we implement orthogonal optical readouts with separate CNN training data, thereby achieving an accuracy well within the 600 nm depth of field of our 1.3 numerical aperture objective lens over a defocus range of up to approximately +/-100 μm. We characterise the performance of this autofocus system and demonstrate its application to automated multiwell plate single molecule localisation microscopy.

Download Full-text

An Auditory Saliency Pooling-Based LSTM Model for Speech Intelligibility Classification

Symmetry ◽

10.3390/sym13091728 ◽

2021 ◽

Vol 13 (9) ◽

pp. 1728

Author(s):

Ascensión Gallardo-Antolín ◽

Juan M. Montero

Keyword(s):

Speech Intelligibility ◽

Short Term Memory ◽

Oral Communication ◽

External Source ◽

Speech Disorders ◽

Training Data ◽

Support Vector ◽

Learning Networks ◽

Starting Point ◽

Lstm Network

Speech intelligibility is a crucial element in oral communication that can be influenced by multiple elements, such as noise, channel characteristics, or speech disorders. In this paper, we address the task of speech intelligibility classification (SIC) in this last circumstance. Taking our previous works, a SIC system based on an attentional long short-term memory (LSTM) network, as a starting point, we deal with the problem of the inadequate learning of the attention weights due to training data scarcity. For overcoming this issue, the main contribution of this paper is a novel type of weighted pooling (WP) mechanism, called saliency pooling where the WP weights are not automatically learned during the training process of the network, but are obtained from an external source of information, the Kalinli’s auditory saliency model. In this way, it is intended to take advantage of the apparent symmetry between the human auditory attention mechanism and the attentional models integrated into deep learning networks. The developed systems are assessed on the UA-speech dataset that comprises speech uttered by subjects with several dysarthria levels. Results show that all the systems with saliency pooling significantly outperform a reference support vector machine (SVM)-based system and LSTM-based systems with mean pooling and attention pooling, suggesting that Kalinli’s saliency can be successfully incorporated into the LSTM architecture as an external cue for the estimation of the speech intelligibility level.

Download Full-text

Multidimensional CNN-LSTM Network for Automatic Modulation Classification

Electronics ◽

10.3390/electronics10141649 ◽

2021 ◽

Vol 10 (14) ◽

pp. 1649

Author(s):

Na Wang ◽

Yunxia Liu ◽

Liang Ma ◽

Yang Yang ◽

Hongjun Wang

Keyword(s):

Short Term Memory ◽

Modulation Classification ◽

Short Term ◽

One Dimensional ◽

Softmax Classifier ◽

Automatic Modulation Classification ◽

Temporal Features ◽

Long Short Term Memory ◽

Lstm Network ◽

Public Datasets

Automatic modulation classification (AMC) is the premise for signal detection and demodulation applications, especially in non-cooperative communication scenarios. It has been a popular topic for decades and has gained significant progress with the development of deep learning methods. To further improve classification accuracy, a hierarchical multifeature fusion (HMF) based on a multidimensional convolutional neural network (CNN)-long short-term memory (LSTM) network is proposed in this paper. First, a multidimensional CNN module (MD-CNN) is proposed for feature compensation between interactive features extracted by two-dimensional convolutional filters and respective features extracted by one-dimensional filters. Second, learnt features of the MD-CNN module are fed into an LSTM layer for further exploitation of temporal features. Finally, classification results are obtained by the Softmax classifier. The effectiveness of the proposed method is verified by abundant experimental results on two public datasets, RadioML.2016.10a and RadioML.2016.10b. Satisfying results are obtained as compared with state-of-the-art methods.

Download Full-text