spatiotemporal feature Latest Research Papers

Efficient video quality assessment with deeper spatiotemporal feature extraction and integration

Journal of Electronic Imaging ◽

10.1117/1.jei.30.6.063034 ◽

2021 ◽

Vol 30 (06) ◽

Author(s):

Yinhao Liu ◽

Xiaofei Zhou ◽

Haibing Yin ◽

Hongkui Wang ◽

Chenggang Yan

Keyword(s):

Feature Extraction ◽

Quality Assessment ◽

Video Quality ◽

Video Quality Assessment ◽

Spatiotemporal Feature ◽

Efficient Video

Encrypted Traffic Identification Method Based on Multi-scale Spatiotemporal Feature Fusion Model with Attention Mechanism

10.1007/978-981-16-6554-7_92 ◽

2021 ◽

pp. 857-866

Author(s):

Yonghua Huo ◽

Hongwu Ge ◽

Libin Jiao ◽

Bowen Gao ◽

Yang Yang

Keyword(s):

Feature Fusion ◽

Attention Mechanism ◽

Fusion Model ◽

Traffic Identification ◽

Multi Scale ◽

Identification Method ◽

Spatiotemporal Feature ◽

Encrypted Traffic

A non‐intrusive load state identification method considering non‐local spatiotemporal feature

IET Generation Transmission & Distribution ◽

10.1049/gtd2.12330 ◽

2021 ◽

Author(s):

Zhenyu Zhang ◽

Yong Li ◽

Jing Duan ◽

Yilong Duan ◽

Yixiu Guo ◽

...

Keyword(s):

Identification Method ◽

State Identification ◽

Spatiotemporal Feature ◽

Non Local ◽

Load State

Semantic Extraction of Basketball Game Video Combining Domain Knowledge and In-Depth Features

Scientific Programming ◽

10.1155/2021/9080120 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Yufeng Du ◽

Quan Zhao ◽

Xiaochun Lu

Keyword(s):

Domain Knowledge ◽

Team Sports ◽

Time Range ◽

Target Movement ◽

Stream Network ◽

Video Footage ◽

Basketball Game ◽

Spatiotemporal Feature ◽

High Level ◽

Semantic Event

The team sports game video features complex background, fast target movement, and mutual occlusion between targets, which poses great challenges to multiperson collaborative video analysis. This paper proposes a video semantic extraction method that integrates domain knowledge and in-depth features, which can be applied to the analysis of a multiperson collaborative basketball game video, where the semantic event is modeled as an adversarial relationship between two teams of players. We first designed a scheme that combines a dual-stream network and learnable spatiotemporal feature aggregation, which can be used for end-to-end training of video semantic extraction to bridge the gap between low-level features and high-level semantic events. Then, an algorithm based on the knowledge from different video sources is proposed to extract the action semantics. The algorithm gathers local convolutional features in the entire space-time range, which can be used to track the ball/shooter/hoop to realize automatic semantic extraction of basketball game videos. Experiments show that the scheme proposed in this paper can effectively identify the four categories of short, medium, long, free throw, and scoring events and the semantics of athletes’ actions based on the video footage of the basketball game.

Channel-Wise Spatiotemporal Aggregation Technology for Face Video Forensics

Security and Communication Networks ◽

10.1155/2021/5524930 ◽

2021 ◽

Vol 2021 ◽

pp. 1-13

Author(s):

Yujiang Lu ◽

Yaju Liu ◽

Jianwei Fei ◽

Zhihua Xia

Keyword(s):

Feature Fusion ◽

Generative Models ◽

Personal Privacy ◽

Frame Sequence ◽

Face Region ◽

Video Frames ◽

Spatiotemporal Feature ◽

The Face ◽

Spatiotemporal Aggregation ◽

The Difference

Recent progress in deep learning, in particular the generative models, makes it easier to synthesize sophisticated forged faces in videos, leading to severe threats on social media about personal privacy and reputation. It is therefore highly necessary to develop forensics approaches to distinguish those forged videos from the authentic. Existing works are absorbed in exploring frame-level cues but insufficient in leveraging affluent temporal information. Although some approaches identify forgeries from the perspective of motion inconsistency, there is so far not a promising spatiotemporal feature fusion strategy. Towards this end, we propose the Channel-Wise Spatiotemporal Aggregation (CWSA) module to fuse deep features of continuous video frames without any recurrent units. Our approach starts by cropping the face region with some background remained, which transforms the learning objective from manipulations to the difference between pristine and manipulated pixels. A deep convolutional neural network (CNN) with skip connections that are conducive to the preservation of detection-helpful low-level features is then utilized to extract frame-level features. The CWSA module finally makes the real or fake decision by aggregating deep features of the frame sequence. Evaluation against a list of large facial video manipulation benchmarks has illustrated its effectiveness. On all three datasets, FaceForensics++, Celeb-DF, and DeepFake Detection Challenge Preview, the proposed approach outperforms the state-of-the-art methods with significant advantages.

Determining Wave Height from Nearshore Videos Based on Multi-level Spatiotemporal Feature Fusion

10.1109/ijcnn52387.2021.9534048 ◽

2021 ◽

Author(s):

Wei Song ◽

Qi-chao Li ◽

Qi He ◽

Xu Zhou ◽

Yuan-yuan Chen

Keyword(s):

Wave Height ◽

Feature Fusion ◽

Spatiotemporal Feature ◽

Multi Level

Identify autism spectrum disorder via dynamic filter and deep spatiotemporal feature extraction

Signal Processing Image Communication ◽

10.1016/j.image.2021.116195 ◽

2021 ◽

Vol 94 ◽

pp. 116195

Author(s):

Weijie Wei ◽

Zhi Liu ◽

Lijin Huang ◽

Ziqiang Wang ◽

Weiyu Chen ◽

...

Keyword(s):

Autism Spectrum Disorder ◽

Feature Extraction ◽

Autism Spectrum ◽

Spectrum Disorder ◽

Spatiotemporal Feature ◽

Dynamic Filter

Hi-EADN: Hierarchical Excitation Aggregation and Disentanglement Frameworks for Action Recognition Based on Videos

Symmetry ◽

10.3390/sym13040662 ◽

2021 ◽

Vol 13 (4) ◽

pp. 662

Author(s):

Zeyuan Hu ◽

Eung-Joo Lee

Keyword(s):

Action Recognition ◽

Short Range ◽

Semantic Information ◽

Long Distance ◽

Spatiotemporal Feature ◽

Benchmark Datasets ◽

Feature Information ◽

Multiple Frame ◽

High Level ◽

Information Streams

Most existing video action recognition methods mainly rely on high-level semantic information from convolutional neural networks (CNNs) but ignore the discrepancies of different information streams. However, it does not normally consider both long-distance aggregations and short-range motions. Thus, to solve these problems, we propose hierarchical excitation aggregation and disentanglement networks (Hi-EADNs), which include multiple frame excitation aggregation (MFEA) and a feature squeeze-and-excitation hierarchical disentanglement (SEHD) module. MFEA specifically uses long-short range motion modelling and calculates the feature-level temporal difference. The SEHD module utilizes these differences to optimize the weights of each spatiotemporal feature and excite motion-sensitive channels. Moreover, without introducing additional parameters, this feature information is processed with a series of squeezes and excitations, and multiple temporal aggregations with neighbourhoods can enhance the interaction of different motion frames. Extensive experimental results confirm our proposed Hi-EADN method effectiveness on the UCF101 and HMDB51 benchmark datasets, where the top-5 accuracy is 93.5% and 76.96%.

Accumulating confidence for deep neural network object detections and semantic segmentations in sequential UAS imagery through spatiotemporal feature correspondences generated from SfM techniques

Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications III ◽

10.1117/12.2585905 ◽

2021 ◽

Author(s):

Trevor M. Bajkowski ◽

James Alex Hurt ◽

David Huangal ◽

Jeffrey Dale ◽

James Keller ◽

...

Keyword(s):

Neural Network ◽

Deep Neural Network ◽

Spatiotemporal Feature ◽

Feature Correspondences ◽

Network Object

UAV-Assisted Wide Area Multi-Camera Space Alignment Based on Spatiotemporal Feature Map

Remote Sensing ◽

10.3390/rs13061117 ◽

2021 ◽

Vol 13 (6) ◽

pp. 1117

Author(s):

Jing Li ◽

Yuguang Xie ◽

Congcong Li ◽

Yanran Dai ◽

Jiaxin Ma ◽

...

Keyword(s):

Large Scale ◽

Superior Performance ◽

Wide Area ◽

Feature Map ◽

Map Construction ◽

Wide Range ◽

Spatiotemporal Feature ◽

Ground Point ◽

Matching Strategy ◽

Ground Monitoring

In this paper, we investigate the problem of aligning multiple deployed camera into one united coordinate system for cross-camera information sharing and intercommunication. However, the difficulty is greatly increased when faced with large-scale scene under chaotic camera deployment. To address this problem, we propose a UAV-assisted wide area multi-camera space alignment approach based on spatiotemporal feature map. It employs the great global perception of Unmanned Aerial Vehicles (UAVs) to meet the challenge from wide-range environment. Concretely, we first present a novel spatiotemporal feature map construction approach to represent the input aerial and ground monitoring data. In this way, the motion consistency across view is well mined to overcome the great perspective gap between the UAV and ground cameras. To obtain the corresponding relationship between their pixels, we propose a cross-view spatiotemporal matching strategy. Through solving relative relationship with the above air-to-ground point correspondences, all ground cameras can be aligned into one surveillance space. The proposed approach was evaluated in both simulation and real environments qualitatively and quantitatively. Extensive experimental results demonstrate that our system can successfully align all ground cameras with very small pixel error. Additionally, the comparisons with other works on different test situations also verify its superior performance.

spatiotemporal feature
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Efficient video quality assessment with deeper spatiotemporal feature extraction and integration

Encrypted Traffic Identification Method Based on Multi-scale Spatiotemporal Feature Fusion Model with Attention Mechanism

A non‐intrusive load state identification method considering non‐local spatiotemporal feature

Semantic Extraction of Basketball Game Video Combining Domain Knowledge and In-Depth Features

Channel-Wise Spatiotemporal Aggregation Technology for Face Video Forensics

Determining Wave Height from Nearshore Videos Based on Multi-level Spatiotemporal Feature Fusion

Identify autism spectrum disorder via dynamic filter and deep spatiotemporal feature extraction

Hi-EADN: Hierarchical Excitation Aggregation and Disentanglement Frameworks for Action Recognition Based on Videos

Accumulating confidence for deep neural network object detections and semantic segmentations in sequential UAS imagery through spatiotemporal feature correspondences generated from SfM techniques

UAV-Assisted Wide Area Multi-Camera Space Alignment Based on Spatiotemporal Feature Map

Export Citation Format

spatiotemporal featureRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Efficient video quality assessment with deeper spatiotemporal feature extraction and integration

Encrypted Traffic Identification Method Based on Multi-scale Spatiotemporal Feature Fusion Model with Attention Mechanism

A non‐intrusive load state identification method considering non‐local spatiotemporal feature

Semantic Extraction of Basketball Game Video Combining Domain Knowledge and In-Depth Features

Channel-Wise Spatiotemporal Aggregation Technology for Face Video Forensics

Determining Wave Height from Nearshore Videos Based on Multi-level Spatiotemporal Feature Fusion

Identify autism spectrum disorder via dynamic filter and deep spatiotemporal feature extraction

Hi-EADN: Hierarchical Excitation Aggregation and Disentanglement Frameworks for Action Recognition Based on Videos

Accumulating confidence for deep neural network object detections and semantic segmentations in sequential UAS imagery through spatiotemporal feature correspondences generated from SfM techniques

UAV-Assisted Wide Area Multi-Camera Space Alignment Based on Spatiotemporal Feature Map

spatiotemporal feature
Recently Published Documents