FUTUREGAN: ANTICIPATING THE FUTURE FRAMES OF VIDEO SEQUENCES USING SPATIO-TEMPORAL 3D CONVOLUTIONS IN PROGRESSIVELY GROWING GANS

Abstract. We introduce a new encoder-decoder GAN model, FutureGAN, that predicts future frames of a video sequence conditioned on a sequence of past frames. During training, the networks solely receive the raw pixel values as an input, without relying on additional constraints or dataset specific conditions. To capture both the spatial and temporal components of a video sequence, spatio-temporal 3d convolutions are used in all encoder and decoder modules. Further, we utilize concepts of the existing progressively growing GAN (PGGAN) that achieves high-quality results on generating high-resolution single images. The FutureGAN model extends this concept to the complex task of video prediction. We conducted experiments on three different datasets, MovingMNIST, KTH Action, and Cityscapes. Our results show that the model learned representations to transform the information of an input sequence into a plausible future sequence effectively for all three datasets. The main advantage of the FutureGAN framework is that it is applicable to various different datasets without additional changes, whilst achieving stable results that are competitive to the state-of-the-art in video prediction. The code to reproduce the results of this paper is publicly available at https://github.com/TUM-LMF/FutureGAN.

Download Full-text

Contiguous Loss for Motion-Based, Non-Aligned Image Deblurring

Symmetry ◽

10.3390/sym13040630 ◽

2021 ◽

Vol 13 (4) ◽

pp. 630

Author(s):

Wenjia Niu ◽

Kewen Xia ◽

Yongke Pan

Keyword(s):

Video Sequence ◽

State Of The Art ◽

Video Sequences ◽

Dynamic Scenes ◽

Multiple Objects ◽

Proposed Model ◽

Ill Posed ◽

Spatio Temporal ◽

Better Than ◽

General Dynamic

In general dynamic scenes, blurring is the result of the motion of multiple objects, camera shaking or scene depth variations. As an inverse process, deblurring extracts a sharp video sequence from the information contained in one single blurry image—it is itself an ill-posed computer vision problem. To reconstruct these sharp frames, traditional methods aim to build several convolutional neural networks (CNN) to generate different frames, resulting in expensive computation. To vanquish this problem, an innovative framework which can generate several sharp frames based on one CNN model is proposed. The motion-based image is put into our framework and the spatio-temporal information is encoded via several convolutional and pooling layers, and the output of our model is several sharp frames. Moreover, a blurry image does not have one-to-one correspondence with any sharp video sequence, since different video sequences can create similar blurry images, so neither the traditional pixel2pixel nor perceptual loss is suitable for focusing on non-aligned data. To alleviate this problem and model the blurring process, a novel contiguous blurry loss function is proposed which focuses on measuring the loss of non-aligned data. Experimental results show that the proposed model combined with the contiguous blurry loss can generate sharp video sequences efficiently and perform better than state-of-the-art methods.

Download Full-text

Robust Feature Points Extraction Based on Harris and SIFT

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.347-350.3500 ◽

2013 ◽

Vol 347-350 ◽

pp. 3500-3504

Author(s):

Xiao Ran Guo ◽

Shao Hui Cui ◽

Fang Dan

Keyword(s):

Video Sequence ◽

State Of The Art ◽

Image Motion ◽

Video Sequences ◽

Stabilization System ◽

Feature Points ◽

Image Stabilization ◽

Current State ◽

Novel Approach ◽

Key Points

This article presents a novel approach to extract robust local feature points of video sequence in digital image stabilization system. Robust Harris-SIFT detector is proposed to select the most stable SIFT key points in the video sequence where image motion is happened due to vehicle or platform vibration. Experimental results show that the proposed scheme is robust to various transformations of video sequences, such as translation, rotation and scaling, as well as blurring. Compared with the current state-of-the-art schemes, the proposed scheme yields better performances.

Download Full-text

Non-contact Pain Recognition from Video Sequences with Remote Physiological Measurements Prediction

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/170 ◽

2021 ◽

Author(s):

Ruijing Yang ◽

Ziyu Guan ◽

Zitong Yu ◽

Xiaoyi Feng ◽

Jinye Peng ◽

...

Keyword(s):

Long Range ◽

Medical Diagnosis ◽

State Of The Art ◽

Video Sequences ◽

Learning Framework ◽

Remote Photoplethysmography ◽

Spatio Temporal ◽

Pain Recognition ◽

Appearance Changes ◽

Over Time

Automatic pain recognition is paramount for medical diagnosis and treatment. The existing works fall into three categories: assessing facial appearance changes, exploiting physiological cues, or fusing them in a multi-modal manner. However, (1) appearance changes are easily affected by subjective factors which impedes objective pain recognition. Besides, the appearance-based approaches ignore long-range spatial-temporal dependencies that are important for modeling expressions over time; (2) the physiological cues are obtained by attaching sensors on human body, which is inconvenient and uncomfortable. In this paper, we present a novel multi-task learning framework which encodes both appearance changes and physiological cues in a non-contact manner for pain recognition. The framework is able to capture both local and long-range dependencies via the proposed attention mechanism for the learned appearance representations, which are further enriched by temporally attended physiological cues (remote photoplethysmography, rPPG) that are recovered from videos in the auxiliary task. This framework is dubbed rPPG-enriched Spatio-Temporal Attention Network (rSTAN) and allows us to establish the state-of-the-art performance of non-contact pain recognition on publicly available pain databases. It demonstrates that rPPG predictions can be used as an auxiliary task to facilitate non-contact automatic pain recognition.

Download Full-text

HYBRID APPROACH FOR KEY FRAME EXTRACTION FROM VIDEO SEQUENCE

International Journal of Research -GRANTHAALAYAH ◽

10.29121/granthaalayah.v5.i4racsit.2017.3361 ◽

2017 ◽

Vol 5 (4RACSIT) ◽

pp. 97-104

Author(s):

Satish Kumar

Keyword(s):

Background Subtraction ◽

Video Sequence ◽

State Of The Art ◽

Hybrid Approach ◽

Considerable Improvement ◽

Video Sequences ◽

Key Frame ◽

Key Frames ◽

Mixture Of Gaussian ◽

Fine Tune

This paper proposed and developed hybrid approach for extraction of key-frames from video sequences from stationary camera. This method first uses histogram difference to extract the candidate key frames from the video sequences, later using Background subtraction algorithm (Mixture of Gaussian) was used to fine tune the final key frames from the video sequences. This developed approach show considerable improvement over the state-of-the art techniques and same is reported in this paper.

Download Full-text

Impact of video quality when evaluating video-assisted cardiopulmonary resuscitation: a randomized, controlled simulation trial

BMC Emergency Medicine ◽

10.1186/s12873-021-00486-4 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Christopher Plata ◽

Martin Nellessen ◽

Rebecca Roth ◽

Hannes Ecker ◽

Bernd W. Böttiger ◽

...

Keyword(s):

Cardiopulmonary Resuscitation ◽

Video Sequence ◽

Video Quality ◽

Video Sequences ◽

Hand Position ◽

High Quality ◽

Correct Evaluation ◽

Compression Depth ◽

Video Assisted ◽

Significant Difference

Abstract Background Although not routinely established during cardiopulmonary resuscitation (CPR), video-assisted CPR has been described as beneficial in the communication with emergency medical service (EMS) authorities in out-of-hospital cardiac arrest scenarios. Since the influence of video quality has not been investigated systematically and due to variation of quality of a live-stream video during video-assisted CPR, we investigated the influence of different video quality levels during the evaluation of CPR performance in video sequences. Methods Seven video sequences of CPR performance were recorded in high quality and artificially reduced to medium and low quality afterwards. Video sequences showed either correct CPR performance or one of six typical errors: too low and too high compression rate, superficial and increased compression depth, wrong hand position and incomplete release. Video sequences were randomly assigned to the different quality levels. During the randomised and double-blinded evaluation process, 46 paramedics and 47 emergency physicians evaluated seven video sequences of CPR performance in different quality levels (high, medium and low resolution). Results Of 650 video sequences, CPR performance was evaluable in 98.2%. CPR performance was correctly evaluated in 71.5% at low quality, in 76.8% at medium quality, and in 77.3% at high quality level, showing no significant differences depending on video quality (p = 0.306). In the subgroup analysis, correct classification of increased compression depth showed significant differences depending on video quality (p = 0.006). Further, there were significant differences in correct CPR classification depending on the presented error (p < 0.001). Allegedly errors, that were not shown in the video sequence, were classified in 28.3%, insignificantly depending on video quality. Correct evaluation did not show significant interprofessional differences (p = 0.468). Conclusion Video quality has no significant impact on the evaluation of CPR in a video sequence. Even low video quality leads to an acceptable rate of correct evaluation of CPR performance. There is a significant difference in evaluation of CPR performance depending on the presented error in a video sequence. Trial registration German Clinical Trial Register (Registration number DRKS00015297) Registered on 2018-08-21.

Download Full-text

FASTER Recurrent Networks for Efficient Video Classification

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.7012 ◽

2020 ◽

Vol 34 (07) ◽

pp. 13098-13105 ◽

Cited By ~ 2

Author(s):

Linchao Zhu ◽

Du Tran ◽

Laura Sevilla-Lara ◽

Yi Yang ◽

Matt Feiszli ◽

...

Keyword(s):

Video Sequence ◽

State Of The Art ◽

Temporal Structure ◽

Computational Cost ◽

Recurrent Network ◽

Recurrent Networks ◽

Video Classification ◽

Spatio Temporal ◽

Temporal Redundancy ◽

Efficient Video

Typical video classification methods often divide a video into short clips, do inference on each clip independently, then aggregate the clip-level predictions to generate the video-level results. However, processing visually similar clips independently ignores the temporal structure of the video sequence, and increases the computational cost at inference time. In this paper, we propose a novel framework named FASTER, i.e., Feature Aggregation for Spatio-TEmporal Redundancy. FASTER aims to leverage the redundancy between neighboring clips and reduce the computational cost by learning to aggregate the predictions from models of different complexities. The FASTER framework can integrate high quality representations from expensive models to capture subtle motion information and lightweight representations from cheap models to cover scene changes in the video. A new recurrent network (i.e., FAST-GRU) is designed to aggregate the mixture of different representations. Compared with existing approaches, FASTER can reduce the FLOPs by over 10× while maintaining the state-of-the-art accuracy across popular datasets, such as Kinetics, UCF-101 and HMDB-51.

Download Full-text

Dynamic Shadow Detection and Removal for Vehicle Tracking System

International Journal of Image and Graphics ◽

10.1142/s0219467822500504 ◽

2021 ◽

Author(s):

Kalpesh R. Jadav ◽

Arvind R. Yadav

Keyword(s):

Detection Rate ◽

Video Sequence ◽

State Of The Art ◽

Tracking System ◽

Shadow Detection ◽

Vehicle Tracking ◽

Video Sequences ◽

Frame Difference ◽

Discrimination Rate ◽

High Discrimination

Shadow leads to failure of moving target positioning, segmentation, tracking, and classification in the video surveillance system thus shadow detection and removal is essential for further computer vision process. The existing state-of-the-art methods for dynamic shadow detection have produced a high discrimination rate but a poor detection rate (foreground pixels are classified as shadow pixels). This paper proposes an effective method for dynamic shadow detection and removal based on intensity ratio along with frame difference, gamma correction, and morphology operations. The performance of the proposed method has been tested on two outdoor ATON datasets, namely, highway-I and highway-III for vehicle tracking systems. The proposed method has produced a discrimination rate of 89.07% and a detection rate of 80.79% for highway-I video sequences. Similarly, for a highway-III video sequence, the discrimination rate of 85.60% and detection rate of 84.05% have been obtained. Investigational outcomes show that the proposed method is the simple, steadiest, and robust for dynamic shadow detection on the dataset used in this work.

Download Full-text

High quality laser micromachining of silicon: A crystallographic comparison of process results achieved with state-of-the-art tripled Nd:YAG and IR lasers

International Congress on Applications of Lasers & Electro-Optics ◽

10.2351/1.5061114 ◽

2007 ◽

Author(s):

Claudio Ferrari ◽

Lucia Nasi ◽

Werner Wiechmann ◽

Sergio Pellegrino

Keyword(s):

State Of The Art ◽

Laser Micromachining ◽

High Quality

Download Full-text

Detecting “DeepFakes” in H.264 Video Data Using Compression Ghost Artifacts

Electronic Imaging ◽

10.2352/issn.2470-1173.2020.4.mwsf-116 ◽

2020 ◽

Vol 2020 (4) ◽

pp. 116-1-116-7

Author(s):

Raphael Antonius Frick ◽

Sascha Zmudzinski ◽

Martin Steinebach

Keyword(s):

Image Forensics ◽

Video Data ◽

Experimental Results ◽

Video Sequences ◽

The Internet ◽

Video Content ◽

High Quality ◽

The Public

In recent years, the number of forged videos circulating on the Internet has immensely increased. Software and services to create such forgeries have become more and more accessible to the public. In this regard, the risk of malicious use of forged videos has risen. This work proposes an approach based on the Ghost effect knwon from image forensics for detecting forgeries in videos that can replace faces in video sequences or change the mimic of a face. The experimental results show that the proposed approach is able to identify forgery in high-quality encoded video content.

Download Full-text

Documentary data and the study of past droughts: a global state of the art

Climate of the Past ◽

10.5194/cp-14-1915-2018 ◽

2018 ◽

Vol 14 (12) ◽

pp. 1915-1960 ◽

Cited By ~ 34

Author(s):

Rudolf Brázdil ◽

Andrea Kiss ◽

Jürg Luterbacher ◽

David J. Nash ◽

Ladislava Řezníčková

Keyword(s):

Large Scale ◽

State Of The Art ◽

Drought Indices ◽

Documentary Evidence ◽

Climatic Trends ◽

Instrumental Observations ◽

Spatio Temporal ◽

Epigraphic Evidence ◽

Administrative Evidence

Abstract. The use of documentary evidence to investigate past climatic trends and events has become a recognised approach in recent decades. This contribution presents the state of the art in its application to droughts. The range of documentary evidence is very wide, including general annals, chronicles, memoirs and diaries kept by missionaries, travellers and those specifically interested in the weather; records kept by administrators tasked with keeping accounts and other financial and economic records; legal-administrative evidence; religious sources; letters; songs; newspapers and journals; pictographic evidence; chronograms; epigraphic evidence; early instrumental observations; society commentaries; and compilations and books. These are available from many parts of the world. This variety of documentary information is evaluated with respect to the reconstruction of hydroclimatic conditions (precipitation, drought frequency and drought indices). Documentary-based drought reconstructions are then addressed in terms of long-term spatio-temporal fluctuations, major drought events, relationships with external forcing and large-scale climate drivers, socio-economic impacts and human responses. Documentary-based drought series are also considered from the viewpoint of spatio-temporal variability for certain continents, and their employment together with hydroclimate reconstructions from other proxies (in particular tree rings) is discussed. Finally, conclusions are drawn, and challenges for the future use of documentary evidence in the study of droughts are presented.

Download Full-text