scholarly journals FUTUREGAN: ANTICIPATING THE FUTURE FRAMES OF VIDEO SEQUENCES USING SPATIO-TEMPORAL 3D CONVOLUTIONS IN PROGRESSIVELY GROWING GANS

Author(s):  
S. Aigner ◽  
M. Körner

<p><strong>Abstract.</strong> We introduce a new <i>encoder-decoder GAN</i> model, <i>FutureGAN</i>, that predicts future frames of a video sequence conditioned on a sequence of past frames. During training, the networks solely receive the raw pixel values as an input, without relying on additional constraints or dataset specific conditions. To capture both the spatial and temporal components of a video sequence, spatio-temporal 3d convolutions are used in all encoder and decoder modules. Further, we utilize concepts of the existing <i>progressively growing GAN (PGGAN)</i> that achieves high-quality results on generating high-resolution single images. The FutureGAN model extends this concept to the complex task of video prediction. We conducted experiments on three different datasets, <i>MovingMNIST</i>, <i>KTH Action</i>, and <i>Cityscapes</i>. Our results show that the model learned representations to transform the information of an input sequence into a plausible future sequence effectively for all three datasets. The main advantage of the FutureGAN framework is that it is applicable to various different datasets without additional changes, whilst achieving stable results that are competitive to the state-of-the-art in video prediction. The code to reproduce the results of this paper is publicly available at https://github.com/TUM-LMF/FutureGAN.</p>

Symmetry ◽  
2021 ◽  
Vol 13 (4) ◽  
pp. 630
Author(s):  
Wenjia Niu ◽  
Kewen Xia ◽  
Yongke Pan

In general dynamic scenes, blurring is the result of the motion of multiple objects, camera shaking or scene depth variations. As an inverse process, deblurring extracts a sharp video sequence from the information contained in one single blurry image—it is itself an ill-posed computer vision problem. To reconstruct these sharp frames, traditional methods aim to build several convolutional neural networks (CNN) to generate different frames, resulting in expensive computation. To vanquish this problem, an innovative framework which can generate several sharp frames based on one CNN model is proposed. The motion-based image is put into our framework and the spatio-temporal information is encoded via several convolutional and pooling layers, and the output of our model is several sharp frames. Moreover, a blurry image does not have one-to-one correspondence with any sharp video sequence, since different video sequences can create similar blurry images, so neither the traditional pixel2pixel nor perceptual loss is suitable for focusing on non-aligned data. To alleviate this problem and model the blurring process, a novel contiguous blurry loss function is proposed which focuses on measuring the loss of non-aligned data. Experimental results show that the proposed model combined with the contiguous blurry loss can generate sharp video sequences efficiently and perform better than state-of-the-art methods.


2013 ◽  
Vol 347-350 ◽  
pp. 3500-3504
Author(s):  
Xiao Ran Guo ◽  
Shao Hui Cui ◽  
Fang Dan

This article presents a novel approach to extract robust local feature points of video sequence in digital image stabilization system. Robust Harris-SIFT detector is proposed to select the most stable SIFT key points in the video sequence where image motion is happened due to vehicle or platform vibration. Experimental results show that the proposed scheme is robust to various transformations of video sequences, such as translation, rotation and scaling, as well as blurring. Compared with the current state-of-the-art schemes, the proposed scheme yields better performances.


Author(s):  
Ruijing Yang ◽  
Ziyu Guan ◽  
Zitong Yu ◽  
Xiaoyi Feng ◽  
Jinye Peng ◽  
...  

Automatic pain recognition is paramount for medical diagnosis and treatment. The existing works fall into three categories: assessing facial appearance changes, exploiting physiological cues, or fusing them in a multi-modal manner. However, (1) appearance changes are easily affected by subjective factors which impedes objective pain recognition. Besides, the appearance-based approaches ignore long-range spatial-temporal dependencies that are important for modeling expressions over time; (2) the physiological cues are obtained by attaching sensors on human body, which is inconvenient and uncomfortable. In this paper, we present a novel multi-task learning framework which encodes both appearance changes and physiological cues in a non-contact manner for pain recognition. The framework is able to capture both local and long-range dependencies via the proposed attention mechanism for the learned appearance representations, which are further enriched by temporally attended physiological cues (remote photoplethysmography, rPPG) that are recovered from videos in the auxiliary task. This framework is dubbed rPPG-enriched Spatio-Temporal Attention Network (rSTAN) and allows us to establish the state-of-the-art performance of non-contact pain recognition on publicly available pain databases. It demonstrates that rPPG predictions can be used as an auxiliary task to facilitate non-contact automatic pain recognition.


2017 ◽  
Vol 5 (4RACSIT) ◽  
pp. 97-104
Author(s):  
Satish Kumar

This paper proposed and developed hybrid approach for extraction of key-frames from video sequences from stationary camera. This method first uses histogram difference to extract the candidate key frames from the video sequences, later using Background subtraction algorithm (Mixture of Gaussian) was used to fine tune the final key frames from the video sequences. This developed approach show considerable improvement over the state-of-the art techniques and same is reported in this paper.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Christopher Plata ◽  
Martin Nellessen ◽  
Rebecca Roth ◽  
Hannes Ecker ◽  
Bernd W. Böttiger ◽  
...  

Abstract Background Although not routinely established during cardiopulmonary resuscitation (CPR), video-assisted CPR has been described as beneficial in the communication with emergency medical service (EMS) authorities in out-of-hospital cardiac arrest scenarios. Since the influence of video quality has not been investigated systematically and due to variation of quality of a live-stream video during video-assisted CPR, we investigated the influence of different video quality levels during the evaluation of CPR performance in video sequences. Methods Seven video sequences of CPR performance were recorded in high quality and artificially reduced to medium and low quality afterwards. Video sequences showed either correct CPR performance or one of six typical errors: too low and too high compression rate, superficial and increased compression depth, wrong hand position and incomplete release. Video sequences were randomly assigned to the different quality levels. During the randomised and double-blinded evaluation process, 46 paramedics and 47 emergency physicians evaluated seven video sequences of CPR performance in different quality levels (high, medium and low resolution). Results Of 650 video sequences, CPR performance was evaluable in 98.2%. CPR performance was correctly evaluated in 71.5% at low quality, in 76.8% at medium quality, and in 77.3% at high quality level, showing no significant differences depending on video quality (p = 0.306). In the subgroup analysis, correct classification of increased compression depth showed significant differences depending on video quality (p = 0.006). Further, there were significant differences in correct CPR classification depending on the presented error (p < 0.001). Allegedly errors, that were not shown in the video sequence, were classified in 28.3%, insignificantly depending on video quality. Correct evaluation did not show significant interprofessional differences (p = 0.468). Conclusion Video quality has no significant impact on the evaluation of CPR in a video sequence. Even low video quality leads to an acceptable rate of correct evaluation of CPR performance. There is a significant difference in evaluation of CPR performance depending on the presented error in a video sequence. Trial registration German Clinical Trial Register (Registration number DRKS00015297) Registered on 2018-08-21.


2020 ◽  
Vol 34 (07) ◽  
pp. 13098-13105 ◽  
Author(s):  
Linchao Zhu ◽  
Du Tran ◽  
Laura Sevilla-Lara ◽  
Yi Yang ◽  
Matt Feiszli ◽  
...  

Typical video classification methods often divide a video into short clips, do inference on each clip independently, then aggregate the clip-level predictions to generate the video-level results. However, processing visually similar clips independently ignores the temporal structure of the video sequence, and increases the computational cost at inference time. In this paper, we propose a novel framework named FASTER, i.e., Feature Aggregation for Spatio-TEmporal Redundancy. FASTER aims to leverage the redundancy between neighboring clips and reduce the computational cost by learning to aggregate the predictions from models of different complexities. The FASTER framework can integrate high quality representations from expensive models to capture subtle motion information and lightweight representations from cheap models to cover scene changes in the video. A new recurrent network (i.e., FAST-GRU) is designed to aggregate the mixture of different representations. Compared with existing approaches, FASTER can reduce the FLOPs by over 10× while maintaining the state-of-the-art accuracy across popular datasets, such as Kinetics, UCF-101 and HMDB-51.


Author(s):  
Kalpesh R. Jadav ◽  
Arvind R. Yadav

Shadow leads to failure of moving target positioning, segmentation, tracking, and classification in the video surveillance system thus shadow detection and removal is essential for further computer vision process. The existing state-of-the-art methods for dynamic shadow detection have produced a high discrimination rate but a poor detection rate (foreground pixels are classified as shadow pixels). This paper proposes an effective method for dynamic shadow detection and removal based on intensity ratio along with frame difference, gamma correction, and morphology operations. The performance of the proposed method has been tested on two outdoor ATON datasets, namely, highway-I and highway-III for vehicle tracking systems. The proposed method has produced a discrimination rate of 89.07% and a detection rate of 80.79% for highway-I video sequences. Similarly, for a highway-III video sequence, the discrimination rate of 85.60% and detection rate of 84.05% have been obtained. Investigational outcomes show that the proposed method is the simple, steadiest, and robust for dynamic shadow detection on the dataset used in this work.


2020 ◽  
Vol 2020 (4) ◽  
pp. 116-1-116-7
Author(s):  
Raphael Antonius Frick ◽  
Sascha Zmudzinski ◽  
Martin Steinebach

In recent years, the number of forged videos circulating on the Internet has immensely increased. Software and services to create such forgeries have become more and more accessible to the public. In this regard, the risk of malicious use of forged videos has risen. This work proposes an approach based on the Ghost effect knwon from image forensics for detecting forgeries in videos that can replace faces in video sequences or change the mimic of a face. The experimental results show that the proposed approach is able to identify forgery in high-quality encoded video content.


2018 ◽  
Vol 14 (12) ◽  
pp. 1915-1960 ◽  
Author(s):  
Rudolf Brázdil ◽  
Andrea Kiss ◽  
Jürg Luterbacher ◽  
David J. Nash ◽  
Ladislava Řezníčková

Abstract. The use of documentary evidence to investigate past climatic trends and events has become a recognised approach in recent decades. This contribution presents the state of the art in its application to droughts. The range of documentary evidence is very wide, including general annals, chronicles, memoirs and diaries kept by missionaries, travellers and those specifically interested in the weather; records kept by administrators tasked with keeping accounts and other financial and economic records; legal-administrative evidence; religious sources; letters; songs; newspapers and journals; pictographic evidence; chronograms; epigraphic evidence; early instrumental observations; society commentaries; and compilations and books. These are available from many parts of the world. This variety of documentary information is evaluated with respect to the reconstruction of hydroclimatic conditions (precipitation, drought frequency and drought indices). Documentary-based drought reconstructions are then addressed in terms of long-term spatio-temporal fluctuations, major drought events, relationships with external forcing and large-scale climate drivers, socio-economic impacts and human responses. Documentary-based drought series are also considered from the viewpoint of spatio-temporal variability for certain continents, and their employment together with hydroclimate reconstructions from other proxies (in particular tree rings) is discussed. Finally, conclusions are drawn, and challenges for the future use of documentary evidence in the study of droughts are presented.


Sign in / Sign up

Export Citation Format

Share Document