Temporal Adaptive Alignment Network for Deep Video Inpainting

Video inpainting aims to synthesize visually pleasant and temporally consistent content in missing regions of video. Due to a variety of motions across different frames, it is highly challenging to utilize effective temporal information to recover videos. Existing deep learning based methods usually estimate optical flow to align frames and thereby exploit useful information between frames. However, these methods tend to generate artifacts once the estimated optical flow is inaccurate. To alleviate above problem, we propose a novel end-to-end Temporal Adaptive Alignment Network(TAAN) for video inpainting. The TAAN aligns reference frames with target frame via implicit motion estimation at a feature level and then reconstruct target frame by taking the aggregated aligned reference frame features as input. In the proposed network, a Temporal Adaptive Alignment (TAA) module based on deformable convolutions is designed to perform temporal alignment in a local, dense and adaptive manner. Both quantitative and qualitative evaluation results show that our method significantly outperforms existing deep learning based methods.

Download Full-text

Spatio-Temporal Deformable Convolution for Compressed Video Quality Enhancement

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6697 ◽

2020 ◽

Vol 34 (07) ◽

pp. 10696-10703 ◽

Cited By ~ 1

Author(s):

Jianing Deng ◽

Li Wang ◽

Shiliang Pu ◽

Cheng Zhuo

Keyword(s):

Optical Flow ◽

Reference Frames ◽

Video Quality ◽

Temporal Information ◽

Quality Enhancement ◽

Compressed Video ◽

Flow Estimation ◽

Temporal Sampling ◽

Target Frame ◽

Spatio Temporal

Recent years have witnessed remarkable success of deep learning methods in quality enhancement for compressed video. To better explore temporal information, existing methods usually estimate optical flow for temporal motion compensation. However, since compressed video could be seriously distorted by various compression artifacts, the estimated optical flow tends to be inaccurate and unreliable, thereby resulting in ineffective quality enhancement. In addition, optical flow estimation for consecutive frames is generally conducted in a pairwise manner, which is computational expensive and inefficient. In this paper, we propose a fast yet effective method for compressed video quality enhancement by incorporating a novel Spatio-Temporal Deformable Fusion (STDF) scheme to aggregate temporal information. Specifically, the proposed STDF takes a target frame along with its neighboring reference frames as input to jointly predict an offset field to deform the spatio-temporal sampling positions of convolution. As a result, complementary information from both target and reference frames can be fused within a single Spatio-Temporal Deformable Convolution (STDC) operation. Extensive experiments show that our method achieves the state-of-the-art performance of compressed video quality enhancement in terms of both accuracy and efficiency.

Download Full-text

Spatio-Temporal Inference Transformer Network for Video Inpainting

International Journal of Image and Graphics ◽

10.1142/s0219467823500079 ◽

2021 ◽

Author(s):

Gajanan Tudavekar ◽

Santosh S. Saraf ◽

Sanjay R. Patil

Keyword(s):

Deep Learning ◽

Qualitative Evaluation ◽

Learning Approaches ◽

Video Inpainting ◽

Video Frames ◽

Temporal Inference ◽

Spatio Temporal ◽

Temporal Dimensions ◽

Better Than ◽

Attention Weight

Video inpainting aims to complete in a visually pleasing way the missing regions in video frames. Video inpainting is an exciting task due to the variety of motions across different frames. The existing methods usually use attention models to inpaint videos by seeking the damaged content from other frames. Nevertheless, these methods suffer due to irregular attention weight from spatio-temporal dimensions, thus giving rise to artifacts in the inpainted video. To overcome the above problem, Spatio-Temporal Inference Transformer Network (STITN) has been proposed. The STITN aligns the frames to be inpainted and concurrently inpaints all the frames, and a spatio-temporal adversarial loss function improves the STITN. Our method performs considerably better than the existing deep learning approaches in quantitative and qualitative evaluation.

Download Full-text

Allocentric to Egocentric Spatial Switching: Impairment in aMCI and Alzheimer's Disease Patients?

Current Alzheimer Research ◽

10.2174/1567205014666171030114821 ◽

2018 ◽

Vol 15 (3) ◽

pp. 229-236 ◽

Cited By ~ 8

Author(s):

Gennaro Ruggiero ◽

Alessandro Iavarone ◽

Tina Iachini

Keyword(s):

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Reference Frame ◽

Reference Frames ◽

Memory Task ◽

Spatial Representations ◽

The Novel ◽

Spatial Memory Task ◽

Switching Condition ◽

Normal Controls

Objective: Deficits in egocentric (subject-to-object) and allocentric (object-to-object) spatial representations, with a mainly allocentric impairment, characterize the first stages of the Alzheimer's disease (AD). Methods: To identify early cognitive signs of AD conversion, some studies focused on amnestic-Mild Cognitive Impairment (aMCI) by reporting alterations in both reference frames, especially the allocentric ones. However, spatial environments in which we move need the cooperation of both reference frames. Such cooperating processes imply that we constantly switch from allocentric to egocentric frames and vice versa. This raises the question of whether alterations of switching abilities might also characterize an early cognitive marker of AD, potentially suitable to detect the conversion from aMCI to dementia. Here, we compared AD and aMCI patients with Normal Controls (NC) on the Ego-Allo- Switching spatial memory task. The task assessed the capacity to use switching (Ego-Allo, Allo-Ego) and non-switching (Ego-Ego, Allo-Allo) verbal judgments about relative distances between memorized stimuli. Results: The novel finding of this study is the neat impairment shown by aMCI and AD in switching from allocentric to egocentric reference frames. Interestingly, in aMCI when the first reference frame was egocentric, the allocentric deficit appeared attenuated. Conclusion: This led us to conclude that allocentric deficits are not always clinically detectable in aMCI since the impairments could be masked when the first reference frame was body-centred. Alongside, AD and aMCI also revealed allocentric deficits in the non-switching condition. These findings suggest that switching alterations would emerge from impairments in hippocampal and posteromedial areas and from concurrent dysregulations in the locus coeruleus-noradrenaline system or pre-frontal cortex.

Download Full-text

Deep learning based keypoint rejection system for underwater visual ego-motion estimation

IFAC-PapersOnLine ◽

10.1016/j.ifacol.2020.12.2420 ◽

2020 ◽

Vol 53 (2) ◽

pp. 9471-9477

Author(s):

Marco Leonardi ◽

Luca Fiori ◽

Annette Stahl

Keyword(s):

Deep Learning ◽

Motion Estimation

Download Full-text

Deep Learning-Based Congestion Detection at Urban Intersections

Sensors ◽

10.3390/s21062052 ◽

2021 ◽

Vol 21 (6) ◽

pp. 2052

Author(s):

Xinghai Yang ◽

Fengjiao Wang ◽

Zhiquan Bai ◽

Feifei Xun ◽

Yulin Zhang ◽

...

Keyword(s):

Deep Learning ◽

Optical Flow ◽

Traffic Congestion ◽

Detection Algorithm ◽

Input Image ◽

Vehicle Speed ◽

Position Information ◽

Traffic State ◽

State Discrimination ◽

Discrimination Method

In this paper, a deep learning-based traffic state discrimination method is proposed to detect traffic congestion at urban intersections. The detection algorithm includes two parts, global speed detection and a traffic state discrimination algorithm. Firstly, the region of interest (ROI) is selected as the road intersection from the input image of the You Only Look Once (YOLO) v3 object detection algorithm for vehicle target detection. The Lucas-Kanade (LK) optical flow method is employed to calculate the vehicle speed. Then, the corresponding intersection state can be obtained based on the vehicle speed and the discrimination algorithm. The detection of the vehicle takes the position information obtained by YOLOv3 as the input of the LK optical flow algorithm and forms an optical flow vector to complete the vehicle speed detection. Experimental results show that the detection algorithm can detect the vehicle speed and traffic state discrimination method can judge the traffic state accurately, which has a strong anti-interference ability and meets the practical application requirements.

Download Full-text

Motion Estimation via Scale-Space in Unsupervised Deep Learning

2021 International Conference on Information Networking (ICOIN) ◽

10.1109/icoin50884.2021.9334004 ◽

2021 ◽

Author(s):

Jaehwan Kim ◽

Bilel Derbel ◽

Byung-Woo Hong

Keyword(s):

Deep Learning ◽

Motion Estimation ◽

Scale Space ◽

Unsupervised Deep Learning

Download Full-text

Reference frames in spatial communication for navigation and sports: an empirical study in ultimate frisbee players

Cognitive Research Principles and Implications ◽

10.1186/s41235-020-00254-1 ◽

2020 ◽

Vol 5 (1) ◽

Author(s):

Steven M. Weisberg ◽

Anjan Chatterjee

Keyword(s):

Reference Frame ◽

Reference Frames ◽

Spatial Information ◽

Spatial Reference ◽

Spatial Reference Frames ◽

The North ◽

Communication Task ◽

Spatial Communication ◽

Spatial Domains ◽

Ultimate Frisbee

Abstract Background Reference frames ground spatial communication by mapping ambiguous language (for example, navigation: “to the left”) to properties of the speaker (using a Relative reference frame: “to my left”) or the world (Absolute reference frame: “to the north”). People’s preferences for reference frame vary depending on factors like their culture, the specific task in which they are engaged, and differences among individuals. Although most people are proficient with both reference frames, it is unknown whether preference for reference frames is stable within people or varies based on the specific spatial domain. These alternatives are difficult to adjudicate because navigation is one of few spatial domains that can be naturally solved using multiple reference frames. That is, while spatial navigation directions can be specified using Absolute or Relative reference frames (“go north” vs “go left”), other spatial domains predominantly use Relative reference frames. Here, we used two domains to test the stability of reference frame preference: one based on navigating a four-way intersection; and the other based on the sport of ultimate frisbee. We recruited 58 ultimate frisbee players to complete an online experiment. We measured reaction time and accuracy while participants solved spatial problems in each domain using verbal prompts containing either Relative or Absolute reference frames. Details of the task in both domains were kept as similar as possible while remaining ecologically plausible so that reference frame preference could emerge. Results We pre-registered a prediction that participants would be faster using their preferred reference frame type and that this advantage would correlate across domains; we did not find such a correlation. Instead, the data reveal that people use distinct reference frames in each domain. Conclusion This experiment reveals that spatial reference frame types are not stable and may be differentially suited to specific domains. This finding has broad implications for communicating spatial information by offering an important consideration for how spatial reference frames are used in communication: task constraints may affect reference frame choice as much as individual factors or culture.

Download Full-text

Ego-Motion Estimation Using Recurrent Convolutional Neural Networks through Optical Flow Learning

Electronics ◽

10.3390/electronics10030222 ◽

2021 ◽

Vol 10 (3) ◽

pp. 222

Author(s):

Baigan Zhao ◽

Yingping Huang ◽

Hongjian Wei ◽

Xing Hu

Keyword(s):

Neural Network ◽

Neural Networks ◽

Motion Estimation ◽

Optical Flow ◽

Feature Representation ◽

Sequential Learning ◽

Navigation Systems ◽

Motion State ◽

Training Schemes ◽

Sequential Images

Visual odometry (VO) refers to incremental estimation of the motion state of an agent (e.g., vehicle and robot) by using image information, and is a key component of modern localization and navigation systems. Addressing the monocular VO problem, this paper presents a novel end-to-end network for estimation of camera ego-motion. The network learns the latent subspace of optical flow (OF) and models sequential dynamics so that the motion estimation is constrained by the relations between sequential images. We compute the OF field of consecutive images and extract the latent OF representation in a self-encoding manner. A Recurrent Neural Network is then followed to examine the OF changes, i.e., to conduct sequential learning. The extracted sequential OF subspace is used to compute the regression of the 6-dimensional pose vector. We derive three models with different network structures and different training schemes: LS-CNN-VO, LS-AE-VO, and LS-RCNN-VO. Particularly, we separately train the encoder in an unsupervised manner. By this means, we avoid non-convergence during the training of the whole network and allow more generalized and effective feature representation. Substantial experiments have been conducted on KITTI and Malaga datasets, and the results demonstrate that our LS-RCNN-VO outperforms the existing learning-based VO approaches.

Download Full-text

Distributed video coding with Multiple Reference Frame motion estimation

Proceedings of 2011 International Conference on Computer Science and Network Technology ◽

10.1109/iccsnt.2011.6182497 ◽

2011 ◽

Author(s):

Rui Min ◽

Le-nan Wu ◽

Bao-min Qiao ◽

Jian-min He

Keyword(s):

Motion Estimation ◽

Video Coding ◽

Reference Frame ◽

Distributed Video Coding ◽

Frame Motion ◽

Multiple Reference

Download Full-text

Fusion of rain radar images and wind forecasts in adeep learning model applied to rain nowcasting

10.5194/egusphere-egu21-11990 ◽

2021 ◽

Author(s):

Anastase Charantonis ◽

Vincent Bouget ◽

Dominique Béréziat ◽

Julien Brajard ◽

Arthur Filoche

Keyword(s):

Deep Learning ◽

Optical Flow ◽

Weather Forecast ◽

Forecast Model ◽

Radar Data ◽

Learning Model ◽

Rainfall Forecasting ◽

Radar Images ◽

Deep Learning Model ◽

Rain Radar

<p>Short or mid-term rainfall forecasting is a major task with several environmental applications such as agricultural management or flood risks monitoring. Existing data-driven approaches, especially deep learning models, have shown significant skill at this task, using only rainfall radar images as inputs. In order to determine whether using other meteorological parameters such as wind would improve forecasts, we trained a deep learning model on a fusion of rainfall radar images and wind velocity produced by a weather forecast model. The network was compared to a similar architecture trained only on radar data, to a basic persistence model and to an approach based on optical flow. Our network outperforms by 8% the F1-score calculated for the optical flow on moderate and higher rain events for forecasts at a horizon time of 30 minutes. Furthermore, it outperforms by 7% the same architecture trained using only rainfall radar images. Merging rain and wind data has also proven to stabilize the training process and enabled significant improvement especially on the difficult-to-predict high precipitation rainfalls. These results can also be found in Bouget, V., B&#233;r&#233;ziat, D., Brajard, J., Charantonis, A., & Filoche, A. (2020). Fusion of rain radar images and wind forecasts in a deep learning model applied to rain nowcasting. arXiv preprint arXiv:2012.05015</p>

Download Full-text