2004 ◽  
Vol 13 (6) ◽  
pp. 638-655 ◽  
Author(s):  
Hyung W. Kang ◽  
Sung Yong Shin

Tour into the picture (TIP), proposed by Horry et al. (Horry, Anjyo, & Arai, 1997, ACM SIGGRAPH '97 Conference Proceedings, 225–232) is a method for generating a sequence of walk-through images from a single reference image. By navigating a 3D scene model constructed from the image, TIP provides convincing 3D effects. This paper presents a comprehensive scheme for creating walk-through images from a video sequence by generalizing the idea of TIP. To address various problems in dealing with a video sequence rather than a single image, the proposed scheme is designed to have the following features: First, it incorporates a new modeling scheme based on a vanishing circle identified in the video, assuming that the input video contains a negligible amount of motion parallax effects and that dynamic objects move on a flat terrain. Second, we propose a novel scheme for automatic background detection from the video, based on 4-parameter motion model and statistical background color estimation. Third, to assist the extraction of static or dynamic foreground objects from video, we devised a semiautomatic boundary-segmentation scheme based on enhanced lane (Kang & Shin, 2002, Graphical Models, 64 (5), 282–303). The purpose of this work is to let users experience the feel of navigating into a video sequence with their own interpretation and imagination about a given scene. The proposed scheme covers various types of video films of dynamic scenes, such as sports coverage, cartoon animation, and movie films, in which objects are continuously changing their shapes and locations. It can also be used to produce a variety of synthetic video sequences by importing and merging dynamic foreign objects with the original video.


Symmetry ◽  
2021 ◽  
Vol 13 (4) ◽  
pp. 630
Author(s):  
Wenjia Niu ◽  
Kewen Xia ◽  
Yongke Pan

In general dynamic scenes, blurring is the result of the motion of multiple objects, camera shaking or scene depth variations. As an inverse process, deblurring extracts a sharp video sequence from the information contained in one single blurry image—it is itself an ill-posed computer vision problem. To reconstruct these sharp frames, traditional methods aim to build several convolutional neural networks (CNN) to generate different frames, resulting in expensive computation. To vanquish this problem, an innovative framework which can generate several sharp frames based on one CNN model is proposed. The motion-based image is put into our framework and the spatio-temporal information is encoded via several convolutional and pooling layers, and the output of our model is several sharp frames. Moreover, a blurry image does not have one-to-one correspondence with any sharp video sequence, since different video sequences can create similar blurry images, so neither the traditional pixel2pixel nor perceptual loss is suitable for focusing on non-aligned data. To alleviate this problem and model the blurring process, a novel contiguous blurry loss function is proposed which focuses on measuring the loss of non-aligned data. Experimental results show that the proposed model combined with the contiguous blurry loss can generate sharp video sequences efficiently and perform better than state-of-the-art methods.


2019 ◽  
Vol 63 (5) ◽  
pp. 50401-1-50401-7 ◽  
Author(s):  
Jing Chen ◽  
Jie Liao ◽  
Huanqiang Zeng ◽  
Canhui Cai ◽  
Kai-Kuang Ma

Abstract For a robust three-dimensional video transmission through error prone channels, an efficient multiple description coding for multi-view video based on the correlation of spatial polyphase transformed subsequences (CSPT_MDC_MVC) is proposed in this article. The input multi-view video sequence is first separated into four subsequences by spatial polyphase transform and then grouped into two descriptions. With the correlation of macroblocks in corresponding subsequence positions, these subsequences should not be coded in completely the same way. In each description, one subsequence is directly coded by the Joint Multi-view Video Coding (JMVC) encoder and the other subsequence is classified into four sets. According to the classification, the indirectly coding subsequence selectively employed the prediction mode and the prediction vector of the counter directly coding subsequence, which reduces the bitrate consumption and the coding complexity of multiple description coding for multi-view video. On the decoder side, the gradient-based directional interpolation is employed to improve the side reconstructed quality. The effectiveness and robustness of the proposed algorithm is verified by experiments in the JMVC coding platform.


2009 ◽  
Vol 32 (6) ◽  
pp. 1172-1182 ◽  
Author(s):  
Jing LI ◽  
Wen-Cheng WANG ◽  
En-Hua WU
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document