scholarly journals Arbitrary Talking Face Generation via Attentional Audio-Visual Coherence Learning

Author(s):  
Hao Zhu ◽  
Huaibo Huang ◽  
Yi Li ◽  
Aihua Zheng ◽  
Ran He

Talking face generation aims to synthesize a face video with precise lip synchronization as well as a smooth transition of facial motion over the entire video via the given speech clip and facial image. Most existing methods mainly focus on either disentangling the information in a single image or learning temporal information between frames. However, cross-modality coherence between audio and video information has not been well addressed during synthesis. In this paper, we propose a novel arbitrary talking face generation framework by discovering the audio-visual coherence via the proposed Asymmetric Mutual Information Estimator (AMIE). In addition, we propose a Dynamic Attention (DA) block by selectively focusing the lip area of the input image during the training stage, to further enhance lip synchronization. Experimental results on benchmark LRW dataset and GRID dataset transcend the state-of-the-art methods on prevalent metrics with robust high-resolution synthesizing on gender and pose variations.

Author(s):  
Yang Song ◽  
Jingwen Zhu ◽  
Dawei Li ◽  
Andy Wang ◽  
Hairong Qi

Given an arbitrary face image and an arbitrary speech clip, the proposed work attempts to generate the talking face video with accurate lip synchronization. Existing works either do not consider temporal dependency across video frames thus yielding abrupt facial and lip movement or are limited to the generation of talking face video for a specific person thus lacking generalization capacity. We propose a novel conditional recurrent generation network that incorporates both image and audio features in the recurrent unit for  temporal dependency. To achieve both image- and video-realism, a pair of spatial-temporal discriminators are included in the network for better image/video quality. Since accurate lip synchronization is essential to the success of talking face video generation, we also construct a lip-reading discriminator to boost the accuracy of lip synchronization. We also extend the network to model the natural pose and expression of talking face on the Obama Dataset. Extensive experimental results demonstrate the superiority of our framework over the state-of-the-arts in terms of visual quality, lip sync accuracy, and smooth transition pertaining to both lip and facial movement.


Atmosphere ◽  
2021 ◽  
Vol 12 (10) ◽  
pp. 1266
Author(s):  
Jing Qin ◽  
Liang Chen ◽  
Jian Xu ◽  
Wenqi Ren

In this paper, we propose a novel method to remove haze from a single hazy input image based on the sparse representation. In our method, the sparse representation is proposed to be used as a contextual regularization tool, which can reduce the block artifacts and halos produced by only using dark channel prior without soft matting as the transmission is not always constant in a local patch. A novel way to use dictionary is proposed to smooth an image and generate the sharp dehazed result. Experimental results demonstrate that our proposed method performs favorably against the state-of-the-art dehazing methods and produces high-quality dehazed and vivid color results.


Author(s):  
Jacek Szklarski ◽  
Łukasz Białek ◽  
Andrzej Szałs

We apply a non-classical four-valued logic in the process of reasoning regarding strategies for cops in a modified game of “Cops and Robber” played on a graph. We extend the game by introducing uncertainty in a form of random failures of detecting devices. This is realized by allowing that a robber can be detected in a node only with the given probability PA. Additionally, with the probability PF, cops can be given a false-positive, i.e., they are informed that the robber is located at some node, whereas it is located somewhere else. Consequently, non-zero PFintroduces a measurement noise into the system. All the cops have access to information provided by the detectors and can communicate with each other, so they can coordinate the search. By adjusting the number of detectors, PA, and PFwe can achieve a smooth transition between the two well-known variants of the game: “with fully visible robber” and “with invisible robber”. We compare a simple probabilistic strategy for cops with the non-parametric strategy based on reasoning with a four-valued paraconsistent logic. It is shown that this novel approach leads to a good performance, as measured by the required mean catch-time. We conclude that this type of reasoning can be applied in real-world applications where there is no knowledge about the underlying source of errors which is particularly useful in robotics.


Author(s):  
V. Prasath

A well-posed multiscale regularization scheme for digital image denoisingWe propose an edge adaptive digital image denoising and restoration scheme based on space dependent regularization. Traditional gradient based schemes use an edge map computed from gradients alone to drive the regularization. This may lead to the oversmoothing of the input image, and noise along edges can be amplified. To avoid these drawbacks, we make use of a multiscale descriptor given by a contextual edge detector obtained from local variances. Using a smooth transition from the computed edges, the proposed scheme removes noise in flat regions and preserves edges without oscillations. By incorporating a space dependent adaptive regularization parameter, image smoothing is driven along probable edges and not across them. The well-posedness of the corresponding minimization problem is proved in the space of functions of bounded variation. The corresponding gradient descent scheme is implemented and further numerical results illustrate the advantages of using the adaptive parameter in the regularization scheme. Compared with similar edge preserving regularization schemes, the proposed adaptive weight based scheme provides a better multiscale edge map, which in turn produces better restoration.


2020 ◽  
Vol 3 (1) ◽  
Author(s):  
Elise Laruelle ◽  
Nathalie Spassky ◽  
Auguste Genovesio

Abstract Cell biology relies largely on reproducible visual observations. Unlike cell culture, tissues are heterogeneous, making difficult the collection of biological replicates that would spotlight a precise location. In consequence, there is no standard approach for estimating the statistical significance of an observed pattern in a tissue sample. Here, we introduce SET (for Synthesis of Epithelial Tissue), a method that can accurately reconstruct the cell tessellation formed by an epithelium in a microscopy image as well as thousands of alternative synthetic tessellations made of the exact same cells. SET can build an accurate null distribution to statistically test if any local pattern is necessarily the result of a process, or if it could be explained by chance in the given context. We provide examples in various tissues where visible, and invisible, cell and subcellular patterns are unraveled in a statistically significant manner using a single image and without any parameter settings.


2016 ◽  
Vol 13 (6) ◽  
pp. 172988141666337 ◽  
Author(s):  
Lei He ◽  
Qiulei Dong ◽  
Guanghui Wang

Predicting depth from a single image is an important problem for understanding the 3-D geometry of a scene. Recently, the nonparametric depth sampling (DepthTransfer) has shown great potential in solving this problem, and its two key components are a Scale Invariant Feature Transform (SIFT) flow–based depth warping between the input image and its retrieved similar images and a pixel-wise depth fusion from all warped depth maps. In addition to the inherent heavy computational load in the SIFT flow computation even under a coarse-to-fine scheme, the fusion reliability is also low due to the low discriminativeness of pixel-wise description nature. This article aims at solving these two problems. First, a novel sparse SIFT flow algorithm is proposed to reduce the complexity from subquadratic to sublinear. Then, a reweighting technique is introduced where the variance of the SIFT flow descriptor is computed at every pixel and used for reweighting the data term in the conditional Markov random fields. Our proposed depth transfer method is tested on the Make3D Range Image Data and NYU Depth Dataset V2. It is shown that, with comparable depth estimation accuracy, our method is 2–3 times faster than the DepthTransfer.


2017 ◽  
Vol 31 (19-21) ◽  
pp. 1740037 ◽  
Author(s):  
Xifang Zhu ◽  
Ruxi Xiang ◽  
Feng Wu ◽  
Xiaoyan Jiang

To improve the image quality and compensate deficiencies of haze removal, we presented a novel fusion method. By analyzing the darkness channel of each method, the effective darkness channel model that takes the correlation information of each darkness channel into account was constructed. This method was used to estimate the transmission map of the input image, and refined by the modified guided filter in order to further improve the image quality. Finally, the radiance image was restored by combining the monochrome atmospheric scattering model. Experimental results show that the proposed method not only effectively remove the haze of the image, but also outperform the other haze removal methods.


1994 ◽  
Vol 116 (2) ◽  
pp. 581-586 ◽  
Author(s):  
D. C. H. Yang ◽  
Jui-Jen Chou

This paper presents a general theory on generating a smooth motion profile for the coordinated motion of five-axes CNC/CMM machines. Motion with constat speed is important and required in many manufacturing processes, such as milling, welding, finishing, and painting. In this paper, a piecewise constant speed profile is constructed by a sequence of Hermite curves to form a composite Hermite curve in parametric domain. Given the continuity of acceleration in our proposed speed profile, it generates relatively better product quality than traditional techniques. We also provide a method for the feasibility study of manufacturing capability in terms of the given machine, the desired path, and the assigned speed. We consider machine dynamics, actuator limitation, path geometry, jerk constraints, and motion kinematics.


Sign in / Sign up

Export Citation Format

Share Document