scholarly journals Instance-Aware Coherent Video Style Transfer for Chinese Ink Wash Painting

Author(s):  
Hao Liang ◽  
Shuai Yang ◽  
Wenjing Wang ◽  
Jiaying Liu

Recent researches have made remarkable achievements in fast video style transfer based on western paintings. However, due to the inherent different drawing techniques and aesthetic expressions of Chinese ink wash painting, existing methods either achieve poor temporal consistency or fail to transfer the key freehand brushstroke characteristics of Chinese ink wash painting. In this paper, we present a novel video style transfer framework for Chinese ink wash paintings. The two key ideas are a multi-frame fusion for temporal coherence and an instance-aware style transfer. The frame reordering and stylization based on reference frame fusion are proposed to improve temporal consistency. Meanwhile, the proposed method is able to adaptively leave the white spaces in the background and to select proper scales to extract features and depict the foreground subject by leveraging instance segmentation. Experimental results demonstrate the superiority of the proposed method over state-of-the-art style transfer methods in terms of both temporal coherence and visual quality. Our project website is available at https://oblivioussy.github.io/InkVideo/.

2020 ◽  
Vol 34 (07) ◽  
pp. 12233-12240
Author(s):  
Wenjing Wang ◽  
Jizheng Xu ◽  
Li Zhang ◽  
Yue Wang ◽  
Jiaying Liu

Recently, neural style transfer has drawn many attentions and significant progresses have been made, especially for image style transfer. However, flexible and consistent style transfer for videos remains a challenging problem. Existing training strategies, either using a significant amount of video data with optical flows or introducing single-frame regularizers, have limited performance on real videos. In this paper, we propose a novel interpretation of temporal consistency, based on which we analyze the drawbacks of existing training strategies; and then derive a new compound regularization. Experimental results show that the proposed regularization can better balance the spatial and temporal performance, which supports our modeling. Combining with the new cost formula, we design a zero-shot video style transfer framework. Moreover, for better feature migration, we introduce a new module to dynamically adjust inter-channel distributions. Quantitative and qualitative results demonstrate the superiority of our method over other state-of-the-art style transfer methods. Our project is publicly available at: https://daooshee.github.io/CompoundVST/.


2011 ◽  
Vol 383-390 ◽  
pp. 1605-1610
Author(s):  
Jing Chen ◽  
Can Hui Cai

In this paper, an error concealment algorithm for lost macroblock (MB), named motion consistence and textural coherence based error concealment algorithm (MCTC), is proposed to meet the requirement of video transmission over error-prone channels. A directional predicted motion vector (MV) set is setup by using the motion consistence between MV co-located in reference frame and the neighboring MVs of the lost MB. To find out an optimal MV from this candidate MV set, a textural coherence based boundary matching (TCBM) criterion is proposed. The experiment results show that the MCTC outperforms the state-of-the-art video error concealment methods in both objective and subjective visual quality.


2020 ◽  
Vol 34 (05) ◽  
pp. 8042-8049
Author(s):  
Tomoyuki Kajiwara ◽  
Biwa Miura ◽  
Yuki Arase

We tackle the low-resource problem in style transfer by employing transfer learning that utilizes abundantly available raw corpora. Our method consists of two steps: pre-training learns to generate a semantically equivalent sentence with an input assured grammaticality, and fine-tuning learns to add a desired style. Pre-training has two options, auto-encoding and machine translation based methods. Pre-training based on AutoEncoder is a simple way to learn these from a raw corpus. If machine translators are available, the model can learn more diverse paraphrasing via roundtrip translation. After these, fine-tuning achieves high-quality paraphrase generation even in situations where only 1k sentence pairs of the parallel corpus for style transfer is available. Experimental results of formality style transfer indicated the effectiveness of both pre-training methods and the method based on roundtrip translation achieves state-of-the-art performance.


Author(s):  
Kangle Deng ◽  
Tianyi Fei ◽  
Xin Huang ◽  
Yuxin Peng

Automatically generating videos according to the given text is a highly challenging task, where visual quality and semantic consistency with captions are two critical issues. In existing methods, when generating a specific frame, the information in those frames generated before is not fully exploited. And an effective way to measure the semantic accordance between videos and captions remains to be established. To address these issues, we present a novel Introspective Recurrent Convolutional GAN (IRC-GAN) approach. First, we propose a recurrent transconvolutional generator, where LSTM cells are integrated with 2D transconvolutional layers. As 2D transconvolutional layers put more emphasis on the details of each frame than 3D ones, our generator takes both the definition of each video frame and temporal coherence across the whole video into consideration, and thus can generate videos with better visual quality. Second, we propose mutual information introspection to semantically align the generated videos to text. Unlike other methods simply judging whether the video and the text match or not, we further take mutual information to concretely measure the semantic consistency. In this way,  our model is able to introspect the semantic distance between the generated video and the corresponding text, and try to minimize it to boost the semantic consistency.We conduct experiments on 3 datasets and compare with state-of-the-art methods. Experimental results demonstrate the effectiveness of our IRC-GAN to generate plausible videos from given text.


Sensors ◽  
2020 ◽  
Vol 20 (4) ◽  
pp. 1010 ◽  
Author(s):  
Yiqing Zhang ◽  
Jun Chu ◽  
Lu Leng ◽  
Jun Miao

With the rapid development of flexible vision sensors and visual sensor networks, computer vision tasks, such as object detection and tracking, are entering a new phase. Accordingly, the more challenging comprehensive task, including instance segmentation, can develop rapidly. Most state-of-the-art network frameworks, for instance, segmentation, are based on Mask R-CNN (mask region-convolutional neural network). However, the experimental results confirm that Mask R-CNN does not always successfully predict instance details. The scale-invariant fully convolutional network structure of Mask R-CNN ignores the difference in spatial information between receptive fields of different sizes. A large-scale receptive field focuses more on detailed information, whereas a small-scale receptive field focuses more on semantic information. So the network cannot consider the relationship between the pixels at the object edge, and these pixels will be misclassified. To overcome this problem, Mask-Refined R-CNN (MR R-CNN) is proposed, in which the stride of ROIAlign (region of interest align) is adjusted. In addition, the original fully convolutional layer is replaced with a new semantic segmentation layer that realizes feature fusion by constructing a feature pyramid network and summing the forward and backward transmissions of feature maps of the same resolution. The segmentation accuracy is substantially improved by combining the feature layers that focus on the global and detailed information. The experimental results on the COCO (Common Objects in Context) and Cityscapes datasets demonstrate that the segmentation accuracy of MR R-CNN is about 2% higher than that of Mask R-CNN using the same backbone. The average precision of large instances reaches 56.6%, which is higher than those of all state-of-the-art methods. In addition, the proposed method requires low time cost and is easily implemented. The experiments on the Cityscapes dataset also prove that the proposed method has great generalization ability.


Entropy ◽  
2021 ◽  
Vol 23 (5) ◽  
pp. 615
Author(s):  
Yuanbin Fu ◽  
Jiayi Ma ◽  
Xiaojie Guo

In the context of social media, large amounts of headshot photos are taken everyday. Unfortunately, in addition to laborious editing and modification, creating a visually compelling photographic masterpiece for sharing requires advanced professional skills, which are difficult for ordinary Internet users. Though there are many algorithms automatically and globally transferring the style from one image to another, they fail to respect the semantics of the scene and are unable to allow users to merely transfer the attributes of one or two face organs in the foreground region leaving the background region unchanged. To overcome this problem, we developed a novel framework for semantically meaningful local face attribute transfer, which can flexibly transfer the local attribute of a face organ from the reference image to a semantically equivalent organ in the input image, while preserving the background. Our method involves warping the reference photo to match the shape, pose, location, and expression of the input image. The fusion of the warped reference image and input image is then taken as the initialized image for a neural style transfer algorithm. Our method achieves better performance in terms of inception score (3.81) and Fréchet inception distance (80.31), which is about 10% higher than those of competitors, indicating that our framework is capable of producing high-quality and photorealistic attribute transfer results. Both theoretical findings and experimental results are provided to demonstrate the efficacy of the proposed framework, reveal its superiority over other state-of-the-art alternatives.


2020 ◽  
Vol 2020 (4) ◽  
pp. 76-1-76-7
Author(s):  
Swaroop Shankar Prasad ◽  
Ofer Hadar ◽  
Ilia Polian

Image steganography can have legitimate uses, for example, augmenting an image with a watermark for copyright reasons, but can also be utilized for malicious purposes. We investigate the detection of malicious steganography using neural networkbased classification when images are transmitted through a noisy channel. Noise makes detection harder because the classifier must not only detect perturbations in the image but also decide whether they are due to the malicious steganographic modifications or due to natural noise. Our results show that reliable detection is possible even for state-of-the-art steganographic algorithms that insert stego bits not affecting an image’s visual quality. The detection accuracy is high (above 85%) if the payload, or the amount of the steganographic content in an image, exceeds a certain threshold. At the same time, noise critically affects the steganographic information being transmitted, both through desynchronization (destruction of information which bits of the image contain steganographic information) and by flipping these bits themselves. This will force the adversary to use a redundant encoding with a substantial number of error-correction bits for reliable transmission, making detection feasible even for small payloads.


2020 ◽  
Vol 8 (1) ◽  
pp. 33-41
Author(s):  
Dr. S. Sarika ◽  

Phishing is a malicious and deliberate act of sending counterfeit messages or mimicking a webpage. The goal is either to steal sensitive credentials like login information and credit card details or to install malware on a victim’s machine. Browser-based cyber threats have become one of the biggest concerns in networked architectures. The most prolific form of browser attack is tabnabbing which happens in inactive browser tabs. In a tabnabbing attack, a fake page disguises itself as a genuine page to steal data. This paper presents a multi agent based tabnabbing detection technique. The method detects heuristic changes in a webpage when a tabnabbing attack happens and give a warning to the user. Experimental results show that the method performs better when compared with state of the art tabnabbing detection techniques.


Electronics ◽  
2021 ◽  
Vol 10 (3) ◽  
pp. 325
Author(s):  
Zhihao Wu ◽  
Baopeng Zhang ◽  
Tianchen Zhou ◽  
Yan Li ◽  
Jianping Fan

In this paper, we developed a practical approach for automatic detection of discrimination actions from social images. Firstly, an image set is established, in which various discrimination actions and relations are manually labeled. To the best of our knowledge, this is the first work to create a dataset for discrimination action recognition and relationship identification. Secondly, a practical approach is developed to achieve automatic detection and identification of discrimination actions and relationships from social images. Thirdly, the task of relationship identification is seamlessly integrated with the task of discrimination action recognition into one single network called the Co-operative Visual Translation Embedding++ network (CVTransE++). We also compared our proposed method with numerous state-of-the-art methods, and our experimental results demonstrated that our proposed methods can significantly outperform state-of-the-art approaches.


Author(s):  
Wenchao Du ◽  
Hu Chen ◽  
Hongyu Yang ◽  
Yi Zhang

AbstractGenerative adversarial network (GAN) has been applied for low-dose CT images to predict normal-dose CT images. However, the undesired artifacts and details bring uncertainty to the clinical diagnosis. In order to improve the visual quality while suppressing the noise, in this paper, we mainly studied the two key components of deep learning based low-dose CT (LDCT) restoration models—network architecture and adversarial loss, and proposed a disentangled noise suppression method based on GAN (DNSGAN) for LDCT. Specifically, a generator network, which contains the noise suppression and structure recovery modules, is proposed. Furthermore, a multi-scaled relativistic adversarial loss is introduced to preserve the finer structures of generated images. Experiments on simulated and real LDCT datasets show that the proposed method can effectively remove noise while recovering finer details and provide better visual perception than other state-of-the-art methods.


Sign in / Sign up

Export Citation Format

Share Document