scholarly journals Consistent Video Style Transfer via Compound Regularization

2020 ◽  
Vol 34 (07) ◽  
pp. 12233-12240
Author(s):  
Wenjing Wang ◽  
Jizheng Xu ◽  
Li Zhang ◽  
Yue Wang ◽  
Jiaying Liu

Recently, neural style transfer has drawn many attentions and significant progresses have been made, especially for image style transfer. However, flexible and consistent style transfer for videos remains a challenging problem. Existing training strategies, either using a significant amount of video data with optical flows or introducing single-frame regularizers, have limited performance on real videos. In this paper, we propose a novel interpretation of temporal consistency, based on which we analyze the drawbacks of existing training strategies; and then derive a new compound regularization. Experimental results show that the proposed regularization can better balance the spatial and temporal performance, which supports our modeling. Combining with the new cost formula, we design a zero-shot video style transfer framework. Moreover, for better feature migration, we introduce a new module to dynamically adjust inter-channel distributions. Quantitative and qualitative results demonstrate the superiority of our method over other state-of-the-art style transfer methods. Our project is publicly available at: https://daooshee.github.io/CompoundVST/.

Author(s):  
Hao Liang ◽  
Shuai Yang ◽  
Wenjing Wang ◽  
Jiaying Liu

Recent researches have made remarkable achievements in fast video style transfer based on western paintings. However, due to the inherent different drawing techniques and aesthetic expressions of Chinese ink wash painting, existing methods either achieve poor temporal consistency or fail to transfer the key freehand brushstroke characteristics of Chinese ink wash painting. In this paper, we present a novel video style transfer framework for Chinese ink wash paintings. The two key ideas are a multi-frame fusion for temporal coherence and an instance-aware style transfer. The frame reordering and stylization based on reference frame fusion are proposed to improve temporal consistency. Meanwhile, the proposed method is able to adaptively leave the white spaces in the background and to select proper scales to extract features and depict the foreground subject by leveraging instance segmentation. Experimental results demonstrate the superiority of the proposed method over state-of-the-art style transfer methods in terms of both temporal coherence and visual quality. Our project website is available at https://oblivioussy.github.io/InkVideo/.


BioResources ◽  
2021 ◽  
Vol 16 (3) ◽  
pp. 5390-5406
Author(s):  
Yiming Fang ◽  
Xianxin Guo ◽  
Kun Chen ◽  
Zhu Zhou ◽  
Qing Ye

Knot detection is a challenging problem for the wood industry. Traditional methodologies depend heavily on the features selected manually and therefore were not always accurate due to the variety of knot appearances. This paper proposes an automated framework for addressing the aforementioned problem by using the state-of-the-art YOLO-v5 (the fifth version of You Only Look Once) detector. The features of surface knots were learned and extracted adaptively, and then the knot defects were identified accurately even though the knots vary in terms of color and texture. The proposed method was compared with YOLO-v3 SPP and Faster R-CNN on two datasets. Experimental results demonstrated that YOLO-v5 model achieved the best performance for detecting surface knot defects. F-Score on Dataset 1 was 91.7% and that of Dataset 2 was up to 97.7%. Moreover, YOLO-v5 has clear advantages in terms of training speed and the size of the weight file. These advantages made YOLO-v5 more suitable for the detection of surface knots on sawn timbers and potential for timber grading.


2021 ◽  
Vol 15 (5) ◽  
pp. 1-21
Author(s):  
Seyed-Vahid Sanei-Mehri ◽  
Apurba Das ◽  
Hooman Hashemi ◽  
Srikanta Tirthapura

Quasi-cliques are dense incomplete subgraphs of a graph that generalize the notion of cliques. Enumerating quasi-cliques from a graph is a robust way to detect densely connected structures with applications in bioinformatics and social network analysis. However, enumerating quasi-cliques in a graph is a challenging problem, even harder than the problem of enumerating cliques. We consider the enumeration of top- k degree-based quasi-cliques and make the following contributions: (1) we show that even the problem of detecting whether a given quasi-clique is maximal (i.e., not contained within another quasi-clique) is NP-hard. (2) We present a novel heuristic algorithm K ernel QC to enumerate the k largest quasi-cliques in a graph. Our method is based on identifying kernels of extremely dense subgraphs within a graph, followed by growing subgraphs around these kernels, to arrive at quasi-cliques with the required densities. (3) Experimental results show that our algorithm accurately enumerates quasi-cliques from a graph, is much faster than current state-of-the-art methods for quasi-clique enumeration (often more than three orders of magnitude faster), and can scale to larger graphs than current methods.


2020 ◽  
Vol 34 (05) ◽  
pp. 8042-8049
Author(s):  
Tomoyuki Kajiwara ◽  
Biwa Miura ◽  
Yuki Arase

We tackle the low-resource problem in style transfer by employing transfer learning that utilizes abundantly available raw corpora. Our method consists of two steps: pre-training learns to generate a semantically equivalent sentence with an input assured grammaticality, and fine-tuning learns to add a desired style. Pre-training has two options, auto-encoding and machine translation based methods. Pre-training based on AutoEncoder is a simple way to learn these from a raw corpus. If machine translators are available, the model can learn more diverse paraphrasing via roundtrip translation. After these, fine-tuning achieves high-quality paraphrase generation even in situations where only 1k sentence pairs of the parallel corpus for style transfer is available. Experimental results of formality style transfer indicated the effectiveness of both pre-training methods and the method based on roundtrip translation achieves state-of-the-art performance.


Author(s):  
Jing Song ◽  
Hong Shen ◽  
Zijing Ou ◽  
Junyi Zhang ◽  
Teng Xiao ◽  
...  

Session-based recommendation is a challenging problem due to the inherent uncertainty of user behavior and the limited historical click information. Latent factors and the complex dependencies within the user’s current session have an important impact on the user's main intention, but the existing methods do not explicitly consider this point. In this paper, we propose a novel model, Interest Shift and Latent Factors Combination Model (ISLF), which can capture the user's main intention by taking into account the user’s interest shift (i.e. long-term and short-term interest) and latent factors simultaneously. In addition, we experimentally give an explicit explanation of this combination in our ISLF. Our experimental results on three benchmark datasets show that our model achieves state-of-the-art performance on all test datasets.


Entropy ◽  
2021 ◽  
Vol 23 (5) ◽  
pp. 615
Author(s):  
Yuanbin Fu ◽  
Jiayi Ma ◽  
Xiaojie Guo

In the context of social media, large amounts of headshot photos are taken everyday. Unfortunately, in addition to laborious editing and modification, creating a visually compelling photographic masterpiece for sharing requires advanced professional skills, which are difficult for ordinary Internet users. Though there are many algorithms automatically and globally transferring the style from one image to another, they fail to respect the semantics of the scene and are unable to allow users to merely transfer the attributes of one or two face organs in the foreground region leaving the background region unchanged. To overcome this problem, we developed a novel framework for semantically meaningful local face attribute transfer, which can flexibly transfer the local attribute of a face organ from the reference image to a semantically equivalent organ in the input image, while preserving the background. Our method involves warping the reference photo to match the shape, pose, location, and expression of the input image. The fusion of the warped reference image and input image is then taken as the initialized image for a neural style transfer algorithm. Our method achieves better performance in terms of inception score (3.81) and Fréchet inception distance (80.31), which is about 10% higher than those of competitors, indicating that our framework is capable of producing high-quality and photorealistic attribute transfer results. Both theoretical findings and experimental results are provided to demonstrate the efficacy of the proposed framework, reveal its superiority over other state-of-the-art alternatives.


2020 ◽  
Vol 2020 (4) ◽  
pp. 116-1-116-7
Author(s):  
Raphael Antonius Frick ◽  
Sascha Zmudzinski ◽  
Martin Steinebach

In recent years, the number of forged videos circulating on the Internet has immensely increased. Software and services to create such forgeries have become more and more accessible to the public. In this regard, the risk of malicious use of forged videos has risen. This work proposes an approach based on the Ghost effect knwon from image forensics for detecting forgeries in videos that can replace faces in video sequences or change the mimic of a face. The experimental results show that the proposed approach is able to identify forgery in high-quality encoded video content.


2020 ◽  
Vol 8 (1) ◽  
pp. 33-41
Author(s):  
Dr. S. Sarika ◽  

Phishing is a malicious and deliberate act of sending counterfeit messages or mimicking a webpage. The goal is either to steal sensitive credentials like login information and credit card details or to install malware on a victim’s machine. Browser-based cyber threats have become one of the biggest concerns in networked architectures. The most prolific form of browser attack is tabnabbing which happens in inactive browser tabs. In a tabnabbing attack, a fake page disguises itself as a genuine page to steal data. This paper presents a multi agent based tabnabbing detection technique. The method detects heuristic changes in a webpage when a tabnabbing attack happens and give a warning to the user. Experimental results show that the method performs better when compared with state of the art tabnabbing detection techniques.


Electronics ◽  
2021 ◽  
Vol 10 (3) ◽  
pp. 325
Author(s):  
Zhihao Wu ◽  
Baopeng Zhang ◽  
Tianchen Zhou ◽  
Yan Li ◽  
Jianping Fan

In this paper, we developed a practical approach for automatic detection of discrimination actions from social images. Firstly, an image set is established, in which various discrimination actions and relations are manually labeled. To the best of our knowledge, this is the first work to create a dataset for discrimination action recognition and relationship identification. Secondly, a practical approach is developed to achieve automatic detection and identification of discrimination actions and relationships from social images. Thirdly, the task of relationship identification is seamlessly integrated with the task of discrimination action recognition into one single network called the Co-operative Visual Translation Embedding++ network (CVTransE++). We also compared our proposed method with numerous state-of-the-art methods, and our experimental results demonstrated that our proposed methods can significantly outperform state-of-the-art approaches.


Author(s):  
My Kieu ◽  
Andrew D. Bagdanov ◽  
Marco Bertini

Pedestrian detection is a canonical problem for safety and security applications, and it remains a challenging problem due to the highly variable lighting conditions in which pedestrians must be detected. This article investigates several domain adaptation approaches to adapt RGB-trained detectors to the thermal domain. Building on our earlier work on domain adaptation for privacy-preserving pedestrian detection, we conducted an extensive experimental evaluation comparing top-down and bottom-up domain adaptation and also propose two new bottom-up domain adaptation strategies. For top-down domain adaptation, we leverage a detector pre-trained on RGB imagery and efficiently adapt it to perform pedestrian detection in the thermal domain. Our bottom-up domain adaptation approaches include two steps: first, training an adapter segment corresponding to initial layers of the RGB-trained detector adapts to the new input distribution; then, we reconnect the adapter segment to the original RGB-trained detector for final adaptation with a top-down loss. To the best of our knowledge, our bottom-up domain adaptation approaches outperform the best-performing single-modality pedestrian detection results on KAIST and outperform the state of the art on FLIR.


Sign in / Sign up

Export Citation Format

Share Document