scholarly journals Learning to Hallucinate Face Images via Component Generation and Enhancement

Author(s):  
Yibing Song ◽  
Jiawei Zhang ◽  
Shengfeng He ◽  
Linchao Bao ◽  
Qingxiong Yang

We propose a two-stage method for face hallucination. First, we generate facial components of the input image using CNNs. These components represent the basic facial structures. Second, we synthesize fine-grained facial structures from high resolution training images. The details of these structures are transferred into facial components for enhancement. Therefore, we generate facial components to approximate ground truth global appearance in the first stage and enhance them through recovering details in the second stage. The experiments demonstrate that our method performs favorably against state-of-the-art methods.

2020 ◽  
Vol 34 (05) ◽  
pp. 8600-8607
Author(s):  
Haiyun Peng ◽  
Lu Xu ◽  
Lidong Bing ◽  
Fei Huang ◽  
Wei Lu ◽  
...  

Target-based sentiment analysis or aspect-based sentiment analysis (ABSA) refers to addressing various sentiment analysis tasks at a fine-grained level, which includes but is not limited to aspect extraction, aspect sentiment classification, and opinion extraction. There exist many solvers of the above individual subtasks or a combination of two subtasks, and they can work together to tell a complete story, i.e. the discussed aspect, the sentiment on it, and the cause of the sentiment. However, no previous ABSA research tried to provide a complete solution in one shot. In this paper, we introduce a new subtask under ABSA, named aspect sentiment triplet extraction (ASTE). Particularly, a solver of this task needs to extract triplets (What, How, Why) from the inputs, which show WHAT the targeted aspects are, HOW their sentiment polarities are and WHY they have such polarities (i.e. opinion reasons). For instance, one triplet from “Waiters are very friendly and the pasta is simply average” could be (‘Waiters’, positive, ‘friendly’). We propose a two-stage framework to address this task. The first stage predicts what, how and why in a unified model, and then the second stage pairs up the predicted what (how) and why from the first stage to output triplets. In the experiments, our framework has set a benchmark performance in this novel triplet extraction task. Meanwhile, it outperforms a few strong baselines adapted from state-of-the-art related methods.


2021 ◽  
Author(s):  
Samir Zamarialai ◽  
Thijs Perenboom ◽  
Amanda Kruijver ◽  
Zenglin Shi ◽  
Bernard Foing

<p>Remote sensing (RS) imagery, generated by e.g. cameras on satellites, airplanes and drones, has been used for a variety of applications such as environmental monitoring, detection of craters, monitoring temporal changes on planetary surfaces.</p><p>In recent years, researchers started applying Computer Vision [TP1] methods on RS data. This led to a steady development of remote sensing classification, providing good results on classification and segmentation tasks on RS data.  However, there are still problems with current approaches. Firstly, the main focus is on high-resolution RS imagery. Apart from the fact that these data are not accessible to everyone, the models fail to generalize on lower resolution data. Secondly, the models fail to generalize on more fine-grained classes. For example, models tend to generalize very well on detecting buildings in general, however they fail to distinguish if a building belongs to a fine-grained subclass like residential or commercial buildings. Fine-grained classes often appear very similar to each other, therefore, models have problems to distinguish between them. This problem occurs both in high-resolution and low-resolution RS imagery, however the drop in accuracy is much more significant when using lower resolution data.</p><p>For these reasons, we propose a Multi-Task Convolutional Neural Network (CNN) with three objective functions for segmentation of RS imagery. This model should be able to generalize on different resolutions and receive better accuracy than state-of the-art approaches, especially on fine-grained classes.</p><p>The model consists of two main components. The first component is a CNN that transforms the input image to a segmentation map. This module is optimized with a pixel-wise Cross-Entropy loss function between the segmentation map of the model and the ground truth annotations. If the input image is of lower resolution, this segmentation map will miss out on the complete structure of input images. The second component is another CNN to build a high-resolution image from the low-resolution input image in order to reconstruct fine-grained structure information. This module essentially guides the model to learn more fine-grained feature representations. The transformed image from this module will have much more details like sharper edges and better color. The second CNN module is optimized with a Mean-Squared-Error loss function between the original high-resolution image and the transformed image. Finally, the two images created by the model are then evaluated by a third objective function that aims to learn the distance of similarity between the segmented input image and the super-high resolution segmentation. The final objective function consists of a sum of the three objectives mentioned above. After the model is finished with training, the second module should be detached, meaning high-resolution imagery is only needed during the training phase.</p><p>At the moment we are implementing the model. Afterwards, we will benchmark the model against current state of the art approaches. The status will be presented at EGU 2021.­</p>


Author(s):  
Guoqing Zhang ◽  
Yuhao Chen ◽  
Weisi Lin ◽  
Arun Chandran ◽  
Xuan Jing

As a prevailing task in video surveillance and forensics field, person re-identification (re-ID) aims to match person images captured from non-overlapped cameras. In unconstrained scenarios, person images often suffer from the resolution mismatch problem, i.e., Cross-Resolution Person Re-ID. To overcome this problem, most existing methods restore low resolution (LR) images to high resolution (HR) by super-resolution (SR). However, they only focus on the HR feature extraction and ignore the valid information from original LR images. In this work, we explore the influence of resolutions on feature extraction and develop a novel method for cross-resolution person re-ID called Multi-Resolution Representations Joint Learning (MRJL). Our method consists of a Resolution Reconstruction Network (RRN) and a Dual Feature Fusion Network (DFFN). The RRN uses an input image to construct a HR version and a LR version with an encoder and two decoders, while the DFFN adopts a dual-branch structure to generate person representations from multi-resolution images. Comprehensive experiments on five benchmarks verify the superiority of the proposed MRJL over the relevent state-of-the-art methods.


2020 ◽  
Vol 10 (2) ◽  
pp. 718 ◽  
Author(s):  
K. Lakshminarayanan ◽  
R. Santhana Krishnan ◽  
E. Golden Julie ◽  
Y. Harold Robinson ◽  
Raghvendra Kumar ◽  
...  

This paper proposed and verified a new integrated approach based on the iterative super-resolution algorithm and expectation-maximization for face hallucination, which is a process of converting a low-resolution face image to a high-resolution image. The current sparse representation for super resolving generic image patches is not suitable for global face images due to its lower accuracy and time-consumption. To solve this, in the new method, training global face sparse representation was used to reconstruct images with misalignment variations after the local geometric co-occurrence matrix. In the testing phase, we proposed a hybrid method, which is a combination of the sparse global representation and the local linear regression using the Expectation Maximization (EM) algorithm. Therefore, this work recovered the high-resolution image of a corresponding low-resolution image. Experimental validation suggested improvement of the overall accuracy of the proposed method with fast identification of high-resolution face images without misalignment.


Sensors ◽  
2020 ◽  
Vol 20 (14) ◽  
pp. 3908
Author(s):  
Kuo-Liang Chung ◽  
Tzu-Hsien Chan ◽  
Szu-Ni Chen

As the color filter array (CFA)2.0, the RGBW CFA pattern, in which each CFA pixel contains only one R, G, B, or W color value, provides more luminance information than the Bayer CFA pattern. Demosaicking RGBW CFA images I R G B W is necessary in order to provide high-quality RGB full-color images as the target images for human perception. In this letter, we propose a three-stage demosaicking method for I R G B W . In the first-stage, a cross shape-based color difference approach is proposed in order to interpolate the missing W color pixels in the W color plane of I R G B W . In the second stage, an iterative error compensation-based demosaicking process is proposed to improve the quality of the demosaiced RGB full-color image. In the third stage, taking the input image I R G B W as the ground truth RGBW CFA image, an I R G B W -based refinement process is proposed to refine the quality of the demosaiced image obtained by the second stage. Based on the testing RGBW images that were collected from the Kodak and IMAX datasets, the comprehensive experimental results illustrated that the proposed three-stage demosaicking method achieves substantial quality and perceptual effect improvement relative to the previous method by Hamilton and Compton and the two state-of-the-art methods, Kwan et al.’s pansharpening-based method, and Kwan and Chou’s deep learning-based method.


2020 ◽  
Vol 861 ◽  
pp. 327-333
Author(s):  
Teow Hsien Loong ◽  
Se Yong Eh Noum ◽  
Wong Wai Mun

It is estimated that 130 million people will suffer from osteoarthritis by 2050 which require patient to undergo a surgical procedure known as total hip replacement which has lifespan of 20 years and failure rates of ~1%. This research would highlight the effects of doping Niobium Oxide (Nb2O5) between 0 vol % to 0.8 vol % into Zirconia-Toughened Alumina (ZTA) composites which is the main biomaterials used to manufacture total hip arthroplasty. The samples were sintered using two-stage sintering (TSS) between 1400°C and 1550°C for first-stage sintering temperature at heating rate of 20°C/min. At second stage, the samples were sintered at 1350°C and hold for 12 hours. It was found that TSS combined with addition of Nb2O5 as dopants were beneficial in producing fine-grained ZTA composites with improved mechanical properties compared to undoped ZTA composites produced via TSS. Compared to undoped ZTA composites, samples doped with Nb2O5 and sintered at T1 ≥1400°C were fully densed (>98%), achieved Vickers hardness more than 20 GPa and Young’s modulus higher than 410 GPa and at the same time fracture toughness of more than 8 MPam1/2. Based on the findings, production of ZTA composites with enhanced mechanical properties with longer lifespan is possible which is beneficial in ensuring the well-being of osteoarthritis patients.


Author(s):  
Xiangteng He ◽  
Yuxin Peng ◽  
Junjie Zhao

Fine-grained visual categorization (FGVC) is the discrimination of similar subcategories, whose main challenge is to localize the quite subtle visual distinctions between similar subcategories. There are two pivotal problems: discovering which region is discriminative and representative, and determining how many discriminative regions are necessary to achieve the best performance. Existing methods generally solve these two problems relying on the prior knowledge or experimental validation, which extremely restricts the usability and scalability of FGVC. To address the "which" and "how many" problems adaptively and intelligently, this paper proposes a stacked deep reinforcement learning approach (StackDRL). It adopts a two-stage learning architecture, which is driven by the semantic reward function. Two-stage learning localizes the object and its parts in sequence ("which"), and determines the number of discriminative regions adaptively ("how many"), which is quite appealing in FGVC. Semantic reward function drives StackDRL to fully learn the discriminative and conceptual visual information, via jointly combining the attention-based reward and category-based reward. Furthermore, unsupervised discriminative localization avoids the heavy labor consumption of labeling, and extremely strengthens the usability and scalability of our StackDRL approach. Comparing with ten state-of-the-art methods on CUB-200-2011 dataset, our StackDRL approach achieves the best categorization accuracy.


Author(s):  
Hui Ying ◽  
Zhaojin Huang ◽  
Shu Liu ◽  
Tianjia Shao ◽  
Kun Zhou

Current instance segmentation methods can be categorized into segmentation-based methods and proposal-based methods. The former performs segmentation first and then does clustering, while the latter detects objects first and then predicts the mask for each object proposal. In this work, we propose a single-stage method, named EmbedMask, that unifies both methods by taking their advantages, so it can achieve good performance in instance segmentation and produce high-resolution masks in a high speed. EmbedMask introduces two newly defined embeddings for mask prediction, which are pixel embedding and proposal embedding. During training, we enforce the pixel embedding to be close to its coupled proposal embedding if they belong to the same instance. During inference, pixels are assigned to the mask of the proposal if their embeddings are similar. This mechanism brings several benefits. First, the pixel-level clustering enables EmbedMask to generate high-resolution masks and avoids the complicated two-stage mask prediction. Second, the existence of proposal embedding simplifies and strengthens the clustering procedure, so our method can achieve high speed and better performance than segmentation-based methods. Without any bell or whistle, EmbedMask outperforms the state-of-the-art instance segmentation method Mask R-CNN on the challenging COCO dataset, obtaining more detailed masks at a higher speed.


Author(s):  
HUANXI LIU ◽  
TIANHONG ZHU

Face hallucination is to synthesize high-resolution face image from the input low-resolution one. Although many two-step learning-based face hallucination approaches have been developed, they suffer from the expensive computational cost due to the separate calculation of the global and local models. To overcome this problem, we propose a correlative two-step learning-based face hallucination approach which bridges the gap between the global model and the local model. In the global phase, we build a global face hallucination framework by combining the steerable pyramid decomposition and the reconstruction. In the residue compensation phase, based on the combination weights and constituent samples obtained in the global phase, a residue face image is synthesized by the neighbor reconstruction algorithm to compensate the hallucinated global face image with subtle facial features. The ultimate hallucinated result is synthesized by adding the residue face image to the global face image. Compared with existing methods, in the global phase, our global face image is more similar to the original high-resolution face image. Furthermore, in the residue compensation phase, we use the combination weights and constituent samples obtained in the global phase to compute the residue face image, by which the computational efficiency can be greatly improved without compromising the quality of facial details. The experimental results and comparisons demonstrate that our approach can not only generate convincible high-resolution face images efficiently, but also has high computational efficiency. Furthermore, our proposed approach can be used to restore the damaged face images in image inpainting. The efficacy of our approach is validated by recovering the damaged face images with visually good results.


Author(s):  
Minghuan Tan ◽  
Jing Jiang ◽  
Bing Tian Dai

In Chinese, Chengyu are fixed phrases consisting of four characters. As a type of idioms, their meanings usually cannot be derived from their component characters. In this article, we study the task of recommending a Chengyu given a textual context. Observing some of the limitations with existing work, we propose a two-stage model, where during the first stage we re-train a Chinese BERT model by masking out Chengyu from a large Chinese corpus with a wide coverage of Chengyu. During the second stage, we fine-tune the re-trained, Chengyu-oriented BERT on a specific Chengyu recommendation dataset. We evaluate this method on ChID and CCT datasets and find that it can achieve the state of the art on both datasets. Ablation studies show that both stages of training are critical for the performance gain.


Sign in / Sign up

Export Citation Format

Share Document