GCM-Net: Towards Effective Global Context Modeling for Image Inpainting

2021 ◽  
Author(s):  
Huan Zheng ◽  
Zhao Zhang ◽  
Yang Wang ◽  
Zheng Zhang ◽  
Mingliang Xu ◽  
...  
2015 ◽  
Vol 12 (3) ◽  
pp. 961-977 ◽  
Author(s):  
Sinisa Neskovic ◽  
Rade Matic

This paper presents an approach for context modeling in complex self adapted systems consisting of many independent context-aware applications. The contextual information used for adaptation of all system applications is described by an ontology treated as a global context model. A local context model tailored to the specific needs of a particular application is defined as a view over the global context in the form of a feature model. Feature models and their configurations derived from the global context state are then used by a specific dynamic software product line in order to adapt applications at runtime. The main focus of the paper is on the realization of mappings between global and local contexts. The paper describes an overall model architecture and provides corresponding metamodels as well as rules for a mapping between feature models and ontologies.


Author(s):  
Wendong Zhang ◽  
Junwei Zhu ◽  
Ying Tai ◽  
Yunbo Wang ◽  
Wenqing Chu ◽  
...  

Recent advances in image inpainting have shown impressive results for generating plausible visual details on rather simple backgrounds. However, for complex scenes, it is still challenging to restore reasonable contents as the contextual information within the missing regions tends to be ambiguous. To tackle this problem, we introduce pretext tasks that are semantically meaningful to estimating the missing contents. In particular, we perform knowledge distillation on pretext models and adapt the features to image inpainting. The learned semantic priors ought to be partially invariant between the high-level pretext task and low-level image inpainting, which not only help to understand the global context but also provide structural guidance for the restoration of local textures. Based on the semantic priors, we further propose a context-aware image inpainting model, which adaptively integrates global semantics and local features in a unified image generator. The semantic learner and the image generator are trained in an end-to-end manner. We name the model SPL to highlight its ability to learn and leverage semantic priors. It achieves the state of the art on Places2, CelebA, and Paris StreetView datasets


Sensors ◽  
2020 ◽  
Vol 20 (23) ◽  
pp. 6939
Author(s):  
Sheng Yuan ◽  
Yuting Chen ◽  
Huihui Huo ◽  
Li Zhu

Traffic scene construction and simulation has been a hot topic in the community of intelligent transportation systems. In this paper, we propose a novel framework for the analysis and synthesis of traffic elements from road image sequences. The proposed framework is composed of three stages: traffic elements detection, road scene inpainting, and road scene reconstruction. First, a new bidirectional single shot multi-box detector (BiSSD) method is designed with a global context attention mechanism for traffic elements detection. After the detection of traffic elements, an unsupervised CycleGAN is applied to inpaint the occlusion regions with optical flow. The high-quality inpainting images are then obtained by the proposed image inpainting algorithm. Finally, a traffic scene simulation method is developed by integrating the foreground and background elements of traffic scenes. The extensive experiments and comparisons demonstrate the effectiveness of the proposed framework.


Author(s):  
L. Mou ◽  
Y. Hua ◽  
P. Jin ◽  
X. X. Zhu

Abstract. The capability of globally modeling and reasoning about relations between image regions is crucial for complex scene understanding tasks such as semantic segmentation. Most current semantic segmentation methods fall back on deep convolutional neural networks (CNNs), while their use of convolutions with local receptive fields is typically inefficient at capturing long-range dependencies. Recent works on self-attention mechanisms and relational reasoning networks seek to address this issue by learning pairwise relations between each two entities and have showcased promising results. But such approaches have heavy computational and memory overheads, which is computationally infeasible for dense prediction tasks, particularly on large size images, i.e., aerial imagery. In this work, we propose an efficient method for global context modeling in which at each position, a sparse set of features, instead of all features, over the spatial domain are adaptively sampled and aggregated. We further devise a highly efficient instantiation of the proposed method, namely learning RANdom walK samplIng aNd feature aGgregation (RANKING). The proposed module is lightweight and general, which can be used in a plug-and-play fashion with the existing fully convolutional neural network (FCN) framework. To evaluate RANKING-equipped networks, we conduct experiments on two aerial scene parsing datasets, and the networks can achieve competitive results at significant low costs in terms of the computational and memory.


Electronics ◽  
2021 ◽  
Vol 10 (22) ◽  
pp. 2780
Author(s):  
Yue Tao ◽  
Zhiwei Jia ◽  
Runze Ma ◽  
Shugong Xu

Scene text recognition (STR) is an important bridge between images and text, attracting abundant research attention. While convolutional neural networks (CNNS) have achieved remarkable progress in this task, most of the existing works need an extra module (context modeling module) to help CNN to capture global dependencies to solve the inductive bias and strengthen the relationship between text features. Recently, the transformer has been proposed as a promising network for global context modeling by self-attention mechanism, but one of the main short-comings, when applied to recognition, is the efficiency. We propose a 1-D split to address the challenges of complexity and replace the CNN with the transformer encoder to reduce the need for a context modeling module. Furthermore, recent methods use a frozen initial embedding to guide the decoder to decode the features to text, leading to a loss of accuracy. We propose to use a learnable initial embedding learned from the transformer encoder to make it adaptive to different input images. Above all, we introduce a novel architecture for text recognition, named TRansformer-based text recognizer with Initial embedding Guidance (TRIG), composed of three stages (transformation, feature extraction, and prediction). Extensive experiments show that our approach can achieve state-of-the-art on text recognition benchmarks.


Sign in / Sign up

Export Citation Format

Share Document