Structural Similarity Loss for Learning to Fuse Multi-Focus Images

Convolutional neural networks have recently been used for multi-focus image fusion. However, some existing methods have resorted to adding Gaussian blur to focused images, to simulate defocus, thereby generating data (with ground-truth) for supervised learning. Moreover, they classify pixels as ‘focused’ or ‘defocused’, and use the classified results to construct the fusion weight maps. This then necessitates a series of post-processing steps. In this paper, we present an end-to-end learning approach for directly predicting the fully focused output image from multi-focus input image pairs. The suggested approach uses a CNN architecture trained to perform fusion, without the need for ground truth fused images. The CNN exploits the image structural similarity (SSIM) to calculate the loss, a metric that is widely accepted for fused image quality evaluation. What is more, we also use the standard deviation of a local window of the image to automatically estimate the importance of the source images in the final fused image when designing the loss function. Our network can accept images of variable sizes and hence, we are able to utilize real benchmark datasets, instead of simulated ones, to train our network. The model is a feed-forward, fully convolutional neural network that can process images of variable sizes during test time. Extensive evaluation on benchmark datasets show that our method outperforms, or is comparable with, existing state-of-the-art techniques on both objective and subjective benchmarks.

Download Full-text

NSCT and focus measure optimization based multi-focus image fusion

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-202803 ◽

2021 ◽

pp. 1-13

Author(s):

N. Aishwarya ◽

C. BennilaThangammal ◽

N.G. Praveena

Keyword(s):

Simulation Analysis ◽

Structural Information ◽

Research Area ◽

Processing Technique ◽

The Novel ◽

Phase Congruency ◽

Focus Measure ◽

Fused Image ◽

Focus Image ◽

Image Pairs

Getting a complete description of scene with all the relevant objects in focus is a hot research area in surveillance, medicine and machine vision applications. In this work, transform based fusion method called as NSCT-FMO, is introduced to integrate the image pairs having different focus features. The NSCT-FMO approach basically contains four steps. Initially, the NSCT is applied on the input images to acquire the approximation and detailed structural information. Then, the approximation sub band coefficients are merged by employing the novel Focus Measure Optimization (FMO) approach. Next, the detailed sub-images are combined using Phase Congruency (PC). Finally, an inverse NSCT operation is conducted on synthesized sub images to obtain the initial synthesized image. To optimize the initial fused image, an initial decision map is first constructed and morphological post-processing technique is applied to get the final map. With the help of resultant map, the final synthesized output is produced by the selection of focused pixels from input images. Simulation analysis show that the NSCT-FMO approach achieves fair results as compared to traditional MST based methods both in qualitative and quantitative assessments.

Download Full-text

Pre-Processing Filter Reflecting Human Visual Perception to Improve Saliency Detection Performance

Electronics ◽

10.3390/electronics10232892 ◽

2021 ◽

Vol 10 (23) ◽

pp. 2892

Author(s):

Kyungjun Lee ◽

Seungwoo Wee ◽

Jechang Jeong

Keyword(s):

Saliency Detection ◽

Visual Saliency ◽

Ground Truth ◽

Bilateral Filter ◽

Input Image ◽

Human Visual Perception ◽

Difference Of Gaussians ◽

Surrounding Environment ◽

Benchmark Datasets ◽

Previous State

Salient object detection is a method of finding an object within an image that a person determines to be important and is expected to focus on. Various features are used to compute the visual saliency, and in general, the color and luminance of the scene are widely used among the spatial features. However, humans perceive the same color and luminance differently depending on the influence of the surrounding environment. As the human visual system (HVS) operates through a very complex mechanism, both neurobiological and psychological aspects must be considered for the accurate detection of salient objects. To reflect this characteristic in the saliency detection process, we have proposed two pre-processing methods to apply to the input image. First, we applied a bilateral filter to improve the segmentation results by smoothing the image so that only the overall context of the image remains while preserving the important borders of the image. Second, although the amount of light is the same, it can be perceived with a difference in the brightness owing to the influence of the surrounding environment. Therefore, we applied oriented difference-of-Gaussians (ODOG) and locally normalized ODOG (LODOG) filters that adjust the input image by predicting the brightness as perceived by humans. Experiments on five public benchmark datasets for which ground truth exists show that our proposed method further improves the performance of previous state-of-the-art methods.

Download Full-text

R²MRF: Defocus Blur Detection via Recurrently Refining Multi-Scale Residual Features

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6884 ◽

2020 ◽

Vol 34 (07) ◽

pp. 12063-12070

Author(s):

Chang Tang ◽

Xinwang Liu ◽

Xinzhong Zhu ◽

En Zhu ◽

Kun Sun ◽

...

Keyword(s):

Semantic Information ◽

Ground Truth ◽

Input Image ◽

Convolutional Network ◽

Multi Scale ◽

Blur Detection ◽

Defocus Blur ◽

Potential Applications ◽

Benchmark Datasets ◽

Background Clutter

Defocus blur detection aims to separate the in-focus and out-of-focus regions in an image. Although attracting more and more attention due to its remarkable potential applications, there are still several challenges for accurate defocus blur detection, such as the interference of background clutter, sensitivity to scales and missing boundary details of defocus blur regions. In order to address these issues, we propose a deep neural network which Recurrently Refines Multi-scale Residual Features (R2MRF) for defocus blur detection. We firstly extract multi-scale deep features by utilizing a fully convolutional network. For each layer, we design a novel recurrent residual refinement branch embedded with multiple residual refinement modules (RRMs) to more accurately detect blur regions from the input image. Considering that the features from bottom layers are able to capture rich low-level features for details preservation while the features from top layers are capable of characterizing the semantic information for locating blur regions, we aggregate the deep features from different layers to learn the residual between the intermediate prediction and the ground truth for each recurrent step in each residual refinement branch. Since the defocus degree is sensitive to image scales, we finally fuse the side output of each branch to obtain the final blur detection map. We evaluate the proposed network on two commonly used defocus blur detection benchmark datasets by comparing it with other 11 state-of-the-art methods. Extensive experimental results with ablation studies demonstrate that R2MRF consistently and significantly outperforms the competitors in terms of both efficiency and accuracy.

Download Full-text

Interactive Removal of Microphone Object in Facial Images

Electronics ◽

10.3390/electronics8101115 ◽

2019 ◽

Vol 8 (10) ◽

pp. 1115 ◽

Cited By ~ 4

Author(s):

Muhammad Kamran Javed Khan ◽

Nizam Ud Din ◽

Seho Bae ◽

Juneho Yi

Keyword(s):

Loss Function ◽

Ground Truth ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Object Removal ◽

Extensive Evaluation ◽

Specific Object ◽

Image Pairs ◽

Two Stages ◽

Facial Images

Removing a specific object from an image and replacing the hole left behind with visually plausible backgrounds is a very intriguing task. While recent deep learning based object removal methods have shown promising results on this task for some structured scenes, none of them have addressed the problem of object removal in facial images. The objective of this work is to remove microphone object in facial images and fill hole with correct facial semantics and fine details. To make our solution practically useful, we present an interactive method called MRGAN, where the user roughly provides the microphone region. For filling the hole, we employ a Generative Adversarial Network based image-to-image translation approach. We break the problem into two stages: inpainter and refiner. The inpainter estimates coarse prediction by roughly filling in the microphone region followed by the refiner which produces fine details under the microphone region. We unite perceptual loss, reconstruction loss and adversarial loss as joint loss function for generating a realistic face and similar structure to the ground truth. Because facial image pairs with or without microphone do not exist, we have trained our method on a synthetically generated microphone dataset from CelebA face images and evaluated on real world microphone images. Our extensive evaluation shows that MRGAN performs better than state-of-the-art image manipulation methods on real microphone images although we only train our method using the synthetic dataset created. Additionally, we provide ablation studies for the integrated loss function and for different network arrangements.

Download Full-text

Multi-focus Image Fusion based on Genetic Algorithm using Local Features

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2021/1241022021 ◽

2021 ◽

Vol 10 (2) ◽

pp. 1362-1372

Keyword(s):

Genetic Algorithm ◽

Image Segmentation ◽

Image Fusion ◽

Local Features ◽

Input Image ◽

Laplacian Operator ◽

Optimal Weights ◽

Fused Image ◽

Fusion Methods ◽

Focus Image

Recent developments in the domain of information technology have made it possible to extract a knowledge of ocean from input images. The knowledge extraction can be performed using a number of operations such as image segmentation. The major objective of image segmentation is to segment focused and non-focused regions from an input image. The field depth of optical lenses is limited. A camera focuses only on those objects which lie in its field depth, rest of the objects are appeared as non-focused or blurry. For image processing, it is a general requirement that an input image must be all in focus image. In almost each domain such as medical imaging, weapon and aircraft detection, digital photography, and agriculture imaging, it is required to have an all-in focused input image. Image fusion is a process which combines two or more input images to create an all in focused complimentary fused image. Image fusion is considered as a challenging task due to irregular boundaries of focused and non-focused regions. In literature, multiple studies have addressed this issue, however they have reported promising results in creating a fully focused fused image. Moreover, they have considered different features to identify focused and non-focused regions from an input image. For better estimation of focused and non-focused regions,an ensemble of multiple features such as shape and texture-based features can be employed. Furthermore, it is required to obtain optimal weights which are to be assigned to each feature for creating a fused image. The focus of this study is to perform a multi-focus image fusion using an ensemble of multiple local features by weight optimization using a genetic algorithm. To perform this experimentation, nine multi-focus image datasets are collected where each dataset indicates an image pair of multi-focused images. The reason of this selection is two-fold, as they are publicly available, and it contain different types of multi-focus images. For reconstruction of a fully focused fused image, an ensemble of different shape and texture-based features such as Sobel Operator, Laplacian Operator and Local Binary Pattern is employed along with optimal weights obtained using a Genetic Algorithm. The experimental results have indicated improvement over previous fusion methods

Download Full-text

R³Net: Recurrent Residual Refinement Network for Saliency Detection

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/95 ◽

2018 ◽

Cited By ~ 72

Author(s):

Zijun Deng ◽

Xiaowei Hu ◽

Lei Zhu ◽

Xuemiao Xu ◽

Jing Qin ◽

...

Keyword(s):

Saliency Detection ◽

Ground Truth ◽

Input Image ◽

Convolutional Network ◽

Low Level ◽

Saliency Maps ◽

Saliency Prediction ◽

Benchmark Datasets ◽

Salient Regions ◽

High Level

Saliency detection is a fundamental yet challenging task in computer vision, aiming at highlighting the most visually distinctive objects in an image. We propose a novel recurrent residual refinement network (R^3Net) equipped with residual refinement blocks (RRBs) to more accurately detect salient regions of an input image. Our RRBs learn the residual between the intermediate saliency prediction and the ground truth by alternatively leveraging the low-level integrated features and the high-level integrated features of a fully convolutional network (FCN). While the low-level integrated features are capable of capturing more saliency details, the high-level integrated features can reduce non-salient regions in the intermediate prediction. Furthermore, the RRBs can obtain complementary saliency information of the intermediate prediction, and add the residual into the intermediate prediction to refine the saliency maps. We evaluate the proposed R^3Net on five widely-used saliency detection benchmarks by comparing it with 16 state-of-the-art saliency detectors. Experimental results show that our network outperforms our competitors in all the benchmark datasets.

Download Full-text

RobotP: A Benchmark Dataset for 6D Object Pose Estimation

Sensors ◽

10.3390/s21041299 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1299

Author(s):

Honglin Yuan ◽

Tim Hoogenkamp ◽

Remco C. Veltkamp

Keyword(s):

Pose Estimation ◽

Ground Truth ◽

3D Models ◽

Depth Image ◽

Great Success ◽

Estimation Algorithms ◽

Depth Images ◽

Object Pose Estimation ◽

Image Pairs ◽

Bounding Boxes

Deep learning has achieved great success on robotic vision tasks. However, when compared with other vision-based tasks, it is difficult to collect a representative and sufficiently large training set for six-dimensional (6D) object pose estimation, due to the inherent difficulty of data collection. In this paper, we propose the RobotP dataset consisting of commonly used objects for benchmarking in 6D object pose estimation. To create the dataset, we apply a 3D reconstruction pipeline to produce high-quality depth images, ground truth poses, and 3D models for well-selected objects. Subsequently, based on the generated data, we produce object segmentation masks and two-dimensional (2D) bounding boxes automatically. To further enrich the data, we synthesize a large number of photo-realistic color-and-depth image pairs with ground truth 6D poses. Our dataset is freely distributed to research groups by the Shape Retrieval Challenge benchmark on 6D pose estimation. Based on our benchmark, different learning-based approaches are trained and tested by the unified dataset. The evaluation results indicate that there is considerable room for improvement in 6D object pose estimation, particularly for objects with dark colors, and photo-realistic images are helpful in increasing the performance of pose estimation algorithms.

Download Full-text

Fused Image Quality Measure Based on Structural Similarity

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.255-260.2072 ◽

2011 ◽

Vol 255-260 ◽

pp. 2072-2076

Author(s):

Yi Yong Han ◽

Jun Ju Zhang ◽

Ben Kang Chang ◽

Yi Hui Yuan ◽

Hui Xu

Keyword(s):

Image Fusion ◽

Structural Information ◽

Similarity Index ◽

Structural Similarity ◽

Reference Image ◽

Human Visual Perception ◽

Subjective Evaluations ◽

Fused Image ◽

Fusion Methods ◽

Image Quality Measure

Under the assumption that human visual perception is highly adapted for extracting structural information from a scene, we present a new approach using structural similarity index for assessing quality in image fusion. The advantages of our measures are that they do not require a reference image and can be easily computed. Numerous simulations demonstrate that our measures are conform to subjective evaluations and can be able to assess different image fusion methods.

Download Full-text

Multi-focus Image Fusion Using an Effective Discrete Wavelet Transform Based Algorithm

Measurement Science Review ◽

10.2478/msr-2014-0014 ◽

2014 ◽

Vol 14 (2) ◽

pp. 102-108 ◽

Cited By ~ 36

Author(s):

Yong Yang ◽

Shuying Huang ◽

Junfeng Gao ◽

Zhongsheng Qian

Keyword(s):

Wavelet Transform ◽

Image Fusion ◽

Discrete Wavelet Transform ◽

High Frequency ◽

Low Frequency ◽

Objective Evaluation ◽

Discrete Wavelet ◽

Evaluation Indexes ◽

Fused Image ◽

Focus Image

Abstract In this paper, by considering the main objective of multi-focus image fusion and the physical meaning of wavelet coefficients, a discrete wavelet transform (DWT) based fusion technique with a novel coefficients selection algorithm is presented. After the source images are decomposed by DWT, two different window-based fusion rules are separately employed to combine the low frequency and high frequency coefficients. In the method, the coefficients in the low frequency domain with maximum sharpness focus measure are selected as coefficients of the fused image, and a maximum neighboring energy based fusion scheme is proposed to select high frequency sub-bands coefficients. In order to guarantee the homogeneity of the resultant fused image, a consistency verification procedure is applied to the combined coefficients. The performance assessment of the proposed method was conducted in both synthetic and real multi-focus images. Experimental results demonstrate that the proposed method can achieve better visual quality and objective evaluation indexes than several existing fusion methods, thus being an effective multi-focus image fusion method.

Download Full-text

No-Reference Image Quality Assessment with Multi-Scale Orderless Pooling of Deep Features

Journal of Imaging ◽

10.3390/jimaging7070112 ◽

2021 ◽

Vol 7 (7) ◽

pp. 112

Author(s):

Domonkos Varga

Keyword(s):

Image Quality ◽

Quality Assessment ◽

Image Quality Assessment ◽

Multiple Scales ◽

Digital Images ◽

Input Image ◽

Perceptual Quality ◽

Reference Image ◽

Benchmark Datasets ◽

In The Wild

The goal of no-reference image quality assessment (NR-IQA) is to evaluate their perceptual quality of digital images without using the distortion-free, pristine counterparts. NR-IQA is an important part of multimedia signal processing since digital images can undergo a wide variety of distortions during storage, compression, and transmission. In this paper, we propose a novel architecture that extracts deep features from the input image at multiple scales to improve the effectiveness of feature extraction for NR-IQA using convolutional neural networks. Specifically, the proposed method extracts deep activations for local patches at multiple scales and maps them onto perceptual quality scores with the help of trained Gaussian process regressors. Extensive experiments demonstrate that the introduced algorithm performs favorably against the state-of-the-art methods on three large benchmark datasets with authentic distortions (LIVE In the Wild, KonIQ-10k, and SPAQ).

Download Full-text