scholarly journals An Enhanced pix2pix Dehazing Network with Guided Filter Layer

2020 ◽  
Vol 10 (17) ◽  
pp. 5898
Author(s):  
Qirong Bu ◽  
Jie Luo ◽  
Kuan Ma ◽  
Hongwei Feng ◽  
Jun Feng

In this paper, we propose an enhanced pix2pix dehazing network, which generates clear images without relying on a physical scattering model. This network is a generative adversarial network (GAN) which combines multiple guided filter layers. First, the input of hazy images is smoothed to obtain high-frequency features according to different smoothing kernels of the guided filter layer. Then, these features are embedded in higher dimensions of the network and connected with the output of the generator’s encoder. Finally, Visual Geometry Group (VGG) features are introduced to serve as a loss function to improve the quality of the texture information restoration and generate better hazy-free images. We conduct experiments on NYU-Depth, I-HAZE and O-HAZE datasets. The enhanced pix2pix dehazing network we propose produces increases of 1.22 dB in the Peak Signal-to-Noise Ratio (PSNR) and 0.01 in the Structural Similarity Index Metric (SSIM) compared with a second successful comparison method using the indoor test dataset. Extensive experiments demonstrate that the proposed method has good performance for image dehazing.

Sensors ◽  
2021 ◽  
Vol 21 (16) ◽  
pp. 5540
Author(s):  
Nayeem Hasan ◽  
Md Saiful Islam ◽  
Wenyu Chen ◽  
Muhammad Ashad Kabir ◽  
Saad Al-Ahmadi

This paper proposes an encryption-based image watermarking scheme using a combination of second-level discrete wavelet transform (2DWT) and discrete cosine transform (DCT) with an auto extraction feature. The 2DWT has been selected based on the analysis of the trade-off between imperceptibility of the watermark and embedding capacity at various levels of decomposition. DCT operation is applied to the selected area to gather the image coefficients into a single vector using a zig-zig operation. We have utilized the same random bit sequence as the watermark and seed for the embedding zone coefficient. The quality of the reconstructed image was measured according to bit correction rate, peak signal-to-noise ratio (PSNR), and similarity index. Experimental results demonstrated that the proposed scheme is highly robust under different types of image-processing attacks. Several image attacks, e.g., JPEG compression, filtering, noise addition, cropping, sharpening, and bit-plane removal, were examined on watermarked images, and the results of our proposed method outstripped existing methods, especially in terms of the bit correction ratio (100%), which is a measure of bit restoration. The results were also highly satisfactory in terms of the quality of the reconstructed image, which demonstrated high imperceptibility in terms of peak signal-to-noise ratio (PSNR ≥ 40 dB) and structural similarity (SSIM ≥ 0.9) under different image attacks.


2020 ◽  
Vol 10 (1) ◽  
pp. 375 ◽  
Author(s):  
Zetao Jiang ◽  
Yongsong Huang ◽  
Lirui Hu

The super-resolution generative adversarial network (SRGAN) is a seminal work that is capable of generating realistic textures during single image super-resolution. However, the hallucinated details are often accompanied by unpleasant artifacts. To further enhance the visual quality, we propose a deep learning method for single image super-resolution (SR). Our method directly learns an end-to-end mapping between the low/high-resolution images. The method is based on depthwise separable convolution super-resolution generative adversarial network (DSCSRGAN). A new depthwise separable convolution dense block (DSC Dense Block) was designed for the generator network, which improved the ability to represent and extract image features, while greatly reducing the total amount of parameters. For the discriminator network, the batch normalization (BN) layer was discarded, and the problem of artifacts was reduced. A frequency energy similarity loss function was designed to constrain the generator network to generate better super-resolution images. Experiments on several different datasets showed that the peak signal-to-noise ratio (PSNR) was improved by more than 3 dB, structural similarity index (SSIM) was increased by 16%, and the total parameter was reduced to 42.8% compared with the original model. Combining various objective indicators and subjective visual evaluation, the algorithm was shown to generate richer image details, clearer texture, and lower complexity.


Author(s):  
Jelena Vlaović ◽  
Drago Žagar ◽  
Snježana Rimac-Drlje ◽  
Mario Vranješ

With the development of Video on Demand applications due to the availability of high-speed internet access, adaptive streaming algorithms have been developing and improving. The focus is on improving user’s Quality of Experience (QoE) and taking it into account as one of the parameters for the adaptation algorithm. Users often experience changing network conditions, so the goal is to ensure stable video playback with satisfying QoE level. Although subjective Video Quality Assessment (VQA) methods provide more accurate results regarding user’s QoE, objective VQA methods cost less and are less time-consuming. In this article, nine different objective VQA methods are compared on a large set of video sequences with various spatial and temporal activities. VQA methods used in this analysis are: Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM), MultiScale Structural Similarity Index (MS-SSIM), Video Quality Metric (VQM), Mean Sum of Differences (DELTA), Mean Sum of Absolute Differences (MSAD), Mean Squared Error (MSE), Netflix Video Multimethod Assessment Fusion (Netflix VMAF) and Visual Signal-to-Noise Ratio (VSNR). The video sequences used for testing purposes were encoded according to H.264/AVC with twelve different target coding bitrates, at three different spatial resolutions (resulting in a total of 190 sequences). In addition to objective quality assessment, subjective quality assessment was performed for these sequences. All results acquired by objective VQA methods have been compared with subjective Mean Opinion Score (MOS) results using Pearson Linear Correlation Coefficient (PLCC). Measurement results obtained on a large set of video sequences with different spatial resolutions show that VQA methods like SSIM and VQM correlate better with MOS results compared to PSNR, SSIM, VSNR, DELTA, MSE, VMAF and MSAD. However, the PLCC results for SSIM and VQM are too low (0.7799 and 0.7734, respectively), for the usage of these methods in streaming services instead of subjective testing. These results suggest that more efficient VQA methods should be developed to be used in streaming testing procedures as well as to support the video segmentation process. Furthermore, when comparing results obtained for different spatial resolutions, it can be concluded that the quality of video sequences encoded at lower spatial resolutions in cases of lower target coding bitrate is higher compared to the quality of video sequences encoded at higher spatial resolutions at the same target coding bitrate, particularly when video sequences with higher spatial and temporal information are used.


2020 ◽  
Vol 2020 ◽  
pp. 1-12
Author(s):  
Jianfang Cao ◽  
Zibang Zhang ◽  
Aidi Zhao

Considering the problems of low resolution and rough details in existing mural images, this paper proposes a superresolution reconstruction algorithm for enhancing artistic mural images, thereby optimizing mural images. The algorithm takes a generative adversarial network (GAN) as the framework. First, a convolutional neural network (CNN) is used to extract image feature information, and then, the features are mapped to the high-resolution image space of the same size as the original image. Finally, the reconstructed high-resolution image is output to complete the design of the generative network. Then, a CNN with deep and residual modules is used for image feature extraction to determine whether the output of the generative network is an authentic, high-resolution mural image. In detail, the depth of the network increases, the residual module is introduced, the batch standardization of the network convolution layer is deleted, and the subpixel convolution is used to realize upsampling. Additionally, a combination of multiple loss functions and staged construction of the network model is adopted to further optimize the mural image. A mural dataset is set up by the current team. Compared with several existing image superresolution algorithms, the peak signal-to-noise ratio (PSNR) of the proposed algorithm increases by an average of 1.2–3.3 dB and the structural similarity (SSIM) increases by 0.04 = 0.13; it is also superior to other algorithms in terms of subjective scoring. The proposed method in this study is effective in the superresolution reconstruction of mural images, which contributes to the further optimization of ancient mural images.


Author(s):  
Diptasree Debnath ◽  
Emlon Ghosh ◽  
Barnali Gupta Banik

Steganography is a widely-used technique for digital data hiding. Image steganography is the most popular among all other kinds of steganography. In this article, a novel key-based blind method for RGB image steganography where multiple images can be hidden simultaneously is described. The proposed method is based on Discrete Cosine Transformation (DCT) and Discrete Wavelet Transformation (DWT) which provides enhanced security as well as improve the quality of the stego. Here, the cover image has been taken as RGB although the method can be implemented on grayscale images as well. The fundamental concept of visual cryptography has been utilized here in order to increase the capacity to a great extent. To make the method more robust and imperceptible, pseudo-random number sequence and a correlation coefficient have been used for embedding and the extraction of the secrets, respectively. The robustness of the method is tested against steganalysis attacks such as crop, rotate, resize, noise addition, and histogram equalization. The method has been applied on multiple sets of images and the quality of the resultant images have been analyzed through various matrices namely ‘Peak Signal to Noise Ratio,' ‘Structural Similarity index,' ‘Structural Content,' and ‘Maximum Difference.' The results obtained are very promising and have been compared with existing methods to prove its efficiency.


2013 ◽  
Vol 13 (01) ◽  
pp. 1350006 ◽  
Author(s):  
RAJANI GUPTA ◽  
PRASHANT BANSOD ◽  
R. S. GAMAD

The paper reveals the analysis of the compression quality of true color medical images of echocardiogram (ECHO), X-radiation (X-ray) and computed tomography (CT) and further a comparison of compressed biomedical images of various sizes using two lossy compression techniques, set partitioning in hierarchical trees (SPIHT) and discrete cosine transform (DCT) to the original image is carried out. The study also evaluates the results after analyzing various objective parameters associated with the image. The objective of this analysis is to exhibits the effect of compression ratio on absolute average difference (AAD), cross correlation (CC), image fidelity (IF), mean square error (MSE), peak signal to noise ratio (PSNR) and structural similarity index measurement (SSIM) of the compressed image by SPIHT and DCT compression techniques. The results signify that the quality of the compressed image depends on resolution of the underlying structure where CT is found to be better than other image modalities. The X-ray compression results are equivalent by both the techniques. The compression results for large size biomedical images by SPIHT signifies that ECHO having comparable results to CT and X-ray while their DCT results are substandard. The compression results for comparatively smaller images of ECHO are not as good as X-ray and CT by both the compression techniques. The quality measurement of the compressed image has been designed using MATLAB.


Electronics ◽  
2020 ◽  
Vol 9 (11) ◽  
pp. 1969
Author(s):  
Hongrui Liu ◽  
Shuoshi Li ◽  
Hongquan Wang ◽  
Xinshan Zhu

The existing face image completion approaches cannot be utilized to rationally complete damaged face images where their identity information is completely lost due to being obscured by center masks. Hence, in this paper, a reference-guided double-pipeline face image completion network (RG-DP-FICN) is designed within the framework of the generative adversarial network (GAN) completing the identity information of damaged images utilizing reference images with the same identity as damaged images. To reasonably integrate the identity information of reference images into completed images, the reference image is decoupled into identity features (e.g., the contour of eyes, eyebrows, nose) and pose features (e.g., the orientation of face and the positions of the facial features), and then the resulting identity features are fused with posture features of damaged images. Specifically, a lightweight identity predictor is used to extract the pose features; an identity extraction module is designed to compress and globally extract the identity features of the reference images, and an identity transfer module is proposed to effectively fuse identity and pose features by performing identity rendering on different receptive fields. Furthermore, quantitative and qualitative evaluations are conducted on a public dataset CelebA-HQ. Compared to the state-of-the-art methods, the evaluation metrics peak signal-to-noise ratio (PSNR), structure similarity index (SSIM) and L1 loss are improved by 2.22 dB, 0.033 and 0.79%, respectively. The results indicate that RG-DP-FICN can generate completed images with reasonable identity, with superior completion effect compared to existing completion approaches.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Tae-Hoon Yong ◽  
Su Yang ◽  
Sang-Jeong Lee ◽  
Chansoo Park ◽  
Jo-Eun Kim ◽  
...  

AbstractThe purpose of this study was to directly and quantitatively measure BMD from Cone-beam CT (CBCT) images by enhancing the linearity and uniformity of the bone intensities based on a hybrid deep-learning model (QCBCT-NET) of combining the generative adversarial network (Cycle-GAN) and U-Net, and to compare the bone images enhanced by the QCBCT-NET with those by Cycle-GAN and U-Net. We used two phantoms of human skulls encased in acrylic, one for the training and validation datasets, and the other for the test dataset. We proposed the QCBCT-NET consisting of Cycle-GAN with residual blocks and a multi-channel U-Net using paired training data of quantitative CT (QCT) and CBCT images. The BMD images produced by QCBCT-NET significantly outperformed the images produced by the Cycle-GAN or the U-Net in mean absolute difference (MAD), peak signal to noise ratio (PSNR), normalized cross-correlation (NCC), structural similarity (SSIM), and linearity when compared to the original QCT image. The QCBCT-NET improved the contrast of the bone images by reflecting the original BMD distribution of the QCT image locally using the Cycle-GAN, and also spatial uniformity of the bone images by globally suppressing image artifacts and noise using the two-channel U-Net. The QCBCT-NET substantially enhanced the linearity, uniformity, and contrast as well as the anatomical and quantitative accuracy of the bone images, and demonstrated more accuracy than the Cycle-GAN and the U-Net for quantitatively measuring BMD in CBCT.


2020 ◽  
Vol 13 (6) ◽  
pp. 219-228
Author(s):  
Avin Maulana ◽  
◽  
Chastine Fatichah ◽  
Nanik Suciati ◽  
◽  
...  

Facial inpainting is a process to reconstruct some missing or damaged pixels in the facial image. The reconstructed pixels should still be realistic, so the observer could not differentiate between the reconstructed pixels and the original one. However, there are a few problems that may arise when the inpainting algorithm has been done. There was an inconsistency between adjacent pixels when done on an unaligned face image, which caused a failure to reconstruct. We propose an improvement method in facial inpainting using Generative Adversarial Network (GAN) with additional loss using pre-trained network VGG-Net and face landmark. The feature reconstruction loss will help to preserve deep-feature on an image, while the landmark will increase the result’s perceptual quality. The training process has been done using a curriculum learning scenario. Qualitative results show that our inpainting method can reconstruct the missing area on unaligned face images. From the quantitative results, our proposed method achieves the average score of 21.528 and 0.665, while the maximum score of 29.922 and 0.908 on PSNR (Peak Signal to Noise Ratio) and SSIM (Structure Similarity Index Measure) metrics, respectively.


Sign in / Sign up

Export Citation Format

Share Document