scholarly journals Image Saliency Prediction in Transformed Domain: A Deep Complex Neural Network Method

Author(s):  
Lai Jiang ◽  
Zhe Wang ◽  
Mai Xu ◽  
Zulin Wang

The transformed domain fearures of images show effectiveness in distinguishing salient and non-salient regions. In this paper, we propose a novel deep complex neural network, named SalDCNN, to predict image saliency by learning features in both pixel and transformed domains. Before proposing Sal-DCNN, we analyze the saliency cues encoded in discrete Fourier transform (DFT) domain. Consequently, we have the following findings: 1) the phase spectrum encodes most saliency cues; 2) a certain pattern of the amplitude spectrum is important for saliency prediction; 3) the transformed domain spectrum is robust to noise and down-sampling for saliency prediction. According to these findings, we develop the structure of SalDCNN, including two main stages: the complex dense encoder and three-stream multi-domain decoder. Given the new SalDCNN structure, the saliency maps can be predicted under the supervision of ground-truth fixation maps in both pixel and transformed domains. Finally, the experimental results show that our Sal-DCNN method outperforms other 8 state-of-theart methods for image saliency prediction on 3 databases.

2011 ◽  
Vol 225-226 ◽  
pp. 1016-1019
Author(s):  
Sheng He ◽  
Jun Wei Han ◽  
Ming Xu ◽  
Gong Cheng ◽  
Tian Yun Zhao ◽  
...  

Computer vision community has long attempted to automatically detect locations in the image that are able to capture attentions of users. In recent years, more and more researchers have proposed to address this problem from the perspective of simulating human visual attention mechanisms. In this paper, we study modeling visual attention in frequency domain. Our major contributions are twofold: 1. A new method called band-divided method (BDM) is developed to generate the saliency map by integrating the amplitude spectrum with the phase spectrum. 2. A quantitative measurement according to min-distance dissimilarity (MDD) is presented to evaluate the saliency map, which is more appropriate for non-binary ground-truth data. Experiments on benchmark dataset and comparisons with traditional approaches demonstrate the promise of the proposed work.


Author(s):  
Zijun Deng ◽  
Xiaowei Hu ◽  
Lei Zhu ◽  
Xuemiao Xu ◽  
Jing Qin ◽  
...  

Saliency detection is a fundamental yet challenging task in computer vision, aiming at highlighting the most visually distinctive objects in an image. We propose a novel recurrent residual refinement network (R^3Net) equipped with residual refinement blocks (RRBs) to more accurately detect salient regions of an input image. Our RRBs learn the residual between the intermediate saliency prediction and the ground truth by alternatively leveraging the low-level integrated features and the high-level integrated features of a fully convolutional network (FCN). While the low-level integrated features are capable of capturing more saliency details, the high-level integrated features can reduce non-salient regions in the intermediate prediction. Furthermore, the RRBs can obtain complementary saliency information of the intermediate prediction, and add the residual into the intermediate prediction to refine the saliency maps. We evaluate the proposed R^3Net on five widely-used saliency detection benchmarks by comparing it with 16 state-of-the-art saliency detectors. Experimental results show that our network outperforms our competitors in all the benchmark datasets.


Sensors ◽  
2021 ◽  
Vol 21 (3) ◽  
pp. 970
Author(s):  
Miguel Ángel Martínez-Domingo ◽  
Juan Luis Nieves ◽  
Eva M. Valero

Saliency prediction is a very important and challenging task within the computer vision community. Many models exist that try to predict the salient regions on a scene from its RGB image values. Several new models are developed, and spectral imaging techniques may potentially overcome the limitations found when using RGB images. However, the experimental study of such models based on spectral images is difficult because of the lack of available data to work with. This article presents the first eight-channel multispectral image database of outdoor urban scenes together with their gaze data recorded using an eyetracker over several observers performing different visualization tasks. Besides, the information from this database is used to study whether the complexity of the images has an impact on the saliency maps retrieved from the observers. Results show that more complex images do not correlate with higher differences in the saliency maps obtained.


Sensors ◽  
2020 ◽  
Vol 20 (2) ◽  
pp. 459
Author(s):  
Shaosheng Dai ◽  
Dongyang Li

Aiming at solving the problem of incomplete saliency detection and unclear boundaries in infrared multi-target images with different target sizes and low signal-to-noise ratio under sky background conditions, this paper proposes a saliency detection method for multiple targets based on multi-saliency detection. The multiple target areas of the infrared image are mainly bright and the background areas are dark. Combining with the multi-scale top hat (Top-hat) transformation, the image is firstly corroded and expanded to extract the subtraction of light and shade parts and reconstruct the image to reduce the interference of sky blurred background noise. Then the image obtained by a multi-scale Top-hat transformation is transformed from the time domain to the frequency domain, and the spectral residuals and phase spectrum are extracted directly to obtain two kinds of image saliency maps by multi-scale Gauss filtering reconstruction, respectively. On the other hand, the quaternion features are extracted directly to transform the phase spectrum, and then the phase spectrum is reconstructed to obtain one kind of image saliency map by the Gauss filtering. Finally, the above three saliency maps are fused to complete the saliency detection of infrared images. The test results show that after the experimental analysis of infrared video photographs and the comparative analysis of Receiver Operating Characteristic (ROC) curve and Area Under the Curve (AUC) index, the infrared image saliency map generated by this method has clear target details and good background suppression effect, and the AUC index performance is good, reaching over 99%. It effectively improves the multi-target saliency detection effect of the infrared image under the sky background and is beneficial to subsequent detection and tracking of image targets.


Author(s):  
C. Supunyachotsakul ◽  
N. Suksangpanya

Classifying features from satellite images has been a time-consuming manual process which requires lots of manpower. This work exploits deep convolutional encoder-decoder neural network to develop an algorithm that can automatically classify the extents of the Pararubber tree-growing areas from the LANDSAT-8 images. The ground truth of the areas of the Pararubber tree was manually prepared and was separated into training datasets and the validation datasets. The classification model from this approach obtained using the training datasets was verified with the classification accuracy of70.90%, precision of 67.66%, recall of 80.80%, and F1 score of 73.59%.


Author(s):  
Venicio Silva Araujo ◽  
Guilherme Silva Prado ◽  
Heinsten Frederich Leal dos Santos

Methods for evaluation the manufacturability of a vehicle in the field of production and operation based on an energy indicator, expert estimates and usage of a neural network are stated. By using the neural network method the manufacturability of a car in a complex and for individual units is considered. The preparation of the initial data at usage a neural network for predicting the manufacturability of a vehicle is shown; the training algorithm and the architecture for calculating the manufacturability of the main units are given. According to the calculation results, comparative data on the manufacturability vehicles of various brands are given.


Author(s):  
Liang Kim Meng ◽  
Azira Khalil ◽  
Muhamad Hanif Ahmad Nizar ◽  
Maryam Kamarun Nisham ◽  
Belinda Pingguan-Murphy ◽  
...  

Background: Bone Age Assessment (BAA) refers to a clinical procedure that aims to identify a discrepancy between biological and chronological age of an individual by assessing the bone age growth. Currently, there are two main methods of executing BAA which are known as Greulich-Pyle and Tanner-Whitehouse techniques. Both techniques involve a manual and qualitative assessment of hand and wrist radiographs, resulting in intra and inter-operator variability accuracy and time-consuming. An automatic segmentation can be applied to the radiographs, providing the physician with more accurate delineation of the carpal bone and accurate quantitative analysis. Methods: In this study, we proposed an image feature extraction technique based on image segmentation with the fully convolutional neural network with eight stride pixel (FCN-8). A total of 290 radiographic images including both female and the male subject of age ranging from 0 to 18 were manually segmented and trained using FCN-8. Results and Conclusion: The results exhibit a high training accuracy value of 99.68% and a loss rate of 0.008619 for 50 epochs of training. The experiments compared 58 images against the gold standard ground truth images. The accuracy of our fully automated segmentation technique is 0.78 ± 0.06, 1.56 ±0.30 mm and 98.02% in terms of Dice Coefficient, Hausdorff Distance, and overall qualitative carpal recognition accuracy, respectively.


2021 ◽  
Vol 1715 ◽  
pp. 012045
Author(s):  
M I Shimelevich ◽  
E A Obornev ◽  
I E Obornev ◽  
E A Rodionov

2021 ◽  
Vol 18 (1) ◽  
pp. 172988142199332
Author(s):  
Xintao Ding ◽  
Boquan Li ◽  
Jinbao Wang

Indoor object detection is a very demanding and important task for robot applications. Object knowledge, such as two-dimensional (2D) shape and depth information, may be helpful for detection. In this article, we focus on region-based convolutional neural network (CNN) detector and propose a geometric property-based Faster R-CNN method (GP-Faster) for indoor object detection. GP-Faster incorporates geometric property in Faster R-CNN to improve the detection performance. In detail, we first use mesh grids that are the intersections of direct and inverse proportion functions to generate appropriate anchors for indoor objects. After the anchors are regressed to the regions of interest produced by a region proposal network (RPN-RoIs), we then use 2D geometric constraints to refine the RPN-RoIs, in which the 2D constraint of every classification is a convex hull region enclosing the width and height coordinates of the ground-truth boxes on the training set. Comparison experiments are implemented on two indoor datasets SUN2012 and NYUv2. Since the depth information is available in NYUv2, we involve depth constraints in GP-Faster and propose 3D geometric property-based Faster R-CNN (DGP-Faster) on NYUv2. The experimental results show that both GP-Faster and DGP-Faster increase the performance of the mean average precision.


Sign in / Sign up

Export Citation Format

Share Document