scholarly journals Visual Saliency Prediction Based on Deep Learning

Information ◽  
2019 ◽  
Vol 10 (8) ◽  
pp. 257 ◽  
Author(s):  
Bashir Ghariba ◽  
Mohamed S. Shehata ◽  
Peter McGuire

Human eye movement is one of the most important functions for understanding our surroundings. When a human eye processes a scene, it quickly focuses on dominant parts of the scene, commonly known as a visual saliency detection or visual attention prediction. Recently, neural networks have been used to predict visual saliency. This paper proposes a deep learning encoder-decoder architecture, based on a transfer learning technique, to predict visual saliency. In the proposed model, visual features are extracted through convolutional layers from raw images to predict visual saliency. In addition, the proposed model uses the VGG-16 network for semantic segmentation, which uses a pixel classification layer to predict the categorical label for every pixel in an input image. The proposed model is applied to several datasets, including TORONTO, MIT300, MIT1003, and DUT-OMRON, to illustrate its efficiency. The results of the proposed model are quantitatively and qualitatively compared to classic and state-of-the-art deep learning models. Using the proposed deep learning model, a global accuracy of up to 96.22% is achieved for the prediction of visual saliency.

2021 ◽  
pp. 147592172098543
Author(s):  
Chaobo Zhang ◽  
Chih-chen Chang ◽  
Maziar Jamshidi

Deep learning techniques have attracted significant attention in the field of visual inspection of civil infrastructure systems recently. Currently, most deep learning-based visual inspection techniques utilize a convolutional neural network to recognize surface defects either by detecting a bounding box of each defect or classifying all pixels on an image without distinguishing between different defect instances. These outputs cannot be directly used for acquiring the geometric properties of each individual defect in an image, thus hindering the development of fully automated structural assessment techniques. In this study, a novel fully convolutional model is proposed for simultaneously detecting and grouping the image pixels for each individual defect on an image. The proposed model integrates an optimized mask subnet with a box-level detection network, where the former outputs a set of position-sensitive score maps for pixel-level defect detection and the latter predicts a bounding box for each defect to group the detected pixels. An image dataset containing three common types of concrete defects, crack, spalling and exposed rebar, is used for training and testing of the model. Results demonstrate that the proposed model is robust to various defect sizes and shapes and can achieve a mask-level mean average precision ( mAP) of 82.4% and a mean intersection over union ( mIoU) of 75.5%, with a processing speed of about 10 FPS at input image size of 576 × 576 when tested on an NVIDIA GeForce GTX 1060 GPU. Its performance is compared with the state-of-the-art instance segmentation network Mask R-CNN and the semantic segmentation network U-Net. The comparative studies show that the proposed model has a distinct defect boundary delineation capability and outperforms the Mask R-CNN and the U-Net in both accuracy and speed.


2013 ◽  
Vol 2013 ◽  
pp. 1-9
Author(s):  
Yuantao Chen ◽  
Weihong Xu ◽  
Fangjun Kuang ◽  
Shangbing Gao

Image segmentation process for high quality visual saliency map is very dependent on the existing visual saliency metrics. It is mostly only get sketchy effect of saliency map, and roughly based visual saliency map will affect the image segmentation results. The paper had presented the randomized visual saliency detection algorithm. The randomized visual saliency detection method can quickly generate the same size as the original input image and detailed results of the saliency map. The randomized saliency detection method can be applied to real-time requirements for image content-based scaling saliency results map. The randomization method for fast randomized video saliency area detection, the algorithm only requires a small amount of memory space can be detected detailed oriented visual saliency map, the presented results are shown that the method of visual saliency map used in image after the segmentation process can be an ideal segmentation results.


2020 ◽  
Vol 6 ◽  
pp. e280
Author(s):  
Bashir Muftah Ghariba ◽  
Mohamed S. Shehata ◽  
Peter McGuire

A human Visual System (HVS) has the ability to pay visual attention, which is one of the many functions of the HVS. Despite the many advancements being made in visual saliency prediction, there continues to be room for improvement. Deep learning has recently been used to deal with this task. This study proposes a novel deep learning model based on a Fully Convolutional Network (FCN) architecture. The proposed model is trained in an end-to-end style and designed to predict visual saliency. The entire proposed model is fully training style from scratch to extract distinguishing features. The proposed model is evaluated using several benchmark datasets, such as MIT300, MIT1003, TORONTO, and DUT-OMRON. The quantitative and qualitative experiment analyses demonstrate that the proposed model achieves superior performance for predicting visual saliency.


Sensors ◽  
2021 ◽  
Vol 21 (19) ◽  
pp. 6346
Author(s):  
Ankita Anand ◽  
Shalli Rani ◽  
Divya Anand ◽  
Hani Moaiteq Aljahdali ◽  
Dermot Kerr

The role of 5G-IoT has become indispensable in smart applications and it plays a crucial part in e-health applications. E-health applications require intelligent schemes and architectures to overcome the security threats against the sensitive data of patients. The information in e-healthcare applications is stored in the cloud which is vulnerable to security attacks. However, with deep learning techniques, these attacks can be detected, which needs hybrid models. In this article, a new deep learning model (CNN-DMA) is proposed to detect malware attacks based on a classifier—Convolution Neural Network (CNN). The model uses three layers, i.e., Dense, Dropout, and Flatten. Batch sizes of 64, 20 epoch, and 25 classes are used to train the network. An input image of 32 × 32 × 1 is used for the initial convolutional layer. Results are retrieved on the Malimg dataset where 25 families of malware are fed as input and our model has detected is Alueron.gen!J malware. The proposed model CNN-DMA is 99% accurate and it is validated with state-of-the-art techniques.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Rajat Garg ◽  
Anil Kumar ◽  
Nikunj Bansal ◽  
Manish Prateek ◽  
Shashi Kumar

AbstractUrban area mapping is an important application of remote sensing which aims at both estimation and change in land cover under the urban area. A major challenge being faced while analyzing Synthetic Aperture Radar (SAR) based remote sensing data is that there is a lot of similarity between highly vegetated urban areas and oriented urban targets with that of actual vegetation. This similarity between some urban areas and vegetation leads to misclassification of the urban area into forest cover. The present work is a precursor study for the dual-frequency L and S-band NASA-ISRO Synthetic Aperture Radar (NISAR) mission and aims at minimizing the misclassification of such highly vegetated and oriented urban targets into vegetation class with the help of deep learning. In this study, three machine learning algorithms Random Forest (RF), K-Nearest Neighbour (KNN), and Support Vector Machine (SVM) have been implemented along with a deep learning model DeepLabv3+ for semantic segmentation of Polarimetric SAR (PolSAR) data. It is a general perception that a large dataset is required for the successful implementation of any deep learning model but in the field of SAR based remote sensing, a major issue is the unavailability of a large benchmark labeled dataset for the implementation of deep learning algorithms from scratch. In current work, it has been shown that a pre-trained deep learning model DeepLabv3+ outperforms the machine learning algorithms for land use and land cover (LULC) classification task even with a small dataset using transfer learning. The highest pixel accuracy of 87.78% and overall pixel accuracy of 85.65% have been achieved with DeepLabv3+ and Random Forest performs best among the machine learning algorithms with overall pixel accuracy of 77.91% while SVM and KNN trail with an overall accuracy of 77.01% and 76.47% respectively. The highest precision of 0.9228 is recorded for the urban class for semantic segmentation task with DeepLabv3+ while machine learning algorithms SVM and RF gave comparable results with a precision of 0.8977 and 0.8958 respectively.


2012 ◽  
Vol 48 (25) ◽  
pp. 1591-1593 ◽  
Author(s):  
Di Wu ◽  
Xiudong Sun ◽  
Yongyuan Jiang ◽  
Chunfeng Hou

IEEE Access ◽  
2018 ◽  
Vol 6 ◽  
pp. 71422-71434 ◽  
Author(s):  
Zhenguo Gao ◽  
Naeem Ayoub ◽  
Danjie Chen ◽  
Bingcai Chen ◽  
Zhimao Lu

Sign in / Sign up

Export Citation Format

Share Document