scholarly journals Image Cropping with Composition and Saliency Aware Aesthetic Score Map

2020 ◽  
Vol 34 (07) ◽  
pp. 12104-12111
Author(s):  
Yi Tu ◽  
Li Niu ◽  
Weijie Zhao ◽  
Dawei Cheng ◽  
Liqing Zhang

Aesthetic image cropping is a practical but challenging task which aims at finding the best crops with the highest aesthetic quality in an image. Recently, many deep learning methods have been proposed to address this problem, but they did not reveal the intrinsic mechanism of aesthetic evaluation. In this paper, we propose an interpretable image cropping model to unveil the mystery. For each image, we use a fully convolutional network to produce an aesthetic score map, which is shared among all candidate crops during crop-level aesthetic evaluation. Then, we require the aesthetic score map to be both composition-aware and saliency-aware. In particular, the same region is assigned with different aesthetic scores based on its relative positions in different crops. Moreover, a visually salient region is supposed to have more sensitive aesthetic scores so that our network can learn to place salient objects at more proper positions. Such an aesthetic score map can be used to localize aesthetically important regions in an image, which sheds light on the composition rules learned by our model. We show the competitive performance of our model in the image cropping task on several benchmark datasets, and also demonstrate its generality in real-world applications.

2020 ◽  
Vol 6 ◽  
pp. e280
Author(s):  
Bashir Muftah Ghariba ◽  
Mohamed S. Shehata ◽  
Peter McGuire

A human Visual System (HVS) has the ability to pay visual attention, which is one of the many functions of the HVS. Despite the many advancements being made in visual saliency prediction, there continues to be room for improvement. Deep learning has recently been used to deal with this task. This study proposes a novel deep learning model based on a Fully Convolutional Network (FCN) architecture. The proposed model is trained in an end-to-end style and designed to predict visual saliency. The entire proposed model is fully training style from scratch to extract distinguishing features. The proposed model is evaluated using several benchmark datasets, such as MIT300, MIT1003, TORONTO, and DUT-OMRON. The quantitative and qualitative experiment analyses demonstrate that the proposed model achieves superior performance for predicting visual saliency.


2018 ◽  
Vol 10 (11) ◽  
pp. 1827 ◽  
Author(s):  
Ahram Song ◽  
Jaewan Choi ◽  
Youkyung Han ◽  
Yongil Kim

Hyperspectral change detection (CD) can be effectively performed using deep-learning networks. Although these approaches require qualified training samples, it is difficult to obtain ground-truth data in the real world. Preserving spatial information during training is difficult due to structural limitations. To solve such problems, our study proposed a novel CD method for hyperspectral images (HSIs), including sample generation and a deep-learning network, called the recurrent three-dimensional (3D) fully convolutional network (Re3FCN), which merged the advantages of a 3D fully convolutional network (FCN) and a convolutional long short-term memory (ConvLSTM). Principal component analysis (PCA) and the spectral correlation angle (SCA) were used to generate training samples with high probabilities of being changed or unchanged. The strategy assisted in training fewer samples of representative feature expression. The Re3FCN was mainly comprised of spectral–spatial and temporal modules. Particularly, a spectral–spatial module with a 3D convolutional layer extracts the spectral–spatial features from the HSIs simultaneously, whilst a temporal module with ConvLSTM records and analyzes the multi-temporal HSI change information. The study first proposed a simple and effective method to generate samples for network training. This method can be applied effectively to cases with no training samples. Re3FCN can perform end-to-end detection for binary and multiple changes. Moreover, Re3FCN can receive multi-temporal HSIs directly as input without learning the characteristics of multiple changes. Finally, the network could extract joint spectral–spatial–temporal features and it preserved the spatial structure during the learning process through the fully convolutional structure. This study was the first to use a 3D FCN and a ConvLSTM for the remote-sensing CD. To demonstrate the effectiveness of the proposed CD method, we performed binary and multi-class CD experiments. Results revealed that the Re3FCN outperformed the other conventional methods, such as change vector analysis, iteratively reweighted multivariate alteration detection, PCA-SCA, FCN, and the combination of 2D convolutional layers-fully connected LSTM.


2021 ◽  
Vol 13 (16) ◽  
pp. 3211
Author(s):  
Tian Tian ◽  
Zhengquan Chu ◽  
Qian Hu ◽  
Li Ma

Semantic segmentation is a fundamental task in remote sensing image interpretation, which aims to assign a semantic label for every pixel in the given image. Accurate semantic segmentation is still challenging due to the complex distributions of various ground objects. With the development of deep learning, a series of segmentation networks represented by fully convolutional network (FCN) has made remarkable progress on this problem, but the segmentation accuracy is still far from expectations. This paper focuses on the importance of class-specific features of different land cover objects, and presents a novel end-to-end class-wise processing framework for segmentation. The proposed class-wise FCN (C-FCN) is shaped in the form of an encoder-decoder structure with skip-connections, in which the encoder is shared to produce general features for all categories and the decoder is class-wise to process class-specific features. To be detailed, class-wise transition (CT), class-wise up-sampling (CU), class-wise supervision (CS), and class-wise classification (CC) modules are designed to achieve the class-wise transfer, recover the resolution of class-wise feature maps, bridge the encoder and modified decoder, and implement class-wise classifications, respectively. Class-wise and group convolutions are adopted in the architecture with regard to the control of parameter numbers. The method is tested on the public ISPRS 2D semantic labeling benchmark datasets. Experimental results show that the proposed C-FCN significantly improves the segmentation performances compared with many state-of-the-art FCN-based networks, revealing its potentials on accurate segmentation of complex remote sensing images.


2021 ◽  
Vol 12 (2) ◽  
pp. 138
Author(s):  
Hashfi Fadhillah ◽  
Suryo Adhi Wibowo ◽  
Rita Purnamasari

Abstract  Combining the real world with the virtual world and then modeling it in 3D is an effort carried on Augmented Reality (AR) technology. Using fingers for computer operations on multi-devices makes the system more interactive. Marker-based AR is one type of AR that uses markers in its detection. This study designed the AR system by detecting fingertips as markers. This system is designed using the Region-based Deep Fully Convolutional Network (R-FCN) deep learning method. This method develops detection results obtained from the Fully Connected Network (FCN). Detection results will be integrated with a computer pointer for basic operations. This study uses a predetermined step scheme to get the best IoU parameters, precision and accuracy. The scheme in this study uses a step scheme, namely: 25K, 50K and 75K step. High precision creates centroid point changes that are not too far away. High accuracy can improve AR performance under conditions of rapid movement and improper finger conditions. The system design uses a dataset in the form of an index finger image with a configuration of 10,800 training data and 3,600 test data. The model will be tested on each scheme using video at different distances, locations and times. This study produced the best results on the 25K step scheme with IoU of 69%, precision of 5.56 and accuracy of 96%.Keyword: Augmented Reality, Region-based Convolutional Network, Fully Convolutional Network, Pointer, Step training Abstrak Menggabungkan dunia nyata dengan dunia virtual lalu memodelkannya bentuk 3D merupakan upaya yang diusung pada teknologi Augmented Reality (AR). Menggunakan jari untuk operasi komputer pada multi-device membuat sistem yang lebih interaktif. Marker-based AR merupakan salah satu jenis AR yang menggunakan marker dalam deteksinya. Penelitian ini merancang sistem AR dengan mendeteksi ujung jari sebagai marker. Sistem ini dirancang menggunakan metode deep learning Region-based Fully Convolutional Network (R-FCN). Metode ini mengembangkan hasil deteksi yang didapat dari Fully Connected Network (FCN). Hasil deteksi akan diintegrasikan dengan pointer komputer untuk operasi dasar. Penelitian ini menggunakan skema step training yang telah ditentukan untuk mendapatkan parameter IoU, presisi dan akurasi yang terbaik. Skema pada penelitian ini menggunakan skema step yaitu: 25K, 50K dan 75K step. Presisi tinggi menciptakan perubahan titik centroid yang tidak terlalu jauh. Akurasi  yang tinggi dapat meningkatkan kinerja AR dalam kondisi pergerakan yang cepat dan kondisi jari yang tidak tepat. Perancangan sistem menggunakan dataset berupa citra jari telunjuk dengan konfigurasi 10.800 data latih dan 3.600 data uji. Model akan diuji pada tiap skema dilakukan menggunakan video pada jarak, lokasi dan waktu yang berbeda. Penelitian ini menghasilkan hasil terbaik pada skema step 25K dengan IoU sebesar 69%, presisi sebesar 5,56 dan akurasi sebesar 96%.Kata kunci: Augmented Reality, Region-based Convolutional Network, Fully Convolutional Network, Pointer, Step training 


2020 ◽  
pp. 147592172094295
Author(s):  
Homin Song ◽  
Yongchao Yang

Subwavelength defect imaging using guided waves has been known to be a difficult task mainly due to the diffraction limit and dispersion of guided waves. In this article, we present a noncontact super-resolution guided wave array imaging approach based on deep learning to visualize subwavelength defects in plate-like structures. The proposed approach is a novel hierarchical multiscale imaging approach that combines two distinct fully convolutional networks. The first fully convolutional network, the global detection network, globally detects subwavelength defects in a raw low-resolution guided wave beamforming image. Then, the subsequent second fully convolutional network, the local super-resolution network, locally resolves subwavelength-scale fine structural details of the detected defects. We conduct a series of numerical simulations and laboratory-scale experiments using a noncontact guided wave array enabled by a scanning laser Doppler vibrometer on aluminate plates with various subwavelength defects. The results demonstrate that the proposed super-resolution guided wave array imaging approach not only locates subwavelength defects but also visualizes super-resolution fine structural details of these defects, thus enabling further estimation of the size and shape of the detected subwavelength defects. We discuss several key aspects of the performance of our approach, compare with an existing super-resolution algorithm, and make recommendations for its successful implementations.


Author(s):  
V. R. S. Mani

In this chapter, the author paints a comprehensive picture of different deep learning models used in different multi-modal image segmentation tasks. This chapter is an introduction for those new to the field, an overview for those working in the field, and a reference for those searching for literature on a specific application. Methods are classified according to the different types of multi-modal images and the corresponding types of convolution neural networks used in the segmentation task. The chapter starts with an introduction to CNN topology and describes various models like Hyper Dense Net, Organ Attention Net, UNet, VNet, Dilated Fully Convolutional Network, Transfer Learning, etc.


2018 ◽  
Vol 30 (7) ◽  
pp. 1775-1800 ◽  
Author(s):  
Xiaopeng Guo ◽  
Rencan Nie ◽  
Jinde Cao ◽  
Dongming Zhou ◽  
Wenhua Qian

As the optical lenses for cameras always have limited depth of field, the captured images with the same scene are not all in focus. Multifocus image fusion is an efficient technology that can synthesize an all-in-focus image using several partially focused images. Previous methods have accomplished the fusion task in spatial or transform domains. However, fusion rules are always a problem in most methods. In this letter, from the aspect of focus region detection, we propose a novel multifocus image fusion method based on a fully convolutional network (FCN) learned from synthesized multifocus images. The primary novelty of this method is that the pixel-wise focus regions are detected through a learning FCN, and the entire image, not just the image patches, are exploited to train the FCN. First, we synthesize 4500 pairs of multifocus images by repeatedly using a gaussian filter for each image from PASCAL VOC 2012, to train the FCN. After that, a pair of source images is fed into the trained FCN, and two score maps indicating the focus property are generated. Next, an inversed score map is averaged with another score map to produce an aggregative score map, which take full advantage of focus probabilities in two score maps. We implement the fully connected conditional random field (CRF) on the aggregative score map to accomplish and refine a binary decision map for the fusion task. Finally, we exploit the weighted strategy based on the refined decision map to produce the fused image. To demonstrate the performance of the proposed method, we compare its fused results with several start-of-the-art methods not only on a gray data set but also on a color data set. Experimental results show that the proposed method can achieve superior fusion performance in both human visual quality and objective assessment.


Energies ◽  
2021 ◽  
Vol 14 (2) ◽  
pp. 413
Author(s):  
Suryeom Jo ◽  
Changhyup Park ◽  
Dong-Woo Ryu ◽  
Seongin Ahn

This paper develops a reliable deep-learning framework to extract latent features from spatial properties and investigates adaptive surrogate estimation to sequester CO2 into heterogeneous deep saline aquifers. Our deep-learning architecture includes a deep convolutional autoencoder (DCAE) and a fully-convolutional network to not only reduce computational costs but also to extract dimensionality-reduced features to conserve spatial characteristics. The workflow integrates two different spatial properties within a single convolutional system, and it also achieves accurate reconstruction performance. This approach significantly reduces the number of parameters to 4.3% of the original number required, e.g., the number of three-dimensional spatial properties needed decreases from 44,460 to 1920. The successful dimensionality reduction is accomplished by the DCAE system regarding all inputs as image channels from the initial stage of learning using the fully-convolutional network instead of fully-connected layers. The DCAE reconstructs spatial parameters such as permeability and porosity while conserving their statistical values, i.e., their mean and standard deviation, achieving R-squared values of over 0.972 with a mean absolute percentage error of their mean values of less than 1.79%. The adaptive surrogate model using the latent features extracted by DCAE, well operations, and modeling parameters is able to accurately estimate CO2 sequestration performances. The model shows R-squared values of over 0.892 for testing data not used in training and validation. The DCAE-based surrogate estimation exploits the reliable integration of various spatial data within the fully-convolutional network and allows us to evaluate flow behavior occurring in a subsurface domain.


Sign in / Sign up

Export Citation Format

Share Document