scholarly journals Revisiting Image Aesthetic Assessment via Self-Supervised Feature Learning

2020 ◽  
Vol 34 (04) ◽  
pp. 5709-5716 ◽  
Author(s):  
Kekai Sheng ◽  
Weiming Dong ◽  
Menglei Chai ◽  
Guohui Wang ◽  
Peng Zhou ◽  
...  

Visual aesthetic assessment has been an active research field for decades. Although latest methods have achieved promising performance on benchmark datasets, they typically rely on a large number of manual annotations including both aesthetic labels and related image attributes. In this paper, we revisit the problem of image aesthetic assessment from the self-supervised feature learning perspective. Our motivation is that a suitable feature representation for image aesthetic assessment should be able to distinguish different expert-designed image manipulations, which have close relationships with negative aesthetic effects. To this end, we design two novel pretext tasks to identify the types and parameters of editing operations applied to synthetic instances. The features from our pretext tasks are then adapted for a one-layer linear classifier to evaluate the performance in terms of binary aesthetic classification. We conduct extensive quantitative experiments on three benchmark datasets and demonstrate that our approach can faithfully extract aesthetics-aware features and outperform alternative pretext schemes. Moreover, we achieve comparable results to state-of-the-art supervised methods that use 10 million labels from ImageNet.

Author(s):  
Xuanlu Xiang ◽  
Zhipeng Wang ◽  
Zhicheng Zhao ◽  
Fei Su

In this paper, aiming at two key problems of instance-level image retrieval, i.e., the distinctiveness of image representation and the generalization ability of the model, we propose a novel deep architecture - Multiple Saliency and Channel Sensitivity Network(MSCNet). Specifically, to obtain distinctive global descriptors, an attention-based multiple saliency learning is first presented to highlight important details of the image, and then a simple but effective channel sensitivity module based on Gram matrix is designed to boost the channel discrimination and suppress redundant information. Additionally, in contrast to most existing feature aggregation methods, employing pre-trained deep networks, MSCNet can be trained in two modes: the first one is an unsupervised manner with an instance loss, and another is a supervised manner, which combines classification and ranking loss and only relies on very limited training data. Experimental results on several public benchmark datasets, i.e., Oxford buildings, Paris buildings and Holidays, indicate that the proposed MSCNet outperforms the state-of-the-art unsupervised and supervised methods.


Author(s):  
Yan Bai ◽  
Yihang Lou ◽  
Yongxing Dai ◽  
Jun Liu ◽  
Ziqian Chen ◽  
...  

Vehicle Re-Identification (ReID) has attracted lots of research efforts due to its great significance to the public security. In vehicle ReID, we aim to learn features that are powerful in discriminating subtle differences between vehicles which are visually similar, and also robust against different orientations of the same vehicle. However, these two characteristics are hard to be encapsulated into a single feature representation simultaneously with unified supervision. Here we propose a Disentangled Feature Learning Network (DFLNet) to learn orientation specific and common features concurrently, which are discriminative at details and invariant to orientations, respectively. Moreover, to effectively use these two types of features for ReID, we further design a feature metric alignment scheme to ensure the consistency of the metric scales. The experiments show the effectiveness of our method that achieves state-of-the-art performance on three challenging datasets.


Sensors ◽  
2020 ◽  
Vol 20 (14) ◽  
pp. 4021 ◽  
Author(s):  
Mustansar Fiaz ◽  
Arif Mahmood ◽  
Soon Ki Jung

We propose to improve the visual object tracking by introducing a soft mask based low-level feature fusion technique. The proposed technique is further strengthened by integrating channel and spatial attention mechanisms. The proposed approach is integrated within a Siamese framework to demonstrate its effectiveness for visual object tracking. The proposed soft mask is used to give more importance to the target regions as compared to the other regions to enable effective target feature representation and to increase discriminative power. The low-level feature fusion improves the tracker robustness against distractors. The channel attention is used to identify more discriminative channels for better target representation. The spatial attention complements the soft mask based approach to better localize the target objects in challenging tracking scenarios. We evaluated our proposed approach over five publicly available benchmark datasets and performed extensive comparisons with 39 state-of-the-art tracking algorithms. The proposed tracker demonstrates excellent performance compared to the existing state-of-the-art trackers.


2020 ◽  
Vol 34 (07) ◽  
pp. 11173-11180 ◽  
Author(s):  
Xin Jin ◽  
Cuiling Lan ◽  
Wenjun Zeng ◽  
Guoqiang Wei ◽  
Zhibo Chen

Person re-identification (reID) aims to match person images to retrieve the ones with the same identity. This is a challenging task, as the images to be matched are generally semantically misaligned due to the diversity of human poses and capture viewpoints, incompleteness of the visible bodies (due to occlusion), etc. In this paper, we propose a framework that drives the reID network to learn semantics-aligned feature representation through delicate supervision designs. Specifically, we build a Semantics Aligning Network (SAN) which consists of a base network as encoder (SA-Enc) for re-ID, and a decoder (SA-Dec) for reconstructing/regressing the densely semantics aligned full texture image. We jointly train the SAN under the supervisions of person re-identification and aligned texture generation. Moreover, at the decoder, besides the reconstruction loss, we add Triplet ReID constraints over the feature maps as the perceptual losses. The decoder is discarded in the inference and thus our scheme is computationally efficient. Ablation studies demonstrate the effectiveness of our design. We achieve the state-of-the-art performances on the benchmark datasets CUHK03, Market1501, MSMT17, and the partial person reID dataset Partial REID.


Sensors ◽  
2019 ◽  
Vol 19 (19) ◽  
pp. 4129 ◽  
Author(s):  
Qing Lei ◽  
Ji-Xiang Du ◽  
Hong-Bo Zhang ◽  
Shuang Ye ◽  
Duan-Sheng Chen

The fields of human activity analysis have recently begun to diversify. Many researchers have taken much interest in developing action recognition or action prediction methods. The research on human action evaluation differs by aiming to design computation models and evaluation approaches for automatically assessing the quality of human actions. This line of study has become popular because of its explosively emerging real-world applications, such as physical rehabilitation, assistive living for elderly people, skill training on self-learning platforms, and sports activity scoring. This paper presents a comprehensive survey of approaches and techniques in action evaluation research, including motion detection and preprocessing using skeleton data, handcrafted feature representation methods, and deep learning-based feature representation methods. The benchmark datasets from this research field and some evaluation criteria employed to validate the algorithms’ performance are introduced. Finally, the authors present several promising future directions for further studies.


2020 ◽  
Vol 10 (12) ◽  
pp. 4386 ◽  
Author(s):  
Sandra Rizkallah ◽  
Amir F. Atiya ◽  
Samir Shaheen

Embedding words from a dictionary as vectors in a space has become an active research field, due to its many uses in several natural language processing applications. Distances between the vectors should reflect the relatedness between the corresponding words. The problem with existing word embedding methods is that they often fail to distinguish between synonymous, antonymous, and unrelated word pairs. Meanwhile, polarity detection is crucial for applications such as sentiment analysis. In this work we propose an embedding approach that is designed to capture the polarity issue. The approach is based on embedding the word vectors into a sphere, whereby the dot product between any vectors represents the similarity. Vectors corresponding to synonymous words would be close to each other on the sphere, while a word and its antonym would lie at opposite poles of the sphere. The approach used to design the vectors is a simple relaxation algorithm. The proposed word embedding is successful in distinguishing between synonyms, antonyms, and unrelated word pairs. It achieves results that are better than those of some of the state-of-the-art techniques and competes well with the others.


Mathematics ◽  
2021 ◽  
Vol 9 (17) ◽  
pp. 2129
Author(s):  
Zhiqiang Pan ◽  
Honghui Chen

Knowledge-enhanced recommendation (KER) aims to integrate the knowledge graph (KG) into collaborative filtering (CF) for alleviating the sparsity and cold start problems. The state-of-the-art graph neural network (GNN)–based methods mainly focus on exploiting the connectivity between entities in the knowledge graph, while neglecting the interaction relation between items reflected in the user-item interactions. Moreover, the widely adopted BPR loss for model optimization fails to provide sufficient supervisions for learning discriminative representation of users and items. To address these issues, we propose the collaborative knowledge-enhanced recommendation (CKER) method. Specifically, CKER proposes a collaborative graph convolution network (CGCN) to learn the user and item representations from the connection between items in the constructed interaction graph and the connectivity between entities in the knowledge graph. Moreover, we introduce the self-supervised learning to maximize the mutual information between the interaction- and knowledge-aware user preferences by deriving additional supervision signals. We conduct comprehensive experiments on two benchmark datasets, namely Amazon-Book and Last-FM, and the experimental results show that CKER can outperform the state-of-the-art baselines in terms of recall and NDCG on knowledge-enhanced recommendation.


2019 ◽  
Vol 11 (24) ◽  
pp. 2908 ◽  
Author(s):  
Yakoub Bazi ◽  
Mohamad M. Al Rahhal ◽  
Haikel Alhichri ◽  
Naif Alajlan

The current literature of remote sensing (RS) scene classification shows that state-of-the-art results are achieved using feature extraction methods, where convolutional neural networks (CNNs) (mostly VGG16 with 138.36 M parameters) are used as feature extractors and then simple to complex handcrafted modules are added for additional feature learning and classification, thus coming back to feature engineering. In this paper, we revisit the fine-tuning approach for deeper networks (GoogLeNet and Beyond) and show that it has not been well exploited due to the negative effect of the vanishing gradient problem encountered when transferring knowledge to small datasets. The aim of this work is two-fold. Firstly, we provide best practices for fine-tuning pre-trained CNNs using the root-mean-square propagation (RMSprop) method. Secondly, we propose a simple yet effective solution for tackling the vanishing gradient problem by injecting gradients at an earlier layer of the network using an auxiliary classification loss function. Then, we fine-tune the resulting regularized network by optimizing both the primary and auxiliary losses. As for pre-trained CNNs, we consider in this work inception-based networks and EfficientNets with small weights: GoogLeNet (7 M) and EfficientNet-B0 (5.3 M) and their deeper versions Inception-v3 (23.83 M) and EfficientNet-B3 (12 M), respectively. The former networks have been used previously in the context of RS and yielded low accuracies compared to VGG16, while the latter are new state-of-the-art models. Extensive experimental results on several benchmark datasets reveal clearly that if fine-tuning is done in an appropriate way, it can settle new state-of-the-art results with low computational cost.


2012 ◽  
Vol 2012 ◽  
pp. 1-11 ◽  
Author(s):  
Christian de Schryver ◽  
Daniel Schmidt ◽  
Norbert Wehn ◽  
Elke Korn ◽  
Henning Marxen ◽  
...  

Nonuniform random numbers are key for many technical applications, and designing efficient hardware implementations of non-uniform random number generators is a very active research field. However, most state-of-the-art architectures are either tailored to specific distributions or use up a lot of hardware resources. At ReConFig 2010, we have presented a new design that saves up to 48% of area compared to state-of-the-art inversion-based implementation, usable for arbitrary distributions and precision. In this paper, we introduce a more flexible version together with a refined segmentation scheme that allows to further reduce the approximation error significantly. We provide a free software tool allowing users to implement their own distributions easily, and we have tested our random number generator thoroughly by statistic analysis and two application tests.


2019 ◽  
Author(s):  
Hansheng Xue ◽  
Jiajie Peng ◽  
Xuequn Shang

AbstractMotivationThe emerging of abundant biological networks, which benefit from the development of advanced high-throughput techniques, contribute to describing and modeling complex internal interactions among biological entities such as genes and proteins. Multiple networks provide rich information for inferring the function of genes or proteins. To extract functional patterns of genes based on multiple heterogeneous networks, network embedding-based methods, aiming to capture non-linear and low-dimensional feature representation based on network biology, have recently achieved remarkable performance in gene function prediction. However, existing methods mainly do not consider the shared information among different networks during the feature learning process. Thus, we propose a novel multi-networks embedding-based function prediction method based on semi-supervised autoencoder and feature convolution neural network, named DeepMNE-CNN, which captures complex topological structures of multi-networks and takes the correlation among multi-networks into account.ResultsWe design a novel semi-supervised autoencoder method to integrate multiple networks and generate a low-dimensional feature representation. Then we utilize a convolutional neural network based on the integrated feature embedding to annotate unlabeled gene functions. We test our method on both yeast and human dataset and compare with four state-of-the-art methods. The results demonstrate the superior performance of our method over four state-of-the-art algorithms. From the future explorations, we find that semi-supervised autoencoder based multi-networks integration method and CNN-based feature learning methods both contribute to the task of function prediction.AvailabilityDeepMNE-CNN is freely available at https://github.com/xuehansheng/DeepMNE-CNN


Sign in / Sign up

Export Citation Format

Share Document