scholarly journals Multiple Saliency and Channel Sensitivity Network for Aggregated Convolutional Feature

Author(s):  
Xuanlu Xiang ◽  
Zhipeng Wang ◽  
Zhicheng Zhao ◽  
Fei Su

In this paper, aiming at two key problems of instance-level image retrieval, i.e., the distinctiveness of image representation and the generalization ability of the model, we propose a novel deep architecture - Multiple Saliency and Channel Sensitivity Network(MSCNet). Specifically, to obtain distinctive global descriptors, an attention-based multiple saliency learning is first presented to highlight important details of the image, and then a simple but effective channel sensitivity module based on Gram matrix is designed to boost the channel discrimination and suppress redundant information. Additionally, in contrast to most existing feature aggregation methods, employing pre-trained deep networks, MSCNet can be trained in two modes: the first one is an unsupervised manner with an instance loss, and another is a supervised manner, which combines classification and ranking loss and only relies on very limited training data. Experimental results on several public benchmark datasets, i.e., Oxford buildings, Paris buildings and Holidays, indicate that the proposed MSCNet outperforms the state-of-the-art unsupervised and supervised methods.

2021 ◽  
Vol 15 ◽  
pp. 174830262110449
Author(s):  
Kai-Jun Hu ◽  
He-Feng Yin ◽  
Jun Sun

During the past decade, representation based classification method has received considerable attention in the community of pattern recognition. The recently proposed non-negative representation based classifier achieved superb recognition results in diverse pattern classification tasks. Unfortunately, discriminative information of training data is not fully exploited in non-negative representation based classifier, which undermines its classification performance in practical applications. To address this problem, we introduce a decorrelation regularizer into the formulation of non-negative representation based classifier and propose a discriminative non-negative representation based classifier for pattern classification. The decorrelation regularizer is able to reduce the correlation of representation results of different classes, thus promoting the competition among them. Experimental results on benchmark datasets validate the efficacy of the proposed discriminative non-negative representation based classifier, and it can outperform some state-of-the-art deep learning based methods. The source code of our proposed discriminative non-negative representation based classifier is accessible at https://github.com/yinhefeng/DNRC .


Author(s):  
Minh Pham ◽  
Craig A. Knoblock ◽  
Muhao Chen ◽  
Binh Vu ◽  
Jay Pujara

Error detection is one of the most important steps in data cleaning and usually requires extensive human interaction to ensure quality. Existing supervised methods in error detection require a significant amount of training data while unsupervised methods rely on fixed inductive biases, which are usually hard to generalize, to solve the problem. In this paper, we present SPADE, a novel semi-supervised probabilistic approach for error detection. SPADE introduces a novel probabilistic active learning model, where the system suggests examples to be labeled based on the agreements between user labels and indicative signals, which are designed to capture potential errors. SPADE uses a two-phase data augmentation process to enrich a dataset before training a deep learning classifier to detect unlabeled errors. In our evaluation, SPADE achieves an average F1-score of 0.91 over five datasets and yields a 10% improvement compared with the state-of-the-art systems.


2020 ◽  
Vol 34 (04) ◽  
pp. 5709-5716 ◽  
Author(s):  
Kekai Sheng ◽  
Weiming Dong ◽  
Menglei Chai ◽  
Guohui Wang ◽  
Peng Zhou ◽  
...  

Visual aesthetic assessment has been an active research field for decades. Although latest methods have achieved promising performance on benchmark datasets, they typically rely on a large number of manual annotations including both aesthetic labels and related image attributes. In this paper, we revisit the problem of image aesthetic assessment from the self-supervised feature learning perspective. Our motivation is that a suitable feature representation for image aesthetic assessment should be able to distinguish different expert-designed image manipulations, which have close relationships with negative aesthetic effects. To this end, we design two novel pretext tasks to identify the types and parameters of editing operations applied to synthetic instances. The features from our pretext tasks are then adapted for a one-layer linear classifier to evaluate the performance in terms of binary aesthetic classification. We conduct extensive quantitative experiments on three benchmark datasets and demonstrate that our approach can faithfully extract aesthetics-aware features and outperform alternative pretext schemes. Moreover, we achieve comparable results to state-of-the-art supervised methods that use 10 million labels from ImageNet.


Author(s):  
Shaobo Min ◽  
Xuejin Chen ◽  
Zheng-Jun Zha ◽  
Feng Wu ◽  
Yongdong Zhang

Learning-based methods suffer from a deficiency of clean annotations, especially in biomedical segmentation. Although many semi-supervised methods have been proposed to provide extra training data, automatically generated labels are usually too noisy to retrain models effectively. In this paper, we propose a Two-Stream Mutual Attention Network (TSMAN) that weakens the influence of back-propagated gradients caused by incorrect labels, thereby rendering the network robust to unclean data. The proposed TSMAN consists of two sub-networks that are connected by three types of attention models in different layers. The target of each attention model is to indicate potentially incorrect gradients in a certain layer for both sub-networks by analyzing their inferred features using the same input. In order to achieve this purpose, the attention models are designed based on the propagation analysis of noisy gradients at different layers. This allows the attention models to effectively discover incorrect labels and weaken their influence during parameter updating process. By exchanging multi-level features within two-stream architecture, the effects of noisy labels in each sub-network are reduced by decreasing the noisy gradients. Furthermore, a hierarchical distillation is developed to provide reliable pseudo labels for unlabelded data, which further boosts the performance of TSMAN. The experiments using both HVSMR 2016 and BRATS 2015 benchmarks demonstrate that our semi-supervised learning framework surpasses the state-of-the-art fully-supervised results.


Author(s):  
Y. Cao ◽  
M. Scaioni

Abstract. In recent research, fully supervised Deep Learning (DL) techniques and large amounts of pointwise labels are employed to train a segmentation network to be applied to buildings’ point clouds. However, fine-labelled buildings’ point clouds are hard to find and manually annotating pointwise labels is time-consuming and expensive. Consequently, the application of fully supervised DL for semantic segmentation of buildings’ point clouds at LoD3 level is severely limited. To address this issue, we propose a novel label-efficient DL network that obtains per-point semantic labels of LoD3 buildings’ point clouds with limited supervision. In general, it consists of two steps. The first step (Autoencoder – AE) is composed of a Dynamic Graph Convolutional Neural Network-based encoder and a folding-based decoder, designed to extract discriminative global and local features from input point clouds by reconstructing them without any label. The second step is semantic segmentation. By supplying a small amount of task-specific supervision, a segmentation network is proposed for semantically segmenting the encoded features acquired from the pre-trained AE. Experimentally, we evaluate our approach based on the ArCH dataset. Compared to the fully supervised DL methods, we find that our model achieved state-of-the-art results on the unseen scenes, with only 10% of labelled training data from fully supervised methods as input.


Author(s):  
Zilu Guo ◽  
Zhongqiang Huang ◽  
Kenny Q. Zhu ◽  
Guandan Chen ◽  
Kaibo Zhang ◽  
...  

Paraphrase generation plays key roles in NLP tasks such as question answering, machine translation, and information retrieval. In this paper, we propose a novel framework for paraphrase generation. It simultaneously decodes the output sentence using a pretrained wordset-to-sequence model and a round-trip translation model. We evaluate this framework on Quora, WikiAnswers, MSCOCO and Twitter, and show its advantage over previous state-of-the-art unsupervised methods and distantly-supervised methods by significant margins on all datasets. For Quora and WikiAnswers, our framework even performs better than some strongly supervised methods with domain adaptation. Further, we show that the generated paraphrases can be used to augment the training data for machine translation to achieve substantial improvements.


Author(s):  
Kuang-Jui Hsu ◽  
Yen-Yu Lin ◽  
Yung-Yu Chuang

Object co-segmentation aims to segment the common objects in images. This paper presents a CNN-based method that is unsupervised and end-to-end trainable to better solve this task. Our method is unsupervised in the sense that it does not require any training data in the form of object masks but merely a set of images jointly covering objects of a specific class. Our method comprises two collaborative CNN modules, a feature extractor and a co-attention map generator. The former module extracts the features of the estimated objects and backgrounds, and is derived based on the proposed co-attention loss which minimizes inter-image object discrepancy while maximizing intra-image figure-ground separation. The latter module is learned to generated co-attention maps by which the estimated figure-ground segmentation can better fit the former module. Besides, the co-attention loss, the mask loss is developed to retain the whole objects and remove noises. Experiments show that our method achieves superior results, even outperforming the state-of-the-art, supervised methods.


Sensors ◽  
2020 ◽  
Vol 20 (13) ◽  
pp. 3780 ◽  
Author(s):  
Mustansar Fiaz ◽  
Arif Mahmood ◽  
Ki Yeol Baek ◽  
Sehar Shahzad Farooq ◽  
Soon Ki Jung

CNN-based trackers, especially those based on Siamese networks, have recently attracted considerable attention because of their relatively good performance and low computational cost. For many Siamese trackers, learning a generic object model from a large-scale dataset is still a challenging task. In the current study, we introduce input noise as regularization in the training data to improve generalization of the learned model. We propose an Input-Regularized Channel Attentional Siamese (IRCA-Siam) tracker which exhibits improved generalization compared to the current state-of-the-art trackers. In particular, we exploit offline learning by introducing additive noise for input data augmentation to mitigate the overfitting problem. We propose feature fusion from noisy and clean input channels which improves the target localization. Channel attention integrated with our framework helps finding more useful target features resulting in further performance improvement. Our proposed IRCA-Siam enhances the discrimination of the tracker/background and improves fault tolerance and generalization. An extensive experimental evaluation on six benchmark datasets including OTB2013, OTB2015, TC128, UAV123, VOT2016 and VOT2017 demonstrate superior performance of the proposed IRCA-Siam tracker compared to the 30 existing state-of-the-art trackers.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Malihe Javidi ◽  
Saeid Abbaasi ◽  
Sara Naybandi Atashi ◽  
Mahdi Jampour

AbstractWith the presence of novel coronavirus disease at the end of 2019, several approaches were proposed to help physicians detect the disease, such as using deep learning to recognize lung involvement based on the pattern of pneumonia. These approaches rely on analyzing the CT images and exploring the COVID-19 pathologies in the lung. Most of the successful methods are based on the deep learning technique, which is state-of-the-art. Nevertheless, the big drawback of the deep approaches is their need for many samples, which is not always possible. This work proposes a combined deep architecture that benefits both employed architectures of DenseNet and CapsNet. To more generalize the deep model, we propose a regularization term with much fewer parameters. The network convergence significantly improved, especially when the number of training data is small. We also propose a novel Cost-sensitive loss function for imbalanced data that makes our model feasible for the condition with a limited number of positive data. Our novelties make our approach more intelligent and potent in real-world situations with imbalanced data, popular in hospitals. We analyzed our approach on two publicly available datasets, HUST and COVID-CT, with different protocols. In the first protocol of HUST, we followed the original paper setup and outperformed it. With the second protocol of HUST, we show our approach superiority concerning imbalanced data. Finally, with three different validations of the COVID-CT, we provide evaluations in the presence of a low number of data along with a comparison with state-of-the-art.


2018 ◽  
Author(s):  
Roman Zubatyuk ◽  
Justin S. Smith ◽  
Jerzy Leszczynski ◽  
Olexandr Isayev

<p>Atomic and molecular properties could be evaluated from the fundamental Schrodinger’s equation and therefore represent different modalities of the same quantum phenomena. Here we present AIMNet, a modular and chemically inspired deep neural network potential. We used AIMNet with multitarget training to learn multiple modalities of the state of the atom in a molecular system. The resulting model shows on several benchmark datasets the state-of-the-art accuracy, comparable to the results of orders of magnitude more expensive DFT methods. It can simultaneously predict several atomic and molecular properties without an increase in computational cost. With AIMNet we show a new dimension of transferability: the ability to learn new targets utilizing multimodal information from previous training. The model can learn implicit solvation energy (like SMD) utilizing only a fraction of original training data, and archive MAD error of 1.1 kcal/mol compared to experimental solvation free energies in MNSol database.</p>


Sign in / Sign up

Export Citation Format

Share Document