scholarly journals Label Embedding Based on Multi-Scale Locality Preservation

Author(s):  
Cheng-Lun Peng ◽  
An Tao ◽  
Xin Geng

Label Distribution Learning (LDL) fits the situations well that focus on the overall distribution of the whole series of labels. The numerical labels of LDL satisfy the integrity probability constraint. Due to LDL's special label domain, existing label embedding algorithms that focus on embedding of binary labels are thus unfit for LDL. This paper proposes a specially designed approach MSLP that achieves label embedding for LDL by Multi-Scale Locality Preserving (MSLP). Specifically, MSLP takes the locality information of data in both the label space and the feature space into account with different locality granularity. By assuming an explicit mapping from the features to the embedded labels, MSLP does not need an additional learning process after completing embedding. Besides, MSLP is insensitive to the existing of data points violating the smoothness assumption, which is usually caused by noises. Experimental results demonstrate the effectiveness of MSLP in preserving the locality structure of label distributions in the embedding space and show its superiority over the state-of-the-art baseline methods.

Author(s):  
Ning Xu ◽  
An Tao ◽  
Xin Geng

Label distribution is more general than both single-label annotation and multi-label annotation. It covers a certain number of labels, representing the degree to which each label describes the instance. The learning process on the instances labeled by label distributions is called label distribution learning (LDL). Unfortunately, many training sets only contain simple logical labels rather than label distributions due to the difficulty of obtaining the label distributions directly.  To solve the problem, one way is to recover the label distributions from the logical labels in the training set via leveraging the topological information of the feature space and the correlation among the labels. Such process of recovering label distributions from logical labels is defined as label enhancement (LE), which reinforces the supervision information in the training sets. This paper proposes a novel LE algorithm called Graph Laplacian Label Enhancement (GLLE). Experimental results on one artificial dataset and fourteen real-world datasets show clear advantages of GLLE over several existing LE algorithms.


Author(s):  
Zizhao Zhang ◽  
Haojie Lin ◽  
Yue Gao

In recent years, hypergraph modeling has shown its superiority on correlation formulation among samples and has wide applications in classification, retrieval, and other tasks. In all these works, the performance of hypergraph learning highly depends on the generated hypergraph structure. A good hypergraph structure can represent the data correlation better, and vice versa. Although hypergraph learning has attracted much attention recently, most of existing works still rely on a static hypergraph structure, and little effort concentrates on optimizing the hypergraph structure during the learning process. To tackle this problem, we propose a dynamic hypergraph structure learning method in this paper. In this method, given the originally generated hypergraph structure, the objective of our work is to simultaneously optimize the label projection matrix (the common task in hypergraph learning) and the hypergraph structure itself. More specifically, in this formulation, the label projection matrix is related to the hypergraph structure, and the hypergraph structure is associated with the data correlation from both the label space and the feature space. Here, we alternatively learn the optimal label projection matrix and the hypergraph structure, leading to a dynamic hypergraph structure during the learning process. We have applied the proposed method in the tasks of 3D shape recognition and gesture recognition. Experimental results on 4 public datasets show better performance compared with the state-of-the-art methods. We note that the proposed method can be further applied in other tasks.


Electronics ◽  
2021 ◽  
Vol 10 (5) ◽  
pp. 567
Author(s):  
Donghun Yang ◽  
Kien Mai Mai Ngoc ◽  
Iksoo Shin ◽  
Kyong-Ha Lee ◽  
Myunggwon Hwang

To design an efficient deep learning model that can be used in the real-world, it is important to detect out-of-distribution (OOD) data well. Various studies have been conducted to solve the OOD problem. The current state-of-the-art approach uses a confidence score based on the Mahalanobis distance in a feature space. Although it outperformed the previous approaches, the results were sensitive to the quality of the trained model and the dataset complexity. Herein, we propose a novel OOD detection method that can train more efficient feature space for OOD detection. The proposed method uses an ensemble of the features trained using the softmax-based classifier and the network based on distance metric learning (DML). Through the complementary interaction of these two networks, the trained feature space has a more clumped distribution and can fit well on the Gaussian distribution by class. Therefore, OOD data can be efficiently detected by setting a threshold in the trained feature space. To evaluate the proposed method, we applied our method to various combinations of image datasets. The results show that the overall performance of the proposed approach is superior to those of other methods, including the state-of-the-art approach, on any combination of datasets.


2021 ◽  
Vol 13 (7) ◽  
pp. 1243
Author(s):  
Wenxin Yin ◽  
Wenhui Diao ◽  
Peijin Wang ◽  
Xin Gao ◽  
Ya Li ◽  
...  

The detection of Thermal Power Plants (TPPs) is a meaningful task for remote sensing image interpretation. It is a challenging task, because as facility objects TPPs are composed of various distinctive and irregular components. In this paper, we propose a novel end-to-end detection framework for TPPs based on deep convolutional neural networks. Specifically, based on the RetinaNet one-stage detector, a context attention multi-scale feature extraction network is proposed to fuse global spatial attention to strengthen the ability in representing irregular objects. In addition, we design a part-based attention module to adapt to TPPs containing distinctive components. Experiments show that the proposed method outperforms the state-of-the-art methods and can achieve 68.15% mean average precision.


Author(s):  
Rohit Mohan ◽  
Abhinav Valada

AbstractUnderstanding the scene in which an autonomous robot operates is critical for its competent functioning. Such scene comprehension necessitates recognizing instances of traffic participants along with general scene semantics which can be effectively addressed by the panoptic segmentation task. In this paper, we introduce the Efficient Panoptic Segmentation (EfficientPS) architecture that consists of a shared backbone which efficiently encodes and fuses semantically rich multi-scale features. We incorporate a new semantic head that aggregates fine and contextual features coherently and a new variant of Mask R-CNN as the instance head. We also propose a novel panoptic fusion module that congruously integrates the output logits from both the heads of our EfficientPS architecture to yield the final panoptic segmentation output. Additionally, we introduce the KITTI panoptic segmentation dataset that contains panoptic annotations for the popularly challenging KITTI benchmark. Extensive evaluations on Cityscapes, KITTI, Mapillary Vistas and Indian Driving Dataset demonstrate that our proposed architecture consistently sets the new state-of-the-art on all these four benchmarks while being the most efficient and fast panoptic segmentation architecture to date.


Algorithms ◽  
2021 ◽  
Vol 14 (2) ◽  
pp. 39
Author(s):  
Carlos Lassance ◽  
Vincent Gripon ◽  
Antonio Ortega

Deep Learning (DL) has attracted a lot of attention for its ability to reach state-of-the-art performance in many machine learning tasks. The core principle of DL methods consists of training composite architectures in an end-to-end fashion, where inputs are associated with outputs trained to optimize an objective function. Because of their compositional nature, DL architectures naturally exhibit several intermediate representations of the inputs, which belong to so-called latent spaces. When treated individually, these intermediate representations are most of the time unconstrained during the learning process, as it is unclear which properties should be favored. However, when processing a batch of inputs concurrently, the corresponding set of intermediate representations exhibit relations (what we call a geometry) on which desired properties can be sought. In this work, we show that it is possible to introduce constraints on these latent geometries to address various problems. In more detail, we propose to represent geometries by constructing similarity graphs from the intermediate representations obtained when processing a batch of inputs. By constraining these Latent Geometry Graphs (LGGs), we address the three following problems: (i) reproducing the behavior of a teacher architecture is achieved by mimicking its geometry, (ii) designing efficient embeddings for classification is achieved by targeting specific geometries, and (iii) robustness to deviations on inputs is achieved via enforcing smooth variation of geometry between consecutive latent spaces. Using standard vision benchmarks, we demonstrate the ability of the proposed geometry-based methods in solving the considered problems.


2020 ◽  
Vol 34 (07) ◽  
pp. 11693-11700 ◽  
Author(s):  
Ao Luo ◽  
Fan Yang ◽  
Xin Li ◽  
Dong Nie ◽  
Zhicheng Jiao ◽  
...  

Crowd counting is an important yet challenging task due to the large scale and density variation. Recent investigations have shown that distilling rich relations among multi-scale features and exploiting useful information from the auxiliary task, i.e., localization, are vital for this task. Nevertheless, how to comprehensively leverage these relations within a unified network architecture is still a challenging problem. In this paper, we present a novel network structure called Hybrid Graph Neural Network (HyGnn) which targets to relieve the problem by interweaving the multi-scale features for crowd density as well as its auxiliary task (localization) together and performing joint reasoning over a graph. Specifically, HyGnn integrates a hybrid graph to jointly represent the task-specific feature maps of different scales as nodes, and two types of relations as edges: (i) multi-scale relations capturing the feature dependencies across scales and (ii) mutual beneficial relations building bridges for the cooperation between counting and localization. Thus, through message passing, HyGnn can capture and distill richer relations between nodes to obtain more powerful representations, providing robust and accurate results. Our HyGnn performs significantly well on four challenging datasets: ShanghaiTech Part A, ShanghaiTech Part B, UCF_CC_50 and UCF_QNRF, outperforming the state-of-the-art algorithms by a large margin.


2015 ◽  
Vol 35-36 ◽  
pp. 206-214 ◽  
Author(s):  
Shengfa Wang ◽  
Nannan Li ◽  
Shuai Li ◽  
Zhongxuan Luo ◽  
Zhixun Su ◽  
...  

2018 ◽  
Vol 57 (4S) ◽  
pp. 04FF04
Author(s):  
Aiwen Luo ◽  
Fengwei An ◽  
Xiangyu Zhang ◽  
Lei Chen ◽  
Zunkai Huang ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document