Discrete Binary Coding based Label Distribution Learning

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/518 ◽

2019 ◽

Author(s):

Ke Wang ◽

Xin Geng

Keyword(s):

Large Scale ◽

Head Pose Estimation ◽

Superior Performance ◽

Binary Codes ◽

Training Time ◽

Binary Coding ◽

Special Cases ◽

Label Distribution Learning ◽

Real World Datasets ◽

Label Distribution

Label Distribution Learning (LDL) is a general learning paradigm in machine learning, which includes both single-label learning (SLL) and multi-label learning (MLL) as its special cases. Recently, many LDL algorithms have been proposed to handle different application tasks such as facial age estimation, head pose estimation and visual sentiment distributions prediction. However, the training time complexity of most existing LDL algorithms is too high, which makes them unapplicable to large-scale LDL. In this paper, we propose a novel LDL method to address this issue, termed Discrete Binary Coding based Label Distribution Learning (DBC-LDL). Specifically, we design an efficiently discrete coding framework to learn binary codes for instances. Furthermore, both the pair-wise semantic similarities and the original label distributions are integrated into this framework to learn highly discriminative binary codes. In addition, a fast approximate nearest neighbor (ANN) search strategy is utilized to predict label distributions for testing instances. Experimental results on five real-world datasets demonstrate its superior performance over several state-of-the-art LDL methods with the lower time cost.

Download Full-text

Binary Coding based Label Distribution Learning

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/386 ◽

2018 ◽

Author(s):

Ke Wang ◽

Xin Geng

Keyword(s):

Large Scale ◽

State Of The Art ◽

Binary Codes ◽

Binary Coding ◽

Benchmark Datasets ◽

Hamming Space ◽

The Mean ◽

Label Distribution Learning ◽

Distribution Generation ◽

Label Distribution

Label Distribution Learning (LDL) is a novel learning paradigm in machine learning, which assumes that an instance is labeled by a distribution over all labels, rather than labeled by a logic label or some logic labels. Thus, LDL can model the description degree of all possible labels to an instance. Although many LDL methods have been put forward to deal with different application tasks, most existing methods suffer from the scalability issue. In this paper, a scalable LDL framework named Binary Coding based Label Distribution Learning (BC-LDL) is proposed for large-scale LDL. The proposed framework includes two parts, i.e., binary coding and label distribution generation. In the binary coding part, the learning objective is to generate the optimal binary codes for the instances. We integrate the label distribution information of the instances into a binary coding procedure, leading to high-quality binary codes. In the label distribution generation part, given an instance, the k nearest training instances in the Hamming space are searched and the mean of the label distributions of all the neighboring instances is calculated as the predicted label distribution. Experiments on five benchmark datasets validate the superiority of BC-LDL over several state-of-the-art LDL methods.

Download Full-text

The information complexity of learning tasks, their structure and their distance

Information and Inference A Journal of the IMA ◽

10.1093/imaiai/iaaa033 ◽

2021 ◽

Author(s):

Alessandro Achille ◽

Giovanni Paolini ◽

Glen Mbeng ◽

Stefano Soatto

Keyword(s):

Kolmogorov Complexity ◽

Large Scale ◽

Parametric Model ◽

Training Dataset ◽

Optimization Scheme ◽

Learning Tasks ◽

Asymmetric Distance ◽

Special Cases ◽

Scale Models ◽

Real World Datasets

Abstract We introduce an asymmetric distance in the space of learning tasks and a framework to compute their complexity. These concepts are foundational for the practice of transfer learning, whereby a parametric model is pre-trained for a task, and then fine tuned for another. The framework we develop is non-asymptotic, captures the finite nature of the training dataset and allows distinguishing learning from memorization. It encompasses, as special cases, classical notions from Kolmogorov complexity and Shannon and Fisher information. However, unlike some of those frameworks, it can be applied to large-scale models and real-world datasets. Our framework is the first to measure complexity in a way that accounts for the effect of the optimization scheme, which is critical in deep learning.

Download Full-text

Head pose estimation using improved label distribution learning with fewer annotations

Multimedia Tools and Applications ◽

10.1007/s11042-019-7284-2 ◽

2019 ◽

Vol 78 (14) ◽

pp. 19141-19162

Author(s):

Luhui Xu ◽

Jingying Chen ◽

Yanling Gan

Keyword(s):

Pose Estimation ◽

Head Pose Estimation ◽

Head Pose ◽

Label Distribution Learning ◽

Label Distribution

Download Full-text

Label Enhancement for Label Distribution Learning

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/406 ◽

2018 ◽

Cited By ~ 1

Author(s):

Ning Xu ◽

An Tao ◽

Xin Geng

Keyword(s):

Learning Process ◽

Real World ◽

Feature Space ◽

Graph Laplacian ◽

Training Set ◽

Topological Information ◽

Label Distribution Learning ◽

Real World Datasets ◽

Label Distribution ◽

Training Sets

Label distribution is more general than both single-label annotation and multi-label annotation. It covers a certain number of labels, representing the degree to which each label describes the instance. The learning process on the instances labeled by label distributions is called label distribution learning (LDL). Unfortunately, many training sets only contain simple logical labels rather than label distributions due to the difficulty of obtaining the label distributions directly. To solve the problem, one way is to recover the label distributions from the logical labels in the training set via leveraging the topological information of the feature space and the correlation among the labels. Such process of recovering label distributions from logical labels is defined as label enhancement (LE), which reinforces the supervision information in the training sets. This paper proposes a novel LE algorithm called Graph Laplacian Label Enhancement (GLLE). Experimental results on one artificial dataset and fourteen real-world datasets show clear advantages of GLLE over several existing LE algorithms.

Download Full-text

Cox-nnet v2.0: improved neural-network-based survival prediction extended to large-scale EMR data

Bioinformatics ◽

10.1093/bioinformatics/btab046 ◽

2021 ◽

Author(s):

Di Wang ◽

Zheng Jing ◽

Kevin He ◽

Lana X Garmire

Keyword(s):

Neural Network ◽

Large Scale ◽

High Efficiency ◽

Prediction Method ◽

Population Data ◽

Superior Performance ◽

Supplementary Information ◽

Survival Prediction ◽

Training Time ◽

Scale Population

Abstract Summary Cox-nnet is a neural-network-based prognosis prediction method, originally applied to genomics data. Here, we propose the version 2 of Cox-nnet, with significant improvement on efficiency and interpretability, making it suitable to predict prognosis based on large-scale population data, including those electronic medical records (EMR) datasets. We also add permutation-based feature importance scores and the direction of feature coefficients. When applied on a kidney transplantation dataset, Cox-nnet v2.0 reduces the training time of Cox-nnet up to 32-folds (n =10 000) and achieves better prediction accuracy than Cox-PH (P<0.05). It also achieves similarly superior performance on a publicly available SUPPORT data (n=8000). The high efficiency and accuracy make Cox-nnet v2.0 a desirable method for survival prediction in large-scale EMR data. Availability and implementation Cox-nnet v2.0 is freely available to the public at https://github.com/lanagarmire/Cox-nnet-v2.0. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Top Position Sensitive Ordinal Relation Preserving Bitwise Weight for Image Retrieval

Algorithms ◽

10.3390/a13010018 ◽

2020 ◽

Vol 13 (1) ◽

pp. 18

Author(s):

Zhen Wang ◽

Fuzhen Sun ◽

Longbo Zhang ◽

Lei Wang ◽

Pingping Liu

Keyword(s):

Large Scale ◽

Hamming Distance ◽

Superior Performance ◽

Binary Codes ◽

Training Samples ◽

Benchmark Datasets ◽

Data Points ◽

Position Sensitive ◽

Core Idea ◽

Optimization Mechanism

In recent years, binary coding methods have become increasingly popular for tasks of searching approximate nearest neighbors (ANNs). High-dimensional data can be quantized into binary codes to give an efficient similarity approximation via a Hamming distance. However, most of existing schemes consider the importance of each binary bit as the same and treat training samples at different positions equally, which causes many data pairs to share the same Hamming distance and a larger retrieval loss at the top position. To handle these problems, we propose a novel method dubbed by the top-position-sensitive ordinal-relation-preserving bitwise weight (TORBW) method. The core idea is to penalize data points without preserving an ordinal relation at the top position of a ranking list more than those at the bottom and assign different weight values to their binary bits according to the distribution of query data. Specifically, we design an iterative optimization mechanism to simultaneously learn binary codes and bitwise weights, which makes their learning processes related to each other. When the iterative procedure converges, the binary codes and bitwise weights are effectively adapted to each other. To reduce the training complexity, we relax the discrete constraints of both the binary codes and the indicator function. Furthermore, we pretrain a tensor ordinal graph to decrease the time consumption of computing a relative similarity relationship among data points. Experimental results on three large-scale ANN search benchmark datasets, i.e., SIFT1M, GIST1M, and Cifar10, show that the proposed TORBW method can achieve superior performance over state-of-the-art approaches.

Download Full-text

A Spatiotemporal Dilated Convolutional Generative Network for Point-Of-Interest Recommendation

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi9020113 ◽

2020 ◽

Vol 9 (2) ◽

pp. 113 ◽

Cited By ~ 4

Author(s):

Chunyang Liu ◽

Jiping Liu ◽

Shenghua Xu ◽

Jian Wang ◽

Chao Liu ◽

...

Keyword(s):

Large Scale ◽

Periodic Patterns ◽

Training Time ◽

Point Of Interest ◽

Poi Recommendation ◽

Range Check ◽

Geographical Distances ◽

Real World Datasets ◽

Residual Block ◽

Media Applications

With the growing popularity of location-based social media applications, point-of-interest (POI) recommendation has become important in recent years. Several techniques, especially the collaborative filtering (CF), Markov chain (MC), and recurrent neural network (RNN) based methods, have been recently proposed for the POI recommendation service. However, CF-based methods and MC-based methods are ineffective to represent complicated interaction relations in the historical check-in sequences. Although recurrent neural networks (RNNs) and its variants have been successfully employed in POI recommendation, they depend on a hidden state of the entire past that cannot fully utilize parallel computation within a check-in sequence. To address these above limitations, we propose a spatiotemporal dilated convolutional generative network (ST-DCGN) for POI recommendation in this study. Firstly, inspired by the Google DeepMind’ WaveNet model, we introduce a simple but very effective dilated convolutional generative network as a solution to POI recommendation, which can efficiently model the user’s complicated short- and long-range check-in sequence by using a stack of dilated causal convolution layers and residual block structure. Then, we propose to acquire user’s spatial preference by modeling continuous geographical distances, and to capture user’s temporal preference by considering two types of time periodic patterns (i.e., hours in a day and days in a week). Moreover, we conducted an extensive performance evaluation using two large-scale real-world datasets, namely Foursquare and Instagram. Experimental results show that the proposed ST-DCGN model is well-suited for POI recommendation problems and can effectively learn dependencies in and between the check-in sequences. The proposed model attains state-of-the-art accuracy with less training time in the POI recommendation task.

Download Full-text

Five-Class Classification of Cervical Pap Smear Images: A Study of CNN-Error-Correcting SVM Models

Healthcare Informatics Research ◽

10.4258/hir.2021.27.4.298 ◽

2021 ◽

Vol 27 (4) ◽

pp. 298-306

Author(s):

Audrey K. C. Huong ◽

Kim Gaik Tay ◽

Xavier T. I. Ngu

Keyword(s):

Image Classification ◽

Pap Smear ◽

False Negative ◽

Area Under The Curve ◽

Superior Performance ◽

Support Vector ◽

Training Time ◽

Binary Coding ◽

Svm Model

Objectives: Different complex strategies of fusing handcrafted descriptors and features from convolutional neural network (CNN) models have been studied, mainly for two-class Papanicolaou (Pap) smear image classification. This paper explores a simplified system using combined binary coding for a five-class version of this problem.Methods: This system extracted features from transfer learning of AlexNet, VGG19, and ResNet50 networks before reducing this problem into multiple binary sub-problems using error-correcting coding. The learners were trained using the support vector machine (SVM) method. The outputs of these classifiers were combined and compared to the true class codes for the final prediction.Results: Despite the superior performance of VGG19-SVM, with mean ± standard deviation accuracy and sensitivity of 80.68% ± 2.00% and 80.86% ± 0.45%, respectively, this model required a long training time. There were also false-negative cases using both the VGGNet-SVM and ResNet-SVM models. AlexNet-SVM was more efficient in terms of running speed and prediction consistency. Our findings also showed good diagnostic ability, with an area under the curve of approximately 0.95. Further investigation also showed good agreement between our research outcomes and that of the state-of-the-art methods, with specificity ranging from 93% to 100%.Conclusions: We believe that the AlexNet-SVM model can be conveniently applied for clinical use. Further research could include the implementation of an optimization algorithm for hyperparameter tuning, as well as an appropriate selection of experimental design to improve the efficiency of Pap smear image classification.

Download Full-text

NGDNet: Nonuniform Gaussian-label distribution learning for infrared head pose estimation

Neurocomputing ◽

10.1016/j.neucom.2020.12.090 ◽

2021 ◽

Author(s):

Tingting Liu ◽

Jixin Wang ◽

Bing Yang ◽

Xuan Wang

Keyword(s):

Pose Estimation ◽

Head Pose Estimation ◽

Head Pose ◽

Label Distribution Learning ◽

Label Distribution

Download Full-text

Shape My Face: Registering 3D Face Scans by Surface-to-Surface Translation

International Journal of Computer Vision ◽

10.1007/s11263-021-01494-4 ◽

2021 ◽

Author(s):

Mehdi Bahri ◽

Eimear O’ Sullivan ◽

Shunwang Gong ◽

Feng Liu ◽

Xiaoming Liu ◽

...

Keyword(s):

Large Scale ◽

Training Time ◽

3D Face ◽

In The Wild ◽

Previous State ◽

Surface Translation ◽

Visual Attention Mechanism ◽

Diverse Data ◽

Human Faces ◽

Robust To Noise

AbstractStandard registration algorithms need to be independently applied to each surface to register, following careful pre-processing and hand-tuning. Recently, learning-based approaches have emerged that reduce the registration of new scans to running inference with a previously-trained model. The potential benefits are multifold: inference is typically orders of magnitude faster than solving a new instance of a difficult optimization problem, deep learning models can be made robust to noise and corruption, and the trained model may be re-used for other tasks, e.g. through transfer learning. In this paper, we cast the registration task as a surface-to-surface translation problem, and design a model to reliably capture the latent geometric information directly from raw 3D face scans. We introduce Shape-My-Face (SMF), a powerful encoder-decoder architecture based on an improved point cloud encoder, a novel visual attention mechanism, graph convolutional decoders with skip connections, and a specialized mouth model that we smoothly integrate with the mesh convolutions. Compared to the previous state-of-the-art learning algorithms for non-rigid registration of face scans, SMF only requires the raw data to be rigidly aligned (with scaling) with a pre-defined face template. Additionally, our model provides topologically-sound meshes with minimal supervision, offers faster training time, has orders of magnitude fewer trainable parameters, is more robust to noise, and can generalize to previously unseen datasets. We extensively evaluate the quality of our registrations on diverse data. We demonstrate the robustness and generalizability of our model with in-the-wild face scans across different modalities, sensor types, and resolutions. Finally, we show that, by learning to register scans, SMF produces a hybrid linear and non-linear morphable model. Manipulation of the latent space of SMF allows for shape generation, and morphing applications such as expression transfer in-the-wild. We train SMF on a dataset of human faces comprising 9 large-scale databases on commodity hardware.

Download Full-text