Binary Coding based Label Distribution Learning

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/386 ◽

2018 ◽

Author(s):

Ke Wang ◽

Xin Geng

Keyword(s):

Large Scale ◽

State Of The Art ◽

Binary Codes ◽

Binary Coding ◽

Benchmark Datasets ◽

Hamming Space ◽

The Mean ◽

Label Distribution Learning ◽

Distribution Generation ◽

Label Distribution

Label Distribution Learning (LDL) is a novel learning paradigm in machine learning, which assumes that an instance is labeled by a distribution over all labels, rather than labeled by a logic label or some logic labels. Thus, LDL can model the description degree of all possible labels to an instance. Although many LDL methods have been put forward to deal with different application tasks, most existing methods suffer from the scalability issue. In this paper, a scalable LDL framework named Binary Coding based Label Distribution Learning (BC-LDL) is proposed for large-scale LDL. The proposed framework includes two parts, i.e., binary coding and label distribution generation. In the binary coding part, the learning objective is to generate the optimal binary codes for the instances. We integrate the label distribution information of the instances into a binary coding procedure, leading to high-quality binary codes. In the label distribution generation part, given an instance, the k nearest training instances in the Hamming space are searched and the mean of the label distributions of all the neighboring instances is calculated as the predicted label distribution. Experiments on five benchmark datasets validate the superiority of BC-LDL over several state-of-the-art LDL methods.

Download Full-text

Discrete Binary Coding based Label Distribution Learning

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/518 ◽

2019 ◽

Author(s):

Ke Wang ◽

Xin Geng

Keyword(s):

Large Scale ◽

Head Pose Estimation ◽

Superior Performance ◽

Binary Codes ◽

Training Time ◽

Binary Coding ◽

Special Cases ◽

Label Distribution Learning ◽

Real World Datasets ◽

Label Distribution

Label Distribution Learning (LDL) is a general learning paradigm in machine learning, which includes both single-label learning (SLL) and multi-label learning (MLL) as its special cases. Recently, many LDL algorithms have been proposed to handle different application tasks such as facial age estimation, head pose estimation and visual sentiment distributions prediction. However, the training time complexity of most existing LDL algorithms is too high, which makes them unapplicable to large-scale LDL. In this paper, we propose a novel LDL method to address this issue, termed Discrete Binary Coding based Label Distribution Learning (DBC-LDL). Specifically, we design an efficiently discrete coding framework to learn binary codes for instances. Furthermore, both the pair-wise semantic similarities and the original label distributions are integrated into this framework to learn highly discriminative binary codes. In addition, a fast approximate nearest neighbor (ANN) search strategy is utilized to predict label distributions for testing instances. Experimental results on five real-world datasets demonstrate its superior performance over several state-of-the-art LDL methods with the lower time cost.

Download Full-text

Large-scale Semantic Parsing without Question-Answer Pairs

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00190 ◽

2014 ◽

Vol 2 ◽

pp. 377-392 ◽

Cited By ~ 40

Author(s):

Siva Reddy ◽

Mirella Lapata ◽

Mark Steedman

Keyword(s):

Natural Language ◽

Large Scale ◽

Graph Matching ◽

State Of The Art ◽

The State ◽

Semantic Parsing ◽

Matching Problem ◽

Weak Supervision ◽

Benchmark Datasets

In this paper we introduce a novel semantic parsing approach to query Freebase in natural language without requiring manual annotations or question-answer pairs. Our key insight is to represent natural language via semantic graphs whose topology shares many commonalities with Freebase. Given this representation, we conceptualize semantic parsing as a graph matching problem. Our model converts sentences to semantic graphs using CCG and subsequently grounds them to Freebase guided by denotations as a form of weak supervision. Evaluation experiments on a subset of the Free917 and WebQuestions benchmark datasets show our semantic parser improves over the state of the art.

Download Full-text

Label Enhancement for Label Distribution Learning via Prior Knowledge

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/446 ◽

2020 ◽

Author(s):

Yongbiao Gao ◽

Yu Zhang ◽

Xin Geng

Keyword(s):

Machine Learning ◽

Reinforcement Learning ◽

Emotion Recognition ◽

Prior Knowledge ◽

Decision Process ◽

Age Estimation ◽

State Of The Art ◽

Learning Agent ◽

Label Distribution Learning ◽

Label Distribution

Label distribution learning (LDL) is a novel machine learning paradigm that gives a description degree of each label to an instance. However, most of training datasets only contain simple logical labels rather than label distributions due to the difficulty of obtaining the label distributions directly. We propose to use the prior knowledge to recover the label distributions. The process of recovering the label distributions from the logical labels is called label enhancement. In this paper, we formulate the label enhancement as a dynamic decision process. Thus, the label distribution is adjusted by a series of actions conducted by a reinforcement learning agent according to sequential state representations. The target state is defined by the prior knowledge. Experimental results show that the proposed approach outperforms the state-of-the-art methods in both age estimation and image emotion recognition.

Download Full-text

A Comprehensive Taxonomy of Dynamic Texture Representation

ACM Computing Surveys ◽

10.1145/3487892 ◽

2023 ◽

Vol 55 (1) ◽

pp. 1-39

Author(s):

Thanh Tuan Nguyen ◽

Thanh Phuong Nguyen

Keyword(s):

Large Scale ◽

Environmental Changes ◽

State Of The Art ◽

The State ◽

Future Research ◽

Research Activities ◽

Potential Applications ◽

Benchmark Datasets ◽

Negative Impacts ◽

Made In

Representing dynamic textures (DTs) plays an important role in many real implementations in the computer vision community. Due to the turbulent and non-directional motions of DTs along with the negative impacts of different factors (e.g., environmental changes, noise, illumination, etc.), efficiently analyzing DTs has raised considerable challenges for the state-of-the-art approaches. For 20 years, many different techniques have been introduced to handle the above well-known issues for enhancing the performance. Those methods have shown valuable contributions, but the problems have been incompletely dealt with, particularly recognizing DTs on large-scale datasets. In this article, we present a comprehensive taxonomy of DT representation in order to purposefully give a thorough overview of the existing methods along with overall evaluations of their obtained performances. Accordingly, we arrange the methods into six canonical categories. Each of them is then taken in a brief presentation of its principal methodology stream and various related variants. The effectiveness levels of the state-of-the-art methods are then investigated and thoroughly discussed with respect to quantitative and qualitative evaluations in classifying DTs on benchmark datasets. Finally, we point out several potential applications and the remaining challenges that should be addressed in further directions. In comparison with two existing shallow DT surveys (i.e., the first one is out of date as it was made in 2005, while the newer one (published in 2016) is an inadequate overview), we believe that our proposed comprehensive taxonomy not only provides a better view of DT representation for the target readers but also stimulates future research activities.

Download Full-text

Coupled CycleGAN: Unsupervised Hashing Network for Cross-Modal Retrieval

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.3301176 ◽

2019 ◽

Vol 33 ◽

pp. 176-183 ◽

Cited By ~ 11

Author(s):

Chao Li ◽

Cheng Deng ◽

Lei Wang ◽

De Xie ◽

Xianglong Liu

Keyword(s):

Large Scale ◽

State Of The Art ◽

The State ◽

Storage Cost ◽

Common Representation ◽

Benchmark Datasets ◽

Query Efficiency ◽

Hash Codes

In recent years, hashing has attracted more and more attention owing to its superior capacity of low storage cost and high query efficiency in large-scale cross-modal retrieval. Benefiting from deep leaning, continuously compelling results in cross-modal retrieval community have been achieved. However, existing deep cross-modal hashing methods either rely on amounts of labeled information or have no ability to learn an accuracy correlation between different modalities. In this paper, we proposed Unsupervised coupled Cycle generative adversarial Hashing networks (UCH), for cross-modal retrieval, where outer-cycle network is used to learn powerful common representation, and inner-cycle network is explained to generate reliable hash codes. Specifically, our proposed UCH seamlessly couples these two networks with generative adversarial mechanism, which can be optimized simultaneously to learn representation and hash codes. Extensive experiments on three popular benchmark datasets show that the proposed UCH outperforms the state-of-the-art unsupervised cross-modal hashing methods.

Download Full-text

Age Label Distribution Learning Based on Unsupervised Comparisons of Faces

Wireless Communications and Mobile Computing ◽

10.1155/2021/1996803 ◽

2021 ◽

Vol 2021 ◽

pp. 1-7

Author(s):

Qiyuan Li ◽

Zongyong Deng ◽

Weichang Xu ◽

Zhendong Li ◽

Hao Liu

Keyword(s):

Age Estimation ◽

State Of The Art ◽

Processing Method ◽

Wireless Sensor ◽

Learning Method ◽

Significant Progress ◽

Meaningful Information ◽

Label Distribution Learning ◽

Label Distribution ◽

Facial Age

Although label distribution learning has made significant progress in the field of face age estimation, unsupervised learning has not been widely adopted and is still an important and challenging task. In this work, we propose an unsupervised contrastive label distribution learning method (UCLD) for facial age estimation. This method is helpful to extract semantic and meaningful information of raw faces with preserving high-order correlation between adjacent ages. Similar to the processing method of wireless sensor network, we designed the ConAge network with the contrast learning method. As a result, our model maximizes the similarity of positive samples by data enhancement and simultaneously pushes the clusters of negative samples apart. Compared to state-of-the-art methods, we achieve compelling results on the widely used benchmark, i.e., MORPH.

Download Full-text

Media Forensics Challenge Image Provenance Evaluation and State-of-the-Art Analysis on Large-Scale Benchmark Datasets

10.6028/nist.ir.8325 ◽

2020 ◽

Author(s):

Xiongnan Jin ◽

Yooyoung Lee ◽

Jonathan Fiscus ◽

Haiying Guan ◽

Amy N. Yates ◽

...

Keyword(s):

Large Scale ◽

State Of The Art ◽

Media Forensics ◽

Benchmark Datasets ◽

Art Analysis

Download Full-text

Deep Recurrent Quantization for Generating Sequential Binary Codes

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/128 ◽

2019 ◽

Cited By ~ 1

Author(s):

Jingkuan Song ◽

Xiaosu Zhu ◽

Lianli Gao ◽

Xin-Shun Xu ◽

Wu Liu ◽

...

Keyword(s):

State Of The Art ◽

Binary Codes ◽

Time Cost ◽

Code Length ◽

Training Time ◽

Retrieval Accuracy ◽

Effective Technology ◽

Benchmark Datasets ◽

Search Speed ◽

And Training

Quantization has been an effective technology in ANN (approximate nearest neighbour) search due to its high accuracy and fast search speed. To meet the requirement of different applications, there is always a trade-off between retrieval accuracy and speed, reflected by variable code lengths. However, to encode the dataset into different code lengths, existing methods need to train several models, where each model can only produce a specific code length. This incurs a considerable training time cost, and largely reduces the flexibility of quantization methods to be deployed in real applications. To address this issue, we propose a Deep Recurrent Quantization (DRQ) architecture which can generate sequential binary codes. To the end, when the model is trained, a sequence of binary codes can be generated and the code length can be easily controlled by adjusting the number of recurrent iterations. A shared codebook and a scalar factor is designed to be the learnable weights in the deep recurrent quantization block, and the whole framework can be trained in an end-to-end manner. As far as we know, this is the first quantization method that can be trained once and generate sequential binary codes. Experimental results on the benchmark datasets show that our model achieves comparable or even better performance compared with the state-of-the-art for image retrieval. But it requires significantly less number of parameters and training times. Our code is published online: https://github.com/cfm-uestc/DRQ.

Download Full-text

Enhanced Bounding Box Estimation with Distribution Calibration for Visual Tracking

Sensors ◽

10.3390/s21238100 ◽

2021 ◽

Vol 21 (23) ◽

pp. 8100

Author(s):

Bin Yu ◽

Ming Tang ◽

Guibo Zhu ◽

Jinqiao Wang ◽

Hanqing Lu

Keyword(s):

Visual Tracking ◽

Large Scale ◽

State Of The Art ◽

Estimation Method ◽

Target Object ◽

Reference Information ◽

Bounding Box ◽

The Mean ◽

Modulation Vector ◽

Initial Target

Bounding box estimation by overlap maximization has improved the state of the art of visual tracking significantly, yet the improvement in robustness and accuracy is restricted by the limited reference information, i.e., the initial target. In this paper, we present DCOM, a novel bounding box estimation method for visual tracking, based on distribution calibration and overlap maximization. We assume every dimension in the modulation vector follows a Gaussian distribution, so that the mean and the variance can borrow from those of similar targets in large-scale training datasets. As such, sufficient and reliable reference information can be obtained from the calibrated distribution, leading to a more robust and accurate target estimation. Additionally, an updating strategy for the modulation vector is proposed to adapt the variation of the target object. Our method can be built on top of off-the-shelf networks without finetuning and extra parameters. It yields state-of-the-art performance on three popular benchmarks, including GOT-10k, LaSOT, and NfS while running at around 40 FPS, confirming its effectiveness and efficiency.

Download Full-text

Similarity Preserving Deep Asymmetric Quantization for Image Retrieval

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33018183 ◽

2019 ◽

Vol 33 ◽

pp. 8183-8190 ◽

Cited By ~ 1

Author(s):

Junjie Chen ◽

William K. Cheung

Keyword(s):

Large Scale ◽

State Of The Art ◽

Multimedia Retrieval ◽

Retrieval Performance ◽

Retrieval Accuracy ◽

Compact Binary ◽

Label Information ◽

Benchmark Datasets ◽

Similarity Preserving ◽

Novel Model

Quantization has been widely adopted for large-scale multimedia retrieval due to its effectiveness of coding highdimensional data. Deep quantization models have been demonstrated to achieve the state-of-the-art retrieval accuracy. However, training the deep models given a large-scale database is highly time-consuming as a large amount of parameters are involved. Existing deep quantization methods often sample only a subset from the database for training, which may end up with unsatisfactory retrieval performance as a large portion of label information is discarded. To alleviate this problem, we propose a novel model called Similarity Preserving Deep Asymmetric Quantization (SPDAQ) which can directly learn the compact binary codes and quantization codebooks for all the items in the database efficiently. To do that, SPDAQ makes use of an image subset as well as the label information of all the database items so the image subset items and the database items are mapped to two different but correlated distributions, where the label similarity can be well preserved. An efficient optimization algorithm is proposed for the learning. Extensive experiments conducted on four widely-used benchmark datasets demonstrate the superiority of our proposed SPDAQ model.

Download Full-text