Self-Paced Robust Learning for Leveraging Clean Labels in Noisy Data

Xuchao Zhang; Xian Wu; Fanglan Chen; Liang Zhao; Chang-Tien Lu

doi:10.1609/aaai.v34i04.6166

Self-Paced Robust Learning for Leveraging Clean Labels in Noisy Data

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6166 ◽

2020 ◽

Vol 34 (04) ◽

pp. 6853-6860

Author(s):

Xuchao Zhang ◽

Xian Wu ◽

Fanglan Chen ◽

Liang Zhao ◽

Chang-Tien Lu

Keyword(s):

Real World ◽

Large Scale ◽

Learning Algorithm ◽

Noisy Data ◽

Training Set ◽

Robust Learning ◽

Robust Model ◽

Small Set ◽

Real World Datasets ◽

Theoretical Analyses

The success of training accurate models strongly depends on the availability of a sufficient collection of precisely labeled data. However, real-world datasets contain erroneously labeled data samples that substantially hinder the performance of machine learning models. Meanwhile, well-labeled data is usually expensive to obtain and only a limited amount is available for training. In this paper, we consider the problem of training a robust model by using large-scale noisy data in conjunction with a small set of clean data. To leverage the information contained via the clean labels, we propose a novel self-paced robust learning algorithm (SPRL) that trains the model in a process from more reliable (clean) data instances to less reliable (noisy) ones under the supervision of well-labeled data. The self-paced learning process hedges the risk of selecting corrupted data into the training set. Moreover, theoretical analyses on the convergence of the proposed algorithm are provided under mild assumptions. Extensive experiments on synthetic and real-world datasets demonstrate that our proposed approach can achieve a considerable improvement in effectiveness and robustness to existing methods.

Download Full-text

Attention Enhanced Serial Unet++ Network for Removing Unevenly Distributed Haze

Electronics ◽

10.3390/electronics10222868 ◽

2021 ◽

Vol 10 (22) ◽

pp. 2868

Author(s):

Wenxuan Zhao ◽

Yaqin Zhao ◽

Liqi Feng ◽

Jiaxi Tang

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Real World ◽

Large Scale ◽

Learning Strategy ◽

Contextual Information ◽

Small Scale ◽

Image Dehazing ◽

Atmospheric Scattering ◽

Real World Datasets

The purpose of image dehazing is the reduction of the image degradation caused by suspended particles for supporting high-level visual tasks. Besides the atmospheric scattering model, convolutional neural network (CNN) has been used for image dehazing. However, the existing image dehazing algorithms are limited in face of unevenly distributed haze and dense haze in real-world scenes. In this paper, we propose a novel end-to-end convolutional neural network called attention enhanced serial Unet++ dehazing network (AESUnet) for single image dehazing. We attempt to build a serial Unet++ structure that adopts a serial strategy of two pruned Unet++ blocks based on residual connection. Compared with the simple Encoder–Decoder structure, the serial Unet++ module can better use the features extracted by encoders and promote contextual information fusion in different resolutions. In addition, we take some improvement measures to the Unet++ module, such as pruning, introducing the convolutional module with ResNet structure, and a residual learning strategy. Thus, the serial Unet++ module can generate more realistic images with less color distortion. Furthermore, following the serial Unet++ blocks, an attention mechanism is introduced to pay different attention to haze regions with different concentrations by learning weights in the spatial domain and channel domain. Experiments are conducted on two representative datasets: the large-scale synthetic dataset RESIDE and the small-scale real-world datasets I-HAZY and O-HAZY. The experimental results show that the proposed dehazing network is not only comparable to state-of-the-art methods for the RESIDE synthetic datasets, but also surpasses them by a very large margin for the I-HAZY and O-HAZY real-world dataset.

Download Full-text

Muhkam Algorithmic Models of Real World Processes for Intelligent Technologies

International Journal of Robotics Applications and Technologies ◽

10.4018/ijrat.2013070105 ◽

2013 ◽

Vol 1 (2) ◽

pp. 56-82 ◽

Cited By ~ 1

Author(s):

Tom Adi ◽

O.K. Ewell ◽

Tim Vogel ◽

Kim Payton ◽

Jeannine L. Hippchen

Keyword(s):

Real World ◽

Large Scale ◽

Adaptive Method ◽

Emotional Learning ◽

Sound Symbolism ◽

Small Set ◽

Software Implementations

This paper significantly revises and expands a chapter in a handbook of research on synthetic emotions published in 2009. The authors extend the scope beyond emotions to all real world processes. The authors use a new adaptive method to create, improve, and correct algorithmic models (models in the form of algorithms) of emotional, learning, communication, memory, perception, biological, physiological, social, legal, and spiritual processes. The models are constructed from the sound symbolism that is indicated by the usage of the Arabic names of these processes in so-called muhkam text passages. These muhkam models have been validated by successful large-scale software implementations and by clinical emotions research. Naturally, models in the form of algorithms are easy to implement in intelligent technologies. These models also lend themselves to integration and interoperability because they share a small set of seven general concepts and their symmetrical combinations.

Download Full-text

Efficient Heterogeneous Collaborative Filtering without Negative Sampling for Recommendation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i01.5329 ◽

2020 ◽

Vol 34 (01) ◽

pp. 19-26 ◽

Cited By ~ 5

Author(s):

Chong Chen ◽

Min Zhang ◽

Yongfeng Zhang ◽

Weizhi Ma ◽

Yiqun Liu ◽

...

Keyword(s):

Collaborative Filtering ◽

Real World ◽

Large Scale ◽

State Of The Art ◽

Heterogeneous Data ◽

Model Parameters ◽

Online Systems ◽

Practical Applications ◽

Real World Datasets ◽

Primary Type

Recent studies on recommendation have largely focused on exploring state-of-the-art neural networks to improve the expressiveness of models, while typically apply the Negative Sampling (NS) strategy for efficient learning. Despite effectiveness, two important issues have not been well-considered in existing methods: 1) NS suffers from dramatic fluctuation, making sampling-based methods difficult to achieve the optimal ranking performance in practical applications; 2) although heterogeneous feedback (e.g., view, click, and purchase) is widespread in many online systems, most existing methods leverage only one primary type of user feedback such as purchase. In this work, we propose a novel non-sampling transfer learning solution, named Efficient Heterogeneous Collaborative Filtering (EHCF) for Top-N recommendation. It can not only model fine-grained user-item relations, but also efficiently learn model parameters from the whole heterogeneous data (including all unlabeled data) with a rather low time complexity. Extensive experiments on three real-world datasets show that EHCF significantly outperforms state-of-the-art recommendation methods in both traditional (single-behavior) and heterogeneous scenarios. Moreover, EHCF shows significant improvements in training efficiency, making it more applicable to real-world large-scale systems. Our implementation has been released 1 to facilitate further developments on efficient whole-data based neural methods.

Download Full-text

MetaLight: Value-Based Meta-Reinforcement Learning for Traffic Signal Control

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i01.5467 ◽

2020 ◽

Vol 34 (01) ◽

pp. 1153-1160 ◽

Cited By ~ 1

Author(s):

Xinshi Zang ◽

Huaxiu Yao ◽

Guanjie Zheng ◽

Nan Xu ◽

Kai Xu ◽

...

Keyword(s):

Reinforcement Learning ◽

Real World ◽

Learning Algorithm ◽

Traffic Signal ◽

Training Data ◽

Signal Control ◽

Traffic Signal Control ◽

Individual Level ◽

Real World Datasets ◽

Reinforcement Learning Models

Using reinforcement learning for traffic signal control has attracted increasing interests recently. Various value-based reinforcement learning methods have been proposed to deal with this classical transportation problem and achieved better performances compared with traditional transportation methods. However, current reinforcement learning models rely on tremendous training data and computational resources, which may have bad consequences (e.g., traffic jams or accidents) in the real world. In traffic signal control, some algorithms have been proposed to empower quick learning from scratch, but little attention is paid to learning by transferring and reusing learned experience. In this paper, we propose a novel framework, named as MetaLight, to speed up the learning process in new scenarios by leveraging the knowledge learned from existing scenarios. MetaLight is a value-based meta-reinforcement learning workflow based on the representative gradient-based meta-learning algorithm (MAML), which includes periodically alternate individual-level adaptation and global-level adaptation. Moreover, MetaLight improves the-state-of-the-art reinforcement learning model FRAP in traffic signal control by optimizing its model structure and updating paradigm. The experiments on four real-world datasets show that our proposed MetaLight not only adapts more quickly and stably in new traffic scenarios, but also achieves better performance.

Download Full-text

Causal Discovery Combining K2 with Brain Storm Optimization Algorithm

Molecules ◽

10.3390/molecules23071729 ◽

2018 ◽

Vol 23 (7) ◽

pp. 1729

Author(s):

Yinghan Hong ◽

Zhifeng Hao ◽

Guizhen Mai ◽

Han Huang ◽

Arun Kumar Sangaiah

Keyword(s):

Real World ◽

Data Science ◽

Learning Algorithm ◽

Causal Structure ◽

Scientific Discovery ◽

Causal Discovery ◽

Causal Mechanism ◽

Topological Order ◽

Brain Storm Optimization ◽

Real World Datasets

Exploring and detecting the causal relations among variables have shown huge practical values in recent years, with numerous opportunities for scientific discovery, and have been commonly seen as the core of data science. Among all possible causal discovery methods, causal discovery based on a constraint approach could recover the causal structures from passive observational data in general cases, and had shown extensive prospects in numerous real world applications. However, when the graph was sufficiently large, it did not work well. To alleviate this problem, an improved causal structure learning algorithm named brain storm optimization (BSO), is presented in this paper, combining K2 with brain storm optimization (K2-BSO). Here BSO is used to search optimal topological order of nodes instead of graph space. This paper assumes that dataset is generated by conforming to a causal diagram in which each variable is generated from its parent based on a causal mechanism. We designed an elaborate distance function for clustering step in BSO according to the mechanism of K2. The graph space therefore was reduced to a smaller topological order space and the order space can be further reduced by an efficient clustering method. The experimental results on various real-world datasets showed our methods outperformed the traditional search and score methods and the state-of-the-art genetic algorithm-based methods.

Download Full-text

Large-Scale Heterogeneous Feature Embedding

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33013878 ◽

2019 ◽

Vol 33 ◽

pp. 3878-3885 ◽

Cited By ~ 5

Author(s):

Xiao Huang ◽

Qingquan Song ◽

Fan Yang ◽

Xia Hu

Keyword(s):

Real World ◽

Large Scale ◽

Single Type ◽

Heterogeneous Information ◽

Multiview Learning ◽

Efficiency And Effectiveness ◽

Joint Embedding ◽

Real World Datasets ◽

Low Dimensional ◽

Vector Representations

Feature embedding aims to learn a low-dimensional vector representation for each instance to preserve the information in its features. These representations can benefit various offthe-shelf learning algorithms. While embedding models for a single type of features have been well-studied, real-world instances often contain multiple types of correlated features or even information within a different modality such as networks. Existing studies such as multiview learning show that it is promising to learn unified vector representations from all sources. However, high computational costs of incorporating heterogeneous information limit the applications of existing algorithms. The number of instances and dimensions of features in practice are often large. To bridge the gap, we propose a scalable framework FeatWalk, which can model and incorporate instance similarities in terms of different types of features into a unified embedding representation. To enable the scalability, FeatWalk does not directly calculate any similarity measure, but provides an alternative way to simulate the similarity-based random walks among instances to extract the local instance proximity and preserve it in a set of instance index sequences. These sequences are homogeneous with each other. A scalable word embedding algorithm is applied to them to learn a joint embedding representation of instances. Experiments on four real-world datasets demonstrate the efficiency and effectiveness of FeatWalk.

Download Full-text

Online Multitask Relative Similarity Learning

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/253 ◽

2017 ◽

Cited By ~ 2

Author(s):

Shuji Hao ◽

Peilin Zhao ◽

Yong Liu ◽

Steven C. H. Hoi ◽

Chunyan Miao

Keyword(s):

Real World ◽

Learning Algorithm ◽

Learning Problems ◽

Similarity Function ◽

Learning Approaches ◽

Similarity Learning ◽

Real World Data ◽

Real World Datasets ◽

Online Learning Algorithm ◽

Relative Similarity

Relative similarity learning~(RSL) aims to learn similarity functions from data with relative constraints. Most previous algorithms developed for RSL are batch-based learning approaches which suffer from poor scalability when dealing with real-world data arriving sequentially. These methods are often designed to learn a single similarity function for a specific task. Therefore, they may be sub-optimal to solve multiple task learning problems. To overcome these limitations, we propose a scalable RSL framework named OMTRSL (Online Multi-Task Relative Similarity Learning). Specifically, we first develop a simple yet effective online learning algorithm for multi-task relative similarity learning. Then, we also propose an active learning algorithm to save the labeling cost. The proposed algorithms not only enjoy theoretical guarantee, but also show high efficacy and efficiency in extensive experiments on real-world datasets.

Download Full-text

Label Enhancement for Label Distribution Learning

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/406 ◽

2018 ◽

Cited By ~ 1

Author(s):

Ning Xu ◽

An Tao ◽

Xin Geng

Keyword(s):

Learning Process ◽

Real World ◽

Feature Space ◽

Graph Laplacian ◽

Training Set ◽

Topological Information ◽

Label Distribution Learning ◽

Real World Datasets ◽

Label Distribution ◽

Training Sets

Label distribution is more general than both single-label annotation and multi-label annotation. It covers a certain number of labels, representing the degree to which each label describes the instance. The learning process on the instances labeled by label distributions is called label distribution learning (LDL). Unfortunately, many training sets only contain simple logical labels rather than label distributions due to the difficulty of obtaining the label distributions directly. To solve the problem, one way is to recover the label distributions from the logical labels in the training set via leveraging the topological information of the feature space and the correlation among the labels. Such process of recovering label distributions from logical labels is defined as label enhancement (LE), which reinforces the supervision information in the training sets. This paper proposes a novel LE algorithm called Graph Laplacian Label Enhancement (GLLE). Experimental results on one artificial dataset and fourteen real-world datasets show clear advantages of GLLE over several existing LE algorithms.

Download Full-text

Quadruply Stochastic Gradient Method for Large Scale Nonlinear Semi-Supervised Ordinal Regression AUC Optimization

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6029 ◽

2020 ◽

Vol 34 (04) ◽

pp. 5734-5741

Author(s):

Wanli Shi ◽

Bin Gu ◽

Xiang Li ◽

Heng Huang

Keyword(s):

Real World ◽

Large Scale ◽

Optimal Solution ◽

Ordinal Regression ◽

Data Sampling ◽

Decomposition Approach ◽

Scalable Algorithm ◽

Auc Optimization ◽

Stochastic Data ◽

Real World Datasets

Semi-supervised ordinal regression (S2OR) problems are ubiquitous in real-world applications, where only a few ordered instances are labeled and massive instances remain unlabeled. Recent researches have shown that directly optimizing concordance index or AUC can impose a better ranking on the data than optimizing the traditional error rate in ordinal regression (OR) problems. In this paper, we propose an unbiased objective function for S2OR AUC optimization based on ordinal binary decomposition approach. Besides, to handle the large-scale kernelized learning problems, we propose a scalable algorithm called QS3ORAO using the doubly stochastic gradients (DSG) framework for functional optimization. Theoretically, we prove that our method can converge to the optimal solution at the rate of O(1/t), where t is the number of iterations for stochastic data sampling. Extensive experimental results on various benchmark and real-world datasets also demonstrate that our method is efficient and effective while retaining similar generalization performance.

Download Full-text

The sequence of disease-modifying anti-rheumatic drugs: pathways to and predictors of tocilizumab monotherapy

Arthritis Research & Therapy ◽

10.1186/s13075-020-02408-4 ◽

2021 ◽

Vol 23 (1) ◽

Author(s):

Daniel H. Solomon ◽

Chang Xu ◽

Jamie Collins ◽

Seoyoung C. Kim ◽

Elena Losina ◽

...

Keyword(s):

Real World ◽

Large Scale ◽

Disease Duration ◽

Learning Algorithm ◽

Supervised Machine Learning ◽

Tnf Inhibitor ◽

Discrete State ◽

Prior Use ◽

Disease Modifying ◽

Tocilizumab Monotherapy

Abstract Background There are numerous non-biologic and biologic disease-modifying anti-rheumatic drugs (bDMARDs) for rheumatoid arthritis (RA). Typical sequences of bDMARDs are not clear. Future treatment policies and trials should be informed by quantitative estimates of current treatment practice. Methods We used data from Corrona, a large real-world RA registry, to develop a method for quantifying sequential patterns in treatment with bDMARDs. As a proof of concept, we study patients who eventually use tocilizumab monotherapy (TCZm), an IL-6 antagonist with similar benefits used as monotherapy or in combination. Patients starting a bDMARD were included and were followed using a discrete-state Markov model, observing changes in treatments every 6 months and determining whether they used TCZm. A supervised machine learning algorithm was then employed to determine longitudinal patient factors associated with TCZm use. Results 7300 patients starting a bDMARD were followed for up to 5 years. Their median age was 58 years, 78% were female, median disease duration was 5 years, and 57% were seropositive. During follow-up, 287 (3.9%) reported use of TCZm with median time until use of 25.6 (11.5, 56.0) months. Eighty-two percent of TCZm use began within 3 years of starting any bDMARD. Ninety-three percent of TCZm users switched from TCZ combination, a TNF inhibitor, or another bDMARD. Very few patients are given TCZm as their first DMARD (0.6%). Variables associated with the use of TCZm included prior use of TCZ combination therapy, older age, longer disease duration, seronegative, higher disease activity, and no prior use of a TNF inhibitor. Conclusions Improved understanding of treatment sequences in RA may help personalize care. These methods may help optimize treatment decisions using large-scale real-world data.

Download Full-text