Lifelong Spectral Clustering

In the past decades, spectral clustering (SC) has become one of the most effective clustering algorithms. However, most previous studies focus on spectral clustering tasks with a fixed task set, which cannot incorporate with a new spectral clustering task without accessing to previously learned tasks. In this paper, we aim to explore the problem of spectral clustering in a lifelong machine learning framework, i.e., Lifelong Spectral Clustering (L2SC). Its goal is to efficiently learn a model for a new spectral clustering task by selectively transferring previously accumulated experience from knowledge library. Specifically, the knowledge library of L2SC contains two components: 1) orthogonal basis library: capturing latent cluster centers among the clusters in each pair of tasks; 2) feature embedding library: embedding the feature manifold information shared among multiple related tasks. As a new spectral clustering task arrives, L2SC firstly transfers knowledge from both basis library and feature library to obtain encoding matrix, and further redefines the library base over time to maximize performance across all the clustering tasks. Meanwhile, a general online update formulation is derived to alternatively update the basis library and feature library. Finally, the empirical experiments on several real-world benchmark datasets demonstrate that our L2SC model can effectively improve the clustering performance when comparing with other state-of-the-art spectral clustering algorithms.

Download Full-text

Anchors Bring Ease: An Embarrassingly Simple Approach to Partial Multi-View Clustering

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.3301118 ◽

2019 ◽

Vol 33 ◽

pp. 118-125 ◽

Cited By ~ 2

Author(s):

Jun Guo ◽

Jiahui Ye

Keyword(s):

Spectral Clustering ◽

Time Complexity ◽

State Of The Art ◽

Research Problem ◽

Simple Approach ◽

Clustering Methods ◽

Real World Data ◽

The Past ◽

Benchmark Datasets ◽

In Virtue Of

Clustering on multi-view data has attracted much more attention in the past decades. Most previous studies assume that each instance appears in all views, or there is at least one view containing all instances. However, real world data often suffers from missing some instances in each view, leading to the research problem of partial multi-view clustering. To address this issue, this paper proposes a simple yet effective Anchorbased Partial Multi-view Clustering (APMC) method, which utilizes anchors to reconstruct instance-to-instance relationships for clustering. APMC is conceptually simple and easy to implement in practice, besides it has clear intuitions and non-trivial empirical guarantees. Specifically, APMC firstly integrates intra- and inter- view similarities through anchors. Then, spectral clustering is performed on the fused similarities to obtain a unified clustering result. Compared with existing partial multi-view clustering methods, APMC has three notable advantages: 1) it can capture more non-linear relations among instances with the help of kernel-based similarities; 2) it has a much lower time complexity in virtue of a noniterative scheme; 3) it can inherently handle data with negative entries as well as be extended to more than two views. Finally, we extensively evaluate the proposed method on five benchmark datasets. Experimental results demonstrate the superiority of APMC over state-of-the-art approaches.

Download Full-text

Consensus Kernel K-Means Clustering for Incomplete Multiview Data

Computational Intelligence and Neuroscience ◽

10.1155/2017/3961718 ◽

2017 ◽

Vol 2017 ◽

pp. 1-11 ◽

Cited By ~ 5

Author(s):

Yongkai Ye ◽

Xinwang Liu ◽

Qiang Liu ◽

Jianping Yin

Keyword(s):

Real World ◽

State Of The Art ◽

Clustering Algorithms ◽

Multiple Views ◽

Learning Method ◽

Learning Framework ◽

Optimal Integration ◽

Art Methods ◽

Multiview Clustering

Multiview clustering aims to improve clustering performance through optimal integration of information from multiple views. Though demonstrating promising performance in various applications, existing multiview clustering algorithms cannot effectively handle the view’s incompleteness. Recently, one pioneering work was proposed that handled this issue by integrating multiview clustering and imputation into a unified learning framework. While its framework is elegant, we observe that it overlooks the consistency between views, which leads to a reduction in the clustering performance. In order to address this issue, we propose a new unified learning method for incomplete multiview clustering, which simultaneously imputes the incomplete views and learns a consistent clustering result with explicit modeling of between-view consistency. More specifically, the similarity between each view’s clustering result and the consistent clustering result is measured. The consistency between views is then modeled using the sum of these similarities. Incomplete views are imputed to achieve an optimal clustering result in each view, while maintaining between-view consistency. Extensive comparisons with state-of-the-art methods on both synthetic and real-world incomplete multiview datasets validate the superiority of the proposed method.

Download Full-text

An Empirical Comparison of Latest Data Clustering Algorithms with State-of-the-Art

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v5.i2.pp410-415 ◽

2017 ◽

Vol 5 (2) ◽

pp. 410 ◽

Cited By ~ 4

Author(s):

Xianjin Shi ◽

Wanwan Wang ◽

Chongsheng Zhang

Keyword(s):

Data Clustering ◽

Spectral Clustering ◽

Clustering Algorithm ◽

State Of The Art ◽

Clustering Algorithms ◽

Density Peak ◽

The Past ◽

Overall Performance ◽

Public Datasets ◽

Clustering Validation

Over the past few decades, a great many data clustering algorithms have been developed, including K-Means, DBSCAN, Bi-Clustering and Spectral clustering, etc. In recent years, two new data clustering algorithms have been proposed, which are affinity propagation (AP, 2007) and density peak based clustering (DP, 2014). In this work, we empirically compare the performance of these two latest data clustering algorithms with state-of-the-art, using 6 external and 2 internal clustering validation metrics. Our experimental results on 16 public datasets show that, the two latest clustering algorithms, AP and DP, do not always outperform DBSCAN. Therefore, to find the best clustering algorithm for a specific dataset, all of AP, DP and DBSCAN should be considered. Moreover, we find that the comparison of different clustering algorithms is closely related to the clustering evaluation metrics adopted. For instance, when using the Silhouette clustering validation metric, the overall performance of K-Means is as good as AP and DP. This work has important reference values for researchers and engineers who need to select appropriate clustering algorithms for their specific applications.

Download Full-text

A Survey on Bias and Fairness in Machine Learning

ACM Computing Surveys ◽

10.1145/3457607 ◽

2021 ◽

Vol 54 (6) ◽

pp. 1-35

Author(s):

Ninareh Mehrabi ◽

Fred Morstatter ◽

Nripsuta Saxena ◽

Kristina Lerman ◽

Aram Galstyan

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Deep Learning ◽

Real World ◽

State Of The Art ◽

Future Directions ◽

Discriminatory Behavior ◽

Real World Applications ◽

Near Future ◽

Different Sources

With the widespread use of artificial intelligence (AI) systems and applications in our everyday lives, accounting for fairness has gained significant importance in designing and engineering of such systems. AI systems can be used in many sensitive environments to make important and life-changing decisions; thus, it is crucial to ensure that these decisions do not reflect discriminatory behavior toward certain groups or populations. More recently some work has been developed in traditional machine learning and deep learning that address such challenges in different subdomains. With the commercialization of these systems, researchers are becoming more aware of the biases that these applications can contain and are attempting to address them. In this survey, we investigated different real-world applications that have shown biases in various ways, and we listed different sources of biases that can affect AI applications. We then created a taxonomy for fairness definitions that machine learning researchers have defined to avoid the existing bias in AI systems. In addition to that, we examined different domains and subdomains in AI showing what researchers have observed with regard to unfair outcomes in the state-of-the-art methods and ways they have tried to address them. There are still many future directions and solutions that can be taken to mitigate the problem of bias in AI systems. We are hoping that this survey will motivate researchers to tackle these issues in the near future by observing existing work in their respective fields.

Download Full-text

Learning to Incorporate Structure Knowledge for Image Inpainting

10.20944/preprints202002.0125.v1 ◽

2020 ◽

Cited By ~ 1

Author(s):

Jie Yang ◽

Zhiquan Qi ◽

Yong Shi

Keyword(s):

Structure Learning ◽

State Of The Art ◽

Image Inpainting ◽

Image Completion ◽

Image Structure ◽

Learning Framework ◽

Task Learning ◽

Pyramid Structure ◽

Benchmark Datasets ◽

Structure Knowledge

This paper develops a multi-task learning framework that attempts to incorporate the image structure knowledge to assist image inpainting, which is not well explored in previous works. The primary idea is to train a shared generator to simultaneously complete the corrupted image and corresponding structures --- edge and gradient, thus implicitly encouraging the generator to exploit relevant structure knowledge while inpainting. In the meantime, we also introduce a structure embedding scheme to explicitly embed the learned structure features into the inpainting process, thus to provide possible preconditions for image completion. Specifically, a novel pyramid structure loss is proposed to supervise structure learning and embedding. Moreover, an attention mechanism is developed to further exploit the recurrent structures and patterns in the image to refine the generated structures and contents. Through multi-task learning, structure embedding besides with attention, our framework takes advantage of the structure knowledge and outperforms several state-of-the-art methods on benchmark datasets quantitatively and qualitatively.

Download Full-text

Triple-GAIL: A Multi-Modal Imitation Learning Framework with Generative Adversarial Nets

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/405 ◽

2020 ◽

Cited By ~ 1

Author(s):

Cong Fei ◽

Bin Wang ◽

Yuzheng Zhuang ◽

Zongzhang Zhang ◽

Jianye Hao ◽

...

Keyword(s):

Real World ◽

Autonomous Vehicles ◽

Data Augmentation ◽

State Of The Art ◽

Robot Learning ◽

Imitation Learning ◽

Proper Understanding ◽

Learning Framework ◽

Strategy Game ◽

Real Time Strategy Game

Generative adversarial imitation learning (GAIL) has shown promising results by taking advantage of generative adversarial nets, especially in the field of robot learning. However, the requirement of isolated single modal demonstrations limits the scalability of the approach to real world scenarios such as autonomous vehicles' demand for a proper understanding of human drivers' behavior. In this paper, we propose a novel multi-modal GAIL framework, named Triple-GAIL, that is able to learn skill selection and imitation jointly from both expert demonstrations and continuously generated experiences with data augmentation purpose by introducing an auxiliary selector. We provide theoretical guarantees on the convergence to optima for both of the generator and the selector respectively. Experiments on real driver trajectories and real-time strategy game datasets demonstrate that Triple-GAIL can better fit multi-modal behaviors close to the demonstrators and outperforms state-of-the-art methods.

Download Full-text

Sliding-Window Thompson Sampling for Non-Stationary Settings

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.11407 ◽

2020 ◽

Vol 68 ◽

pp. 311-364

Author(s):

Francesco Trovo ◽

Stefano Paladino ◽

Marcello Restelli ◽

Nicola Gatti

Keyword(s):

Real World ◽

State Of The Art ◽

Sliding Window ◽

Upper Bounds ◽

Decision Problems ◽

Sequential Decision ◽

Thompson Sampling ◽

The Past ◽

Real World Applications ◽

Window Approach

Multi-Armed Bandit (MAB) techniques have been successfully applied to many classes of sequential decision problems in the past decades. However, non-stationary settings -- very common in real-world applications -- received little attention so far, and theoretical guarantees on the regret are known only for some frequentist algorithms. In this paper, we propose an algorithm, namely Sliding-Window Thompson Sampling (SW-TS), for nonstationary stochastic MAB settings. Our algorithm is based on Thompson Sampling and exploits a sliding-window approach to tackle, in a unified fashion, two different forms of non-stationarity studied separately so far: abruptly changing and smoothly changing. In the former, the reward distributions are constant during sequences of rounds, and their change may be arbitrary and happen at unknown rounds, while, in the latter, the reward distributions smoothly evolve over rounds according to unknown dynamics. Under mild assumptions, we provide regret upper bounds on the dynamic pseudo-regret of SW-TS for the abruptly changing environment, for the smoothly changing one, and for the setting in which both the non-stationarity forms are present. Furthermore, we empirically show that SW-TS dramatically outperforms state-of-the-art algorithms even when the forms of non-stationarity are taken separately, as previously studied in the literature.

Download Full-text

Learning to Incorporate Structure Knowledge for Image Inpainting

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6951 ◽

2020 ◽

Vol 34 (07) ◽

pp. 12605-12612 ◽

Cited By ~ 1

Author(s):

Jie Yang ◽

Zhiquan Qi ◽

Yong Shi

Keyword(s):

Structure Learning ◽

State Of The Art ◽

Image Inpainting ◽

Image Completion ◽

Image Structure ◽

Learning Framework ◽

Task Learning ◽

Pyramid Structure ◽

Benchmark Datasets ◽

Structure Knowledge

This paper develops a multi-task learning framework that attempts to incorporate the image structure knowledge to assist image inpainting, which is not well explored in previous works. The primary idea is to train a shared generator to simultaneously complete the corrupted image and corresponding structures — edge and gradient, thus implicitly encouraging the generator to exploit relevant structure knowledge while inpainting. In the meantime, we also introduce a structure embedding scheme to explicitly embed the learned structure features into the inpainting process, thus to provide possible preconditions for image completion. Specifically, a novel pyramid structure loss is proposed to supervise structure learning and embedding. Moreover, an attention mechanism is developed to further exploit the recurrent structures and patterns in the image to refine the generated structures and contents. Through multi-task learning, structure embedding besides with attention, our framework takes advantage of the structure knowledge and outperforms several state-of-the-art methods on benchmark datasets quantitatively and qualitatively.

Download Full-text

Accurate prediction of DNA N4-methylcytosine sites via boost-learning various types of sequence features

BMC Genomics ◽

10.1186/s12864-020-07033-8 ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Zhixun Zhao ◽

Xiaocai Zhang ◽

Fang Chen ◽

Liang Fang ◽

Jinyan Li

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Case Studies ◽

Learning Algorithm ◽

State Of The Art ◽

Learning Algorithms ◽

Feature Space ◽

Sequence Features ◽

Independent Test ◽

Benchmark Datasets

Abstract Background DNA N4-methylcytosine (4mC) is a critical epigenetic modification and has various roles in the restriction-modification system. Due to the high cost of experimental laboratory detection, computational methods using sequence characteristics and machine learning algorithms have been explored to identify 4mC sites from DNA sequences. However, state-of-the-art methods have limited performance because of the lack of effective sequence features and the ad hoc choice of learning algorithms to cope with this problem. This paper is aimed to propose new sequence feature space and a machine learning algorithm with feature selection scheme to address the problem. Results The feature importance score distributions in datasets of six species are firstly reported and analyzed. Then the impact of the feature selection on model performance is evaluated by independent testing on benchmark datasets, where ACC and MCC measurements on the performance after feature selection increase by 2.3% to 9.7% and 0.05 to 0.19, respectively. The proposed method is compared with three state-of-the-art predictors using independent test and 10-fold cross-validations, and our method outperforms in all datasets, especially improving the ACC by 3.02% to 7.89% and MCC by 0.06 to 0.15 in the independent test. Two detailed case studies by the proposed method have confirmed the excellent overall performance and correctly identified 24 of 26 4mC sites from the C.elegans gene, and 126 out of 137 4mC sites from the D.melanogaster gene. Conclusions The results show that the proposed feature space and learning algorithm with feature selection can improve the performance of DNA 4mC prediction on the benchmark datasets. The two case studies prove the effectiveness of our method in practical situations.

Download Full-text

An Efficient Perpetual Learning Algorithm

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213019500222 ◽

2019 ◽

Vol 28 (07) ◽

pp. 1950022 ◽

Cited By ~ 1

Author(s):

Haiou Qin ◽

Du Zhang ◽

Xibin Sun ◽

Jiahua Tang ◽

Jun Peng

Keyword(s):

Machine Learning ◽

Real World ◽

Efficient Algorithm ◽

Learning Algorithm ◽

Small Data ◽

Computing Systems ◽

Agent Systems ◽

Multiple Tasks ◽

Improved Performance ◽

Over Time

One of the emerging research opportunities in machine learning is to develop computing systems that learn many tasks continuously and improve the performance of learned tasks incrementally over time. In real world, learners have to adapt to labeled and unlabeled samples from various tasks which arrive randomly. In this paper, we propose an efficient algorithm called Efficient Perpetual Learning Algorithm (EPLA) which is suitable for learning multiple tasks in both offline and online settings. The algorithm, which is an extension of ELLA,4 is part of what we call perpetual learning that can learn new tasks or refine knowledge of learned tasks for improved performance with newly arrived labeled samples in an incremental fashion. Several salient features exist for EPLA. The learning episodes are triggered via either extrinsic or intrinsic stimuli. Agent systems based on the proposed algorithm can be engaged in an open-ended and alternating sequence of learning episodes and working episodes. Unlabeled samples can be used to self-train the learner in small data setting. Compared with ELLA, EPLA shows almost equivalent performance without memorizing any labeled samples learned previously.

Download Full-text