Uncertainty Sampling for Constrained Cluster Ensemble

Constrained clustering is a framework for improving clustering performance by using constraints about data pairs. Since performance of constrained clustering depends on the set of constraints used, a method is needed to select good constraints that promote clustering performance. In this paper, we propose an active sampling method working with a constrained cluster ensemble algorithm that aggregates clustering results that a modified COP-Kmeans iteratively produces by changing the priorities of constraints. Our method follows the approach of uncertainty sampling and measures uncertainty using variations of clustering results where data pairs are clustered together in some results but not in others. It selects the data pair to be labeled that has the most variable result during cluster ensemble process. Experimental results show that our method outperforms random sampling. We further investigate the effect of important parameters.

Download Full-text

How to measure uncertainty in uncertainty sampling for active learning

Machine Learning ◽

10.1007/s10994-021-06003-9 ◽

2021 ◽

Author(s):

Vu-Linh Nguyen ◽

Mohammad Hossein Shaker ◽

Eyke Hüllermeier

Keyword(s):

Machine Learning ◽

Active Learning ◽

Sampling Strategies ◽

Total Uncertainty ◽

Uncertainty Sampling ◽

Different Types ◽

Alternative Approaches ◽

Active Learner ◽

Probabilistic Nature ◽

Different Sources

AbstractVarious strategies for active learning have been proposed in the machine learning literature. In uncertainty sampling, which is among the most popular approaches, the active learner sequentially queries the label of those instances for which its current prediction is maximally uncertain. The predictions as well as the measures used to quantify the degree of uncertainty, such as entropy, are traditionally of a probabilistic nature. Yet, alternative approaches to capturing uncertainty in machine learning, alongside with corresponding uncertainty measures, have been proposed in recent years. In particular, some of these measures seek to distinguish different sources and to separate different types of uncertainty, such as the reducible (epistemic) and the irreducible (aleatoric) part of the total uncertainty in a prediction. The goal of this paper is to elaborate on the usefulness of such measures for uncertainty sampling, and to compare their performance in active learning. To this end, we instantiate uncertainty sampling with different measures, analyze the properties of the sampling strategies thus obtained, and compare them in an experimental study.

Download Full-text

Geometric Analysis of Uncertainty Sampling for Dense Neural Network Layer

IEEE Signal Processing Letters ◽

10.1109/lsp.2021.3072292 ◽

2021 ◽

pp. 1-1

Author(s):

Aziz Kocanaogullari ◽

Niklas Smedemark-Margulies ◽

Murat Akcakaya ◽

Deniz Erdogmus

Keyword(s):

Neural Network ◽

Geometric Analysis ◽

Network Layer ◽

Uncertainty Sampling

Download Full-text

Active learning with uncertainty sampling for large scale activity recognition in smart homes

Journal of Ambient Intelligence and Smart Environments ◽

10.3233/ais-170427 ◽

2017 ◽

Vol 9 (2) ◽

pp. 209-223 ◽

Cited By ~ 6

Author(s):

Hande Alemdar ◽

T.L.M. van Kasteren ◽

Cem Ersoy

Keyword(s):

Active Learning ◽

Activity Recognition ◽

Large Scale ◽

Smart Homes ◽

Uncertainty Sampling

Download Full-text

Cluster Ensemble Selection

Statistical Analysis and Data Mining The ASA Data Science Journal ◽

10.1002/sam.10008 ◽

2008 ◽

Vol 1 (3) ◽

pp. 128-141 ◽

Cited By ~ 95

Author(s):

Xiaoli Z. Fern ◽

Wei Lin

Keyword(s):

Cluster Ensemble ◽

Ensemble Selection

Download Full-text

ChemInform Abstract: Size-Induced Metal to Semiconductor Transition in a Stabilized Gold Cluster Ensemble.

ChemInform ◽

10.1002/chin.199827013 ◽

2010 ◽

Vol 29 (27) ◽

pp. no-no

Author(s):

A. W. SNOW ◽

H. WOHLTJEN

Keyword(s):

Gold Cluster ◽

Cluster Ensemble

Download Full-text

A Self-training Approach to Cost Sensitive Uncertainty Sampling

Machine Learning and Knowledge Discovery in Databases - Lecture Notes in Computer Science ◽

10.1007/978-3-642-04180-8_10 ◽

2009 ◽

pp. 10-10 ◽

Cited By ~ 2

Author(s):

Alexander Liu ◽

Goo Jun ◽

Joydeep Ghosh

Keyword(s):

Uncertainty Sampling ◽

Training Approach

Download Full-text

Cluster Ensemble-Based Image Segmentation

International Journal of Advanced Robotic Systems ◽

10.5772/56769 ◽

2013 ◽

Vol 10 (7) ◽

pp. 297 ◽

Cited By ~ 5

Author(s):

Xiaoru Wang ◽

Junping Du ◽

Shuzhe Wu ◽

Xu Li ◽

Fu Li

Keyword(s):

Image Segmentation ◽

Cluster Ensemble

Download Full-text

Adaptive fuzzy exponent cluster ensemble system based feature selection and spectral clustering

2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE) ◽

10.1109/fuzz-ieee.2017.8015721 ◽

2017 ◽

Cited By ~ 2

Author(s):

Abdelkarim Ben Ayed ◽

Mohamed Ben Halima ◽

Adel M. Alimi

Keyword(s):

Feature Selection ◽

Spectral Clustering ◽

Cluster Ensemble ◽

Adaptive Fuzzy

Download Full-text

PENERAPAN ANALISIS CLUSTER ENSEMBLE UNTUK MENGELOMPOKKAN PROVINSI DI INDONESIA BERDASARKAN INDIKATOR KESEHATAN LINGKUNGAN

Jurnal Matematika UNAND ◽

10.25077/jmu.8.1.323-330.2019 ◽

2019 ◽

Vol 8 (1) ◽

pp. 323

Author(s):

Yuliza Diana Putri ◽

Izzati Rahmi HG ◽

Hazmira Yozza

Keyword(s):

Cluster Ensemble ◽

Ensemble Cluster ◽

Cluster 2

Kesehatan lingkungan merupakan bagian dari pada kesehatan masyarakat pada umumnya. Setiap daerah memiliki keadaan kesehatan lingkungan yang berbedabeda jika dikaitkan dengan indikator kesehatan lingkungan tersebut. Oleh karena itu prioritas program penyehatan lingkungan pun berbeda pada setiap daerah. Suatu hal yang menarik untuk diketahui adalah bagaimana kesamaan/kemiripan dari masing-masing daerah tersebut berdasarkan indikator kesehatan lingkungan. Kemiripan tersebut selanjutnya dapat dijadikan dasar untuk melakukan pengelompokan daerah daerah tersebut, sehingga daerah yang memiliki kondisi kesehatan lingkungan yang hampir sama akan berada pada satu kelompok dan sebaliknya, daerah-daerah dengan kondisi kesehatan lingkungan yang tidak sama akan berada pada kelompok yang berbeda. Dengan adanya pengelompokan tersebut akan mempermudah pemerintah untuk menentukan prioritas bagi pembangunan kesehatan lingkungan di daerah-daerah tersebut. Dalam penelitian ini metode cluster ensemble akan diterapkan untuk mengelompokkan provinsi di Indonesia berdasarkan 8 indikator kesehatan lingkungan. Penelitian ini menghasilkan solusi pengklasteran terbaik yaitu solusi dengan 2 cluster, dimana anggota dari cluster 1 merupakan provinsi dengan lingkungan sehat yang lebih baik dibandingkan anggota dari cluster 2.Kata kunci: Cluster ensemble, cluster hierarki, k-means cluster

Download Full-text