Uncertainty Sampling for Constrained Cluster Ensemble

Author(s):  
Masayuki Okabe ◽  
Seiji Yamada
Author(s):  
Masayuki Okabe ◽  
◽  
Seiji Yamada ◽  

Constrained clustering is a framework for improving clustering performance by using constraints about data pairs. Since performance of constrained clustering depends on the set of constraints used, a method is needed to select good constraints that promote clustering performance. In this paper, we propose an active sampling method working with a constrained cluster ensemble algorithm that aggregates clustering results that a modified COP-Kmeans iteratively produces by changing the priorities of constraints. Our method follows the approach of uncertainty sampling and measures uncertainty using variations of clustering results where data pairs are clustered together in some results but not in others. It selects the data pair to be labeled that has the most variable result during cluster ensemble process. Experimental results show that our method outperforms random sampling. We further investigate the effect of important parameters.


2021 ◽  
Author(s):  
Vu-Linh Nguyen ◽  
Mohammad Hossein Shaker ◽  
Eyke Hüllermeier

AbstractVarious strategies for active learning have been proposed in the machine learning literature. In uncertainty sampling, which is among the most popular approaches, the active learner sequentially queries the label of those instances for which its current prediction is maximally uncertain. The predictions as well as the measures used to quantify the degree of uncertainty, such as entropy, are traditionally of a probabilistic nature. Yet, alternative approaches to capturing uncertainty in machine learning, alongside with corresponding uncertainty measures, have been proposed in recent years. In particular, some of these measures seek to distinguish different sources and to separate different types of uncertainty, such as the reducible (epistemic) and the irreducible (aleatoric) part of the total uncertainty in a prediction. The goal of this paper is to elaborate on the usefulness of such measures for uncertainty sampling, and to compare their performance in active learning. To this end, we instantiate uncertainty sampling with different measures, analyze the properties of the sampling strategies thus obtained, and compare them in an experimental study.


2021 ◽  
pp. 1-1
Author(s):  
Aziz Kocanaogullari ◽  
Niklas Smedemark-Margulies ◽  
Murat Akcakaya ◽  
Deniz Erdogmus

2008 ◽  
Vol 1 (3) ◽  
pp. 128-141 ◽  
Author(s):  
Xiaoli Z. Fern ◽  
Wei Lin

10.5772/56769 ◽  
2013 ◽  
Vol 10 (7) ◽  
pp. 297 ◽  
Author(s):  
Xiaoru Wang ◽  
Junping Du ◽  
Shuzhe Wu ◽  
Xu Li ◽  
Fu Li

2019 ◽  
Vol 8 (1) ◽  
pp. 323
Author(s):  
Yuliza Diana Putri ◽  
Izzati Rahmi HG ◽  
Hazmira Yozza

Kesehatan lingkungan merupakan bagian dari pada kesehatan masyarakat pada umumnya. Setiap daerah memiliki keadaan kesehatan lingkungan yang berbedabeda jika dikaitkan dengan indikator kesehatan lingkungan tersebut. Oleh karena itu prioritas program penyehatan lingkungan pun berbeda pada setiap daerah. Suatu hal yang menarik untuk diketahui adalah bagaimana kesamaan/kemiripan dari masing-masing daerah tersebut berdasarkan indikator kesehatan lingkungan. Kemiripan tersebut selanjutnya dapat dijadikan dasar untuk melakukan pengelompokan daerah daerah tersebut, sehingga daerah yang memiliki kondisi kesehatan lingkungan yang hampir sama akan berada pada satu kelompok dan sebaliknya, daerah-daerah dengan kondisi kesehatan lingkungan yang tidak sama akan berada pada kelompok yang berbeda. Dengan adanya pengelompokan tersebut akan mempermudah pemerintah untuk menentukan prioritas bagi pembangunan kesehatan lingkungan di daerah-daerah tersebut. Dalam penelitian ini metode cluster ensemble akan diterapkan untuk mengelompokkan provinsi di Indonesia berdasarkan 8 indikator kesehatan lingkungan. Penelitian ini menghasilkan solusi pengklasteran terbaik yaitu solusi dengan 2 cluster, dimana anggota dari cluster 1 merupakan provinsi dengan lingkungan sehat yang lebih baik dibandingkan anggota dari cluster 2.Kata kunci: Cluster ensemble, cluster hierarki, k-means cluster


Sign in / Sign up

Export Citation Format

Share Document