Cluster ensemble selection using balanced normalized mutual information

2020 ◽  
Vol 39 (3) ◽  
pp. 3033-3055
Author(s):  
Zecong Wang ◽  
Hamid Parvin ◽  
Sultan Noman Qasem ◽  
Bui Anh Tuan ◽  
Kim-Hung Pho

A bad partition in an ensemble will be removed by a cluster ensemble selection framework from the final ensemble. It is the main idea in cluster ensemble selection to remove these partitions (bad partitions) from the selected ensemble. But still, it is likely that one of them contains some reliable clusters. Therefore, it may be reasonable to apply the selection phase on cluster level. To do this, a cluster evaluation metric is needed. Some of these metrics have been recently introduced; each of them has its limitations. The weak points of each method have been addressed in the paper. Subsequently, a new metric for cluster assessment has been introduced. The new measure is named Balanced Normalized Mutual Information (BNMI) criterion. It balances the deficiency of the traditional NMI-based criteria. Additionally, an innovative cluster ensemble approach has been proposed. To create the consensus partition considering the elected clusters, a set of different aggregation-functions (called also consensus-functions) have been utilized: the ones which are based upon the co-association matrix (CAM), the ones which are based on hyper graph partitioning algorithms, and the ones which are based upon intermediate space. The experimental study indicates that the state-of-the-art cluster ensemble methods are outperformed by the proposed cluster ensemble approach.

2020 ◽  
Vol 176 (1) ◽  
pp. 79-102
Author(s):  
Chenyue Zhao ◽  
Hosein Alizadeh ◽  
Behrouz Minaei ◽  
Majid Mohamadpoor ◽  
Hamid Parvin ◽  
...  

This paper studies the cluster ensemble selection problem for unsupervised learning. Given a large ensemble of clustering solutions, our goal is to select a subset of solutions to form a smaller yet better performing cluster ensemble than using all available solutions. The common way of aggregating the chosen solutions is accumulating the information of the selected results to a similarity matrix. This paper suggests transforming the similarity matrix to a modularity matrix and then applying a new consensus function which optimizes modularity measure in it. We represent the modularity maximization problem as a 0-1 quadratic program which can be exactly solved for small datasets. We also established a new greedy algorithm, namely sum linkage, to optimize the objective function specially for large scale datasets in a very short time. We show that the proposed consensus partition gets much closer to the actual cluster structure than the partitions obtained from the direct application of common cluster ensemble methods. The promising results compared with other most cited consensus functions show the excellent efficiency of the proposed method.


2008 ◽  
Vol 1 (3) ◽  
pp. 128-141 ◽  
Author(s):  
Xiaoli Z. Fern ◽  
Wei Lin

2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Mostafa El Habib Daho ◽  
Nesma Settouti ◽  
Mohammed El Amine Bechar ◽  
Amina Boublenza ◽  
Mohammed Amine Chikh

PurposeEnsemble methods have been widely used in the field of pattern recognition due to the difficulty of finding a single classifier that performs well on a wide variety of problems. Despite the effectiveness of these techniques, studies have shown that ensemble methods generate a large number of hypotheses and that contain redundant classifiers in most cases. Several works proposed in the state of the art attempt to reduce all hypotheses without affecting performance.Design/methodology/approachIn this work, the authors are proposing a pruning method that takes into consideration the correlation between classifiers/classes and each classifier with the rest of the set. The authors have used the random forest algorithm as trees-based ensemble classifiers and the pruning was made by a technique inspired by the CFS (correlation feature selection) algorithm.FindingsThe proposed method CES (correlation-based Ensemble Selection) was evaluated on ten datasets from the UCI machine learning repository, and the performances were compared to six ensemble pruning techniques. The results showed that our proposed pruning method selects a small ensemble in a smaller amount of time while improving classification rates compared to the state-of-the-art methods.Originality/valueCES is a new ordering-based method that uses the CFS algorithm. CES selects, in a short time, a small sub-ensemble that outperforms results obtained from the whole forest and the other state-of-the-art techniques used in this study.


Sign in / Sign up

Export Citation Format

Share Document