A Tabu Search Based Algorithm for Clustering Categorical Data Sets

Author(s):  
Joyce C. Wong ◽  
Michael K. Ng

2002 ◽  
Vol 35 (12) ◽  
pp. 2783-2790 ◽  
Author(s):  
Michael K. Ng ◽  
Joyce C. Wong


2018 ◽  
Vol 11 (2) ◽  
pp. 53-67
Author(s):  
Ajay Kumar ◽  
Shishir Kumar

Several initial center selection algorithms are proposed in the literature for numerical data, but the values of the categorical data are unordered so, these methods are not applicable to a categorical data set. This article investigates the initial center selection process for the categorical data and after that present a new support based initial center selection algorithm. The proposed algorithm measures the weight of unique data points of an attribute with the help of support and then integrates these weights along the rows, to get the support of every row. Further, a data object having the largest support is chosen as an initial center followed by finding other centers that are at the greatest distance from the initially selected center. The quality of the proposed algorithm is compared with the random initial center selection method, Cao's method, Wu method and the method introduced by Khan and Ahmad. Experimental analysis on real data sets shows the effectiveness of the proposed algorithm.



2021 ◽  
Vol 8 (10) ◽  
pp. 43-50
Author(s):  
Truong et al. ◽  

Clustering is a fundamental technique in data mining and machine learning. Recently, many researchers are interested in the problem of clustering categorical data and several new approaches have been proposed. One of the successful and pioneering clustering algorithms is the Minimum-Minimum Roughness algorithm (MMR) which is a top-down hierarchical clustering algorithm and can handle the uncertainty in clustering categorical data. However, MMR tends to choose the category with less value leaf node with more objects, leading to undesirable clustering results. To overcome such shortcomings, this paper proposes an improved version of the MMR algorithm for clustering categorical data, called IMMR (Improved Minimum-Minimum Roughness). Experimental results on actual data sets taken from UCI show that the IMMR algorithm outperforms MMR in clustering categorical data.



2016 ◽  
Vol 78 (6-13) ◽  
Author(s):  
Azlin Ahmad ◽  
Rubiyah Yusof

The Kohonen Self-Organizing Map (KSOM) is one of the Neural Network unsupervised learning algorithms. This algorithm is used in solving problems in various areas, especially in clustering complex data sets. Despite its advantages, the KSOM algorithm has a few drawbacks; such as overlapped cluster and non-linear separable problems. Therefore, this paper proposes a modified KSOM that inspired from pheromone approach in Ant Colony Optimization. The modification is focusing on the distance calculation amongst objects. The proposed algorithm has been tested on four real categorical data that are obtained from UCI machine learning repository; Iris, Seeds, Glass and Wisconsin Breast Cancer Database. From the results, it shows that the modified KSOM has produced accurate clustering result and all clusters can clearly be identified.



2019 ◽  
Vol 22 (09) ◽  
pp. 1533-1544 ◽  
Author(s):  
Andrew van Horn ◽  
Charles A Weitz ◽  
Kathryn M Olszowy ◽  
Kelsey N Dancause ◽  
Cheng Sun ◽  
...  

AbstractObjectiveThe present study evaluates the use of multiple correspondence analysis (MCA), a type of exploratory factor analysis designed to reduce the dimensionality of large categorical data sets, in identifying behaviours associated with measures of overweight/obesity in Vanuatu, a rapidly modernizing Pacific Island country.DesignStarting with seventy-three true/false questions regarding a variety of behaviours, MCA identified twelve most significantly associated with modernization status and transformed the aggregate binary responses of participants to these twelve questions into a linear scale. Using this scale, individuals were separated into three modernization groups (tertiles) among which measures of body fat were compared and OR for overweight/obesity were computed.SettingVanuatu.ParticipantsNi-Vanuatu adults (n 810) aged 20–85 years.ResultsAmong individuals in the tertile characterized by positive responses to most of or all the twelve modernization questions, weight and measures of body fat and the likelihood that measures of body fat were above the US 75th percentile were significantly greater compared with individuals in the tertiles characterized by mostly or partly negative responses.ConclusionsThe study indicates that MCA can be used to identify individuals or groups at risk for overweight/obesity, based on answers to simply-put questions. MCA therefore may be useful in areas where obtaining detailed information about modernization status is constrained by time, money or manpower.



Author(s):  
Hongwei Du ◽  
Qiang Ye ◽  
Zhipeng Sun ◽  
Chuang Liu ◽  
Wen Xu


2011 ◽  
Vol 181-182 ◽  
pp. 760-764
Author(s):  
Yun Yao Li ◽  
Chang Shi Liu

The vehicle routing problem with delivery and pick-up service was considered in this paper. A tabu search was proposed to determine the optimal set of routes to totally satisfy both the delivery and pick-up demand. Performances are compared with other heuristics appeared in the literature recently by the bench-mark data sets. The computational results show that the proposed approaches produce high quality results within a reasonable computing time.



2017 ◽  
Vol 2017 ◽  
pp. 1-7 ◽  
Author(s):  
Hongfang Zhou ◽  
Yihui Zhang ◽  
Yibin Liu

Thek-modes clustering algorithm has been widely used to cluster categorical data. In this paper, we firstly analyzed thek-modes algorithm and its dissimilarity measure. Based on this, we then proposed a novel dissimilarity measure, which is named as GRD. GRD considers not only the relationships between the object and all cluster modes but also the differences of different attributes. Finally the experiments were made on four real data sets from UCI. And the corresponding results show that GRD achieves better performance than two existing dissimilarity measures used ink-modes and Cao’s algorithms.



Sign in / Sign up

Export Citation Format

Share Document