A Complementary Optimization Procedure for Final Cluster Analysis of Clustering Categorical Data

Author(s):  
Ali Seman ◽  
Azizian Mohd Sapawi
Author(s):  
Maria M. Suarez-Alvarez ◽  
Duc-Truong Pham ◽  
Mikhail Y. Prostov ◽  
Yuriy I. Prostov

Normalization of feature vectors of datasets is widely used in a number of fields of data mining, in particular in cluster analysis, where it is used to prevent features with large numerical values from dominating in distance-based objective functions. In this study, a unified statistical approach to normalization of all attributes of mixed databases, when different metrics are used for numerical and categorical data, is proposed. After the proposed normalization, the contributions of both numerical and categorical attributes to a specified objective function are statistically the same. Formulae for the statistically normalized Minkowski mixed p -metrics are given in an explicit way. It is shown that the classic z -score standardization and the min–max normalization are particular cases of the statistical normalization, when the objective function is, respectively, based on the Euclidean or the Tchebycheff (Chebyshev) metrics. Finally, clustering of several benchmark datasets is performed with non-normalized and introduced normalized mixed metrics using either the k -prototypes (for p =2) or another algorithm (for p ≠2).


2021 ◽  
Author(s):  
Marija Eric

The purpose of this thesis is to develop a methodology for hydrological modelling the performance of Low Impact Development technologies using an Urban Hydrological Response Unit approach. The K-Means Cluster Analysis procedure was carried out to create clusters of lot parcels which represented the Urban Hydrological Response Units. Different sampling methods were used to select lots from each of the clusters to model before and after Low Impact Development implementation. The runoff response (m3) of an approximate final cluster centre was used to calculate the total runoff (m3) of each cluster. After adding the total runoff (m3) for a group of 15 clusters, the benchmark runoff value (m3) from modelling all lots was closely approached with and without Low Impact Development. A random sample of 7 % and 90 % of lots from each cluster for a group of three clusters closely approached the benchmark runoff value (m3) for both no Low Impact Development and Low Impact Development respectively.


2012 ◽  
Vol 18 (2) ◽  
pp. 98-111 ◽  
Author(s):  
Shelly Campo ◽  
Natoshia M. Askelson ◽  
Knute D. Carter ◽  
Mary Losch

Half of all pregnancies in young adult women are unintended, but few interventions have been successful in encouraging contraceptive use. The group heterogeneity likely contributes to the lack of success. Segmenting based on theories that provide meaningful information may improve tailoring and targeting of behavioral interventions. Previous research has indicated that threat, efficacy, and fear were important factors in influencing intentions to use contraceptives; therefore, the extended parallel process model (EPPM) was used for this cluster analysis. A telephone survey of randomly selected 18- to 30-year-old women in Iowa was conducted ( N = 401). The constructs of EPPM and age were used for conducting a K means cluster analysis with four clusters. The cluster analysis pointed to the importance of fear, perceived susceptibility, and age. All of the clusters had varying degrees of ambivalence about the severity of a pregnancy. Cluster 1 (27.8%) had high susceptibility, with little fear. Cluster 2 (23.8%) had high efficacy and higher fear. The third cluster (34.7%) was not fearful and had low susceptibility. The final cluster (13.8%) was younger than the other groups and had the lowest efficacy. Additional analyses were conducted to explore how the clusters varied on other variables. The clusters help campaign developers prioritize audiences and tailor messages.


2021 ◽  
Author(s):  
Marija Eric

The purpose of this thesis is to develop a methodology for hydrological modelling the performance of Low Impact Development technologies using an Urban Hydrological Response Unit approach. The K-Means Cluster Analysis procedure was carried out to create clusters of lot parcels which represented the Urban Hydrological Response Units. Different sampling methods were used to select lots from each of the clusters to model before and after Low Impact Development implementation. The runoff response (m3) of an approximate final cluster centre was used to calculate the total runoff (m3) of each cluster. After adding the total runoff (m3) for a group of 15 clusters, the benchmark runoff value (m3) from modelling all lots was closely approached with and without Low Impact Development. A random sample of 7 % and 90 % of lots from each cluster for a group of three clusters closely approached the benchmark runoff value (m3) for both no Low Impact Development and Low Impact Development respectively.


Genetika ◽  
2016 ◽  
Vol 48 (1) ◽  
pp. 219-232 ◽  
Author(s):  
Srbislav Dencic ◽  
Ron Depauw ◽  
Vojislava Momcilovic ◽  
Vladimir Acin

The objective of this study was to compared fourteen different similarity coefficients and their influence in sister line wheat cultivars clustering. Seventeen sister cultivars developed from two crosses were used and fingerprinted with 19 wheat microsatellite markers. Comparisons among the similarity coefficients were made using the Sperman correlation analysis, dendogram evaluation (visual inspection and consensus fork index - CIc), projection efficiency in a two-dimensional space, and groups formed by the Tocher optimization procedure. The Sperman correlation coefficients among the fourteen similarity coefficients were all high showing a strong association between them. The correlation coefficient between Dice and Kulczinski and Ochiai I as well as between Hamann and Simple matching and between Kulczinski and Ochiai I was equal to 1. Although visual estimation of the dendograms shows almost identical clustering structures, CIc indexes indicate that all coefficients are not identical.


Sign in / Sign up

Export Citation Format

Share Document