Universality of Logarithmic Loss in Fixed-Length Lossy Compression

We established a universality of logarithmic loss over a finite alphabet as a distortion criterion in fixed-length lossy compression. For any fixed-length lossy-compression problem under an arbitrary distortion criterion, we show that there is an equivalent lossy-compression problem under logarithmic loss. The equivalence is in the strong sense that we show that finding good schemes in corresponding lossy compression under logarithmic loss is essentially equivalent to finding good schemes in the original problem. This equivalence relation also provides an algebraic structure in the reconstruction alphabet, which allows us to use known techniques in the clustering literature. Furthermore, our result naturally suggests a new clustering algorithm in the categorical data-clustering problem.

Download Full-text

Improving categorical data clustering algorithm by weighting uncommon attribute value matches

Computer Science and Information Systems ◽

10.2298/csis0601023h ◽

2006 ◽

Vol 3 (1) ◽

pp. 23-32 ◽

Cited By ~ 2

Author(s):

Zengyou He ◽

Xiaofei Xu ◽

Shenchun Deng

Keyword(s):

Categorical Data ◽

Data Clustering ◽

Clustering Algorithm ◽

Real Life ◽

Experimental Results ◽

Categorical Data Clustering ◽

Modified Algorithm ◽

Attribute Value

This paper presents an improved Squeezer algorithm for categorical data clustering by giving greater weight to uncommon attribute value matches in similarity computations. Experimental results on real life datasets show that, the modified algorithm is superior to the original Squeezer algorithm and other clustering algorithm with respect to clustering accuracy.

Download Full-text

A categorical data clustering algorithm and its efficient parallel implementation

2016 5th International Conference on Computer Science and Network Technology (ICCSNT) ◽

10.1109/iccsnt.2016.8070153 ◽

2016 ◽

Author(s):

Xiangwu Ding ◽

Jia Tan ◽

Mei Wang

Keyword(s):

Categorical Data ◽

Data Clustering ◽

Clustering Algorithm ◽

Parallel Implementation ◽

Categorical Data Clustering

Download Full-text

Generalized similarity measure for categorical data clustering

2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI) ◽

10.1109/icacci.2016.7732138 ◽

2016 ◽

Cited By ~ 1

Author(s):

Shruti Sharma ◽

Manoj Singh

Keyword(s):

Similarity Measure ◽

Categorical Data ◽

Data Clustering ◽

Categorical Data Clustering

Download Full-text

A Novel Cosine Similarity Like Data Clustering Method for Effective Data Classification in Data Mining

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.h6417.069820 ◽

2020 ◽

Vol 9 (8) ◽

pp. 340-346

Keyword(s):

Data Mining ◽

Similarity Measure ◽

Categorical Data ◽

Data Clustering ◽

Similarity Measures ◽

Numerical Data ◽

Data Classification ◽

Fundamental Goal ◽

Learning Technique ◽

Categorical Data Clustering

In data mining ample techniques use distance based measures for data clustering. Improving clustering performance is the fundamental goal in cluster domain related tasks. Many techniques are available for clustering numerical data as well as categorical data. Clustering is an unsupervised learning technique and objects are grouped or clustered based on similarity among the objects. A new cluster similarity finding measure, which is cosine like cluster similarity measure (CLCSM), is proposed in this paper. The proposed cluster similarity measure is used for data classification. Extensive experiments are conducted by taking UCI machine learning datasets. The experimental results have shown that the proposed cosinelike cluster similarity measure is superior to many of the existing cluster similarity measures for data classification.

Download Full-text