In privacy preserving data publishing, to reduce the correlation loss between
sensitive attribute (SA) and non-sensitive attributes(NSAs) caused by
anonymization methods (such as generalization, anatomy, slicing and
randomization, etc.), the records with same NSAs values should be divided
into same blocks to meet the anonymizing demands of ?-diversity. However,
there are often many blocks (of the initial partition), in which there are
more than ? records with different SA values, and the frequencies of
different SA values are uneven. Therefore, anonymization on the initial
partition causes more correlation loss. To reduce the correlation loss as far
as possible, in this paper, an optimizing model is first proposed. Then
according to the optimizing model, the refining partition of the initial
partition is generated, and anonymization is applied on the refining
partition. Although anonymization on refining partition can be used on top of
any existing partitioning method to reduce the correlation loss, we
demonstrate that a new partitioning method tailored for refining partition
could further improve data utility. An experimental evaluation shows that our
approach could efficiently reduce correlation loss.