An Efficient and Dynamic Concept Hierarchy Generation for Data Anonymization

Author(s):  
Sri Krishna Adusumalli ◽  
V. Valli Kumari
2014 ◽  
Vol 13 (04) ◽  
pp. 1450035 ◽  
Author(s):  
Valli Kumari Vatsavayi ◽  
Sri Krishna Adusumalli

Explosive growth of information in the Internet has raised threats for individual privacy. k-Anonymity and l-diversity are two known techniques proposed to address the threats. They use concept hierarchy tree (CHT)-based generalization/suppression. For a given attribute several CHTs can be constructed. An appropriate CHT is to be chosen for attribute anonymization to be effective. This paper discusses an on the fly approach for constructing CHT which can be used for generalization/suppression. Furthermore to improve anonymization the CHT can be dynamically adjusted for a given k value. Performance evaluation is done for the proposed approach and a comparative study is performed against known methods, k-member clustering anonymization and mondrian multi-dimensional algorithm using (1) improved on the fly hierarchy (IOTF) (Campan et al., 2011), (2) on the fly hierarchy (OTF) (Campan and Cooper, 2010), (3) hierarchy free (HF) (LeFevre et al., 2006), (4) predefined hierarchy (PH) (Iyengar, 2002) (5) CHU (Chu and Chiang, 1994) and (6) HAN (Han and Fu, 1994) methods. The metrics used for evaluation are (a) information loss, (b) discernibility metric, (c) normalized average equivalence size metric. Experimental results indicate that our approach is more effective and flexible and the utility is 12% better than IOTF, 16% better than OTF and CHU, 17% better than PH and 21% better than HAN methods when applied on mondrain multi-dimensional algorithm. Experiments are conducted on k-member clustering technique and it is observed that our approach improved utility 1% better than IOTF, 2% better than OTF, 3% better than CHU, 5% better than PH and 14% better than HAN methods.


Author(s):  
Shalin Eliabeth S. ◽  
Sarju S.

Big data privacy preservation is one of the most disturbed issues in current industry. Sometimes the data privacy problems never identified when input data is published on cloud environment. Data privacy preservation in hadoop deals in hiding and publishing input dataset to the distributed environment. In this paper investigate the problem of big data anonymization for privacy preservation from the perspectives of scalability and time factor etc. At present, many cloud applications with big data anonymization faces the same kind of problems. For recovering this kind of problems, here introduced a data anonymization algorithm called Two Phase Top-Down Specialization (TPTDS) algorithm that is implemented in hadoop. For the data anonymization-45,222 records of adults information with 15 attribute values was taken as the input big data. With the help of multidimensional anonymization in map reduce framework, here implemented proposed Two-Phase Top-Down Specialization anonymization algorithm in hadoop and it will increases the efficiency on the big data processing system. By conducting experiment in both one dimensional and multidimensional map reduce framework with Two Phase Top-Down Specialization algorithm on hadoop, the better result shown in multidimensional anonymization on input adult dataset. Data sets is generalized in a top-down manner and the better result was shown in multidimensional map reduce framework by the better IGPL values generated by the algorithm. The anonymization was performed with specialization operation on taxonomy tree. The experiment shows that the solutions improves the IGPL values, anonymity parameter and decreases the execution time of big data privacy preservation by compared to the existing algorithm. This experimental result will leads to great application to the distributed environment.


Author(s):  
Shuting Wang ◽  
Chen Liang ◽  
Zhaohui Wu ◽  
Kyle Williams ◽  
Bart Pursel ◽  
...  
Keyword(s):  

Author(s):  
Tim Joda ◽  
Tuomas Waltimo ◽  
Christiane Pauli-Magnus ◽  
Nicole Probst-Hensch ◽  
Nicola Zitzmann

Population-based linkage of patient-level information opens new strategies for dental research to identify unknown correlations of diseases, prognostic factors, novel treatment concepts and evaluate healthcare systems. As clinical trials have become more complex and inefficient, register-based controlled (clinical) trials (RC(C)T) are a promising approach in dental research. RC(C)Ts provide comprehensive information on hard-to-reach populations, allow observations with minimal loss to follow-up, but require large sample sizes with generating high level of external validity. Collecting data is only valuable if this is done systematically according to harmonized and inter-linkable standards involving a universally accepted general patient consent. Secure data anonymization is crucial, but potential re-identification of individuals poses several challenges. Population-based linkage of big data is a game changer for epidemiological surveys in Public Health and will play a predominant role in future dental research by influencing healthcare services, research, education, biotechnology, insurance, social policy and governmental affairs.


Sign in / Sign up

Export Citation Format

Share Document