Two-phase entropy based approach to big data anonymization

Big data privacy preservation is one of the most disturbed issues in current industry. Sometimes the data privacy problems never identified when input data is published on cloud environment. Data privacy preservation in hadoop deals in hiding and publishing input dataset to the distributed environment. In this paper investigate the problem of big data anonymization for privacy preservation from the perspectives of scalability and time factor etc. At present, many cloud applications with big data anonymization faces the same kind of problems. For recovering this kind of problems, here introduced a data anonymization algorithm called Two Phase Top-Down Specialization (TPTDS) algorithm that is implemented in hadoop. For the data anonymization-45,222 records of adults information with 15 attribute values was taken as the input big data. With the help of multidimensional anonymization in map reduce framework, here implemented proposed Two-Phase Top-Down Specialization anonymization algorithm in hadoop and it will increases the efficiency on the big data processing system. By conducting experiment in both one dimensional and multidimensional map reduce framework with Two Phase Top-Down Specialization algorithm on hadoop, the better result shown in multidimensional anonymization on input adult dataset. Data sets is generalized in a top-down manner and the better result was shown in multidimensional map reduce framework by the better IGPL values generated by the algorithm. The anonymization was performed with specialization operation on taxonomy tree. The experiment shows that the solutions improves the IGPL values, anonymity parameter and decreases the execution time of big data privacy preservation by compared to the existing algorithm. This experimental result will leads to great application to the distributed environment.

Download Full-text

Scalable Two-Phase Top-Down Specification for Big Data Anonymization Using Apache Pig

Advances in Intelligent Systems and Computing - Advances in Artificial Intelligence and Data Engineering ◽

10.1007/978-981-15-3514-7_75 ◽

2020 ◽

pp. 1009-1021

Author(s):

Anushree Raj ◽

Rio D’Souza

Keyword(s):

Big Data ◽

Top Down ◽

Two Phase ◽

Data Anonymization ◽

Apache Pig

Download Full-text

Population-Based Linkage of Big Data in Dental Research

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph15112357 ◽

2018 ◽

Vol 15 (11) ◽

pp. 2357 ◽

Cited By ~ 6

Author(s):

Tim Joda ◽

Tuomas Waltimo ◽

Christiane Pauli-Magnus ◽

Nicole Probst-Hensch ◽

Nicola Zitzmann

Keyword(s):

Clinical Trials ◽

Big Data ◽

Healthcare Services ◽

Population Based ◽

Services Research ◽

Research Education ◽

Data Anonymization ◽

Dental Research ◽

Epidemiological Surveys ◽

Patient Consent

Population-based linkage of patient-level information opens new strategies for dental research to identify unknown correlations of diseases, prognostic factors, novel treatment concepts and evaluate healthcare systems. As clinical trials have become more complex and inefficient, register-based controlled (clinical) trials (RC(C)T) are a promising approach in dental research. RC(C)Ts provide comprehensive information on hard-to-reach populations, allow observations with minimal loss to follow-up, but require large sample sizes with generating high level of external validity. Collecting data is only valuable if this is done systematically according to harmonized and inter-linkable standards involving a universally accepted general patient consent. Secure data anonymization is crucial, but potential re-identification of individuals poses several challenges. Population-based linkage of big data is a game changer for epidemiological surveys in Public Health and will play a predominant role in future dental research by influencing healthcare services, research, education, biotechnology, insurance, social policy and governmental affairs.

Download Full-text

A Survey on Data Anonymization Using Mapreduce on Cloud with Scalable Two-Phase Top-Down Approach

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.20.14773 ◽

2018 ◽

Vol 7 (2.20) ◽

pp. 254

Author(s):

M Dhasaratham ◽

R P. Singh

Keyword(s):

Large Scale ◽

Public Information ◽

Top Down ◽

Two Phase ◽

Data Anonymization ◽

Security Issues ◽

Cloud Applications ◽

Massive Information ◽

Broad Scale

Endless forces anticipate that customers can cut non-public information like electronic prosperity records for information examination or mining, transferral security issues. Anonymizing instructional accumulations by ways for hypothesis to satisfy bound assurance necessities, parenthetically, k-anonymity may be a for the foremost half used arrangement of security shielding frameworks. At appear, the live of information in varied cloud applications augments massively consistent with the massive information slant, on these lines creating it a take a look at for habitually used programming instruments to confine, supervise, and method such large scale information within an appropriate snuck hobby. during this manner, it's a take a look at for existing anonymization approaches to manage accomplish security preservation on insurance sensitive monumental scale instructive files as a results of their insufficiency of skillfulness. during this paper, we have a tendency to propose a versatile 2 part top-down specialization (TDS) to anonymize broad scale instructive accumulations victimisation the MapReduce structure on cloud. In mboth times of our approach, we have a tendency to advisedly layout a affair of innovative MapReduce occupations to determinedly accomplish the specialization reckoning in an awfully versatile means. wildcat assessment happens demonstrate that with our approach, the flexibleness and adequacy of TDS may be basically redesigned over existing philosophies.

Download Full-text

Operations strategy of cloud-based firms: achieving firm growth in the Big Data era

International Journal of Operations & Production Management ◽

10.1108/ijopm-01-2019-0089 ◽

2019 ◽

Vol 40 (6) ◽

pp. 873-896 ◽

Cited By ~ 1

Author(s):

Yongyi Shou ◽

Xinyu Zhao ◽

Lujie Chen

Keyword(s):

Big Data ◽

Content Analysis ◽

Operations Management ◽

Firm Growth ◽

Seemingly Unrelated Regression ◽

Second Phase ◽

Operations Strategy ◽

Two Phase ◽

Content Type ◽

Operations Capabilities

Purpose Cloud computing is a major enabling technology for Industry 4.0 and the Big Data era. However, cloud-based firms, who establish their businesses on cloud platforms, have received scant attention in the extant operations management (OM) literature. To narrow this gap, the purpose of this paper is to investigate cloud-based firms from an operations strategy perspective. Design/methodology/approach A two-phase multi-method approach was adopted. In the first phase, content analysis of 27 reports from cloud-based firms was conducted, aided by text mining keyword extraction. Two data-related operations capabilities were identified and hypotheses were posited regarding the relationships between data resources (DR), operations capabilities and firm growth (FG). In the second phase, a sample of 190 cloud-based firms was collected. Seemingly unrelated regression and bootstrapping method were employed to test the proposed hypotheses using the survey data. Findings The content analysis indicates data as a key resource and both data processing capability and data transformational capability as critical operations capabilities of cloud-based firms. FG is regarded as a top priority in the cloud context. The regression results indicate that DR and the two capabilities contribute to the growth of cloud-based firms. Moreover, a follow-up bootstrapping analysis reveals that the mediating effects of the two capabilities vary between different types of FG. Originality/value To the authors’ best knowledge, this is one of the first OM studies on cloud-based firms. This study extends the operations strategy literature by identifying and testing the key operations capabilities and priorities of cloud-based firms. It also provides insightful implications for industrial practitioners.

Download Full-text

Big Data Anonymization Requirements vs Privacy Models

Proceedings of the 15th International Joint Conference on e-Business and Telecommunications ◽

10.5220/0006830003050312 ◽

2018 ◽

Author(s):

Josep Domingo-Ferrer

Keyword(s):

Big Data ◽

Data Anonymization ◽

Privacy Models

Download Full-text

A Scalable Two Phase Top Down Specialization Approach For Data Anonymization Using Mapreduce On Cloud

International Journal of Computer Trends and Technology ◽

10.14445/22312803/ijctt-v45p110 ◽

2017 ◽

Vol 45 (1) ◽

pp. 50-53

Author(s):

Sameesha Vs ◽

Keyword(s):

Top Down ◽

Two Phase ◽

Data Anonymization

Download Full-text

A Scalable Two-Phase Top-Down Specialization Approach for Data Anonymization Using Map Reduce on Cloud

International Journal of Computer Applications Technology and Research ◽

10.7753/ijcatr0405.1015 ◽

2015 ◽

Vol 4 (5) ◽

pp. 409-413

Author(s):

R. Thaayumaanavan ◽

N. Priya ◽

J. Balaguru

Keyword(s):

Map Reduce ◽

Top Down ◽

Two Phase ◽

Data Anonymization

Download Full-text

A Two-Phase Dynamic Throughput Optimization Model for Big Data Transfers

IEEE Transactions on Parallel and Distributed Systems ◽

10.1109/tpds.2020.3012929 ◽

2021 ◽

Vol 32 (2) ◽

pp. 269-280

Author(s):

MD S Q Zulkar Nine ◽

Tevfik Kosar

Keyword(s):

Big Data ◽

Optimization Model ◽

Throughput Optimization ◽

Two Phase ◽

Phase Dynamic ◽

Data Transfers

Download Full-text

A Secure Protocol for High-Dimensional Big Data Providing Data Privacy

Research Anthology on Privatizing and Securing Data ◽

10.4018/978-1-7998-8954-0.ch015 ◽

2021 ◽

pp. 327-343

Author(s):

Anitha J. ◽

Prasad S. P.

Keyword(s):

Big Data ◽

Data Storage ◽

Data Privacy ◽

Personal Information ◽

Technological Development ◽

High Dimensional ◽

Sensitive Information ◽

Data Anonymization ◽

Secure Protocol ◽

Data Owner

Due to recent technological development, a huge amount of data generated by social networking, sensor networks, internet, etc., adds more challenges when performing data storage and processing tasks. During PPDP, the collected data may contain sensitive information about the data owner. Directly releasing this for further processing may violate the privacy of the data owner, hence data modification is needed so that it does not disclose any personal information. The existing techniques of data anonymization have a fixed scheme with a small number of dimensions. There are various types of attacks on the privacy of data like linkage attack, homogeneity attack, and background knowledge attack. To provide an effective technique in big data to maintain data privacy and prevent linkage attacks, this paper proposes a privacy preserving protocol, UNION, for a multi-party data provider. Experiments show that this technique provides a better data utility to handle high dimensional data, and scalability with respect to the data size compared with existing anonymization techniques.

Download Full-text