Privacy Preserving Data Mining on Unstructured Data

Privacy and Security Policies in Big Data - Advances in Information Security, Privacy, and Ethics ◽

10.4018/978-1-5225-2486-1.ch008 ◽

2017 ◽

pp. 167-190

Author(s):

Trupti Vishwambhar Kenekar ◽

Ajay R. Dani

Keyword(s):

Data Mining ◽

Big Data ◽

Structure Data ◽

Data Privacy ◽

Differential Privacy ◽

Unstructured Data ◽

Map Reduce ◽

Individual Data ◽

Data Set ◽

Privacy Preserving Data Mining

As Big Data is group of structured, unstructured and semi-structure data collected from various sources, it is important to mine and provide privacy to individual data. Differential Privacy is one the best measure which provides strong privacy guarantee. The chapter proposed differentially private frequent item set mining using map reduce requires less time for privately mining large dataset. The chapter discussed problem of preserving data privacy, different challenges to preserving data privacy in big data environment, Data privacy techniques and their applications to unstructured data. The analyses of experimental results on structured and unstructured data set are also presented.

Download Full-text

Map Reduce clustering in Incremental Big Data processing

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.b6606.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 4205-4211

Keyword(s):

Data Mining ◽

Big Data ◽

Social Network ◽

Data Processing ◽

Online Shopping ◽

Processing Technique ◽

Map Reduce ◽

Data Set ◽

Computation Procedure ◽

Incremental Processing

An advanced Incremental processing technique is planned for data examination in knowledge to have the clustering results inform. Data is continuously arriving by different data generating factors like social network, online shopping, sensors, e-commerce etc. [1]. On account of this Big Data the consequences of data mining applications getting stale and neglected after some time. Cloud knowledge applications regularly perform iterative calculations (e.g., PageRank) on continuously converting datasets. Though going before trainings grow Map-Reduce aimed at productive iterative calculations, it's miles also pricey to carry out a whole new big-ruler Map-Reduce iterative task near well-timed quarter new adjustments to fundamental records sets. Our usage of MapReduce keeps running [4] scheduled a big cluster of product technologies and is incredibly walkable: an ordinary Map-Reduce computation procedure several terabytes of records arranged heaps of technologies. Processor operator locates the machine clean to apply: masses of MapReduce applications, we look at that during many instances, The differences result separate a totally little part of the data set, and the recently iteratively merged nation is very near the recently met state. I2MapReduce clustering adventures this commentary to keep re-calculated by way of beginning after the before affected national [2], and by using acting incremental up-dates on the converging information. The approach facilitates in enhancing the process successively period and decreases the jogging period of stimulating the consequences of big data.

Download Full-text

Retrieving Information and Discovering Knowledge from Unstructured Data Using Big Data Mining Technique: Heavy Oil Fields Example

10.2523/17805-ms ◽

2014 ◽

Cited By ~ 1

Author(s):

Wenkuang Wu ◽

Xiaoguang Lu ◽

Ben Cox ◽

Guoqiang Li ◽

Lihua Lin ◽

...

Keyword(s):

Data Mining ◽

Big Data ◽

Heavy Oil ◽

Oil Fields ◽

Unstructured Data ◽

Data Mining Technique ◽

Big Data Mining ◽

Mining Technique

Download Full-text

Big Data Privacy Preservation Using Two Phase Top-Down Specialization Algorithm with Multidimensional Map Reduce Framework on Hadoop

International Journal of Distributed and Cloud Computing ◽

10.21863/ijdcc/2015.3.2.009 ◽

2015 ◽

Vol 3 (2) ◽

Author(s):

Shalin Eliabeth S. ◽

Sarju S.

Keyword(s):

Big Data ◽

Data Privacy ◽

Privacy Preservation ◽

Experimental Result ◽

Map Reduce ◽

Distributed Environment ◽

Top Down ◽

Two Phase ◽

Data Anonymization ◽

Big Data Privacy

Big data privacy preservation is one of the most disturbed issues in current industry. Sometimes the data privacy problems never identified when input data is published on cloud environment. Data privacy preservation in hadoop deals in hiding and publishing input dataset to the distributed environment. In this paper investigate the problem of big data anonymization for privacy preservation from the perspectives of scalability and time factor etc. At present, many cloud applications with big data anonymization faces the same kind of problems. For recovering this kind of problems, here introduced a data anonymization algorithm called Two Phase Top-Down Specialization (TPTDS) algorithm that is implemented in hadoop. For the data anonymization-45,222 records of adults information with 15 attribute values was taken as the input big data. With the help of multidimensional anonymization in map reduce framework, here implemented proposed Two-Phase Top-Down Specialization anonymization algorithm in hadoop and it will increases the efficiency on the big data processing system. By conducting experiment in both one dimensional and multidimensional map reduce framework with Two Phase Top-Down Specialization algorithm on hadoop, the better result shown in multidimensional anonymization on input adult dataset. Data sets is generalized in a top-down manner and the better result was shown in multidimensional map reduce framework by the better IGPL values generated by the algorithm. The anonymization was performed with specialization operation on taxonomy tree. The experiment shows that the solutions improves the IGPL values, anonymity parameter and decreases the execution time of big data privacy preservation by compared to the existing algorithm. This experimental result will leads to great application to the distributed environment.

Download Full-text

A Comprehensive Survey on Local Differential Privacy

Security and Communication Networks ◽

10.1155/2020/8829523 ◽

2020 ◽

Vol 2020 ◽

pp. 1-29 ◽

Cited By ~ 1

Author(s):

Xingxing Xiong ◽

Shubo Liu ◽

Dan Li ◽

Zhaohui Cai ◽

Xiaoguang Niu

Keyword(s):

Big Data ◽

Data Analysis ◽

Statistical Learning ◽

Data Privacy ◽

Statistical Estimation ◽

Differential Privacy ◽

Future Research ◽

Reference Source ◽

Complex Data ◽

Comprehensive Survey

With the advent of the era of big data, privacy issues have been becoming a hot topic in public. Local differential privacy (LDP) is a state-of-the-art privacy preservation technique that allows to perform big data analysis (e.g., statistical estimation, statistical learning, and data mining) while guaranteeing each individual participant’s privacy. In this paper, we present a comprehensive survey of LDP. We first give an overview on the fundamental knowledge of LDP and its frameworks. We then introduce the mainstream privatization mechanisms and methods in detail from the perspective of frequency oracle and give insights into recent studied on private basic statistical estimation (e.g., frequency estimation and mean estimation) and complex statistical estimation (e.g., multivariate distribution estimation and private estimation over complex data) under LDP. Furthermore, we present current research circumstances on LDP including the private statistical learning/inferencing, private statistical data analysis, privacy amplification techniques for LDP, and some application fields under LDP. Finally, we identify future research directions and open challenges for LDP. This survey can serve as a good reference source for the research of LDP to deal with various privacy-related scenarios to be encountered in practice.

Download Full-text

Big Data in Science and Healthcare: A Review of Recent Literature and Perspectives

Yearbook of Medical Informatics ◽

10.15265/iy-2014-0004 ◽

2014 ◽

Vol 23 (01) ◽

pp. 21-26 ◽

Cited By ~ 51

Author(s):

T. Miron-Shatz ◽

A. Y. S. Lau ◽

C. Paton ◽

M. M. Hansen

Keyword(s):

Health Care ◽

Big Data ◽

Health Care Providers ◽

Data Privacy ◽

Recent Literature ◽

Unstructured Data ◽

Small Scale ◽

Small Data ◽

Quantified Self ◽

Care Providers

Summary Objectives: As technology continues to evolve and rise in various industries, such as healthcare, science, education, and gaming, a sophisticated concept known as Big Data is surfacing. The concept of analytics aims to understand data. We set out to portray and discuss perspectives of the evolving use of Big Data in science and healthcare and, to examine some of the opportunities and challenges. Methods: A literature review was conducted to highlight the implications associated with the use of Big Data in scientific research and healthcare innovations, both on a large and small scale. Results: Scientists and health-care providers may learn from one another when it comes to understanding the value of Big Data and analytics. Small data, derived by patients and consumers, also requires analytics to become actionable. Connectivism provides a framework for the use of Big Data and analytics in the areas of science and healthcare. This theory assists individuals to recognize and synthesize how human connections are driving the increase in data. Despite the volume and velocity of Big Data, it is truly about technology connecting humans and assisting them to construct knowledge in new ways. Concluding Thoughts: The concept of Big Data and associated analytics are to be taken seriously when approaching the use of vast volumes of both structured and unstructured data in science and health-care. Future exploration of issues surrounding data privacy, confidentiality, and education are needed. A greater focus on data from social media, the quantified self-movement, and the application of analytics to “small data” would also be useful.

Download Full-text

Privacy Preserving Data Mining on Big Data Computing Platform: Trends and Future

Advances in Intelligent Networking and Collaborative Systems - Lecture Notes on Data Engineering and Communications Technologies ◽

10.1007/978-3-319-65636-6_44 ◽

2017 ◽

pp. 491-502 ◽

Cited By ~ 2

Author(s):

Gao Zhiqiang ◽

Zhang Longjun

Keyword(s):

Data Mining ◽

Big Data ◽

Privacy Preserving ◽

Privacy Preserving Data Mining ◽

Computing Platform ◽

Big Data Computing

Download Full-text

Towards Privacy-Preserving Data Mining in Online Social Networks: Distance-Grained and Item-Grained Differential Privacy

Information Security and Privacy - Lecture Notes in Computer Science ◽

10.1007/978-3-319-40253-6_9 ◽

2016 ◽

pp. 141-157 ◽

Cited By ~ 1

Author(s):

Shen Yan ◽

Shiran Pan ◽

Yuhang Zhao ◽

Wen-Tao Zhu

Keyword(s):

Data Mining ◽

Social Networks ◽

Online Social Networks ◽

Differential Privacy ◽

Privacy Preserving ◽

Privacy Preserving Data Mining

Download Full-text

Issues of K Means Clustering While Migrating to Map Reduce Paradigm with Big Data: A Survey

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v6i6.11207 ◽

2016 ◽

Vol 6 (6) ◽

pp. 3047 ◽

Cited By ~ 1

Author(s):

Khyati R Nirmal ◽

K.V.V. Satyanarayana

Keyword(s):

Data Mining ◽

Big Data ◽

Distinct Group ◽

Map Reduce ◽

Data Mining Algorithm ◽

Distributed Environment ◽

Significant Information ◽

User Influence ◽

Initial Cluster ◽

Machine Learning Approach

<p><span>In recent times Big Data Analysis are imminent as essential area in the field of Computer Science. Taking out of significant information from Big Data by separating the data in to distinct group is crucial task and it is beyond the scope of commonly used personal machine. It is necessary to adopt the distributed environment similar to map reduce paradigm and migrate the data mining algorithm using it. In Data Mining the partition based K Means Clustering is one of the broadly used algorithms for grouping data according to the degree of similarities between data. It requires the number of K and initial centroid of cluster as input. By surveying the parameters preferred by algorithm or opted by user influence the functionality of Algorithm. It is the necessity to migrate the K means Clustering on MapReduce and predicts the value of k using machine learning approach. For selecting the initial cluster the efficient method is to be devised and united with it. This paper is comprised the survey of several methods for predicting the value of K in K means Clustering and also contains the survey of different methodologies to find out initial center of the cluster. Along with initial value of k and initial centroid selection the objective of proposed work is to compact with analysis of categorical data.</span></p>

Download Full-text

Information Security in Big Data: Privacy and Data Mining

IEEE Access ◽

10.1109/access.2014.2362522 ◽

2014 ◽

Vol 2 ◽

pp. 1149-1176 ◽

Cited By ~ 198

Author(s):

Lei Xu ◽

Chunxiao Jiang ◽

Jian Wang ◽

Jian Yuan ◽

Yong Ren

Keyword(s):

Data Mining ◽

Big Data ◽

Information Security ◽

Data Privacy ◽

Big Data Privacy

Download Full-text

An Improved Classification Analysis on Utility Aware K-Anonymized Dataset

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2019.7748 ◽

2019 ◽

Vol 16 (2) ◽

pp. 445-452

Author(s):

Kishore S. Verma ◽

A. Rajesh ◽

Adeline J. S. Johnsana

Keyword(s):

Data Mining ◽

Analytical Approach ◽

Value Added ◽

Data Sets ◽

Data Set ◽

Privacy Preserving Data Mining ◽

Privacy Leakage ◽

Anonymized Data ◽

Null Values ◽

The Individual

K anonymization is one of the worldwide used approaches to protect the individual records from the privacy leakage attack of Privacy Preserving Data Mining (PPDM) arena. Typically anonymized dataset will impact the effectiveness of data mining results. Anyhow, currently researchers of PPDM progress in driving their efforts in finding out the optimum trade-off between privacy and utility. This work tends in bringing out the optimum classifier from a set of best classifiers of data mining approaches that are capable enough in generating value-added classifying results on utility aware k-anonymized data set. We performed the analytical approach on the data set that are anonymized in sense of accompanying the anonymity utility factors like null values count and transformation pattern loss. The experimentation is done with three widely used classifiers HNB, PART and J48 and these classifiers are analysed with Accuracy, F-measure, and ROC-AUC which are literately proved to be the perfect measures of classification. Our experimental analysis reveals the best classifiers on the utility aware anonymized data sets of Cell oriented Anonymization (CoA), Attribute oriented Anonymization (AoA) and Record oriented Anonymization (RoA).

Download Full-text