Data mining in distributed environment: a survey

In recent times Big Data Analysis are imminent as essential area in the field of Computer Science. Taking out of significant information from Big Data by separating the data in to distinct group is crucial task and it is beyond the scope of commonly used personal machine. It is necessary to adopt the distributed environment similar to map reduce paradigm and migrate the data mining algorithm using it. In Data Mining the partition based K Means Clustering is one of the broadly used algorithms for grouping data according to the degree of similarities between data. It requires the number of K and initial centroid of cluster as input. By surveying the parameters preferred by algorithm or opted by user influence the functionality of Algorithm. It is the necessity to migrate the K means Clustering on MapReduce and predicts the value of k using machine learning approach. For selecting the initial cluster the efficient method is to be devised and united with it. This paper is comprised the survey of several methods for predicting the value of K in K means Clustering and also contains the survey of different methodologies to find out initial center of the cluster. Along with initial value of k and initial centroid selection the objective of proposed work is to compact with analysis of categorical data.

Download Full-text

Multi-dimensional geospatial data mining in a distributed environment using MapReduce

Journal Of Big Data ◽

10.1186/s40537-019-0245-9 ◽

2019 ◽

Vol 6 (1) ◽

Cited By ~ 3

Author(s):

Mazin Alkathiri ◽

Abdul Jhummarwala ◽

M. B. Potdar

Keyword(s):

Data Mining ◽

Geospatial Data ◽

Distributed Environment

Download Full-text

A sanitization approach for privacy preserving data mining on social distributed environment

Journal of Ambient Intelligence and Humanized Computing ◽

10.1007/s12652-019-01335-w ◽

2019 ◽

Vol 11 (7) ◽

pp. 2761-2777 ◽

Cited By ~ 2

Author(s):

P. L. Lekshmy ◽

M. Abdul Rahiman

Keyword(s):

Data Mining ◽

Privacy Preserving ◽

Distributed Environment ◽

Privacy Preserving Data Mining

Download Full-text

Issues of K Means Clustering While Migrating to Map Reduce Paradigm with Big Data: A Survey

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v6i6.pp3047-3051 ◽

2016 ◽

Vol 6 (6) ◽

pp. 3047

Author(s):

Khyati R Nirmal ◽

K.V.V. Satyanarayana

Keyword(s):

Data Mining ◽

Big Data ◽

Distinct Group ◽

Map Reduce ◽

Data Mining Algorithm ◽

Distributed Environment ◽

Significant Information ◽

User Influence ◽

Initial Cluster ◽

Machine Learning Approach

In recent times Big Data Analysis are imminent as essential area in the field of Computer Science. Taking out of significant information from Big Data by separating the data in to distinct group is crucial task and it is beyond the scope of commonly used personal machine. It is necessary to adopt the distributed environment similar to map reduce paradigm and migrate the data mining algorithm using it. In Data Mining the partition based K Means Clustering is one of the broadly used algorithms for grouping data according to the degree of similarities between data. It requires the number of K and initial centroid of cluster as input. By surveying the parameters preferred by algorithm or opted by user influence the functionality of Algorithm. It is the necessity to migrate the K means Clustering on MapReduce and predicts the value of k using machine learning approach. For selecting the initial cluster the efficient method is to be devised and united with it. This paper is comprised the survey of several methods for predicting the value of K in K means Clustering and also contains the survey of different methodologies to find out initial center of the cluster. Along with initial value of k and initial centroid selection the objective of proposed work is to compact with analysis of categorical data.

Download Full-text

Effective Fuzzy Ontology Based Distributed Document Using Non-Dominated Ranked Genetic Algorithm

Organizational Efficiency through Intelligent Information Technologies ◽

10.4018/978-1-4666-2047-6.ch015 ◽

2012 ◽

pp. 243-264

Author(s):

M. Thangamani ◽

P. Thangaraj

Keyword(s):

Artificial Intelligence ◽

Data Mining ◽

Genetic Algorithm ◽

Document Clustering ◽

Distributed Environment ◽

Data Set ◽

Clustering Technique ◽

Machine Readable ◽

Readable Format ◽

Machine Readable Format

The increase in the number of documents has aggravated the difficulty of classifying those documents according to specific needs. Clustering analysis in a distributed environment is a thrust area in artificial intelligence and data mining. Its fundamental task is to utilize characters to compute the degree of related corresponding relationship between objects and to accomplish automatic classification without earlier knowledge. Document clustering utilizes clustering technique to gather the documents of high resemblance collectively by computing the documents resemblance. Recent studies have shown that ontologies are useful in improving the performance of document clustering. Ontology is concerned with the conceptualization of a domain into an individual identifiable format and machine-readable format containing entities, attributes, relationships, and axioms. By analyzing types of techniques for document clustering, a better clustering technique depending on Genetic Algorithm (GA) is determined. Non-Dominated Ranked Genetic Algorithm (NRGA) is used in this paper for clustering, which has the capability of providing a better classification result. The experiment is conducted in 20 newsgroups data set for evaluating the proposed technique. The result shows that the proposed approach is very effective in clustering the documents in the distributed environment.

Download Full-text

A Scheme of Interactive Data Mining Support System in Parallel and Distributed Environment

Parallel and Distributed Processing and Applications - Lecture Notes in Computer Science ◽

10.1007/3-540-37619-4_27 ◽

2003 ◽

pp. 263-272 ◽

Cited By ~ 2

Author(s):

Zhen Liu ◽

Shinichi Kamohara ◽

Minyi Guo

Keyword(s):

Data Mining ◽

Support System ◽

Distributed Environment ◽

Interactive Data Mining ◽

Interactive Data

Download Full-text

Study on distributed privacy preserving data mining

World Journal of Engineering ◽

10.1260/1708-5284.11.2.163 ◽

2014 ◽

Vol 11 (2) ◽

pp. 163-170

Author(s):

Binli Wang ◽

Yanguang Shen

Keyword(s):

Data Mining ◽

Data Privacy ◽

Rapid Development ◽

Privacy Preserving ◽

Future Research ◽

Distributed Data ◽

Distributed Environment ◽

Privacy Preserving Data Mining ◽

Advantages And Disadvantages ◽

Future Research Directions

Recently, with the rapid development of network, communications and computer technology, privacy preserving data mining (PPDM) has become an increasingly important research in the field of data mining. In distributed environment, how to protect data privacy while doing data mining jobs from a large number of distributed data is more far-researching. This paper describes current research of PPDM at home and abroad. Then it puts emphasis on classifying the typical uses and algorithms of PPDM in distributed environment, and summarizing their advantages and disadvantages. Furthermore, it points out the future research directions in the field.

Download Full-text

Designing a Model to Study Data Mining in Distributed Environment

Journal of Data Analysis and Information Processing ◽

10.4236/jdaip.2021.91002 ◽

2021 ◽

Vol 09 (01) ◽

pp. 23-29

Author(s):

Md. Abadur Rahman ◽

Masud Karim

Keyword(s):

Data Mining ◽

Study Data ◽

Distributed Environment

Download Full-text

Collusion-Free Privacy Preserving Data Mining

International Journal of Intelligent Information Technologies ◽

10.4018/jiit.2010100103 ◽

2010 ◽

Vol 6 (4) ◽

pp. 30-45 ◽

Cited By ~ 7

Author(s):

M. Rajalakshmi ◽

T. Purusothaman ◽

S. Pratheeba

Keyword(s):

Data Mining ◽

Association Rule ◽

Privacy Preserving ◽

Frequent Itemsets ◽

Data Sources ◽

Sensitive Information ◽

Distributed Data ◽

Distributed Environment ◽

Rule Mining ◽

Privacy Preserving Data Mining

Distributed association rule mining is an integral part of data mining that extracts useful information hidden in distributed data sources. As local frequent itemsets are globalized from data sources, sensitive information about individual data sources needs high protection. Different privacy preserving data mining approaches for distributed environment have been proposed but in the existing approaches, collusion among the participating sites reveal sensitive information about the other sites. In this paper, the authors propose a collusion-free algorithm for mining global frequent itemsets in a distributed environment with minimal communication among sites. This algorithm uses the techniques of splitting and sanitizing the itemsets and communicates to random sites in two different phases, thus making it difficult for the colluders to retrieve sensitive information. Results show that the consequence of collusion is reduced to a greater extent without affecting mining performance and confirms optimal communication among sites.

Download Full-text

Collusion-Free Privacy Preserving Data Mining

Insights into Advancements in Intelligent Information Technologies ◽

10.4018/978-1-4666-0158-1.ch015 ◽

2012 ◽

pp. 269-284

Author(s):

T. Purusothaman ◽

M. Rajalakshmi ◽

S. Pratheeba

Keyword(s):

Data Mining ◽

Privacy Preserving ◽

Frequent Itemsets ◽

Data Sources ◽

Sensitive Information ◽

Distributed Data ◽

Distributed Environment ◽

Rule Mining ◽

Privacy Preserving Data Mining ◽

Distributed Association

Distributed association rule mining is an integral part of data mining that extracts useful information hidden in distributed data sources. As local frequent itemsets are globalized from data sources, sensitive information about individual data sources needs high protection. Different privacy preserving data mining approaches for distributed environment have been proposed but in the existing approaches, collusion among the participating sites reveal sensitive information about the other sites. In this paper, the authors propose a collusion-free algorithm for mining global frequent itemsets in a distributed environment with minimal communication among sites. This algorithm uses the techniques of splitting and sanitizing the itemsets and communicates to random sites in two different phases, thus making it difficult for the colluders to retrieve sensitive information. Results show that the consequence of collusion is reduced to a greater extent without affecting mining performance and confirms optimal communication among sites.

Download Full-text