Analysis of shared memory in Distributed and non Distributed environment

Author(s):  
Dharmendra Dangi ◽  
Sachin Bhandari ◽  
Amit Bhagat
2020 ◽  
Vol 14 (3) ◽  
pp. 351-363
Author(s):  
Yue Wang ◽  
Ruiqi Xu ◽  
Zonghao Feng ◽  
Yulin Che ◽  
Lei Chen ◽  
...  

Measuring similarities among different nodes is important in graph analysis. SimRank is one of the most popular similarity measures. Given a graph G ( V , E ) and a source node u , a single-source Sim-Rank query returns the similarities between u and each node v ∈ V. This type of query is often used in link prediction, personalized recommendation and spam detection. While dealing with a large graph is beyond the ability of a single machine due to its limited memory and computational power, it is necessary to process single-source SimRank queries in a distributed environment, where the graph is partitioned and distributed across multiple machines. However, most current solutions are based on shared-memory model, where the whole graph is loaded into a shared memory and all processors can access the graph randomly. It is difficult to deploy such algorithms on shared-nothing model. In this paper, we present DISK, a distributed framework for processing single-source SimRank queries. DISK follows the linearized formulation of SimRank, and consists of offline and online phases. In the offline phase, a tree-based method is used to estimate the diagonal correction matrix of SimRank accurately, and in the online phase, single-source similarities are computed iteratively. Under this framework, we propose different optimization techniques to boost the indexing and queries. DISK guarantees both accuracy and parallel scalability, which distinguishes itself from existing solutions. Its accuracy, efficiency, parallel scalability and scalability are also verified by extensive experimental studies. The experiments show that DISK scales up to graphs of billions of nodes and edges, and answers online queries within seconds, while ensuring the accuracy bounds.


2005 ◽  
Vol 1 (03) ◽  
pp. 285-290 ◽  
Author(s):  
F. González-Longatt ◽  
◽  
A. Hernandez ◽  
F. Guillen ◽  
C. Fortoul

2009 ◽  
Vol 28 (9) ◽  
pp. 2303-2305
Author(s):  
Xiao-gang WANG ◽  
Xiao-juan WU ◽  
Xin ZHOU ◽  
Xiao-yan ZHANG

1990 ◽  
Author(s):  
Yehunda Afek ◽  
Hagit Attiya ◽  
Danny Dolev ◽  
Eli Gafni ◽  
Michael Merritt
Keyword(s):  

Author(s):  
Shalin Eliabeth S. ◽  
Sarju S.

Big data privacy preservation is one of the most disturbed issues in current industry. Sometimes the data privacy problems never identified when input data is published on cloud environment. Data privacy preservation in hadoop deals in hiding and publishing input dataset to the distributed environment. In this paper investigate the problem of big data anonymization for privacy preservation from the perspectives of scalability and time factor etc. At present, many cloud applications with big data anonymization faces the same kind of problems. For recovering this kind of problems, here introduced a data anonymization algorithm called Two Phase Top-Down Specialization (TPTDS) algorithm that is implemented in hadoop. For the data anonymization-45,222 records of adults information with 15 attribute values was taken as the input big data. With the help of multidimensional anonymization in map reduce framework, here implemented proposed Two-Phase Top-Down Specialization anonymization algorithm in hadoop and it will increases the efficiency on the big data processing system. By conducting experiment in both one dimensional and multidimensional map reduce framework with Two Phase Top-Down Specialization algorithm on hadoop, the better result shown in multidimensional anonymization on input adult dataset. Data sets is generalized in a top-down manner and the better result was shown in multidimensional map reduce framework by the better IGPL values generated by the algorithm. The anonymization was performed with specialization operation on taxonomy tree. The experiment shows that the solutions improves the IGPL values, anonymity parameter and decreases the execution time of big data privacy preservation by compared to the existing algorithm. This experimental result will leads to great application to the distributed environment.


Sign in / Sign up

Export Citation Format

Share Document