Index Clustering: A Map-reduce Clustering Approach using Numba

Author(s):  
Xinyu Chen ◽  
Trilce Estrada
2020 ◽  
Vol 16 (10) ◽  
pp. 1627
Author(s):  
Pei Shujun ◽  
Zhang Yu ◽  
Liang Chao

Author(s):  
Hussain A. Jaber ◽  
Ilyas Çankaya ◽  
Hadeel K. Aljobouri ◽  
Orhan M. Koçak ◽  
Oktay Algin

Background: Cluster analysis is a robust tool for exploring the underlining structures in data and grouping them with similar objects. In the researches of Functional Magnetic Resonance Imaging (fMRI), clustering approaches attempt to classify voxels depending on their time-course signals into a similar hemodynamic response over time. Objective: In this work, a novel unsupervised learning approach is proposed that relies on using Enhanced Neural Gas (ENG) algorithm in fMRI data for comparison with Neural Gas (NG) method, which has yet to be utilized for that aim. The ENG algorithm depends on the network structure of the NG and concentrates on an efficacious prototype-based clustering approach. Methods: The comparison outcomes on real auditory fMRI data show that ENG outperforms the NG and statistical parametric mapping (SPM) methods due to its insensitivity to the ordering of input data sequence, various initializations for selecting a set of neurons, and the existence of extreme values (outliers). The findings also prove its capability to discover the exact and real values of a cluster number effectively. Results: Four validation indices are applied to evaluate the performance of the proposed ENG method with fMRI and compare it with a clustering approach (NG algorithm) and model-based data analysis (SPM). These validation indices include the Jaccard Coefficient (JC), Receiver Operating Characteristic (ROC), Minimum Description Length (MDL) value, and Minimum Square Error (MSE). Conclusion: The ENG technique can tackle all shortcomings of NG application with fMRI data, identify the active area of the human brain effectively, and determine the locations of the cluster center based on the MDL value during the process of network learning.


Author(s):  
Shalin Eliabeth S. ◽  
Sarju S.

Big data privacy preservation is one of the most disturbed issues in current industry. Sometimes the data privacy problems never identified when input data is published on cloud environment. Data privacy preservation in hadoop deals in hiding and publishing input dataset to the distributed environment. In this paper investigate the problem of big data anonymization for privacy preservation from the perspectives of scalability and time factor etc. At present, many cloud applications with big data anonymization faces the same kind of problems. For recovering this kind of problems, here introduced a data anonymization algorithm called Two Phase Top-Down Specialization (TPTDS) algorithm that is implemented in hadoop. For the data anonymization-45,222 records of adults information with 15 attribute values was taken as the input big data. With the help of multidimensional anonymization in map reduce framework, here implemented proposed Two-Phase Top-Down Specialization anonymization algorithm in hadoop and it will increases the efficiency on the big data processing system. By conducting experiment in both one dimensional and multidimensional map reduce framework with Two Phase Top-Down Specialization algorithm on hadoop, the better result shown in multidimensional anonymization on input adult dataset. Data sets is generalized in a top-down manner and the better result was shown in multidimensional map reduce framework by the better IGPL values generated by the algorithm. The anonymization was performed with specialization operation on taxonomy tree. The experiment shows that the solutions improves the IGPL values, anonymity parameter and decreases the execution time of big data privacy preservation by compared to the existing algorithm. This experimental result will leads to great application to the distributed environment.


Sign in / Sign up

Export Citation Format

Share Document