Similarity Measure Design on Big Data

Author(s):  
Sanghyuk Lee ◽  
Yan Sun
2021 ◽  
pp. 016555152110137
Author(s):  
N.R. Gladiss Merlin ◽  
Vigilson Prem. M

Large and complex data becomes a valuable resource in biomedical discovery, which is highly facilitated to increase the scientific resources for retrieving the helpful information. However, indexing and retrieving the patient information from the disparate source of big data is challenging in biomedical research. Indexing and retrieving the patient information from big data is performed using the MapReduce framework. In this research, the indexing and retrieval of information are performed using the proposed Jaya-Sine Cosine Algorithm (Jaya–SCA)-based MapReduce framework. Initially, the input big data is forwarded to the mapper randomly. The average of each mapper data is calculated, and these data are forwarded to the reducer, where the representative data are stored. For each user query, the input query is matched with the reducer, and thereby, it switches over to the mapper for retrieving the matched best result. The bilevel matching is performed while retrieving the data from the mapper based on the distance between the query. The similarity measure is computed based on the parametric-enabled similarity measure (PESM), cosine similarity and the proposed Jaya–SCA, which is the integration of the Jaya algorithm and the SCA. Moreover, the proposed Jaya–SCA algorithm attained the maximum value of F-measure, recall and precision of 0.5323, 0.4400 and 0.6867, respectively, using the StatLog Heart Disease dataset.


2020 ◽  
Vol 2 (95) ◽  
pp. 77-81
Author(s):  
E.V. Bodyansky ◽  
A.Yu. Shafronenko ◽  
І. М. Klimova

A method of credibilistic fuzzy clustering is proposed for problems when data are fed sequentially, in online mode and forms large arrays (Big Data). The introduced procedures are essentially gradient algorithms for optimizing the objective function of a special type, and have a number of advantages over known probabilistic and possible approaches and, above all, robustness to anomalous observations. The approach is based on similarity measure, parameters of that are determined automatically in the process of self-learning. The proposed procedures are a generalization of the known methods, characterized by high speed and simple in numerical implementation.


2011 ◽  
Vol 18 (5) ◽  
pp. 1602-1608 ◽  
Author(s):  
Sang-Hyuk Lee ◽  
Wook-Je Park ◽  
Dong-yean Jung

2013 ◽  
Vol 20 (9) ◽  
pp. 2440-2446 ◽  
Author(s):  
Sang-hyuk Lee ◽  
Seung-soo Shin

Sign in / Sign up

Export Citation Format

Share Document