Similarity Measure Design on Big Data

Large and complex data becomes a valuable resource in biomedical discovery, which is highly facilitated to increase the scientific resources for retrieving the helpful information. However, indexing and retrieving the patient information from the disparate source of big data is challenging in biomedical research. Indexing and retrieving the patient information from big data is performed using the MapReduce framework. In this research, the indexing and retrieval of information are performed using the proposed Jaya-Sine Cosine Algorithm (Jaya–SCA)-based MapReduce framework. Initially, the input big data is forwarded to the mapper randomly. The average of each mapper data is calculated, and these data are forwarded to the reducer, where the representative data are stored. For each user query, the input query is matched with the reducer, and thereby, it switches over to the mapper for retrieving the matched best result. The bilevel matching is performed while retrieving the data from the mapper based on the distance between the query. The similarity measure is computed based on the parametric-enabled similarity measure (PESM), cosine similarity and the proposed Jaya–SCA, which is the integration of the Jaya algorithm and the SCA. Moreover, the proposed Jaya–SCA algorithm attained the maximum value of F-measure, recall and precision of 0.5323, 0.4400 and 0.6867, respectively, using the StatLog Heart Disease dataset.

Download Full-text

Similarity Measure Design for Non-Overlapped Data

Lecture Notes in Electrical Engineering - Future Information Communication Technology and Applications ◽

10.1007/978-94-007-6516-0_33 ◽

2013 ◽

pp. 299-307

Author(s):

Sanghyuk Lee

Keyword(s):

Similarity Measure ◽

Measure Design

Download Full-text

A New Term-Term Similarity Measure for Selecting Expansion Features in Big Data

10.1109/inds.2014.23 ◽

2014 ◽

Author(s):

Ilyes Khennak ◽

Habiba Drias

Keyword(s):

Big Data ◽

Similarity Measure ◽

Term Similarity

Download Full-text

Recurrent authenticity of the clustering of great tribute to the function of special type

Bionics of Intelligence ◽

10.30837/bi.2020.2(95).10 ◽

2020 ◽

Vol 2 (95) ◽

pp. 77-81

Author(s):

E.V. Bodyansky ◽

A.Yu. Shafronenko ◽

І. М. Klimova

Keyword(s):

Big Data ◽

Objective Function ◽

Similarity Measure ◽

Fuzzy Clustering ◽

High Speed ◽

Numerical Implementation ◽

Online Mode ◽

Gradient Algorithms ◽

Self Learning

A method of credibilistic fuzzy clustering is proposed for problems when data are fed sequentially, in online mode and forms large arrays (Big Data). The introduced procedures are essentially gradient algorithms for optimizing the objective function of a special type, and have a number of advantages over known probabilistic and possible approaches and, above all, robustness to anomalous observations. The approach is based on similarity measure, parameters of that are determined automatically in the process of self-learning. The proposed procedures are a generalization of the known methods, characterized by high speed and simple in numerical implementation.

Download Full-text