A Hybrid Similarity Measure Based on Binary and Decimal Data for Data Mining

A Novel Cosine Similarity Like Data Clustering Method for Effective Data Classification in Data Mining

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.h6417.069820 ◽

2020 ◽

Vol 9 (8) ◽

pp. 340-346

Keyword(s):

Data Mining ◽

Similarity Measure ◽

Categorical Data ◽

Data Clustering ◽

Similarity Measures ◽

Numerical Data ◽

Data Classification ◽

Fundamental Goal ◽

Learning Technique ◽

Categorical Data Clustering

In data mining ample techniques use distance based measures for data clustering. Improving clustering performance is the fundamental goal in cluster domain related tasks. Many techniques are available for clustering numerical data as well as categorical data. Clustering is an unsupervised learning technique and objects are grouped or clustered based on similarity among the objects. A new cluster similarity finding measure, which is cosine like cluster similarity measure (CLCSM), is proposed in this paper. The proposed cluster similarity measure is used for data classification. Extensive experiments are conducted by taking UCI machine learning datasets. The experimental results have shown that the proposed cosinelike cluster similarity measure is superior to many of the existing cluster similarity measures for data classification.

Download Full-text

Handling Fuzzy Similarity for Data Classification

Encyclopedia of Artificial Intelligence ◽

10.4018/978-1-59904-849-9.ch118 ◽

2011 ◽

pp. 796-802 ◽

Cited By ~ 2

Author(s):

Roy Gelbard ◽

Avichai Meged

Keyword(s):

Data Mining ◽

Similarity Measure ◽

Data Representation ◽

Classification Algorithms ◽

Binary Representation ◽

Fuzzy Data ◽

Problem Domain ◽

Data Types ◽

Fuzzy Similarity ◽

Set Up

Representing and consequently processing fuzzy data in standard and binary databases is problematic. The problem is further amplified in binary databases where continuous data is represented by means of discrete ‘1’ and ‘0’ bits. As regards classification, the problem becomes even more acute. In these cases, we may want to group objects based on some fuzzy attributes, but unfortunately, an appropriate fuzzy similarity measure is not always easy to find. The current paper proposes a novel model and measure for representing fuzzy data, which lends itself to both classification and data mining. Classification algorithms and data mining attempt to set up hypotheses regarding the assigning of different objects to groups and classes on the basis of the similarity/distance between them (Estivill-Castro & Yang, 2004) (Lim, Loh & Shih, 2000) (Zhang & Srihari, 2004). Classification algorithms and data mining are widely used in numerous fields including: social sciences, where observations and questionnaires are used in learning mechanisms of social behavior; marketing, for segmentation and customer profiling; finance, for fraud detection; computer science, for image processing and expert systems applications; medicine, for diagnostics; and many other fields. Classification algorithms and data mining methodologies are based on a procedure that calculates a similarity matrix based on similarity index between objects and on a grouping technique. Researches proved that a similarity measure based upon binary data representation yields better results than regular similarity indexes (Erlich, Gelbard & Spiegler, 2002) (Gelbard, Goldman & Spiegler, 2007). However, binary representation is currently limited to nominal discrete attributes suitable for attributes such as: gender, marital status, etc., (Zhang & Srihari, 2003). This makes the binary approach for data representation unattractive for widespread data types. The current research describes a novel approach to binary representation, referred to as Fuzzy Binary Representation. This new approach is suitable for all data types - nominal, ordinal and as continuous. We propose that there is meaning not only to the actual explicit attribute value, but also to its implicit similarity to other possible attribute values. These similarities can either be determined by a problem domain expert or automatically by analyzing fuzzy functions that represent the problem domain. The added new fuzzy similarity yields improved classification and data mining results. More generally, Fuzzy Binary Representation and related similarity measures exemplify that a refined and carefully designed handling of data, including eliciting of domain expertise regarding similarity, may add both value and knowledge to existing databases.

Download Full-text

DLCSS: A new similarity measure for time series data mining

Engineering Applications of Artificial Intelligence ◽

10.1016/j.engappai.2020.103664 ◽

2020 ◽

Vol 92 ◽

pp. 103664 ◽

Cited By ~ 1

Author(s):

Gholamreza Soleimani ◽

Masoud Abessi

Keyword(s):

Data Mining ◽

Time Series ◽

Similarity Measure ◽

Time Series Data ◽

Series Data ◽

Time Series Data Mining

Download Full-text

A New Representation and Similarity Measure of Time Series on Data Mining

2009 International Conference on Computational Intelligence and Software Engineering ◽

10.1109/cise.2009.5364532 ◽

2009 ◽

Cited By ~ 3

Author(s):

Yi Jiang ◽

Tuo Lan ◽

Dongzhan Zhang

Keyword(s):

Data Mining ◽

Time Series ◽

Similarity Measure

Download Full-text

New Similarity Measures between Vague Sets and Performance Analysis

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.811.547 ◽

2013 ◽

Vol 811 ◽

pp. 547-551 ◽

Cited By ~ 1

Author(s):

Hong Xu Wang ◽

Hai Feng Wang ◽

Kun Zhang ◽

Hui Wang

Keyword(s):

Data Mining ◽

Performance Analysis ◽

Similarity Measure ◽

Similarity Measures ◽

Practical Case ◽

Vague Sets ◽

Diagnosis Method ◽

And Performance ◽

Definition Of ◽

New Formula

In order to amend the defects of existing similarity measure formula between vague sets, a new definition of similarity measure between vague sets is proposed and a new formula with higher resolution and highlighted uncertainty is presented on the basis of data mining vague value method. A general fault diagnosis method of Vague sets (GFDMVS) is proposed. The same practical case is studied with three methods and the results demonstrate the validity and reasonability of the method proposed in this paper.

Download Full-text

A New Similarity Metric for Sequential Data

International Journal of Data Warehousing and Mining ◽

10.4018/jdwm.2010100102 ◽

2010 ◽

Vol 6 (4) ◽

pp. 16-32 ◽

Cited By ~ 11

Author(s):

Pradeep Kumar ◽

Bapi S. Raju ◽

P. Radha Krishna

Keyword(s):

Data Mining ◽

Similarity Measure ◽

Web Mining ◽

Clustering Algorithms ◽

Sequential Data ◽

Similarity Metric ◽

Benchmark Datasets ◽

Similarity Preserving ◽

Sequential Nature ◽

Classification And Clustering

In many data mining applications, both classification and clustering algorithms require a distance/similarity measure. The central problem in similarity based clustering/classification comprising sequential data is deciding an appropriate similarity metric. The existing metrics like Euclidean, Jaccard, Cosine, and so forth do not exploit the sequential nature of data explicitly. In this paper, the authors propose a similarity preserving function called Sequence and Set Similarity Measure (S3M) that captures both the order of occurrence of items in sequences and the constituent items of sequences. The authors demonstrate the usefulness of the proposed measure for classification and clustering tasks. Experiments were conducted on benchmark datasets, that is, DARPA’98 and msnbc, for classification task in intrusion detection and clustering task in web mining domains. Results show the usefulness of the proposed measure.

Download Full-text

Discovering an Effective Measure in Data Mining

Data Warehousing and Mining ◽

10.4018/978-1-59904-951-9.ch026 ◽

2008 ◽

pp. 371-380

Author(s):

Takao Ito

Keyword(s):

Data Mining ◽

Mutual Information ◽

Similarity Measure ◽

Distance Measures ◽

Time Saving ◽

Effective Measure ◽

The One ◽

Phi Coefficient ◽

The Relationship ◽

Large Corpus

One of the most important issues in data mining is to discover an implicit relationship between words in a large corpus and labels in a large database. The relationship between words and labels often is expressed as a function of distance measures. An effective measure would be useful not only for getting the high precision of data mining, but also for time saving of the operation in data mining. In previous research, many measures for calculating the one-to-many relationship have been proposed, such as the complementary similarity measure, the mutual information, and the phi coefficient. Some research showed that the complementary similarity measure is the most effective. The author reviewed previous research related to the measures in one-to-many relationships and proposed a new idea to get an effective one, based on the heuristic approach in this article.

Download Full-text

Discovering an Effective Measure in Data Mining

Encyclopedia of Data Warehousing and Mining ◽

10.4018/978-1-59140-557-3.ch070 ◽

2011 ◽

pp. 364-371

Author(s):

Takao Ito

Keyword(s):

Data Mining ◽

Mutual Information ◽

Similarity Measure ◽

Distance Measures ◽

Time Saving ◽

Effective Measure ◽

The One ◽

Phi Coefficient ◽

The Relationship ◽

Large Corpus

One of the most important issues in data mining is to discover an implicit relationship between words in a large corpus and labels in a large database. The relationship between words and labels often is expressed as a function of distance measures. An effective measure would be useful not only for getting the high precision of data mining, but also for time saving of the operation in data mining. In previous research, many measures for calculating the one-to-many relationship have been proposed, such as the complementary similarity measure, the mutual information, and the phi coefficient. Some research showed that the complementary similarity measure is the most effective. The author reviewed previous research related to the measures in one-to-many relationships and proposed a new idea to get an effective one, based on the heuristic approach in this article.

Download Full-text