Correlation and Probability Based Similarity Measure for Detecting Outliers in Categorical Data

In data mining ample techniques use distance based measures for data clustering. Improving clustering performance is the fundamental goal in cluster domain related tasks. Many techniques are available for clustering numerical data as well as categorical data. Clustering is an unsupervised learning technique and objects are grouped or clustered based on similarity among the objects. A new cluster similarity finding measure, which is cosine like cluster similarity measure (CLCSM), is proposed in this paper. The proposed cluster similarity measure is used for data classification. Extensive experiments are conducted by taking UCI machine learning datasets. The experimental results have shown that the proposed cosinelike cluster similarity measure is superior to many of the existing cluster similarity measures for data classification.

Download Full-text

PRIVACY PRESERVING CLUSTERING BASED ON LINEAR APPROXIMATION OF FUNCTION

INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY ◽

10.24297/ijct.v12i5.2914 ◽

2013 ◽

Vol 12 (5) ◽

pp. 3443-3451

Author(s):

Rajesh Pasupuleti ◽

Narsimha Gugulothu

Keyword(s):

Linear Approximation ◽

Clustering Algorithms ◽

Similarity Measures ◽

Privacy Preserving ◽

Distance Measures ◽

Clustering Methods ◽

Sensitive Data ◽

Processing Information ◽

Data Objects ◽

Approximation Of Function

Clustering analysis initiativesÂ a new direction in data mining that has major impact in various domains including machine learning, pattern recognition, image processing, information retrieval and bioinformatics. Current clustering techniques address some of theÂ requirements not adequately and failed in standardizing clustering algorithms to support for all real applications. Many clustering methods mostly depend on user specified parametric methods and initial seeds of clusters are randomly selected byÂ user.Â In this paper, we proposed new clustering method based on linear approximation of function by getting over all idea of behavior knowledge of clustering function, then pick the initial seeds of clusters as the points on linear approximation line and perform clustering operations, unlike grouping data objects into clusters by using distance measures, similarity measures and statistical distributions in traditional clustering methods. We have shown experimental results as clusters based on linear approximation yields goodÂ results in practice with an example ofÂ business data are provided.Â It alsoÂ explains privacy preserving clusters of sensitive data objects.

Download Full-text

Exploratory Time Series Data Mining by Genetic Clustering

Mathematical Methods for Knowledge Discovery and Data Mining ◽

10.4018/978-1-59904-528-3.ch010 ◽

2011 ◽

pp. 157-178

Author(s):

T. Warren Liao

Keyword(s):

Data Mining ◽

Time Series ◽

Time Series Data ◽

Distance Measures ◽

Series Data ◽

Synthetic Control ◽

Data Set ◽

Univariate Time Series ◽

Genetic Clustering ◽

Data Objects

In this chapter, we present genetic algorithm (GA) based methods developed for clustering univariate time series with equal or unequal length as an exploratory step of data mining. These methods basically implement the k-medoids algorithm. Each chromosome encodes in binary the data objects serving as the k-medoids. To compare their performance, both fixed-parameter and adaptive GAs were used. We first employed the synthetic control chart data set to investigate the performance of three fitness functions, two distance measures, and other GA parameters such as population size, crossover rate, and mutation rate. Two more sets of time series with or without known number of clusters were also experimented: one is the cylinder-bell-funnel data and the other is the novel battle simulation data. The clustering results are presented and discussed.

Download Full-text

New Similarity Measures between Vague Sets and Performance Analysis

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.811.547 ◽

2013 ◽

Vol 811 ◽

pp. 547-551 ◽

Cited By ~ 1

Author(s):

Hong Xu Wang ◽

Hai Feng Wang ◽

Kun Zhang ◽

Hui Wang

Keyword(s):

Data Mining ◽

Performance Analysis ◽

Similarity Measure ◽

Similarity Measures ◽

Practical Case ◽

Vague Sets ◽

Diagnosis Method ◽

And Performance ◽

Definition Of ◽

New Formula

In order to amend the defects of existing similarity measure formula between vague sets, a new definition of similarity measure between vague sets is proposed and a new formula with higher resolution and highlighted uncertainty is presented on the basis of data mining vague value method. A general fault diagnosis method of Vague sets (GFDMVS) is proposed. The same practical case is studied with three methods and the results demonstrate the validity and reasonability of the method proposed in this paper.

Download Full-text

Discovering an Effective Measure in Data Mining

Data Warehousing and Mining ◽

10.4018/978-1-59904-951-9.ch026 ◽

2008 ◽

pp. 371-380

Author(s):

Takao Ito

Keyword(s):

Data Mining ◽

Mutual Information ◽

Similarity Measure ◽

Distance Measures ◽

Time Saving ◽

Effective Measure ◽

The One ◽

Phi Coefficient ◽

The Relationship ◽

Large Corpus

One of the most important issues in data mining is to discover an implicit relationship between words in a large corpus and labels in a large database. The relationship between words and labels often is expressed as a function of distance measures. An effective measure would be useful not only for getting the high precision of data mining, but also for time saving of the operation in data mining. In previous research, many measures for calculating the one-to-many relationship have been proposed, such as the complementary similarity measure, the mutual information, and the phi coefficient. Some research showed that the complementary similarity measure is the most effective. The author reviewed previous research related to the measures in one-to-many relationships and proposed a new idea to get an effective one, based on the heuristic approach in this article.

Download Full-text

Discovering an Effective Measure in Data Mining

Encyclopedia of Data Warehousing and Mining ◽

10.4018/978-1-59140-557-3.ch070 ◽

2011 ◽

pp. 364-371

Author(s):

Takao Ito

Keyword(s):

Data Mining ◽

Mutual Information ◽

Similarity Measure ◽

Distance Measures ◽

Time Saving ◽

Effective Measure ◽

The One ◽

Phi Coefficient ◽

The Relationship ◽

Large Corpus

One of the most important issues in data mining is to discover an implicit relationship between words in a large corpus and labels in a large database. The relationship between words and labels often is expressed as a function of distance measures. An effective measure would be useful not only for getting the high precision of data mining, but also for time saving of the operation in data mining. In previous research, many measures for calculating the one-to-many relationship have been proposed, such as the complementary similarity measure, the mutual information, and the phi coefficient. Some research showed that the complementary similarity measure is the most effective. The author reviewed previous research related to the measures in one-to-many relationships and proposed a new idea to get an effective one, based on the heuristic approach in this article.

Download Full-text

Using the Similarity Measure Between Intuitionistic Fuzzy Sets for the Application on Pattern Recognitions

Computer Vision ◽

10.4018/978-1-5225-5204-8.ch040 ◽

2018 ◽

pp. 972-985

Author(s):

Lixin Fan

Keyword(s):

Pattern Recognition ◽

Fuzzy Sets ◽

Similarity Measure ◽

Medical Diagnosis ◽

Similarity Measures ◽

Intuitionistic Fuzzy Sets ◽

Distance Measures ◽

Membership Degree ◽

Intuitionistic Fuzzy ◽

Definition Of

The measurement of uncertainty is an important topic for the theories dealing with uncertainty. The definition of similarity measure between two IFSs is one of the most interesting topics in IFSs theory. A similarity measure is defined to compare the information carried by IFSs. Many similarity measures have been proposed. A few of them come from the well-known distance measures. In this work, a new similarity measure between IFSs was proposed by the consideration of the information carried by the membership degree, the non-membership degree, and hesitancy degree in intuitionistic fuzzy sets (IFSs). To demonstrate the efficiency of the proposed similarity measure, various similarity measures between IFSs were compared with the proposed similarity measure between IFSs by numerical examples. The compared results demonstrated that the new similarity measure is reasonable and has stronger discrimination among them. Finally, the similarity measure was applied to pattern recognition and medical diagnosis. Two illustrative examples were provided to show the effectiveness of the pattern recognition and medical diagnosis.

Download Full-text

Distance and Similarity Measures for Spherical Fuzzy Sets and Their Applications in Selecting Mega Projects

Mathematics ◽

10.3390/math8040519 ◽

2020 ◽

Vol 8 (4) ◽

pp. 519 ◽

Cited By ~ 7

Author(s):

Muhammad Jabir Khan ◽

Poom Kumam ◽

Wejdan Deebani ◽

Wiyada Kumam ◽

Zahir Shah

Keyword(s):

Pattern Recognition ◽

Fuzzy Sets ◽

Similarity Measure ◽

Fuzzy Set ◽

Similarity Measures ◽

Developed Countries ◽

Distance Measures ◽

Membership Functions ◽

Pythagorean Fuzzy Set ◽

Picture Fuzzy Set

A new condition on positive membership, neutral membership, and negative membership functions give us the successful extension of picture fuzzy set and Pythagorean fuzzy set and called spherical fuzzy sets ( SFS ) . This extends the domain of positive membership, neutral membership, and negative membership functions. Keeping in mind the importance of similarity measure and application in data mining, medical diagnosis, decision making, and pattern recognition, several studies on similarity measures have been proposed in the literature. Some of those, however, cannot satisfy the axioms of similarity and provide counter-intuitive cases. In this paper, we proposed the set-theoretic similarity and distance measures. We provide some counterexamples for already proposed similarity measures in the literature and shows that how our proposed method is important and applicable to the pattern recognition problems. In the end, we provide an application of a proposed similarity measure for selecting mega projects in under developed countries.

Download Full-text

Using the Similarity Measure between Intuitionistic Fuzzy Sets for the Application on Pattern Recognitions

International Journal of Cognitive Informatics and Natural Intelligence ◽

10.4018/ijcini.2015040102 ◽

2015 ◽

Vol 9 (2) ◽

pp. 24-36 ◽

Cited By ~ 3

Author(s):

Lixin Fan

Keyword(s):

Pattern Recognition ◽

Fuzzy Sets ◽

Similarity Measure ◽

Medical Diagnosis ◽

Similarity Measures ◽

Intuitionistic Fuzzy Sets ◽

Distance Measures ◽

Membership Degree ◽

Intuitionistic Fuzzy ◽

Definition Of

The measurement of uncertainty is an important topic for the theories dealing with uncertainty. The definition of similarity measure between two IFSs is one of the most interesting topics in IFSs theory. A similarity measure is defined to compare the information carried by IFSs. Many similarity measures have been proposed. A few of them come from the well-known distance measures. In this work, a new similarity measure between IFSs was proposed by the consideration of the information carried by the membership degree, the non-membership degree, and hesitancy degree in intuitionistic fuzzy sets (IFSs). To demonstrate the efficiency of the proposed similarity measure, various similarity measures between IFSs were compared with the proposed similarity measure between IFSs by numerical examples. The compared results demonstrated that the new similarity measure is reasonable and has stronger discrimination among them. Finally, the similarity measure was applied to pattern recognition and medical diagnosis. Two illustrative examples were provided to show the effectiveness of the pattern recognition and medical diagnosis.

Download Full-text

Modified Cosine Similarity Measure based Data Classification in Data Mining

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.e9754.069520 ◽

2020 ◽

Vol 9 (5) ◽

pp. 649-654

Keyword(s):

Machine Learning ◽

Data Mining ◽

Similarity Measure ◽

Dominant Role ◽

Similarity Measures ◽

Data Classification ◽

Cosine Similarity ◽

Machine Learning Techniques ◽

Text Data ◽

Cosine Similarity Measure

Text data analytics became an integral part of World Wide Web data management and Internet based applications rapidly growing all over the world. E-commerce applications are growing exponentially in the business field and the competitors in the E-commerce are gradually increasing many machine learning techniques for predicting business related operations with the aim of increasing the product sales to the greater extent. Usage of similarity measures is inevitable in modern day to day real applications. Cosine similarity plays a dominant role in text data mining applications such as text classification, clustering, querying, and searching and so on. A modified clustering based cosine similarity measure called MCS is proposed in this paper for data classification. The proposed method is experimentally verified by employing many UCI machine learning datasets involving categorical attributes. The proposed method is superior in producing more accurate classification results in majority of experiments conducted on the UCI machine learning datasets.

Download Full-text