Analysis of Fuzzy Clustering for the Adoption in Data Mining

2014 ◽  
Vol 989-994 ◽  
pp. 2047-2050
Author(s):  
Ying Jie Wang

Data mining is the general methodology for retrieving useful information from big data. Clustering analysis is a mathematical method of classification for unsupervised machine learning. It can be adopted for data classification in Data mining. This paper combines the clustering process by fuzzy way and then deduces a special clustering algorithm with fast fuzzy c-means (FFCM) method. In summary, the paper illustrates the adoption of a series of fuzzy clustering methods in Data Mining. These methods have improved the computational efficiency with learning as the convergence speed is fast. The methodology of this paper presents significantly meaningful for information retrieval of big data.

Author(s):  
Ye. V. Bodyanskiy ◽  
A. Yu. Shafronenko ◽  
I. N. Klymova

Context. The problems of big data clustering today is a very relevant area of artificial intelligence. This task is often found in many applications related to data mining, deep learning, etc. To solve these problems, traditional approaches and methods require that the entire data sample be submitted in batch form. Objective. The aim of the work is to propose a method of fuzzy probabilistic data clustering using evolutionary optimization of cat swarm, that would be devoid of the drawbacks of traditional data clustering approaches. Method. The procedure of fuzzy probabilistic data clustering using evolutionary algorithms, for faster determination of sample extrema, cluster centroids and adaptive functions, allowing not to spend machine resources for storing intermediate calculations and do not require additional time to solve the problem of data clustering, regardless of the dimension and the method of presentation for processing. Results. The proposed data clustering algorithm based on evolutionary optimization is simple in numerical implementation, is devoid of the drawbacks inherent in traditional fuzzy clustering methods and can work with a large size of input information processed online in real time. Conclusions. The results of the experiment allow to recommend the developed method for solving the problems of automatic clustering and classification of big data, as quickly as possible to find the extrema of the sample, regardless of the method of submitting the data for processing. The proposed method of online probabilistic fuzzy data clustering based on evolutionary optimization of cat swarm is intended for use in hybrid computational intelligence systems, neuro-fuzzy systems, in training artificial neural networks, in clustering and classification problems.


Author(s):  
Zhanqiu Yu

To explore the Internet of things logistics system application, an Internet of things big data clustering analysis algorithm based on K-mans was discussed. First of all, according to the complex event relation and processing technology, the big data processing of Internet of things was transformed into the extraction and analysis of complex relational schema, so as to provide support for simplifying the processing complexity of big data in Internet of things (IOT). The traditional K-means algorithm was optimized and improved to make it fit the demand of big data RFID data network. Based on Hadoop cloud cluster platform, a K-means cluster analysis was achieved. In addition, based on the traditional clustering algorithm, a center point selection technology suitable for RFID IOT data clustering was selected. The results showed that the clustering efficiency was improved to some extent. As a result, an RFID Internet of things clustering analysis prototype system is designed and realized, which further tests the feasibility.


2020 ◽  
Vol 34 (04) ◽  
pp. 6869-6876
Author(s):  
Yiqun Zhang ◽  
Yiu-ming Cheung

Clustering ordinal data is a common task in data mining and machine learning fields. As a major type of categorical data, ordinal data is composed of attributes with naturally ordered possible values (also called categories interchangeably in this paper). However, due to the lack of dedicated distance metric, ordinal categories are usually treated as nominal ones, or coded as consecutive integers and treated as numerical ones. Both these two common ways will roughly define the distances between ordinal categories because the former way ignores the order relationship and the latter way simply assigns identical distances to different pairs of adjacent categories that may have intrinsically unequal distances. As a result, they may produce unsatisfactory ordinal data clustering results. This paper, therefore, proposes a novel ordinal data clustering algorithm, which iteratively learns: 1) The partition of ordinal dataset, and 2) the inter-category distances. To the best of our knowledge, this is the first attempt to dynamically adjust inter-category distances during the clustering process to search for a better partition of ordinal data. The proposed algorithm features superior clustering accuracy, low time complexity, fast convergence, and is parameter-free. Extensive experiments show its efficacy.


2019 ◽  
Vol 04 (01) ◽  
pp. 1850017 ◽  
Author(s):  
Weiru Chen ◽  
Jared Oliverio ◽  
Jin Ho Kim ◽  
Jiayue Shen

Big Data is a popular cutting-edge technology nowadays. Techniques and algorithms are expanding in different areas including engineering, biomedical, and business. Due to the high-volume and complexity of Big Data, it is necessary to conduct data pre-processing methods when data mining. The pre-processing methods include data cleaning, data integration, data reduction, and data transformation. Data clustering is the most important step of data reduction. With data clustering, mining on the reduced data set should be more efficient yet produce quality analytical results. This paper presents the different data clustering methods and related algorithms for data mining with Big Data. Data clustering can increase the efficiency and accuracy of data mining.


2013 ◽  
Vol 760-762 ◽  
pp. 1896-1901 ◽  
Author(s):  
Chuan Qi Chen

Fuzzy clustering analysis is a clustering algorithm based on function best practices, technology and optimal cost function using calculus. Fuzzy clustering, each sample is no longer belong to a class, but belong to a certain degree of membership of each class. In this paper, Web log sequential pattern mining knowledge gained, and visitors have the same browsing mode access to cutting the interaction of users with the Web information space. The paper presents analysis of Web log data mining based on improved fuzzy clustering algorithm. The experiment demonstrates the improved algorithm has better scalability.


Author(s):  
Yuchi Kanzawa ◽  

In this study, two co-clustering algorithms based on Bezdek-type fuzzification of fuzzy clustering are proposed for categorical multivariate data. The two proposed algorithms are motivated by the fact that there are only two fuzzy co-clustering methods currently available – entropy regularization and quadratic regularization – whereas there are three fuzzy clustering methods for vectorial data: entropy regularization, quadratic regularization, and Bezdek-type fuzzification. The first proposed algorithm forms the basis of the second algorithm. The first algorithm is a variant of a spherical clustering method, with the kernelization of a maximizing model of Bezdek-type fuzzy clustering with multi-medoids. By interpreting the first algorithm in this way, the second algorithm, a spectral clustering approach, is obtained. Numerical examples demonstrate that the proposed algorithms can produce satisfactory results when suitable parameter values are selected.


2014 ◽  
Vol 23 (4) ◽  
pp. 379-389 ◽  
Author(s):  
Jun Ye

AbstractClustering plays an important role in data mining, pattern recognition, and machine learning. Single-valued neutrosophic sets (SVNSs) are useful means to describe and handle indeterminate and inconsistent information that fuzzy sets and intuitionistic fuzzy sets cannot describe and deal with. To cluster the data represented by single-valued neutrosophic information, this article proposes single-valued neutrosophic clustering methods based on similarity measures between SVNSs. First, we define a generalized distance measure between SVNSs and propose two distance-based similarity measures of SVNSs. Then, we present a clustering algorithm based on the similarity measures of SVNSs to cluster single-valued neutrosophic data. Finally, an illustrative example is given to demonstrate the application and effectiveness of the developed clustering methods.


Author(s):  
Marwan B. Mohammed ◽  
Wafaa AL-Hameed

The clustering analysis techniques play an important role in the area of data mining. Although from existence several clustering techniques. However, it still to their tries to improve the clustering process efficiently or propose new techniques seeks to allocate objects into clusters so that two objects in the same cluster are more similar than two objects in different clusters and careful not to duplicate the same objects in different groups with the ability to cover all data as much as possible. This paper presents two directions. The first is to propose a new algorithm that coined a name (MB Algorithm) to collect unlabeled data and put them into appropriate groups. The second is the creation of a lexical sequence sentence (LCS) based on similar semantic sentences which are different from the traditional lexical word chain (LCW) based on words. The results showed that the performance of the MB algorithm has generally outperformed the two algorithms the hierarchical clustering algorithm and the K-mean algorithm.


2014 ◽  
Vol 26 (2) ◽  
pp. 705-719 ◽  
Author(s):  
S. Ramathilaga ◽  
James Jiunn-Yin Leu ◽  
Kuo-Kuang Huang ◽  
Yueh-Min Huang

Sign in / Sign up

Export Citation Format

Share Document