Semantic Similarity-Based Clustering of Web Documents Using Fuzzy C-Means

With the massive growth and large volume of the web it is very difficult to recover results based on the user preferences. The next generation web architecture, semantic web reduces the burden of the user by performing search based on semantics instead of keywords. Even in the context of semantic technologies optimization problem occurs but rarely considered. In this paper, document clustering is applied to recover relevant documents. We propose an ontology-based clustering algorithm using semantic similarity measure and Fuzzy C-Means, which is applied to the annotated documents for optimizing the result. The proposed method uses Jena API and GATE tool API and the documents can be recovered based on their annotation features and relations. A preliminary experiment comparing the proposed method with K-Means, PSO and hybrid approach PSOK Means shows that the proposed method is feasible and performs better than other clustering methods.

Download Full-text

Semantic Clustering of Web Documents

International Journal of Information Technology and Web Engineering ◽

10.4018/jitwe.2012100102 ◽

2012 ◽

Vol 7 (4) ◽

pp. 20-33 ◽

Cited By ~ 1

Author(s):

J. Avanija ◽

K. Ramar

Keyword(s):

Optimization Problem ◽

Clustering Algorithm ◽

User Preferences ◽

Semantic Similarity Measure ◽

Web Documents ◽

Semantic Clustering ◽

Massive Growth ◽

Web Architecture ◽

Paper Document ◽

Better Than

With the massive growth and large volume of the web it is very difficult to recover results based on the user preferences. The next generation web architecture, semantic web reduces the burden of the user by performing search based on semantics instead of keywords. Even in the context of semantic technologies optimization problem occurs but rarely considered. In this paper document clustering is applied to recover relevant documents. The authors propose an ontology based clustering algorithm using semantic similarity measure and Particle Swarm Optimization (PSO), which is applied to the annotated documents for optimizing the result. The proposed method uses Jena API and GATE tool API and the documents can be recovered based on their annotation features and relations. A preliminary experiment comparing the proposed method with K-Means shows that the proposed method is feasible and performs better than K-Means.

Download Full-text

A Hybrid Deep Clustering Approach for Robust Cell Type Profiling Using Single-cell RNA-seq Data

10.1101/511626 ◽

2019 ◽

Cited By ~ 2

Author(s):

Suhas Srinivasan ◽

Nathan T. Johnson ◽

Dmitry Korkin

Keyword(s):

Deep Learning ◽

Single Cell ◽

Clustering Algorithm ◽

Hybrid Approach ◽

Feature Learning ◽

Specific Cell ◽

Clustering Methods ◽

Model Based Clustering ◽

Clustering And Classification ◽

Living Organisms

AbstractSingle-cell RNA sequencing (scRNA-seq) is a recent technology that enables fine-grained discovery of cellular subtypes and specific cell states. It routinely uses machine learning methods, such as feature learning, clustering, and classification, to assist in uncovering novel information from scRNA-seq data. However, current methods are not well suited to deal with the substantial amounts of noise that is created by the experiments or the variation that occurs due to differences in the cells of the same type. Here, we develop a new hybrid approach, Deep Unsupervised Single-cell Clustering (DUSC), that integrates feature generation based on a deep learning architecture with a model-based clustering algorithm, to find a compact and informative representation of the single-cell transcriptomic data generating robust clusters. We also include a technique to estimate an efficient number of latent features in the deep learning model. Our method outperforms both classical and state-of-the-art feature learning and clustering methods, approaching the accuracy of supervised learning. The method is freely available to the community and will hopefully facilitate our understanding of the cellular atlas of living organisms as well as provide the means to improve patient diagnostics and treatment.

Download Full-text

Improving Image Search through MKFCM Clustering Strategy-Based Re-ranking Measure

Journal of Intelligent Systems ◽

10.1515/jisys-2017-0227 ◽

2018 ◽

Vol 29 (1) ◽

pp. 497-514

Author(s):

A.K. Naveena ◽

N.K. Narayanan

Keyword(s):

Clustering Algorithm ◽

Data Retrieval ◽

Initial Step ◽

Discrete Wavelet ◽

Image Search ◽

Clustering Methods ◽

Fuzzy C Means ◽

Image Retrieval System ◽

Fuzzy C Means Clustering ◽

Search Approach

Abstract The main intention of this research is to develop a novel ranking measure for content-based image retrieval system. Owing to the achievement of data retrieval, most commercial search engines still utilize a text-based search approach for image search by utilizing encompassing textual information. As the text information is, in some cases, noisy and even inaccessible, the drawback of such a recovery strategy is to the extent that it cannot depict the contents of images precisely, subsequently hampering the execution of image search. In order to improve the performance of image search, we propose in this work a novel algorithm for improving image search through a multi-kernel fuzzy c-means (MKFCM) algorithm. In the initial step of our method, images are retrieved using four-level discrete wavelet transform-based features and the MKFCM clustering algorithm. Next, the retrieved images are analyzed using fuzzy c-means clustering methods, and the rank of the results is adjusted according to the distance of a cluster from a query. To improve the ranking performance, we combine the retrieved result and ranking result. At last, we obtain the ranked retrieved images. In addition, we analyze the effects of different clustering methods. The effectiveness of the proposed methodology is analyzed with the help of precision, recall, and F-measures.

Download Full-text

Fuzzy C-Means Clustering Algorithm with Multiple Fuzzification Coefficients

Algorithms ◽

10.3390/a13070158 ◽

2020 ◽

Vol 13 (7) ◽

pp. 158

Author(s):

Tran Dinh Khang ◽

Nguyen Duc Vuong ◽

Manh-Kien Tran ◽

Michael Fowler

Keyword(s):

Fuzzy Clustering ◽

Clustering Algorithm ◽

Clustering Methods ◽

Clustering Method ◽

Machine Learning Technique ◽

Practical Applications ◽

Fuzzy C Means ◽

Fuzzy Clustering Method ◽

Learning Technique ◽

Fuzzy C Means Clustering

Clustering is an unsupervised machine learning technique with many practical applications that has gathered extensive research interest. Aside from deterministic or probabilistic techniques, fuzzy C-means clustering (FCM) is also a common clustering technique. Since the advent of the FCM method, many improvements have been made to increase clustering efficiency. These improvements focus on adjusting the membership representation of elements in the clusters, or on fuzzifying and defuzzifying techniques, as well as the distance function between elements. This study proposes a novel fuzzy clustering algorithm using multiple different fuzzification coefficients depending on the characteristics of each data sample. The proposed fuzzy clustering method has similar calculation steps to FCM with some modifications. The formulas are derived to ensure convergence. The main contribution of this approach is the utilization of multiple fuzzification coefficients as opposed to only one coefficient in the original FCM algorithm. The new algorithm is then evaluated with experiments on several common datasets and the results show that the proposed algorithm is more efficient compared to the original FCM as well as other clustering methods.

Download Full-text

Fuzzy Divisive Hierarchical Clustering of Solvents According to Their Experimentally and Theoretically Predicted Descriptors

Symmetry ◽

10.3390/sym12111763 ◽

2020 ◽

Vol 12 (11) ◽

pp. 1763

Author(s):

Miroslava Nedyalkova ◽

Costel Sarbu ◽

Marek Tobiszewski ◽

Vasil Simeonov

Keyword(s):

Physicochemical Properties ◽

Hierarchical Clustering ◽

Clustering Algorithm ◽

Simple Procedure ◽

Hybrid Approach ◽

Molecular Symmetry ◽

Clustering Methods ◽

Fuzzy Partition ◽

Large Group ◽

Divisive Hierarchical Clustering

The present study describes a simple procedure to separate into patterns of similarity a large group of solvents, 259 in total, presented by 15 specific descriptors (experimentally found and theoretically predicted physicochemical parameters). Solvent data is usually characterized by its high variability, different molecular symmetry, and spatial orientation. Methods of chemometrics can usefully be used to extract and explore accurately the information contained in such data. In this order, advanced fuzzy divisive hierarchical-clustering methods were efficiently applied in the present study of a large group of solvents using specific descriptors. The fuzzy divisive hierarchical associative-clustering algorithm provides not only a fuzzy partition of the solvents investigated, but also a fuzzy partition of descriptors considered. In this way, it is possible to identify the most specific descriptors (in terms of higher, smallest, or intermediate values) to each fuzzy partition (group) of solvents. Additionally, the partitioning performed could be interpreted with respect to the molecular symmetry. The chemometric approach used for this goal is fuzzy c-means method being a semi-supervised clustering procedure. The advantage of such a clustering process is the opportunity to achieve separation of the solvents into similarity patterns with a certain degree of membership of each solvent to a certain pattern, as well as to consider possible membership of the same object (solvent) in another cluster. Partitioning based on a hybrid approach of the theoretical molecular descriptors and experimentally obtained ones permits a more straightforward separation into groups of similarity and acceptable interpretation. It was shown that an important link between objects’ groups of similarity and similarity groups of variables is achieved. Ten classes of solvents are interpreted depending on their specific descriptors, as one of the classes includes a single object and could be interpreted as an outlier. Setting the results of this research into broader perspective, it has been shown that the fuzzy clustering approach provides a useful tool for partitioning by the variables related to the main physicochemical properties of the solvents. It gets possible to offer a simple guide for solvents recognition based on theoretically calculated or experimentally found descriptors related to the physicochemical properties of the solvents.

Download Full-text

TEXTUAL-BASED CLUSTERING OF WEB DOCUMENTS

International Journal of Uncertainty Fuzziness and Knowledge-Based Systems ◽

10.1142/s021848850400317x ◽

2004 ◽

Vol 12 (06) ◽

pp. 715-743 ◽

Cited By ~ 1

Author(s):

PAWEL BRZEMINSKI ◽

WITOLD PEDRYCZ

Keyword(s):

Experimental Study ◽

Data Collection ◽

Internal Structure ◽

Clustering Algorithm ◽

Web Pages ◽

Clustering Method ◽

Web Documents ◽

Fuzzy C Means ◽

Input Parameters ◽

Fuzzy C Means Clustering

In our study we presented an effective method for clustering of Web pages. From flat HTML files we extracted keywords, formed feature vectors as representation of Web pages and applied them to a clustering method. We took advantage of the Fuzzy C-Means clustering algorithm (FCM). We demonstrated an organized and schematic manner of data collection. Various categories of Web pages were retrieved from ODP (Open Directory Project) in order to create our datasets. The results of clustering proved that the method performs well for all datasets. Finally, we presented a comprehensive experimental study examining: the behavior of the algorithm for different input parameters, internal structure of datasets and classification experiments.

Download Full-text

A modified ant-based text clustering algorithm with semantic similarity measure

Journal of Systems Science and Systems Engineering ◽

10.1007/s11518-006-5029-z ◽

2006 ◽

Vol 15 (4) ◽

pp. 474-492 ◽

Cited By ~ 9

Author(s):

Haoxiang Xia ◽

Shuguang Wang ◽

Taketoshi Yoshida

Keyword(s):

Semantic Similarity ◽

Similarity Measure ◽

Clustering Algorithm ◽

Text Clustering ◽

Semantic Similarity Measure

Download Full-text

An Interval Slope Approach to Fuzzy C-Means Clustering Algorithm for Interval Valued Data

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.989-994.1641 ◽

2014 ◽

Vol 989-994 ◽

pp. 1641-1645

Author(s):

Yan Jin ◽

Jiang Hong Ma

Keyword(s):

Clustering Algorithm ◽

Interval Data ◽

Experimental Results ◽

Clustering Methods ◽

Structure Information ◽

Fuzzy C Means ◽

Fuzzy C Means Clustering ◽

Interval Valued

Interval data is a range of continuous values which can describe the uncertainty. The traditional clustering methods ignore the structure information of intervals. And some modified ones have been developed. We have already used Taylor technique to perform well in the fuzzy c-means clustering algorithm. In this paper, we propose a new way based on the mixed interval slopes technique and interval computing. Experimental results also show the effectiveness of our approach.

Download Full-text

A novel hybrid approach for real world data clustering algorithm based on fuzzy C-means and firefly algorithm

International Journal of Fuzzy Computation and Modelling ◽

10.1504/ijfcm.2015.076274 ◽

2015 ◽

Vol 1 (4) ◽

pp. 431 ◽

Cited By ~ 1

Author(s):

Himansu Sekhar Behera ◽

Janmenjoy Nayak ◽

M. Nanda ◽

K. Nayak

Keyword(s):

Real World ◽

Data Clustering ◽

Firefly Algorithm ◽

Clustering Algorithm ◽

Hybrid Approach ◽

Real World Data ◽

World Data ◽

Fuzzy C Means

Download Full-text

Comparison of Clustering K-Means, Fuzzy C-Means, and Linkage for Nasa Active Fire Dataset

International Journal of Artificial Intelligence & Robotics (IJAIR) ◽

10.25139/ijair.v2i2.3030 ◽

2020 ◽

Vol 2 (2) ◽

pp. 34

Author(s):

Muchamad Kurniawan ◽

Rani Rotul Muhima ◽

Siti Agustini

Keyword(s):

Forest Fires ◽

Clustering Algorithm ◽

Hot Spot ◽

Clustering Methods ◽

Clustering Method ◽

Simple Method ◽

Fuzzy C Means ◽

Total Distance ◽

Active Fire ◽

Average Linkage

One of the causes of forest fires is the lack of speed of handling when a fire occurs. This can be anticipated by determining how many extinguishing units are in the center of the hot spot. To get hotspots, NASA has provided an active fire dataset. The clustering method is used to get the most optimal centroid point. The clustering methods we use are K-Means, Fuzzy C-Means (FCM), and Average Linkage. The reason for using K-means is a simple method and has been applied in various areas. FCM is a partition-based clustering algorithm which is a development of the K-means method. The hierarchical based clustering method is represented by the Average Linkage method. The measurement technique that uses is the sum of the internal distance of each cluster. Elbow evaluation is used to evaluate the optimal cluster. The results obtained after conducting the K-Means trial obtained the best results with a total distance of 145.35 km, and the best clusters from this method were 4 clusters. Meanwhile, the total distance values obtained from the FCM and Linkage methods were 154.13 km and 266.61 km.

Download Full-text