TEXTUAL-BASED CLUSTERING OF WEB DOCUMENTS

Author(s):  
PAWEL BRZEMINSKI ◽  
WITOLD PEDRYCZ

In our study we presented an effective method for clustering of Web pages. From flat HTML files we extracted keywords, formed feature vectors as representation of Web pages and applied them to a clustering method. We took advantage of the Fuzzy C-Means clustering algorithm (FCM). We demonstrated an organized and schematic manner of data collection. Various categories of Web pages were retrieved from ODP (Open Directory Project) in order to create our datasets. The results of clustering proved that the method performs well for all datasets. Finally, we presented a comprehensive experimental study examining: the behavior of the algorithm for different input parameters, internal structure of datasets and classification experiments.

Algorithms ◽  
2020 ◽  
Vol 13 (7) ◽  
pp. 158
Author(s):  
Tran Dinh Khang ◽  
Nguyen Duc Vuong ◽  
Manh-Kien Tran ◽  
Michael Fowler

Clustering is an unsupervised machine learning technique with many practical applications that has gathered extensive research interest. Aside from deterministic or probabilistic techniques, fuzzy C-means clustering (FCM) is also a common clustering technique. Since the advent of the FCM method, many improvements have been made to increase clustering efficiency. These improvements focus on adjusting the membership representation of elements in the clusters, or on fuzzifying and defuzzifying techniques, as well as the distance function between elements. This study proposes a novel fuzzy clustering algorithm using multiple different fuzzification coefficients depending on the characteristics of each data sample. The proposed fuzzy clustering method has similar calculation steps to FCM with some modifications. The formulas are derived to ensure convergence. The main contribution of this approach is the utilization of multiple fuzzification coefficients as opposed to only one coefficient in the original FCM algorithm. The new algorithm is then evaluated with experiments on several common datasets and the results show that the proposed algorithm is more efficient compared to the original FCM as well as other clustering methods.


2017 ◽  
Vol 25 (0) ◽  
pp. 30-33
Author(s):  
Subhasis Das ◽  
Anindya Ghosh

In this paper a new technique has been proposed for cotton bale management using fuzzy logic. The fuzzy c-means clustering algorithm has been applied for clustering cotton bales into 5 categories from 1200 randomly chosen bales of the J-34 variety. In order to cluster bales of different categories, eight fibre properties, viz., the strength, elongation, upper half mean length, length uniformity, short fibre content, micronaire, reflectance and yellowness of each bale have been considered. The fuzzy c-means clustering method is able to handle the haziness that may be present in the boundaries between adjacent classes of cotton bales as compared to the K-means clustering method. This method may be used as a convenient tool for the consistent picking of different bale mixes from any number of bales in a warehouse.


2020 ◽  
Vol 15 ◽  
pp. 155892502097832
Author(s):  
Jiaqin Zhang ◽  
Jingan Wang ◽  
Le Xing ◽  
Hui’e Liang

As the precious cultural heritage of the Chinese nation, traditional costumes are in urgent need of scientific research and protection. In particular, there are scanty studies on costume silhouettes, due to the reasons of the need for cultural relic protection, and the strong subjectivity of manual measurement, which limit the accuracy of quantitative research. This paper presents an automatic measurement method for traditional Chinese costume dimensions based on fuzzy C-means clustering and silhouette feature point location. The method is consisted of six steps: (1) costume image acquisition; (2) costume image preprocessing; (3) color space transformation; (4) object clustering segmentation; (5) costume silhouette feature point location; and (6) costume measurement. First, the relative total variation model was used to obtain the environmental robustness and costume color adaptability. Second, the FCM clustering algorithm was used to implement image segmentation to extract the outer silhouette of the costume. Finally, automatic measurement of costume silhouette was achieved by locating its feature points. The experimental results demonstrated that the proposed method could effectively segment the outer silhouette of a costume image and locate the feature points of the silhouette. The measurement accuracy could meet the requirements of industrial application, thus providing the dual value of costume culture research and industrial application.


Sign in / Sign up

Export Citation Format

Share Document