Fuzzy Set Based Clustering Algorithm of Web Text

Web text exists non-certain and non-structure contents ,and it is difficult to cluster the text by normal classification methods. We propose a web text clustering algorithm based on fuzzy set to increase the computing accuracy with the web text. After abstracting the key words of the text, we can look it as attributes and design the fuzzy algorithm to decide the membership of the words. The algorithm can improve the algorithm complexity of time and space, increase the robustness comparing to the normal algorithm. To test the accuracy and efficiency of the algorithm, we take the comparative experiment between pattern clustering and our algorithm. The experiment shows that our method has a better result.

Download Full-text

Public Opinion Hotspot Discovery Algorithm Based on Fuzzy Clustering LDA

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.433-435.626 ◽

2013 ◽

Vol 433-435 ◽

pp. 626-629

Author(s):

Hong Xin Wan ◽

Yun Peng

Keyword(s):

Public Opinion ◽

Key Words ◽

Fuzzy Clustering ◽

Fuzzy Set ◽

High Probability ◽

Clustering Algorithm ◽

Precision Data ◽

Topic Extraction ◽

Noise Data ◽

Topic Clustering

The discovery of public opinion hotspot is an important aspect of public opinion research, and because many similarities and relevance exist between hot topics, we propose a hot topic clustering algorithm to find the hotspot in public opinions. Since fuzzy set can handle non-precision data well, the fuzzy algorithm can reduce the influences of the uncertainty of public opinion data. Based on LDA topic extraction we cluster the topical words by fuzzy method, and take the topic probability as word membership to the cluster. It can reduce the noise data and improve the ability of hotspot discovery that aggregate the similar and related topic to one class. The topical key words with high probability in cluster are the hotspot, and singular cluster with few words can be looked as outlier. The algorithm is demonstrated by example analysis in detail.

Download Full-text

Fuzzy Set Based Web Opinion Text Clustering Algorithm

Proceedings of the 4th International Conference on Mechatronics, Materials, Chemistry and Computer Engineering 2015 ◽

10.2991/icmmcce-15.2015.501 ◽

2015 ◽

Author(s):

Hongxin Wan ◽

Yun Peng

Keyword(s):

Fuzzy Set ◽

Clustering Algorithm ◽

Text Clustering

Download Full-text

K-means text clustering algorithm based on density and nearest neighbor

Journal of Computer Applications ◽

10.3724/sp.j.1087.2010.01933 ◽

2010 ◽

Vol 30 (7) ◽

pp. 1933-1935 ◽

Cited By ~ 6

Author(s):

Wen-ming ZHANG ◽

Jiang WU ◽

Xiao-jiao YUAN

Keyword(s):

Clustering Algorithm ◽

Nearest Neighbor ◽

Text Clustering

Download Full-text

A Novel Unsupervised Classification Method for Sandy Land Using Fully Polarimetric SAR Data

Remote Sensing ◽

10.3390/rs13030355 ◽

2021 ◽

Vol 13 (3) ◽

pp. 355

Author(s):

Weixian Tan ◽

Borong Sun ◽

Chenyu Xiao ◽

Pingping Huang ◽

Wei Xu ◽

...

Keyword(s):

Spectral Clustering ◽

Large Scale ◽

Clustering Algorithm ◽

Feature Vector ◽

Unsupervised Classification ◽

Classification Method ◽

Sandy Land ◽

Classification Methods ◽

The Many ◽

Representative Points

Classification based on polarimetric synthetic aperture radar (PolSAR) images is an emerging technology, and recent years have seen the introduction of various classification methods that have been proven to be effective to identify typical features of many terrain types. Among the many regions of the study, the Hunshandake Sandy Land in Inner Mongolia, China stands out for its vast area of sandy land, variety of ground objects, and intricate structure, with more irregular characteristics than conventional land cover. Accounting for the particular surface features of the Hunshandake Sandy Land, an unsupervised classification method based on new decomposition and large-scale spectral clustering with superpixels (ND-LSC) is proposed in this study. Firstly, the polarization scattering parameters are extracted through a new decomposition, rather than other decomposition approaches, which gives rise to more accurate feature vector estimate. Secondly, a large-scale spectral clustering is applied as appropriate to meet the massive land and complex terrain. More specifically, this involves a beginning sub-step of superpixels generation via the Adaptive Simple Linear Iterative Clustering (ASLIC) algorithm when the feature vector combined with the spatial coordinate information are employed as input, and subsequently a sub-step of representative points selection as well as bipartite graph formation, followed by the spectral clustering algorithm to complete the classification task. Finally, testing and analysis are conducted on the RADARSAT-2 fully PolSAR dataset acquired over the Hunshandake Sandy Land in 2016. Both qualitative and quantitative experiments compared with several classification methods are conducted to show that proposed method can significantly improve performance on classification.

Download Full-text

Word2Cluster: A New Multi-Label Text Clustering Algorithm with an Adaptive Clusters Number

2019 IEEE Global Communications Conference (GLOBECOM) ◽

10.1109/globecom38437.2019.9013266 ◽

2019 ◽

Author(s):

Kaili Mao ◽

Jianwei Niu ◽

Xuefeng Liu ◽

Shui Yu ◽

Longbo Zhao

Keyword(s):

Clustering Algorithm ◽

Text Clustering

Download Full-text

A novel text clustering algorithm treated attributes differently

Control Engineering and Information Systems ◽

10.1201/b17732-147 ◽

2015 ◽

pp. 731-734

Keyword(s):

Clustering Algorithm ◽

Text Clustering

Download Full-text

A Cluster-Based Browsing Model For QoS-Aware Web Service Selection

10.32920/ryerson.14655468.v1 ◽

2021 ◽

Author(s):

Kian Farsandaj

Keyword(s):

Web Services ◽

Web Service ◽

Clustering Algorithm ◽

Service Selection ◽

Functional Requirements ◽

Analysis Techniques ◽

Symbolic Data ◽

Web Service Selection ◽

The Web

In the last decade, selecting suitable web services based on users’ requirements has become one of the major subjects in the web service domain. Any research works have been done - either based on functional requirements, or focusing more on Quality of Service (QoS) - based selection. We believe that searching is not the only way to implement the selection. Selection could also be done by browsing, or by a combination of searching and browsing. In this thesis, we propose a browsing method based on the Scatter/Gather model, which helps users gain a better understanding of the QoS value distribution of the web services and locate their desired services. Because the Scatter/Gather model uses cluster analysis techniques and web service QoS data is best represented as a vector of intervals, or more generically a vector of symbolic data, we apply for symbolic clustering algorithm and implement different variations of the Scatter/Gather model. Through our experiments on both synthetic and real datasets, we identify the most efficient ( based on the processing time) and effective implementations.

Download Full-text

A Roadmap to Integrate Document Clustering in Information Retrieval

Information Retrieval Methods for Multidisciplinary Applications ◽

10.4018/978-1-4666-3898-3.ch003 ◽

2013 ◽

pp. 31-45

Author(s):

R. Subhashini ◽

V.Jawahar Senthil Kumar

Keyword(s):

Information Retrieval ◽

Search Engines ◽

World Wide ◽

Clustering Algorithm ◽

Web Search ◽

Full Potential ◽

Digital Information ◽

Search Results ◽

The World ◽

The Web

The World Wide Web is a large distributed digital information space. The ability to search and retrieve information from the Web efficiently and effectively is an enabling technology for realizing its full potential. Information Retrieval (IR) plays an important role in search engines. Today’s most advanced engines use the keyword-based (“bag of words”) paradigm, which has inherent disadvantages. Organizing web search results into clusters facilitates the user’s quick browsing of search results. Traditional clustering techniques are inadequate because they do not generate clusters with highly readable names. This paper proposes an approach for web search results in clustering based on a phrase based clustering algorithm. It is an alternative to a single ordered result of search engines. This approach presents a list of clusters to the user. Experimental results verify the method’s feasibility and effectiveness.

Download Full-text

Authorship Detection and Encoding for eBay Images

International Journal of Multimedia Data Engineering and Management ◽

10.4018/jmdem.2011010102 ◽

2011 ◽

Vol 2 (1) ◽

pp. 22-37 ◽

Cited By ~ 2

Author(s):

Liping Zhou ◽

Wei-Bang Chen ◽

Chengcui Zhang

Keyword(s):

Clustering Algorithm ◽

Classification Methods ◽

Test Image ◽

Common Edge ◽

Probability Approach ◽

Color Models ◽

The Common ◽

Edge Based

This paper describes a framework to detect authorship of eBay images. It contains three modules: editing style summarization, classification and multi-account linking detection. For editing style summarization, three approaches, namely the edge-based approach, the color-based approach, and the color probability approach, are proposed to encode the common patterns inside a group of images with similar editing styles into common edge or color models. Prior to the summarization step, an edge-based clustering algorithm is developed. Corresponding to the three summarization approaches, three classification methods are developed accordingly to predict the authorship of an unlabeled test image. For multi-account linking detection, to detect the hidden owner behind multiple eBay seller accounts, two methods to measure the similarity between seller accounts based on similar models are presented.

Download Full-text

SeqPAM

Successes and New Directions in Data Mining ◽

10.4018/978-1-59904-645-7.ch002 ◽

2008 ◽

pp. 17-38 ◽

Cited By ~ 1

Author(s):

Pradeep Kumar Kumar ◽

Raju S. Bapi ◽

P. Radha Krishna

Keyword(s):

Similarity Measure ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Similarity Measures ◽

Sequential Data ◽

Cluster Validation ◽

Web Personalization ◽

Similarity Preserving ◽

Validation Technique ◽

The Web

With the growth in the number of web users and necessity for making information available on the web, the problem of web personalization has become very critical and popular. Developers are trying to customize a web site to the needs of specific users with the help of knowledge acquired from user navigational behavior. Since user page visits are intrinsically sequential in nature, efficient clustering algorithms for sequential data are needed. In this paper, we introduce a similarity preserving function called sequence and set similarity measure S3M that captures both the order of occurrence of page visits as well as the content of pages. We conducted pilot experiments comparing the results of PAM, a standard clustering algorithm, with two similarity measures: Cosine and S3M. The goodness of the clusters resulting from both the measures was computed using a cluster validation technique based on average levensthein distance. Results on pilot dataset established the effectiveness of S3M for sequential data. Based on these results, we proposed a new clustering algorithm, SeqPAM for clustering sequential data. We tested the new algorithm on two datasets namely, cti and msnbc datasets. We provided recommendations for web personalization based on the clusters obtained from SeqPAM for msnbc dataset.

Download Full-text