Clustering web search results using Wikipedia resource

2020 ◽  
Vol 0 (10/2019) ◽  
pp. 25-29
Author(s):  
Chung Tran ◽  
Andrzej Ameljańczyk

The paper presents a proposal of a new method for clustering search results. The method uses an external knowledge resource, which can be, for example, Wikipedia. Wikipedia – the largest encyclopedia, is a free and popular knowledge resource which is used to extract topics from short texts. Similarities between documents are calculated based on the similarities between these topics. After that, affinity propagation clustering algorithm is employed to cluster web search results. Proposed method is tested by AMBIENT dataset and evaluated within the experimental framework provided by a SemEval-2013 task. The paper also suggests new method to compare global performance of algorithms using multi – criteria analysis.

2014 ◽  
Vol 590 ◽  
pp. 688-692
Author(s):  
Bei Chen ◽  
Kun Song

Overlap information usually exits in the high-dimensional data. Misclassified points may be more when affinity propagation clustering is applied to these data. Concerning this problem, a new method combining principal components analysis and affinity propagation clustering is proposed. In this method, dimensionality of the original data is reduced on the premise of reserving most information of the variables. Then, affinity propagation clustering is implemented in the low-dimensional space. Thus, because the redundant information is deleted, the classification is accurate. Experiment is done by using this new method, the results of the experiment explain that this method is effective.


Author(s):  
R. Subhashini ◽  
V.Jawahar Senthil Kumar

The World Wide Web is a large distributed digital information space. The ability to search and retrieve information from the Web efficiently and effectively is an enabling technology for realizing its full potential. Information Retrieval (IR) plays an important role in search engines. Today’s most advanced engines use the keyword-based (“bag of words”) paradigm, which has inherent disadvantages. Organizing web search results into clusters facilitates the user’s quick browsing of search results. Traditional clustering techniques are inadequate because they do not generate clusters with highly readable names. This paper proposes an approach for web search results in clustering based on a phrase based clustering algorithm. It is an alternative to a single ordered result of search engines. This approach presents a list of clusters to the user. Experimental results verify the method’s feasibility and effectiveness.


2012 ◽  
Vol 586 ◽  
pp. 241-246
Author(s):  
Li Min Li ◽  
Zhong Sheng Wang

When diagnosing sudden mechanical failure, in order to make the result of classification more accurate, in this article we describe an affinity propagation clustering algorithm for feature selection of sudden machinery failure diagnosis. General methods of feature selection select features by reducing dimension of the features, at the same time changing the data in the feature space, which would result in incorrect answer to the diagnosis. While affinity propagation method is based on measuring similarity between features whereby redundancy therein is removed, and selecting the exemplar subset of features, while doesn't change the data in the feature space. After testing on clustering and taking the result of PCA and affinity propagation clustering as input of a same SVM classifier, we get the conclusion that the latter has lower error than the former.


2020 ◽  
Author(s):  
Sayed Moustafa ◽  
Farhan Khan ◽  
Mohamed Metwaly ◽  
Eslam A.Elawadi ◽  
Nassir Al-Arifi

Abstract Investigations made to evaluate the site effect characteristics and develop a reliable site classification scheme have received the paramount importance for the planning of urban areas and for a reliable site-specific seismic hazard assessment. This paper presents a new approach for site classification based on affinity propagation (AP) along with a selected set of representative horizontal to vertical spectral ratio (HVSR) curves inside King Saud University (KSU) campus. Measurements of the ambient vibrations were performed to cover the entire campus area by about 307 stations with 20 minutes recording length and sample rate of 128 Hz for each station to satisfy the criteria for reliable and unambiguous HVSR results. Predominant period values were used for identifying of site response and subsequent site classification. Empirical equations from the literature relating frequency of HVSR peak to average shear wave velocity in the upper 30m, commonly used as a proxy for site classification, were found to be unreliable, making site classification difficult. To overcome this problem, Affinity propagation clustering algorithm is used. The obtained results illustrated that microtremors spectral ratios can be remarkably robust tool in determining site effects. The survey results concluded to the preliminary seismic site classification map for the mapped area, which would be useful for future safe design of structures. Finally, the results presented in this study are encouraging prolongation of this type of study in other parts of Saudi Arabia using the microtremors data and site response functions.


2011 ◽  
Vol 1 (1) ◽  
pp. 31-44 ◽  
Author(s):  
R. Subhashini ◽  
V.Jawahar Senthil Kumar

The World Wide Web is a large distributed digital information space. The ability to search and retrieve information from the Web efficiently and effectively is an enabling technology for realizing its full potential. Information Retrieval (IR) plays an important role in search engines. Today’s most advanced engines use the keyword-based (“bag of words”) paradigm, which has inherent disadvantages. Organizing web search results into clusters facilitates the user’s quick browsing of search results. Traditional clustering techniques are inadequate because they do not generate clusters with highly readable names. This paper proposes an approach for web search results in clustering based on a phrase based clustering algorithm. It is an alternative to a single ordered result of search engines. This approach presents a list of clusters to the user. Experimental results verify the method s feasibility and effectiveness.


Sign in / Sign up

Export Citation Format

Share Document