web clustering
Recently Published Documents


TOTAL DOCUMENTS

33
(FIVE YEARS 0)

H-INDEX

7
(FIVE YEARS 0)

Author(s):  
Hossein Shirgahi ◽  
Mehran Mohsenzadeh ◽  
Hamid Haj Seyyed Javadi
Keyword(s):  

Author(s):  
Kanaka Durga ◽  
V Rama Krishna

In the websites the contents will be are similarity when we compared with other search engines. So to check the similar content in the websites and its web contents we created a overhead to the search engine which will severely effect its performance & quality. So to detect the silmilar or same content or web documenattion some techniques are implemented by web crawling research community. So it is one of major factor for the search engines to provide some applicatory data to users in the first page itself. So to avoid such issues we proposed a methodlogy called Automatic Detection of illegitimate websites with Mutual Clustering (ADIWMC) paper we are presenting a peculiar and efficacious path for the detection of similarities in the web pages in web clustering. Detection of same and similar web pages and web content will be done by storing the crawled web pages into depository. Initially the adwords will be extracted from the crawled pages and similarity checking will be done between the two pages based in the usage of adwords. So a threshold value is set for this, if the similarity checking percentage is greater than the threshold then similarity content is reduced and improves the depositary and improves the search engine quality. In the sections of existing analysis and the proposed analysis we are clearly exploring how it works.


Author(s):  
Kanaka Durga ◽  
V Rama Krishna

In the websites the contents will be are similarity when we compared with other search engines. So to check the similar content in the websites and its web contents we created a overhead to the search engine which will severely effect its performance & quality. So to detect the silmilar or same content or web documenattion some techniques are implemented by web crawling research community. So it is one of major factor for the search engines to provide some applicatory data to users in the first page itself. So to avoid such issues we proposed a methodlogy called Automatic Detection of illegitimate websites with Mutual Clustering (ADIWMC) paper we are presenting a peculiar and efficacious path for the detection of similarities in the web pages in web clustering. Detection of same and similar web pages and web content will be done by storing the crawled web pages into depository. Initially the adwords will be extracted from the crawled pages and similarity checking will be done between the two pages based in the usage of adwords. So a threshold value is set for this, if the similarity checking percentage is greater than the threshold then similarity content is reduced and improves the depositary and improves the search engine quality. In the sections of existing analysis and the proposed analysis we are clearly exploring how it works.


2014 ◽  
Vol 11 (1) ◽  
pp. 111-131
Author(s):  
Tomas Grigalis ◽  
Antanas Cenys

Template-generated Web pages contain most of structured data on the Web. Clustering these pages according to their template structure is an important problem in wrapper-based structured data extraction systems. These systems extract structured data using wrappers that must be matched to only particular template pages. Selecting single type of template from all crawled Web pages is a time consuming task. Although there are methods to cluster Web pages according to their structural similarity, however, in most cases they are too computationally expensive to be applicable at Web-Scale. We propose a novel highly scalable approach to structurally cluster Web pages by employing XPath addresses of inbound inner-site links. We demonstrate the effectiveness of our method by clustering more than one million Web pages from many real world Websites in a few minutes and achieving >90% accuracy.


2013 ◽  
Vol 39 (3) ◽  
pp. 709-754 ◽  
Author(s):  
Antonio Di Marco ◽  
Roberto Navigli

Web search result clustering aims to facilitate information search on the Web. Rather than the results of a query being presented as a flat list, they are grouped on the basis of their similarity and subsequently shown to the user as a list of clusters. Each cluster is intended to represent a different meaning of the input query, thus taking into account the lexical ambiguity (i.e., polysemy) issue. Existing Web clustering methods typically rely on some shallow notion of textual similarity between search result snippets, however. As a result, text snippets with no word in common tend to be clustered separately even if they share the same meaning, whereas snippets with words in common may be grouped together even if they refer to different meanings of the input query. In this article we present a novel approach to Web search result clustering based on the automatic discovery of word senses from raw text, a task referred to as Word Sense Induction. Key to our approach is to first acquire the various senses (i.e., meanings) of an ambiguous query and then cluster the search results based on their semantic similarity to the word senses induced. Our experiments, conducted on data sets of ambiguous queries, show that our approach outperforms both Web clustering and search engines.


2013 ◽  
Vol 336-338 ◽  
pp. 2152-2156
Author(s):  
Rong Wang ◽  
Bo Ping Zhang

This paper analyzes Web clustering firstly, then proposes to describes the object using binary property for users clustering in Web clustering. Makes the time property Binaryzation using Zipf's Law, then clusters the object using ROCK Algorithm. Experiments can prove that the work for clustering is well.


Polibits ◽  
2013 ◽  
Vol 47 ◽  
pp. 31-45 ◽  
Author(s):  
Carlos Cobos ◽  
Martha Mendoza ◽  
Elizabeth León ◽  
Milos Manic ◽  
Enrique Herrera-Viedma

Sign in / Sign up

Export Citation Format

Share Document