FINDING REPRESENTATIVE WEB PAGES BASED ON A SOM AND A REVERSE CLUSTER ANALYSIS

2011 ◽  
Vol 20 (01) ◽  
pp. 93-118 ◽  
Author(s):  
SEBASTIÁN A. RÍOS ◽  
JUAN D. VELÁSQUEZ

Enhancing the content and structure of a web site is a very important task which can help to maintain people visiting a web site and gain new visits (or customers). Web mining area helps to enhance a web site organization and contents using data mining algorithms. In particular we may perform Web Mining using a Self Organizing Feature Map (SOFM or SOM) it is always needed an analysis phase by experts. To help analysts to perform this phase after SOFMs' training, many post-processing techniques have been developed (component planes, labels, etc.); however, none of these techniques are useful when working in web mining for off-line enhancements of a web site. In this paper an algorithm called Reverse Cluster Analysis (RCA) will be provided. It aims to identify important web pages based on a self organizing feature map (SOFM) when performing web text mining (WTM) and web usage mining (WUM). We successfully applied this technique in a real web site to show its effectiveness. We have extended previous work performing a comparison with another unsupervised technique, administrators survey and an extended survey.

Author(s):  
Sebastián Ríos ◽  
Juan D. Velázquez ◽  
Hiroshi Yasuda ◽  
Terumasa Aoki

Author(s):  
Jayanti Mehra ◽  
Ramjeevan Singh Thakur

Weblog analysis takes raw data from access logs and performs study on this data for extracting statistical information. This info incorporates a variety of data for the website activity such as average no. of hits, total no. of user visits, failed and successful cached hits, average time of view, average path length over a website; analytical information such as page was not found errors and server errors; server information, which includes exit and entry pages, single access pages, and top visited pages; requester information like which type of search engines is used, keywords and top referring sites, and so on. In general, the website administrator uses this kind of knowledge to make the system act better, helping in the manipulation process of site, then also forgiving marketing decisions support. Most of the advanced web mining systems practice this kind of information to take out more difficult or complex interpretations using data mining procedures like association rules, clustering, and classification.


2007 ◽  
Vol 16 (05) ◽  
pp. 793-828 ◽  
Author(s):  
JUAN D. VELÁSQUEZ ◽  
VASILE PALADE

Understanding the web user browsing behaviour in order to adapt a web site to the needs of a particular user represents a key issue for many commercial companies that do their business over the Internet. This paper presents the implementation of a Knowledge Base (KB) for building web-based computerized recommender systems. The Knowledge Base consists of a Pattern Repository that contains patterns extracted from web logs and web pages, by applying various web mining tools, and a Rule Repository containing rules that describe the use of discovered patterns for building navigation or web site modification recommendations. The paper also focuses on testing the effectiveness of the proposed online and offline recommendations. An ample real-world experiment is carried out on a web site of a bank.


2011 ◽  
Vol 76 (3) ◽  
pp. 255-261 ◽  
Author(s):  
Piotr Kosiba ◽  
Andrzej Stankiewicz

The study objects were 48 microhabitats of five <em>Utricularia</em> species in Lower and Upper Silesia (POLAND). The aim of the paper was to focus on application of the Self-Organizing Feature Map in assessment of water trophicity in <em>Utricularia</em> microhabitats, and to describe how SOFM can be used for the study of ecological subjects. This method was compared with the hierarchical tree plot of cluster analysis to check whether this techniques give similar results. In effect, both topological map of SOFM and dendrogram of cluster analysis show differences between <em>Utricularia</em> species microhabitats in respect of water quality, from eutrophic for <em>U. vulgaris</em> to dystrophic for <em>U. minor</em> and <em>U. intermedia</em>. The used methods give similar results and constitute a validation of the SOFM method in this type of studies.


Author(s):  
Soner Kiziloluk ◽  
Ahmet Bedri Ozer

In recent years, data on the Internet has grown exponentially, attaining enormous dimensions. This situation makes it difficult to obtain useful information from such data. Web mining is the process of using data mining techniques such as association rules, classification, clustering, and statistics to discover and extract information from Web documents. Optimization algorithms play an important role in such techniques. In this work, the parliamentary optimization algorithm (POA), which is one of the latest social-based metaheuristic algorithms, has been adopted for Web page classification. Two different data sets (Course and Student) were selected for experimental evaluation, and HTML tags were used as features. The data sets were tested using different classification algorithms implemented in WEKA, and the results were compared with those of the POA. The POA was found to yield promising results compared to the other algorithms. This study is the first to propose the POA for effective Web page classification.


2009 ◽  
pp. 2924-2935
Author(s):  
Miao-Ling Wang ◽  
Hsiao-Fan Wang

With the ever-increasing and ever-changing flow of information available on the Web, information analysis has never been more important. Web text mining, which includes text categorization, text clustering, association analysis and prediction of trends, can assist us in discovering useful information in an effective and efficient manner. In this chapter, we have proposed a Web mining system that incorporates both online efficiency and off-line effectiveness to provide the “right” information based on users’ preferences. A Bi- Objective Fuzzy c-Means algorithm and information retrieval technique, for text categorization, clustering and integration, was employed for analysis. The proposed system is illustrated via a case involving the Web site marketing of mobile phones. A variety of Web sites exist on the Internet and a common type involves the trading of goods. In this type of Web site, the question to ask is: If we want to establish a Web site that provides information about products, how can we respond quickly and accurately to queries? This is equivalent to asking: How can we design a flexible search engine according to users’ preferences? In this study, we have applied data mining techniques to cope with such problems, by proposing, as an example, a Web site providing information on mobile phones in Taiwan. In order to efficiently provide useful information, two tasks were considered during the Web design phase. One related to off-line analysis: this was done by first carrying out a survey of frequent Web users, students between 15 and 40 years of age, regarding their preferences, so that Web customers’ behavior could be characterized. Then the survey data, as well as the products offered, were classified into different demand and preference groups. The other task was related to online query: this was done through the application of an information retrieval technique that responded to users’ queries. Based on the ideas above the remainder of the chapter is organized as follows: first, we present a literature review, introduce some concepts and review existing methods relevant to our study, then, the proposed Web mining system is presented, a case study of a mobile-phone marketing Web site is illustrated and finally, a summary and conclusions are offered.


Author(s):  
Quanzhi Li ◽  
Yi-fang Brook Wu

This chapter presents a new approach of mining the Web to identify people of similar background. To find similar people from the Web for a given person, two major research issues are person representation and matching persons. In this chapter, a person representation method which uses a person’s personal Web site to represent this person’s background is proposed. Based on this person representation method, the main proposed algorithm integrates textual content and hyperlink information of all the Web pages belonging to a personal Web site to represent a person and match persons. Other algorithms are also explored and compared to the main proposed algorithm. The evaluation methods and experimental results are presented.


Author(s):  
Miao-Ling Wang ◽  
Hsiao-Fan Wang

With the ever-increasing and ever-changing flow of information available on the Web, information analysis has never been more important. Web text mining, which includes text categorization, text clustering, association analysis and prediction of trends, can assist us in discovering useful information in an effective and efficient manner. In this chapter, we have proposed a Web mining system that incorporates both online efficiency and off-line effectiveness to provide the “right” information based on users’ preferences. A Bi-Objective Fuzzy c-Means algorithm and information retrieval technique, for text categorization, clustering and integration, was employed for analysis. The proposed system is illustrated via a case involving the Web site marketing of mobile phones. A variety of Web sites exist on the Internet and a common type involves the trading of goods. In this type of Web site, the question to ask is: If we want to establish a Web site that provides information about products, how can we respond quickly and accurately to queries? This is equivalent to asking: How can we design a flexible search engine according to users’ preferences? In this study, we have applied data mining techniques to cope with such problems, by proposing, as an example, a Web site providing information on mobile phones in Taiwan. In order to efficiently provide useful information, two tasks were considered during the Web design phase. One related to off-line analysis: this was done by first carrying out a survey of frequent Web users, students between 15 and 40 years of age, regarding their preferences, so that Web customers’ behavior could be characterized. Then the survey data, as well as the products offered, were classified into different demand and preference groups. The other task was related to online query: this was done through the application of an information retrieval technique that responded to users’ queries. Based on the ideas above the remainder of the chapter is organized as follows: first, we present a literature review, introduce some concepts and review existing methods relevant to our study, then, the proposed Web mining system is presented, a case study of a mobile-phone marketing Web site is illustrated and finally, a summary and conclusions are offered.


Sign in / Sign up

Export Citation Format

Share Document