Web Usage Mining

The technological revolutions have opened up new ways of information and communication. The Internet is growing as a vital source of information in this modern era of technology. The ever increasing volume of information through WWW is creating complexity in the design, development and deployment of WWW. It has become important for the organizations to analyze the usage of their web sites. The web usage analysis may help the organizations not only to monitor the load on their websites and cater for the needs of their potential clients but also enhance their web services and restructure the organization to better serve their clients. Web mining has emerged as important research areas used to discover information which can be utilized for improvement of websites. Allama Iqbal Open University (AIOU) is one of the largest open and distant university of the world. Due to unique philosophy of open and distant learning, AIOU has been providing useful information online through its website. It is an active website which is flooded with huge flow of information. This paper presents web usage analysis of AIOU website and provides statistical analysis of the usage patterns. It presents how the results were used not only to enhance the web contents and services but also discusses how these results helped the university to allocate and reallocate its resources. The reallocation was used to improve efficiency and processes of the university in order to better serve its clients.

Download Full-text

Statistical Methods for User Profiling in Web Usage Mining

Handbook of Research on Text and Web Mining Technologies ◽

10.4018/978-1-59904-990-8.ch022 ◽

2010 ◽

pp. 359-368 ◽

Cited By ~ 1

Author(s):

Marcello Pecoraro

Keyword(s):

Statistical Methods ◽

Web Mining ◽

Web Usage Mining ◽

User Profiling ◽

Web Usage ◽

Data Object ◽

Segmentation Methods ◽

Binary Segmentation ◽

Usage Analysis ◽

The Web

This chapter aims at providing an overview about the use of statistical methods supporting the Web Usage Mining. Within the first part is described the framework of the Web Usage Mining as a branch of the Web Mining committed to the study of how to use a Website. Then, the data (object of the analysis) are detailed together with the problems linked to the pre-processing. Once clarified, the data origin and their treatment for a correct development of a Web Usage analysis,the focus shifts on the statistical techniques that can be applied to the analysis background, with reference to binary segmentation methods. Those latter allow the discrimination through a response variable that determines the affiliation of the users to a group by considering some characteristics detected on the same users.

Download Full-text

Mining Generalized Web Data for Discovering Usage Patterns

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch198 ◽

2011 ◽

pp. 1275-1281

Author(s):

Doru Tanasa

Keyword(s):

Web Portal ◽

Bird Flu ◽

Jones Index ◽

Web Usage ◽

Log Files ◽

Efficiency And Effectiveness ◽

Usage Patterns ◽

Usage Analysis ◽

E Mail ◽

The Web

Web Usage Mining (WUM) includes all the Data Mining techniques used to analyze the behavior of a Web site‘s users (Cooley, Mobasher & Srivastava, 1999, Spiliopoulou, Faulstich & Winkler, 1999, Mobasher, Dai, Luo & Nakagawa, 2002). Based mainly on the data stored into the access log files, these methods allow the discovery of frequent behaviors. In particular, the extraction of sequential patterns (Agrawal, & Srikant, 1995) is well suited to the context of Web logs analysis, given the chronological nature of their records. On a Web portal, one could discover for example that “25% of the users navigated on the site in a particular order, by consulting first the homepage then the page with an article about the bird flu, then the Dow Jones index evolution to finally return on the homepage before consulting their personal e-mail as a subscriber”. In theory, this analysis allows us to find frequent behaviors rather easily. However, reality shows that the diversity of the Web pages and behaviors makes this approach delicate. Indeed, it is often necessary to set minimum thresholds of frequency (i.e. minimum support) of about 1% or 2% before revealing these behaviors. Such low supports combined with significant characteristics of access log files (e.g. huge number of records) are generally the cause of failures or limitations for the existent techniques employed in Web usage analysis. A solution for this problem consists in clustering the pages by topic, in the form of a taxonomy for example, in order to obtain a more general behavior. Considering again the previous example, one could have obtained: “70% of the users navigate on the Web site in a particular order, while consulting the home page then a page of news, then a page on financial indexes, then return on the homepage before consulting a service of communication offered by the Web portal”. A page on the financial indexes can relate to the Dow Jones as well as the FTSE 100 or the NIKKEI (and in a similar way: the e-mail or the chat are services of communication, the bird flu belongs to the news section, etc.). Moreover, the fact of grouping these pages under the “financial indexes” term has a direct impact by increasing the support of such behaviors and thus their readability, their relevance and significance. The drawback of using a taxonomy comes from the time and energy necessary to its definition and maintenance. In this chapter, we propose solutions to facilitate (or guide as much as possible) the automatic creation of this taxonomy allowing a WUM process to return more effective and relevant results. These solutions include a prior clustering of the pages depending on the way they are reached by the users. We will show the relevance of our approach in terms of efficiency and effectiveness when extracting the results.

Download Full-text

Creating and using Web corpora

International Journal of Corpus Linguistics ◽

10.1075/ijcl.10.4.07the ◽

2005 ◽

Vol 10 (4) ◽

pp. 517-541 ◽

Cited By ~ 4

Author(s):

Mike Thelwall

Keyword(s):

Search Engine ◽

Web Sites ◽

Web Crawler ◽

Commercial Search Engine ◽

British National Corpus ◽

The Uk ◽

The University ◽

The Web ◽

National Corpus

The Web has recently been used as a corpus for linguistic investigations, often with the help of a commercial search engine. We discuss some potential problems with collecting data from commercial search engine and with using the Web as a corpus. We outline an alternative strategy for data collection, using a personal Web crawler. As a case study, the university Web sites of three nations (Australia, New Zealand and the UK) were crawled. The most frequent words were broadly consistent with non-Web written English, but with some academic-related words amongst the top 50 most frequent. It was also evident that the university Web sites contained a significant amount of non-English text, and academic Web English seems to be more future-oriented than British National Corpus written English.

Download Full-text

Web Usage Mining and the Challenge of Big Data

Big Data ◽

10.4018/978-1-4666-9840-6.ch042 ◽

2016 ◽

pp. 899-928

Author(s):

Abubakr Gafar Abdalla ◽

Tarig Mohamed Ahmed ◽

Mohamed Elhassan Seliaman

Keyword(s):

Data Mining ◽

Pattern Discovery ◽

Web Usage Mining ◽

Data Mining Techniques ◽

Web Log ◽

Web Usage ◽

Web Logs ◽

Usage Patterns ◽

Rich Data ◽

The Web

The web is a rich data mining source which is dynamic and fast growing, providing great opportunities which are often not exploited. Web data represent a real challenge to traditional data mining techniques due to its huge amount and the unstructured nature. Web logs contain information about the interactions between visitors and the website. Analyzing these logs provides insights into visitors' behavior, usage patterns, and trends. Web usage mining, also known as web log mining, is the process of applying data mining techniques to discover useful information hidden in web server's logs. Web logs are primarily used by Web administrators to know how much traffic they get and to detect broken links and other types of errors. Web usage mining extracts useful information that can be beneficial to a number of application areas such as: web personalization, website restructuring, system performance improvement, and business intelligence. The Web usage mining process involves three main phases: pre-processing, pattern discovery, and pattern analysis. Various preprocessing techniques have been proposed to extract information from log files and group primitive data items into meaningful, lighter level abstractions that are suitable for mining, usually in forms of visitors' sessions. Major data mining techniques in web usage mining pattern discovery are: clustering, association analysis, classification, and sequential patterns discovery. This chapter discusses the process of web usage mining, its procedure, methods, and patterns discovery techniques. The chapter also presents a practical example using real web log data.

Download Full-text

Building User Communities of Interests by Using Latent Semantic Analysis

Advances in Social Networking and Online Communities - Collaborative Search and Communities of Interest ◽

10.4018/978-1-61520-841-8.ch004 ◽

2011 ◽

pp. 38-68

Author(s):

Guandong Xu

Keyword(s):

Latent Semantic Analysis ◽

Web Mining ◽

Semantic Analysis ◽

Information Overload ◽

Critical Issue ◽

Web Usage Mining ◽

Semantic Level ◽

Web Usage ◽

User Communities ◽

Usage Patterns

Nowadays Web users are facing the problems of information overload and drowning due to the significant and rapid growth in the amount of information and the large number of users. As a result, how to provide Web users more exactly needed information is becoming a critical issue in Web-based information retrieval and data management. In order to address the above difficulties, Web mining was proposed as an efficient means to discover the intrinsic relationships among Web data. In particular, Web usage mining is to discover Web usage patterns and utilize the discovered usage knowledge for constructing interest-oriented user communities, which could be, in turn, used for presenting Web users more personalized Web contents, i.e. Web recommendation. On the other hand, Latent Semantic Analysis (LSA) is one kind of approaches that is used to reveal the inherent correlation resided in co-occurrence activities, such as Web usage data. Moreover, LSA possesses the capability of capturing the hidden knowledge at semantic level that can’t be achieved by traditional methods. In this chapter, we aim to address building user communities of interests via combining Web usage mining and latent semantic analysis. Meanwhile we also present the application of user communities for Web recommendation.

Download Full-text

Semantic Web Adaptation

Web Technologies ◽

10.4018/978-1-60566-982-3.ch006 ◽

2011 ◽

pp. 78-88

Author(s):

Alexander Mikroyannidis ◽

Babis Theodoulidis

Keyword(s):

Semantic Web ◽

Web Sites ◽

Web Usage Mining ◽

New Era ◽

Automatic Reasoning ◽

Web Usage ◽

User Friendly ◽

Unstructured Information ◽

The Web ◽

The Way

The rate of growth in the amount of information available in the World Wide Web has not been followed by similar advances in the way this information is organized and exploited. Web adaptation seeks to address this issue by transforming the topology of a Web site to help users in their browsing tasks. In this sense, Web usage mining techniques have been employed for years to study how the Web is used in order to make Web sites more user-friendly. The Semantic Web is an ambitious initiative aiming to transform the Web to a well-organized source of information. In particular, apart from the unstructured information of today’s Web, the Semantic Web will contain machine-processable metadata organized in ontologies. This will enhance the way we search the Web and can even allow for automatic reasoning on Web data with the use of software agents. Semantic Web adaptation brings traditional Web adaptation techniques into the new era of the Semantic Web. The idea is to enable the Semantic Web to be constantly aligned to the users’ preferences. In order to achieve this, Web usage mining and text mining methodologies are employed for the semi-automatic construction and evolution of Web ontologies. This usage-driven evolution of Web ontologies, in parallel with Web topologies evolution, can bring the Semantic Web closer to the users’ expectations.

Download Full-text

An Academic Student-Centered Portal

Encyclopedia of Portal Technologies and Applications ◽

10.4018/978-1-59140-989-2.ch002 ◽

2011 ◽

pp. 6-11

Author(s):

Carla Falsetti

Keyword(s):

Web Sites ◽

Discussion Forum ◽

Active Citizenship ◽

Student Centered ◽

Social Growth ◽

Principal Feature ◽

A Site ◽

The University ◽

Student Association ◽

The Web

The Università Politecnica delle Marche (UNIVPM) aims to acquire a recognizable “look” of the university to live on the Web. One way to reach this goal is to create a site, “e-Univpm” (Ramazzotti, De Giovanni, L., Battistini, G., & Leo et al., 2005), linked to the institutional university Web portal. The principal feature of this new site is to be a convivial site offering the services online that the students usually look for in everyday life. It is a place where students would meet other students for a personal social growth. It would be a rich environment with spaces for curricula and extra-curricula activities such as concerts, lectures, exhibitions, meetings, sports, and student association activities, opportunities of lodging and trading. It is a place for expressing ideas and emotions, where students can open blog and discussion forum, and have access to Web sites of interest for an active citizenship. For these reasons, the portal “e- Univpm” is named “convivial site.”

Download Full-text

Improving the Web Usage Analysis Process: A UML Model of the ETL Process

Advances in Web Mining and Web Usage Analysis - Lecture Notes in Computer Science ◽

10.1007/11899402_2 ◽

2006 ◽

pp. 18-36

Author(s):

Thilo Maier

Keyword(s):

Web Usage ◽

Analysis Process ◽

Usage Analysis ◽

The Web

Download Full-text

Applying Web Mining Techniques for Constructing Webometrics Ranking early Warning System

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.532-533.767 ◽

2012 ◽

Vol 532-533 ◽

pp. 767-771 ◽

Cited By ~ 1

Author(s):

Shu Ming Hsieh ◽

Ssu An Lo ◽

Chiun Chieh Hsu ◽

Da Ren Chen

Keyword(s):

Early Warning ◽

Web Sites ◽

Web Site ◽

Early Warning System ◽

Web Mining ◽

Information Source ◽

Scientific Information ◽

Warning System ◽

Internet Technology ◽

The Web

The management of university web sites is becoming more critical than before due to the rapid growth of the population dependent on the world wide web as the most important (if not the only) information source. A university can spread its research outcomes and education achievements through its web site, and consequently gain visibility and influence from the web population. Webometrics Ranking of World Universities (WR) proposed by Centre for Scientific Information and Documentation (CINDOC-CSIC), which ranks the university web sites, has obtained much attention recently. The rankings of WR are well recognized as an important index for universities willing to promote themselves by the internet technology. In this paper, we proposed WRES as an early warning system for Webometrics Rankings. WRES gathers the WR indices from the WWW automatically in flexible periods, and provides useful information in real time for the managers of university web sites. If the WR ranking of an institution is below the expected position according to their academic performance, university authorities should reconsider their web policy, by promoting substantial increases of the volume and quality of their electronic publications. Besides, the web site manages may adopt effective approaches to promote their WR rankings according to the hints given by WRES.

Download Full-text

Design Methods for Experience Design

Human Computer Interaction ◽

10.4018/978-1-87828-991-9.ch031 ◽

2009 ◽

pp. 432-447

Author(s):

Marie Jefsioutine ◽

John Knight

Keyword(s):

Research Methods ◽

Design Process ◽

Web Sites ◽

Historical Context ◽

Web Design ◽

Design Development ◽

Experience Design ◽

User Experiences ◽

Site Use ◽

The Web

The following chapter describes an approach to Web design and evaluation where the user experience is central. It outlines the historical context in which experience design has evolved and describes the authors’ experience design framework (EDF). This is based on the principles of user-centred design (UCD) and draws on a variety of research methods and tools to facilitate the design, development, and evaluation of user experiences. It proposes that to design usable, accessible, engaging, and beneficial Web sites, effort needs to focus on visceral, behavioural, reflective, and social factors, while considering contexts such as the who and why; what and how; when and where; and with what of Web site use. Research methods from a variety of disciplines are used to support exploration, communication, empathy, and speculation. Examples of the application of the EDF, to various stages of the Web design process, are described.

Download Full-text