Improved PageRank Algorithm for Web Structure Mining

This chapter presents an overview of web mining. The three areas of web mining—Web content mining, Web usage mining, and Web structure mining—are identified. In this chapter specific attention is paid to Web structure mining, which is the study of the link topology. The link topology of the Web is analyzed in the context of a cyber-community in order to explore the connection between the link topology and conferral of authority. Millions, soon to be billions, of people are annotating Web documents, which results in an abundance of information. Herein lies the problem: topic distillation—searching through the sea of documents for relevant information. To address the problem of overabundance and relevancy, models are needed that can assist in creating order at the local level. The hub and spoke model identified in this chapter takes a proactive approach to creating an online community in a centralized or planned fashion and provides control over the architecture of the Web graph. In the end users can be assured with a certain level of confidence that the Web content contained in a hyperlinked community is both accurate and relevant.

Download Full-text

Web Mining Overview

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch319 ◽

2011 ◽

pp. 2085-2089 ◽

Cited By ~ 2

Author(s):

Bamshad Mobasher

Keyword(s):

Web Mining ◽

Relevant Information ◽

Content Management ◽

Heterogeneous Data ◽

Web Content ◽

Web Structure ◽

Web Structure Mining ◽

Content Mining ◽

E Learning ◽

Intelligent Tools

In the span of a decade, the World Wide Web has been transformed from a tool for information sharing among researchers into an indispensable part of everyday activities. This transformation has been characterized by an explosion of heterogeneous data and information available electronically, as well as increasingly complex applications driving a variety of systems for content management, e-commerce, e-learning, collaboration, and other Web services. This tremendous growth, in turn, has necessitated the development of more intelligent tools for end users as well as information providers in order to more effectively extract relevant information or to discover actionable knowledge. From its very beginning, the potential of extracting valuable knowledge from the Web has been quite evident. Web mining (i.e. the application of data mining techniques to extract knowledge from Web content, structure, and usage) is the collection of technologies to fulfill this potential. In this article, we will summarize briefly each of the three primary areas of Web mining—Web usage mining, Web content mining, and Web structure mining— and discuss some of the primary applications in each area.

Download Full-text

An Approach for Focused Crawler to Harvest Digital Academic Documents in Online Digital Libraries

International Journal of Information Retrieval Research ◽

10.4018/ijirr.2019070103 ◽

2019 ◽

Vol 9 (3) ◽

pp. 23-47

Author(s):

Sumita Gupta ◽

Neelam Duhan ◽

Poonam Bansal

Keyword(s):

Digital Libraries ◽

Search Engines ◽

Relevant Information ◽

Vital Role ◽

Digital Information ◽

Meta Data ◽

Domain Specific ◽

User Query ◽

Specific Search ◽

The Web

With the rapid growth of digital information and user need, it becomes imperative to retrieve relevant and desired domain or topic specific documents as per the user query quickly. A focused crawler plays a vital role in digital libraries to crawl the web so that researchers can easily explore the domain specific search results list and find the desired content against the query. In this article, a focused crawler is being proposed for online digital library search engines, which considers meta-data of the query in order to retrieve the corresponding document or other relevant but missing information (e.g. paid publication from ACM, IEEE, etc.) against the user query. The different query strategies are made by using the meta-data and submitted to different search engines which aim to find more relevant information which is missing. The result comes out from these search engines are filtered and then used further for crawling the Web.

Download Full-text

Knowledge Extraction Through Page Rank Using Web-Mining Techniques for E-Business

Advances in Business Information Systems and Analytics - Maximizing Business Performance and Efficiency Through Intelligent Systems ◽

10.4018/978-1-5225-2234-8.ch001 ◽

2017 ◽

pp. 1-36

Author(s):

Mahesh Kumar Singh ◽

Om Prakash Rishi ◽

Anukrati Sharma ◽

Zaved Akhtar

Keyword(s):

Search Engines ◽

Web Mining ◽

Web Search ◽

Relevant Information ◽

Business Environment ◽

Vital Role ◽

Business Organizations ◽

Web Logs ◽

Internet Users ◽

New Challenges

Internet plays a vital role for doing the business. It provides platform for creating huge number of customers for ease of business. E-business organizations are growing rapidly and doubly in every minute; World Wide Web (WWW) provides huge information for the Internet users. The accesses of user's behavior are recorded in web logs. This information seems to be very helpful in an E-business environment for analysis and decision making. Mining of web data come across many new challenges with enlarged amount of information on data stored in web logs. The search engines play key role for retrieving the relevant information from huge information. Nowadays, the well-known search engines, like Google, MSN, Yahoo, etc. have provided the users with good search results worked on special search strategies. In web search services the web page ranker component plays the main factor of the Google. This paper discusses the new challenges faced by web mining techniques, ranking of web pages using page ranking algorithms and its application in E-business analysis to improve the business operations.

Download Full-text

Web Mining in Thematic Search Engines

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch318 ◽

2011 ◽

pp. 2080-2084

Author(s):

Massimiliano Caramia ◽

Giovanni Felici

Keyword(s):

Data Mining ◽

Genetic Algorithm ◽

Search Engines ◽

Web Mining ◽

Research Work ◽

Relevant Information ◽

Additional Information ◽

Internet Users ◽

User Query ◽

Search Tool

In the present chapter we report on some extensions on the work presented in the first edition of the Encyclopedia of Data Mining. In Caramia and Felici (2005) we have described a method based on clustering and a heuristic search method- based on a genetic algorithm - to extract pages with relevant information for a specific user query in a thematic search engine. Starting from these results we have extended the research work trying to match some issues related to the semantic aspects of the search, focusing on the keywords that are used to establish the similarity among the pages that result from the query. Complete details on this method, here omitted for brevity, can be found in Caramia and Felici (2006). Search engines technologies remain a strong research topic, as new problems and new demands from the market and the users arise. The process of switching from quantity (maintaining and indexing large databases of web pages and quickly select pages matching some criterion) to quality (identifying pages with a high quality for the user), already highlighted in Caramia and Felici (2005), has not been interrupted, but has gained further energy, being motivated by the natural evolution of the internet users, more selective in their choice of the search tool and willing to pay the price of providing extra feedback to the system and wait more time to have their queries better matched. In this framework, several have considered the use of data mining and optimization techniques, that are often referred to as web mining (for a recent bibliography on this topic see, e.g., Getoor, Senator, Domingos, and Faloutsos, 2003 and Zaïane, Srivastava, Spiliopoulou, and Masand, 2002). The work described in this chapter is bases on clustering techniques to identify, in the set of pages resulting from a simple query, subsets that are homogeneous with respect to a vectorization based on context or profile; then, a number of small and potentially good subsets of pages is constructed, extracting from each cluster the pages with higher scores. Operating on these subsets with a genetic algorithm, a subset with a good overall score and a high internal dissimilarity is identified. A related problem is then considered: the selection of a subset of pages that are compliant with the search keywords, but that also are characterized by the fact that they share a large subset of words different from the search keywords. This characteristic represents a sort of semantic connection of these pages that may be of use to spot some particular aspects of the information present in the pages. Such a task is accomplished by the construction of a special graph, whose maximumweight clique and k-densest subgraph should represent the page subsets with the desired properties. In the following we summarize the main background topics and provide a synthetic description of the methods. Interested readers may find additional information in Caramia and Felici (2004), Caramia and Felici (2005), and Caramia and Felici (2006).

Download Full-text

Web Mining for Public E-Services Personalization

Electronic Services ◽

10.4018/978-1-61520-967-5.ch045 ◽

2010 ◽

pp. 751-758

Author(s):

P. Markellou

Keyword(s):

Web Site ◽

Web Mining ◽

Web Usage Mining ◽

Web Content ◽

Web Personalization ◽

Business Transactions ◽

Web Structure ◽

Web Structure Mining ◽

Content Mining ◽

The Web

Over the last decade, we have witnessed an explosive growth in the information available on the Web. Today, Web browsers provide easy access to myriad sources of text and multimedia data. Search engines index more than a billion pages and finding the desired information is not an easy task. This profusion of resources has prompted the need for developing automatic mining techniques on Web, thereby giving rise to the term “Web mining” (Pal, Talwar, & Mitra, 2002). Web mining is the application of data mining techniques on the Web for discovering useful patterns and can be divided into three basic categories: Web content mining, Web structure mining, and Web usage mining. Web content mining includes techniques for assisting users in locating Web documents (i.e., pages) that meet certain criteria, while Web structure mining relates to discovering information based on the Web site structure data (the data depicting the Web site map). Web usage mining focuses on analyzing Web access logs and other sources of information regarding user interactions within the Web site in order to capture, understand and model their behavioral patterns and profiles and thereby improve their experience with the Web site. As citizens requirements and needs change continuously, traditional information searching, and fulfillment of various tasks result to the loss of valuable time spent in identifying the responsible actor (public authority) and waiting in queues. At the same time, the percentage of users who acquaint with the Internet has been remarkably increased (Internet World Stats, 2005). These two facts motivate many governmental organizations to proceed with the provision of e-services via their Web sites. The ease and speed with which business transactions can be carried out over the Web has been a key driving force in the rapid growth and popularity of e-government, e-commerce, and e-business applications. In this framework, the Web is emerging as the appropriate environment for business transactions and user-organization interactions. However, since it is a large collection of semi-structured and structured information sources, Web users often suffer from information overload. Personalization is considered as a popular solution in order to alleviate this problem and to customize the Web environment to users (Eirinaki & Vazirgiannis, 2003). Web personalization can be described, as any action that makes the Web experience of a user personalized to his or her needs and wishes. Principal elements of Web personalization include modeling of Web objects (pages) and subjects (users), categorization of objects and subjects, matching between and across objects and/or subjects, and determination of the set of actions to be recommended for personalization. In the remainder of this article, we present the way an e-government application can deploy Web mining techniques in order to support intelligent and personalized interactions with citizens. Specifically, we describe the tasks that typically comprise this process, illustrate the future trends, and discuss the open issues in the field.

Download Full-text

Performance Comparison of Data Mining Classifiers on Web Log Data

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2020.9349 ◽

2020 ◽

Vol 17 (11) ◽

pp. 5113-5116

Author(s):

Varun Malik ◽

Vikas Rattan ◽

Jaiteg Singh ◽

Ruchi Mittal ◽

Urvashi Tandon

Keyword(s):

Web Mining ◽

Performance Comparison ◽

Web Usage Mining ◽

Classification Algorithms ◽

Web Content ◽

Web Usage ◽

Web Structure ◽

Web Structure Mining ◽

Content Mining ◽

The Web

Web usage mining is the branch of web mining that deals with mining of data over the web. Web mining can be categorized as web content mining, web structure mining, web usage mining. In this paper, we have summarized the web usage mining results executed over the user tool WMOT (web mining optimized tool) based on the WEKA tool that has been used to apply various classification algorithms such as Naïve Bayes, KNN, SVM and tree based algorithms. Authors summarized the results of classification algorithms on WMOT tool and compared the results on the basis of classified instances and identify the algorithms that gives better instances accuracy.

Download Full-text

COMPARATIVE ANALYSIS OF IMAGE RETRIEVAL TECHNIQUES IN CYBERSPACE

International Journal of Students Research in Technology & Management ◽

10.18510/ijsrtm.2020.811 ◽

2020 ◽

Vol 8 (1) ◽

pp. 01-10

Author(s):

Rajeev Gupta ◽

Virender Singh

Keyword(s):

Comparative Analysis ◽

Image Retrieval ◽

Search Engines ◽

Web Search ◽

Content Based Image Retrieval ◽

Retrieval Time ◽

User Query ◽

Web Search Engines ◽

Research Challenge ◽

The Web

Purpose: With the popularity and remarkable usage of digital images in various domains, the existing image retrieval techniques need to be enhanced. The content-based image retrieval is playing a vital role to retrieve the requested data from the database available in cyberspace. CBIR from cyberspace is a popular and interesting research area nowadays for a better outcome. The searching and downloading of the requested images accurately based on meta-data from the cyberspace by using CBIR techniques is a challenging task. The purpose of this study is to explore the various image retrieval techniques for retrieving the data available in cyberspace. Methodology: Whenever a user wishes to retrieve an image from the web, using present search engines, a bunch of images is retrieved based on a user query. But, most of the resultant images are unrelated to the user query. Here, the user puts their text-based query in the web-based search engine and compute the related images and retrieval time. Main Findings: This study compares the accuracy and retrieval-time of the requested image. After the detailed analysis, the main finding is none of the used web-search engines viz. Flickr, Pixabay, Shutterstock, Bing, Everypixel, retrieved the accurate related images based on the entered query. Implications: This study is discussing and performs a comparative analysis of various content-based image retrieval techniques from cyberspace. Novelty of Study: Research community has been making efforts towards efficient retrieval of useful images from the web but this problem has not been solved and it still prevails as an open research challenge. This study makes some efforts to resolve this research challenge and perform a comparative analysis of the outcome of various web-search engines.

Download Full-text

Web Mining for Public E-Services Personalization

Human Computer Interaction ◽

10.4018/978-1-87828-991-9.ch068 ◽

2009 ◽

pp. 1079-1086 ◽

Cited By ~ 1

Author(s):

Penelope Markellou ◽

Angeliki Panayiotaki ◽

Athanasios Tsakalidis

Keyword(s):

Web Site ◽

Web Mining ◽

Web Usage Mining ◽

Web Content ◽

Web Personalization ◽

Business Transactions ◽

Web Structure ◽

Web Structure Mining ◽

Content Mining ◽

The Web

Over the last decade, we have witnessed an explosive growth in the information available on the Web. Today, Web browsers provide easy access to myriad sources of text and multimedia data. Search engines index more than a billion pages and finding the desired information is not an easy task. This profusion of resources has prompted the need for developing automatic mining techniques on Web, thereby giving rise to the term “Web mining” (Pal, Talwar, & Mitra, 2002). Web mining is the application of data mining techniques on the Web for discovering useful patterns and can be divided into three basic categories: Web content mining, Web structure mining, and Web usage mining. Web content mining includes techniques for assisting users in locating Web documents (i.e., pages) that meet certain criteria, while Web structure mining relates to discovering information based on the Web site structure data (the data depicting the Web site map). Web usage mining focuses on analyzing Web access logs and other sources of information regarding user interactions within the Web site in order to capture, understand and model their behavioral patterns and profiles and thereby improve their experience with the Web site. As citizens requirements and needs change continuously, traditional information searching, and fulfillment of various tasks result to the loss of valuable time spent in identifying the responsible actor (public authority) and waiting in queues. At the same time, the percentage of users who acquaint with the Internet has been remarkably increased (Internet World Stats, 2005). These two facts motivate many governmental organizations to proceed with the provision of e-services via their Web sites. The ease and speed with which business transactions can be carried out over the Web has been a key driving force in the rapid growth and popularity of e-government, e-commerce, and e-business applications. In this framework, the Web is emerging as the appropriate environment for business transactions and user-organization interactions. However, since it is a large collection of semi-structured and structured information sources, Web users often suffer from information overload. Personalization is considered as a popular solution in order to alleviate this problem and to customize the Web environment to users (Eirinaki & Vazirgiannis, 2003). Web personalization can be described, as any action that makes the Web experience of a user personalized to his or her needs and wishes. Principal elements of Web personalization include modeling of Web objects (pages) and subjects (users), categorization of objects and subjects, matching between and across objects and/or subjects, and determination of the set of actions to be recommended for personalization. In the remainder of this article, we present the way an e-government application can deploy Web mining techniques in order to support intelligent and personalized interactions with citizens. Specifically, we describe the tasks that typically comprise this process, illustrate the future trends, and discuss the open issues in the field.

Download Full-text

Web Mining for Public E-Services Personalization

Encyclopedia of Digital Government ◽

10.4018/978-1-59140-789-8.ch251 ◽

2011 ◽

pp. 1629-1634

Author(s):

P. Markellou

Keyword(s):

Web Site ◽

Web Mining ◽

Web Usage Mining ◽

Web Content ◽

Web Personalization ◽

Business Transactions ◽

Web Structure ◽

Web Structure Mining ◽

Content Mining ◽

The Web

Over the last decade, we have witnessed an explosive growth in the information available on the Web. Today, Web browsers provide easy access to myriad sources of text and multimedia data. Search engines index more than a billion pages and finding the desired information is not an easy task. This profusion of resources has prompted the need for developing automatic mining techniques on Web, thereby giving rise to the term “Web mining” (Pal, Talwar, & Mitra, 2002). Web mining is the application of data mining techniques on the Web for discovering useful patterns and can be divided into three basic categories: Web content mining, Web structure mining, and Web usage mining. Web content mining includes techniques for assisting users in locating Web documents (i.e., pages) that meet certain criteria, while Web structure mining relates to discovering information based on the Web site structure data (the data depicting the Web site map). Web usage mining focuses on analyzing Web access logs and other sources of information regarding user interactions within the Web site in order to capture, understand and model their behavioral patterns and profiles and thereby improve their experience with the Web site. As citizens requirements and needs change continuously, traditional information searching, and fulfillment of various tasks result to the loss of valuable time spent in identifying the responsible actor (public authority) and waiting in queues. At the same time, the percentage of users who acquaint with the Internet has been remarkably increased (Internet World Stats, 2005). These two facts motivate many governmental organizations to proceed with the provision of e-services via their Web sites. The ease and speed with which business transactions can be carried out over the Web has been a key driving force in the rapid growth and popularity of e-government, e-commerce, and e-business applications. In this framework, the Web is emerging as the appropriate environment for business transactions and user-organization interactions. However, since it is a large collection of semi-structured and structured information sources, Web users often suffer from information overload. Personalization is considered as a popular solution in order to alleviate this problem and to customize the Web environment to users (Eirinaki & Vazirgiannis, 2003). Web personalization can be described, as any action that makes the Web experience of a user personalized to his or her needs and wishes. Principal elements of Web personalization include modeling of Web objects (pages) and subjects (users), categorization of objects and subjects, matching between and across objects and/or subjects, and determination of the set of actions to be recommended for personalization. In the remainder of this article, we present the way an e-government application can deploy Web mining techniques in order to support intelligent and personalized interactions with citizens. Specifically, we describe the tasks that typically comprise this process, illustrate the future trends, and discuss the open issues in the field.

Download Full-text