scholarly journals Improved PageRank Algorithm for Web Structure Mining

2013 ◽  
Vol 10 (9) ◽  
pp. 1969-1976
Author(s):  
Sathya Bama ◽  
M.S.Irfan Ahmed ◽  
A. Saravanan

The growth of internet is increasing continuously by which the need for improving the quality of services has been increased. Web mining is a research area which applies data mining techniques to address all this need. With billions of pages on the web it is very intricate task for the search engines to provide the relevant information to the users. Web structure mining plays a vital role by ranking the web pages based on user query which is the most essential attempt of the web search engines. PageRank, Weighted PageRank and HITS are the commonly used algorithm in web structure mining for ranking the web page. But all these algorithms treat all links equally when distributing initial rank scores. In this paper, an improved page rank algorithm is introduced. The result shows that the algorithm has better performance over PageRank algorithm.

Author(s):  
Roderick L. Lee

This chapter presents an overview of web mining. The three areas of web mining—Web content mining, Web usage mining, and Web structure mining—are identified. In this chapter specific attention is paid to Web structure mining, which is the study of the link topology. The link topology of the Web is analyzed in the context of a cyber-community in order to explore the connection between the link topology and conferral of authority. Millions, soon to be billions, of people are annotating Web documents, which results in an abundance of information. Herein lies the problem: topic distillation—searching through the sea of documents for relevant information. To address the problem of overabundance and relevancy, models are needed that can assist in creating order at the local level. The hub and spoke model identified in this chapter takes a proactive approach to creating an online community in a centralized or planned fashion and provides control over the architecture of the Web graph. In the end users can be assured with a certain level of confidence that the Web content contained in a hyperlinked community is both accurate and relevant.


Author(s):  
Bamshad Mobasher

In the span of a decade, the World Wide Web has been transformed from a tool for information sharing among researchers into an indispensable part of everyday activities. This transformation has been characterized by an explosion of heterogeneous data and information available electronically, as well as increasingly complex applications driving a variety of systems for content management, e-commerce, e-learning, collaboration, and other Web services. This tremendous growth, in turn, has necessitated the development of more intelligent tools for end users as well as information providers in order to more effectively extract relevant information or to discover actionable knowledge. From its very beginning, the potential of extracting valuable knowledge from the Web has been quite evident. Web mining (i.e. the application of data mining techniques to extract knowledge from Web content, structure, and usage) is the collection of technologies to fulfill this potential. In this article, we will summarize briefly each of the three primary areas of Web mining—Web usage mining, Web content mining, and Web structure mining— and discuss some of the primary applications in each area.


2019 ◽  
Vol 9 (3) ◽  
pp. 23-47
Author(s):  
Sumita Gupta ◽  
Neelam Duhan ◽  
Poonam Bansal

With the rapid growth of digital information and user need, it becomes imperative to retrieve relevant and desired domain or topic specific documents as per the user query quickly. A focused crawler plays a vital role in digital libraries to crawl the web so that researchers can easily explore the domain specific search results list and find the desired content against the query. In this article, a focused crawler is being proposed for online digital library search engines, which considers meta-data of the query in order to retrieve the corresponding document or other relevant but missing information (e.g. paid publication from ACM, IEEE, etc.) against the user query. The different query strategies are made by using the meta-data and submitted to different search engines which aim to find more relevant information which is missing. The result comes out from these search engines are filtered and then used further for crawling the Web.


Author(s):  
Mahesh Kumar Singh ◽  
Om Prakash Rishi ◽  
Anukrati Sharma ◽  
Zaved Akhtar

Internet plays a vital role for doing the business. It provides platform for creating huge number of customers for ease of business. E-business organizations are growing rapidly and doubly in every minute; World Wide Web (WWW) provides huge information for the Internet users. The accesses of user's behavior are recorded in web logs. This information seems to be very helpful in an E-business environment for analysis and decision making. Mining of web data come across many new challenges with enlarged amount of information on data stored in web logs. The search engines play key role for retrieving the relevant information from huge information. Nowadays, the well-known search engines, like Google, MSN, Yahoo, etc. have provided the users with good search results worked on special search strategies. In web search services the web page ranker component plays the main factor of the Google. This paper discusses the new challenges faced by web mining techniques, ranking of web pages using page ranking algorithms and its application in E-business analysis to improve the business operations.


Author(s):  
Massimiliano Caramia ◽  
Giovanni Felici

In the present chapter we report on some extensions on the work presented in the first edition of the Encyclopedia of Data Mining. In Caramia and Felici (2005) we have described a method based on clustering and a heuristic search method- based on a genetic algorithm - to extract pages with relevant information for a specific user query in a thematic search engine. Starting from these results we have extended the research work trying to match some issues related to the semantic aspects of the search, focusing on the keywords that are used to establish the similarity among the pages that result from the query. Complete details on this method, here omitted for brevity, can be found in Caramia and Felici (2006). Search engines technologies remain a strong research topic, as new problems and new demands from the market and the users arise. The process of switching from quantity (maintaining and indexing large databases of web pages and quickly select pages matching some criterion) to quality (identifying pages with a high quality for the user), already highlighted in Caramia and Felici (2005), has not been interrupted, but has gained further energy, being motivated by the natural evolution of the internet users, more selective in their choice of the search tool and willing to pay the price of providing extra feedback to the system and wait more time to have their queries better matched. In this framework, several have considered the use of data mining and optimization techniques, that are often referred to as web mining (for a recent bibliography on this topic see, e.g., Getoor, Senator, Domingos, and Faloutsos, 2003 and Zaïane, Srivastava, Spiliopoulou, and Masand, 2002). The work described in this chapter is bases on clustering techniques to identify, in the set of pages resulting from a simple query, subsets that are homogeneous with respect to a vectorization based on context or profile; then, a number of small and potentially good subsets of pages is constructed, extracting from each cluster the pages with higher scores. Operating on these subsets with a genetic algorithm, a subset with a good overall score and a high internal dissimilarity is identified. A related problem is then considered: the selection of a subset of pages that are compliant with the search keywords, but that also are characterized by the fact that they share a large subset of words different from the search keywords. This characteristic represents a sort of semantic connection of these pages that may be of use to spot some particular aspects of the information present in the pages. Such a task is accomplished by the construction of a special graph, whose maximumweight clique and k-densest subgraph should represent the page subsets with the desired properties. In the following we summarize the main background topics and provide a synthetic description of the methods. Interested readers may find additional information in Caramia and Felici (2004), Caramia and Felici (2005), and Caramia and Felici (2006).


2010 ◽  
pp. 751-758
Author(s):  
P. Markellou

Over the last decade, we have witnessed an explosive growth in the information available on the Web. Today, Web browsers provide easy access to myriad sources of text and multimedia data. Search engines index more than a billion pages and finding the desired information is not an easy task. This profusion of resources has prompted the need for developing automatic mining techniques on Web, thereby giving rise to the term “Web mining” (Pal, Talwar, & Mitra, 2002). Web mining is the application of data mining techniques on the Web for discovering useful patterns and can be divided into three basic categories: Web content mining, Web structure mining, and Web usage mining. Web content mining includes techniques for assisting users in locating Web documents (i.e., pages) that meet certain criteria, while Web structure mining relates to discovering information based on the Web site structure data (the data depicting the Web site map). Web usage mining focuses on analyzing Web access logs and other sources of information regarding user interactions within the Web site in order to capture, understand and model their behavioral patterns and profiles and thereby improve their experience with the Web site. As citizens requirements and needs change continuously, traditional information searching, and fulfillment of various tasks result to the loss of valuable time spent in identifying the responsible actor (public authority) and waiting in queues. At the same time, the percentage of users who acquaint with the Internet has been remarkably increased (Internet World Stats, 2005). These two facts motivate many governmental organizations to proceed with the provision of e-services via their Web sites. The ease and speed with which business transactions can be carried out over the Web has been a key driving force in the rapid growth and popularity of e-government, e-commerce, and e-business applications. In this framework, the Web is emerging as the appropriate environment for business transactions and user-organization interactions. However, since it is a large collection of semi-structured and structured information sources, Web users often suffer from information overload. Personalization is considered as a popular solution in order to alleviate this problem and to customize the Web environment to users (Eirinaki & Vazirgiannis, 2003). Web personalization can be described, as any action that makes the Web experience of a user personalized to his or her needs and wishes. Principal elements of Web personalization include modeling of Web objects (pages) and subjects (users), categorization of objects and subjects, matching between and across objects and/or subjects, and determination of the set of actions to be recommended for personalization. In the remainder of this article, we present the way an e-government application can deploy Web mining techniques in order to support intelligent and personalized interactions with citizens. Specifically, we describe the tasks that typically comprise this process, illustrate the future trends, and discuss the open issues in the field.


2020 ◽  
Vol 17 (11) ◽  
pp. 5113-5116
Author(s):  
Varun Malik ◽  
Vikas Rattan ◽  
Jaiteg Singh ◽  
Ruchi Mittal ◽  
Urvashi Tandon

Web usage mining is the branch of web mining that deals with mining of data over the web. Web mining can be categorized as web content mining, web structure mining, web usage mining. In this paper, we have summarized the web usage mining results executed over the user tool WMOT (web mining optimized tool) based on the WEKA tool that has been used to apply various classification algorithms such as Naïve Bayes, KNN, SVM and tree based algorithms. Authors summarized the results of classification algorithms on WMOT tool and compared the results on the basis of classified instances and identify the algorithms that gives better instances accuracy.


Author(s):  
Rajeev Gupta ◽  
Virender Singh

Purpose: With the popularity and remarkable usage of digital images in various domains, the existing image retrieval techniques need to be enhanced. The content-based image retrieval is playing a vital role to retrieve the requested data from the database available in cyberspace. CBIR from cyberspace is a popular and interesting research area nowadays for a better outcome. The searching and downloading of the requested images accurately based on meta-data from the cyberspace by using CBIR techniques is a challenging task. The purpose of this study is to explore the various image retrieval techniques for retrieving the data available in cyberspace.  Methodology: Whenever a user wishes to retrieve an image from the web, using present search engines, a bunch of images is retrieved based on a user query. But, most of the resultant images are unrelated to the user query. Here, the user puts their text-based query in the web-based search engine and compute the related images and retrieval time. Main Findings:  This study compares the accuracy and retrieval-time of the requested image. After the detailed analysis, the main finding is none of the used web-search engines viz. Flickr, Pixabay, Shutterstock, Bing, Everypixel, retrieved the accurate related images based on the entered query.   Implications: This study is discussing and performs a comparative analysis of various content-based image retrieval techniques from cyberspace. Novelty of Study: Research community has been making efforts towards efficient retrieval of useful images from the web but this problem has not been solved and it still prevails as an open research challenge. This study makes some efforts to resolve this research challenge and perform a comparative analysis of the outcome of various web-search engines.


2009 ◽  
pp. 1079-1086 ◽  
Author(s):  
Penelope Markellou ◽  
Angeliki Panayiotaki ◽  
Athanasios Tsakalidis

Over the last decade, we have witnessed an explosive growth in the information available on the Web. Today, Web browsers provide easy access to myriad sources of text and multimedia data. Search engines index more than a billion pages and finding the desired information is not an easy task. This profusion of resources has prompted the need for developing automatic mining techniques on Web, thereby giving rise to the term “Web mining” (Pal, Talwar, & Mitra, 2002). Web mining is the application of data mining techniques on the Web for discovering useful patterns and can be divided into three basic categories: Web content mining, Web structure mining, and Web usage mining. Web content mining includes techniques for assisting users in locating Web documents (i.e., pages) that meet certain criteria, while Web structure mining relates to discovering information based on the Web site structure data (the data depicting the Web site map). Web usage mining focuses on analyzing Web access logs and other sources of information regarding user interactions within the Web site in order to capture, understand and model their behavioral patterns and profiles and thereby improve their experience with the Web site. As citizens requirements and needs change continuously, traditional information searching, and fulfillment of various tasks result to the loss of valuable time spent in identifying the responsible actor (public authority) and waiting in queues. At the same time, the percentage of users who acquaint with the Internet has been remarkably increased (Internet World Stats, 2005). These two facts motivate many governmental organizations to proceed with the provision of e-services via their Web sites. The ease and speed with which business transactions can be carried out over the Web has been a key driving force in the rapid growth and popularity of e-government, e-commerce, and e-business applications. In this framework, the Web is emerging as the appropriate environment for business transactions and user-organization interactions. However, since it is a large collection of semi-structured and structured information sources, Web users often suffer from information overload. Personalization is considered as a popular solution in order to alleviate this problem and to customize the Web environment to users (Eirinaki & Vazirgiannis, 2003). Web personalization can be described, as any action that makes the Web experience of a user personalized to his or her needs and wishes. Principal elements of Web personalization include modeling of Web objects (pages) and subjects (users), categorization of objects and subjects, matching between and across objects and/or subjects, and determination of the set of actions to be recommended for personalization. In the remainder of this article, we present the way an e-government application can deploy Web mining techniques in order to support intelligent and personalized interactions with citizens. Specifically, we describe the tasks that typically comprise this process, illustrate the future trends, and discuss the open issues in the field.


Author(s):  
P. Markellou

Over the last decade, we have witnessed an explosive growth in the information available on the Web. Today, Web browsers provide easy access to myriad sources of text and multimedia data. Search engines index more than a billion pages and finding the desired information is not an easy task. This profusion of resources has prompted the need for developing automatic mining techniques on Web, thereby giving rise to the term “Web mining” (Pal, Talwar, & Mitra, 2002). Web mining is the application of data mining techniques on the Web for discovering useful patterns and can be divided into three basic categories: Web content mining, Web structure mining, and Web usage mining. Web content mining includes techniques for assisting users in locating Web documents (i.e., pages) that meet certain criteria, while Web structure mining relates to discovering information based on the Web site structure data (the data depicting the Web site map). Web usage mining focuses on analyzing Web access logs and other sources of information regarding user interactions within the Web site in order to capture, understand and model their behavioral patterns and profiles and thereby improve their experience with the Web site. As citizens requirements and needs change continuously, traditional information searching, and fulfillment of various tasks result to the loss of valuable time spent in identifying the responsible actor (public authority) and waiting in queues. At the same time, the percentage of users who acquaint with the Internet has been remarkably increased (Internet World Stats, 2005). These two facts motivate many governmental organizations to proceed with the provision of e-services via their Web sites. The ease and speed with which business transactions can be carried out over the Web has been a key driving force in the rapid growth and popularity of e-government, e-commerce, and e-business applications. In this framework, the Web is emerging as the appropriate environment for business transactions and user-organization interactions. However, since it is a large collection of semi-structured and structured information sources, Web users often suffer from information overload. Personalization is considered as a popular solution in order to alleviate this problem and to customize the Web environment to users (Eirinaki & Vazirgiannis, 2003). Web personalization can be described, as any action that makes the Web experience of a user personalized to his or her needs and wishes. Principal elements of Web personalization include modeling of Web objects (pages) and subjects (users), categorization of objects and subjects, matching between and across objects and/or subjects, and determination of the set of actions to be recommended for personalization. In the remainder of this article, we present the way an e-government application can deploy Web mining techniques in order to support intelligent and personalized interactions with citizens. Specifically, we describe the tasks that typically comprise this process, illustrate the future trends, and discuss the open issues in the field.


Sign in / Sign up

Export Citation Format

Share Document