Ranking Documents Based on the Semantic Relations Using Analytical Hierarchy Process

2017 ◽

Vol 7 (3) ◽

pp. 22-37

Author(s):

Ali I. El-Dsouky ◽

Hesham A. Ali ◽

Rabab Samy Rashed

Keyword(s):

Search Engines ◽

Accurate Solution ◽

Semantic Relations ◽

Web Pages ◽

Ahp Method ◽

Ranking Algorithms ◽

Hierarchical Relations ◽

The World ◽

Hierarchy Process ◽

Genealogical Information

With the rapid growth of the World Wide Web comes the need for a fast and accurate way to reach the information required. Search engines play an important role in retrieving the required information for users. Ranking algorithms are an important step in search engines so that the user could retrieve the pages most relevant to his query In this work, the authors present a method for utilizing genealogical information from ontology to find the suitable hierarchical concepts for query extension, and ranking web pages based on semantic relations of the hierarchical concepts related to query terms, taking into consideration the hierarchical relations of domain searched (sibling, synonyms and hyponyms) by different weighting based on AHP method. So, it provides an accurate solution for ranking documents when compared to the three common methods.

Download Full-text

MapReduce Based Information Retrieval Algorithms for Efficient Ranking of Webpages

Information Retrieval Methods for Multidisciplinary Applications ◽

10.4018/978-1-4666-3898-3.ch015 ◽

2013 ◽

pp. 250-265

Author(s):

K.G. Srinivasa ◽

Anil Kumar Muppalla ◽

Varun A. Bharghava ◽

M. Amulya

Keyword(s):

Search Engines ◽

World Wide ◽

Speed Of Convergence ◽

User Choice ◽

Link Structure ◽

Ranking Algorithms ◽

The World ◽

Retrieval Algorithms ◽

Web Graph ◽

The Web

In this paper, the authors discuss the MapReduce implementation of crawler, indexer and ranking algorithms in search engines. The proposed algorithms are used in search engines to retrieve results from the World Wide Web. A crawler and an indexer in a MapReduce environment are used to improve the speed of crawling and indexing. The proposed ranking algorithm is an iterative method that makes use of the link structure of the Web and is developed using MapReduce framework to improve the speed of convergence of ranking the WebPages. Categorization is used to retrieve and order the results according to the user choice to personalize the search. A new score is introduced in this paper that is associated with each WebPage and is calculated using user’s query and number of occurrences of the terms in the query in the document corpus. The experiments are conducted on Web graph datasets and the results are compared with the serial versions of crawler, indexer and ranking algorithms.

Download Full-text

Web Algorithms for Information Retrieval

International Journal of Mobile Computing and Multimedia Communications ◽

10.4018/ijmcmc.2014010101 ◽

2014 ◽

Vol 6 (1) ◽

pp. 1-16

Author(s):

Bouchra Frikh ◽

Brahim Ouhbi

Keyword(s):

World Wide ◽

Information Dissemination ◽

Quality Information ◽

Web Pages ◽

Web Page ◽

Page Rank ◽

Ranking Algorithms ◽

Internet Users ◽

The World ◽

The Web

The World Wide Web has emerged to become the biggest and most popular way of communication and information dissemination. Every day, the Web is expending and people generally rely on search engine to explore the web. Because of its rapid and chaotic growth, the resulting network of information lacks of organization and structure. It is a challenge for service provider to provide proper, relevant and quality information to the internet users by using the web page contents and hyperlinks between web pages. This paper deals with analysis and comparison of web pages ranking algorithms based on various parameters to find out their advantages and limitations for ranking web pages and to give the further scope of research in web pages ranking algorithms. Six important algorithms: the Page Rank, Query Dependent-PageRank, HITS, SALSA, Simultaneous Terms Query Dependent-PageRank (SQD-PageRank) and Onto-SQD-PageRank are presented and their performances are discussed.

Download Full-text

MapReduce Based Information Retrieval Algorithms for Efficient Ranking of Webpages

International Journal of Information Retrieval Research ◽

10.4018/ijirr.2011100102 ◽

2011 ◽

Vol 1 (4) ◽

pp. 23-37 ◽

Cited By ~ 1

Author(s):

Srinivasa K.G. ◽

Anil Kumar Muppalla ◽

Bharghava Varun A. ◽

Amulya M.

Keyword(s):

Search Engines ◽

World Wide ◽

Ranking Algorithm ◽

Mapreduce Framework ◽

User Choice ◽

Ranking Algorithms ◽

The World ◽

Retrieval Algorithms ◽

Web Graph ◽

The Web

In this paper, the authors discuss the MapReduce implementation of crawler, indexer and ranking algorithms in search engines. The proposed algorithms are used in search engines to retrieve results from the World Wide Web. A crawler and an indexer in a MapReduce environment are used to improve the speed of crawling and indexing. The proposed ranking algorithm is an iterative method that makes use of the link structure of the Web and is developed using MapReduce framework to improve the speed of convergence of ranking the WebPages. Categorization is used to retrieve and order the results according to the user choice to personalize the search. A new score is introduced in this paper that is associated with each WebPage and is calculated using user’s query and number of occurrences of the terms in the query in the document corpus. The experiments are conducted on Web graph datasets and the results are compared with the serial versions of crawler, indexer and ranking algorithms.

Download Full-text

Internet Search Engines

Encyclopedia of E-Commerce, E-Government, and Mobile Commerce ◽

10.4018/978-1-59140-799-7.ch108 ◽

2011 ◽

pp. 672-677

Author(s):

Vijay Kasi ◽

Radhika Jain

Keyword(s):

Search Engine ◽

Web Sites ◽

Search Engines ◽

World Wide ◽

Relevant Information ◽

The Internet ◽

Web Pages ◽

Web Page ◽

The World ◽

The Web

In the context of the Internet, a search engine can be defined as a software program designed to help one access information, documents, and other content on the World Wide Web. The adoption and growth of the Internet in the last decade has been unprecedented. The World Wide Web has always been applauded for its simplicity and ease of use. This is evident looking at the extent of the knowledge one requires to build a Web page. The flexible nature of the Internet has enabled the rapid growth and adoption of it, making it hard to search for relevant information on the Web. The number of Web pages has been increasing at an astronomical pace, from around 2 million registered domains in 1995 to 233 million registered domains in 2004 (Consortium, 2004). The Internet, considered a distributed database of information, has the CRUD (create, retrieve, update, and delete) rule applied to it. While the Internet has been effective at creating, updating, and deleting content, it has considerably lacked in enabling the retrieval of relevant information. After all, there is no point in having a Web page that has little or no visibility on the Web. Since the 1990s when the first search program was released, we have come a long way in terms of searching for information. Although we are currently witnessing a tremendous growth in search engine technology, the growth of the Internet has overtaken it, leading to a state in which the existing search engine technology is falling short. When we apply the metrics of relevance, rigor, efficiency, and effectiveness to the search domain, it becomes very clear that we have progressed on the rigor and efficiency metrics by utilizing abundant computing power to produce faster searches with a lot of information. Rigor and efficiency are evident in the large number of indexed pages by the leading search engines (Barroso, Dean, & Holzle, 2003). However, more research needs to be done to address the relevance and effectiveness metrics. Users typically type in two to three keywords when searching, only to end up with a search result having thousands of Web pages! This has made it increasingly hard to effectively find any useful, relevant information. Search engines face a number of challenges today requiring them to perform rigorous searches with relevant results efficiently so that they are effective. These challenges include the following (“Search Engines,” 2004). 1. The Web is growing at a much faster rate than any present search engine technology can index. 2. Web pages are updated frequently, forcing search engines to revisit them periodically. 3. Dynamically generated Web sites may be slow or difficult to index, or may result in excessive results from a single Web site. 4. Many dynamically generated Web sites are not able to be indexed by search engines. 5. The commercial interests of a search engine can interfere with the order of relevant results the search engine shows. 6. Content that is behind a firewall or that is password protected is not accessible to search engines (such as those found in several digital libraries).1 7. Some Web sites have started using tricks such as spamdexing and cloaking to manipulate search engines to display them as the top results for a set of keywords. This can make the search results polluted, with more relevant links being pushed down in the result list. This is a result of the popularity of Web searches and the business potential search engines can generate today. 8. Search engines index all the content of the Web without any bounds on the sensitivity of information. This has raised a few security and privacy flags. With the above background and challenges in mind, we lay out the article as follows. In the next section, we begin with a discussion of search engine evolution. To facilitate the examination and discussion of the search engine development’s progress, we break down this discussion into the three generations of search engines. Figure 1 depicts this evolution pictorially and highlights the need for better search engine technologies. Next, we present a brief discussion on the contemporary state of search engine technology and various types of content searches available today. With this background, the next section documents various concerns about existing search engines setting the stage for better search engine technology. These concerns include information overload, relevance, representation, and categorization. Finally, we briefly address the research efforts under way to alleviate these concerns and then present our conclusion.

Download Full-text

Ranking Pages of Clustered Users using Weighted Page Rank Algorithm with User Access Period

International Journal of Intelligent Information Technologies ◽

10.4018/ijiit.2015100102 ◽

2015 ◽

Vol 11 (4) ◽

pp. 16-36 ◽

Cited By ~ 1

Author(s):

G. Sumathi ◽

S. Sendhilkumar ◽

G.S. Mahalakshmi

Keyword(s):

World Wide Web ◽

Search Engines ◽

World Wide ◽

Web Search ◽

Search Space ◽

Web Pages ◽

Page Rank ◽

Access Period ◽

The World ◽

User Access

The World Wide Web comprises billions of web pages and a tremendous amount of information accessible inside of web pages. To recover obliged data from the World Wide Web, search engines perform number of tasks in light of their separate structural planning. The point at which a user gives a query to the search engine, it commonly returns a bulky number of pages related to the user's query. To backing the users to explore in the returned list, different ranking techniques are connected on the search results. The vast majority of the ranking calculations, which are given in the related work, are either link or content based. The existing works don't consider user access patterns. In this paper, a page ranking approach of Weighted Page Rank Score Algorithm taking user access is being conceived for search engines, which deals with the premise of weighted page rank method and considers user access period of web pages into record. For this reason, the web users are clustered based on the Particle Swarm Optimization (PSO) approach. From those groups, the pages are ranked by improving the weighted page rank approach with usage based parameter of user access period. This calculation is utilized to discover more applicable pages as per user's query. In this way, this idea is extremely helpful to show the most important pages on the uppermost part of the search list on the principle of user searching behavior, which shrinks the search space on a huge scale.

Download Full-text

Patterns of Searching for Information on the World Wide Web: A Pilot Study

Psychological Reports ◽

10.2466/pr0.2003.92.3c.1091 ◽

2003 ◽

Vol 92 (3_suppl) ◽

pp. 1091-1096 ◽

Cited By ~ 2

Author(s):

Nobuhiko Fujihara ◽

Asako Miura

Keyword(s):

World Wide Web ◽

Undergraduate Students ◽

Search Engines ◽

World Wide ◽

Information Sources ◽

Task Type ◽

The Other ◽

The World ◽

Selection Of ◽

Search Domain

The influences of task type on search of the World Wide Web using search engines without limitation of search domain were investigated. 9 graduate and undergraduate students studying psychology (1 woman and 8 men, M age = 25.0 yr., SD = 2.1) participated. Their performance to manipulate the search engines on a closed task with only one answer were compared with their performance on an open task with several possible answers. Analysis showed that the number of actions was larger for the closed task ( M = 91) than for the open task ( M = 46.1). Behaviors such as selection of keywords (averages were 7.9% of all actions for the closed task and 16.7% for the open task) and pressing of the browser's back button (averages were 40.3% of all actions for the closed task and 29.6% for the open task) were also different. On the other hand, behaviors such as selection of hyperlinks, pressing of the home button, and number of browsed pages were similar for both tasks. Search behaviors were influenced by task type when the students searched for information without limitation placed on the information sources.

Download Full-text

A Simulation of the Structure of the World-Wide Web

Sociological Research Online ◽

10.5153/sro.684 ◽

2002 ◽

Vol 7 (1) ◽

pp. 9-25 ◽

Cited By ~ 2

Author(s):

Moses Boudourides ◽

Gerasimos Antypas

Keyword(s):

World Wide Web ◽

Power Law ◽

Web Sites ◽

World Wide ◽

The Internet ◽

Web Pages ◽

Small Worlds ◽

Web Page ◽

Simple Simulation ◽

The World

In this paper we are presenting a simple simulation of the Internet World-Wide Web, where one observes the appearance of web pages belonging to different web sites, covering a number of different thematic topics and possessing links to other web pages. The goal of our simulation is to reproduce the form of the observed World-Wide Web and of its growth, using a small number of simple assumptions. In our simulation, existing web pages may generate new ones as follows: First, each web page is equipped with a topic concerning its contents. Second, links between web pages are established according to common topics. Next, new web pages may be randomly generated and subsequently they might be equipped with a topic and be assigned to web sites. By repeated iterations of these rules, our simulation appears to exhibit the observed structure of the World-Wide Web and, in particular, a power law type of growth. In order to visualise the network of web pages, we have followed N. Gilbert's (1997) methodology of scientometric simulation, assuming that web pages can be represented by points in the plane. Furthermore, the simulated graph is found to possess the property of small worlds, as it is the case with a large number of other complex networks.

Download Full-text

The World Wide Web: Web Pages that Change

The Internet Book ◽

10.1201/9780429447358-22 ◽

2018 ◽

pp. 227-240

Author(s):

Douglas E. Comer

Keyword(s):

World Wide Web ◽

World Wide ◽

Web Pages ◽

The World

Download Full-text

Dark Web

Encyclopedia of Criminal Activities and the Deep Web ◽

10.4018/978-1-5225-9715-5.ch010 ◽

2020 ◽

pp. 152-164

Author(s):

Punam Bedi ◽

Neha Gupta ◽

Vinita Jindal

Keyword(s):

World Wide Web ◽

Search Engines ◽

World Wide ◽

Data Dissemination ◽

Deep Web ◽

Web Browsers ◽

Web Content ◽

The World ◽

Dark Web ◽

The Web

The World Wide Web is a part of the Internet that provides data dissemination facility to people. The contents of the Web are crawled and indexed by search engines so that they can be retrieved, ranked, and displayed as a result of users' search queries. These contents that can be easily retrieved using Web browsers and search engines comprise the Surface Web. All information that cannot be crawled by search engines' crawlers falls under Deep Web. Deep Web content never appears in the results displayed by search engines. Though this part of the Web remains hidden, it can be reached using targeted search over normal Web browsers. Unlike Deep Web, there exists a portion of the World Wide Web that cannot be accessed without special software. This is known as the Dark Web. This chapter describes how the Dark Web differs from the Deep Web and elaborates on the commonly used software to enter the Dark Web. It highlights the illegitimate and legitimate sides of the Dark Web and specifies the role played by cryptocurrencies in the expansion of Dark Web's user base.

Download Full-text