MapReduce Based Information Retrieval Algorithms for Efficient Ranking of Webpages

In this paper, the authors discuss the MapReduce implementation of crawler, indexer and ranking algorithms in search engines. The proposed algorithms are used in search engines to retrieve results from the World Wide Web. A crawler and an indexer in a MapReduce environment are used to improve the speed of crawling and indexing. The proposed ranking algorithm is an iterative method that makes use of the link structure of the Web and is developed using MapReduce framework to improve the speed of convergence of ranking the WebPages. Categorization is used to retrieve and order the results according to the user choice to personalize the search. A new score is introduced in this paper that is associated with each WebPage and is calculated using user’s query and number of occurrences of the terms in the query in the document corpus. The experiments are conducted on Web graph datasets and the results are compared with the serial versions of crawler, indexer and ranking algorithms.

Download Full-text

Dark Web

Encyclopedia of Criminal Activities and the Deep Web ◽

10.4018/978-1-5225-9715-5.ch010 ◽

2020 ◽

pp. 152-164

Author(s):

Punam Bedi ◽

Neha Gupta ◽

Vinita Jindal

Keyword(s):

World Wide Web ◽

Search Engines ◽

World Wide ◽

Data Dissemination ◽

Deep Web ◽

Web Browsers ◽

Web Content ◽

The World ◽

Dark Web ◽

The Web

The World Wide Web is a part of the Internet that provides data dissemination facility to people. The contents of the Web are crawled and indexed by search engines so that they can be retrieved, ranked, and displayed as a result of users' search queries. These contents that can be easily retrieved using Web browsers and search engines comprise the Surface Web. All information that cannot be crawled by search engines' crawlers falls under Deep Web. Deep Web content never appears in the results displayed by search engines. Though this part of the Web remains hidden, it can be reached using targeted search over normal Web browsers. Unlike Deep Web, there exists a portion of the World Wide Web that cannot be accessed without special software. This is known as the Dark Web. This chapter describes how the Dark Web differs from the Deep Web and elaborates on the commonly used software to enter the Dark Web. It highlights the illegitimate and legitimate sides of the Dark Web and specifies the role played by cryptocurrencies in the expansion of Dark Web's user base.

Download Full-text

A Roadmap to Integrate Document Clustering in Information Retrieval

Information Retrieval Methods for Multidisciplinary Applications ◽

10.4018/978-1-4666-3898-3.ch003 ◽

2013 ◽

pp. 31-45

Author(s):

R. Subhashini ◽

V.Jawahar Senthil Kumar

Keyword(s):

Information Retrieval ◽

Search Engines ◽

World Wide ◽

Clustering Algorithm ◽

Web Search ◽

Full Potential ◽

Digital Information ◽

Search Results ◽

The World ◽

The Web

The World Wide Web is a large distributed digital information space. The ability to search and retrieve information from the Web efficiently and effectively is an enabling technology for realizing its full potential. Information Retrieval (IR) plays an important role in search engines. Today’s most advanced engines use the keyword-based (“bag of words”) paradigm, which has inherent disadvantages. Organizing web search results into clusters facilitates the user’s quick browsing of search results. Traditional clustering techniques are inadequate because they do not generate clusters with highly readable names. This paper proposes an approach for web search results in clustering based on a phrase based clustering algorithm. It is an alternative to a single ordered result of search engines. This approach presents a list of clusters to the user. Experimental results verify the method’s feasibility and effectiveness.

Download Full-text

Web Algorithms for Information Retrieval

International Journal of Mobile Computing and Multimedia Communications ◽

10.4018/ijmcmc.2014010101 ◽

2014 ◽

Vol 6 (1) ◽

pp. 1-16

Author(s):

Bouchra Frikh ◽

Brahim Ouhbi

Keyword(s):

World Wide ◽

Information Dissemination ◽

Quality Information ◽

Web Pages ◽

Web Page ◽

Page Rank ◽

Ranking Algorithms ◽

Internet Users ◽

The World ◽

The Web

The World Wide Web has emerged to become the biggest and most popular way of communication and information dissemination. Every day, the Web is expending and people generally rely on search engine to explore the web. Because of its rapid and chaotic growth, the resulting network of information lacks of organization and structure. It is a challenge for service provider to provide proper, relevant and quality information to the internet users by using the web page contents and hyperlinks between web pages. This paper deals with analysis and comparison of web pages ranking algorithms based on various parameters to find out their advantages and limitations for ranking web pages and to give the further scope of research in web pages ranking algorithms. Six important algorithms: the Page Rank, Query Dependent-PageRank, HITS, SALSA, Simultaneous Terms Query Dependent-PageRank (SQD-PageRank) and Onto-SQD-PageRank are presented and their performances are discussed.

Download Full-text

Searching Bioinformatics Information Strategies for Effective Use of Search Engine

Biomedical Engineering ◽

10.4018/978-1-5225-3158-6.ch033 ◽

2018 ◽

pp. 742-748

Author(s):

Viveka Vardhan Jumpala

Keyword(s):

World Wide Web ◽

Search Engine ◽

Search Engines ◽

World Wide ◽

The Internet ◽

Information Strategies ◽

The World ◽

Search For Information ◽

Effective Use ◽

The Web

The Internet, which is an information super high way, has practically compressed the world into a cyber colony through various networks and other Internets. The development of the Internet and the emergence of the World Wide Web (WWW) as common vehicle for communication and instantaneous access to search engines and databases. Search Engine is designed to facilitate search for information on the WWW. Search Engines are essentially the tools that help in finding required information on the web quickly in an organized manner. Different search engines do the same job in different ways thus giving different results for the same query. Search Strategies are the new trend on the Web.

Download Full-text

Ranking Documents Based on the Semantic Relations Using Analytical Hierarchy Process

Information Retrieval and Management ◽

10.4018/978-1-5225-5191-1.ch084 ◽

2018 ◽

pp. 1841-1859 ◽

Cited By ~ 1

Author(s):

Ali I. El-Dsouky ◽

Hesham A. Ali ◽

Rabab Samy Rashed

Keyword(s):

Search Engines ◽

World Wide ◽

Accurate Solution ◽

Semantic Relations ◽

Web Pages ◽

Ranking Algorithms ◽

Hierarchical Relations ◽

The World ◽

Hierarchy Process ◽

Genealogical Information

With the rapid growth of the World Wide Web comes the need for a fast and accurate way to reach the information required. Search engines play an important role in retrieving the required information for users. Ranking algorithms are an important step in search engines so that the user could retrieve the pages most relevant to his query In this work, the authors present a method for utilizing genealogical information from ontology to find the suitable hierarchical concepts for query extension, and ranking web pages based on semantic relations of the hierarchical concepts related to query terms, taking into consideration the hierarchical relations of domain searched (sibling, synonyms and hyponyms) by different weighting based on AHP method. So, it provides an accurate solution for ranking documents when compared to the three common methods.

Download Full-text

Searching Bioinformatics Information Strategies for Effective Use of Search Engine

Library and Information Services for Bioinformatics Education and Research - Advances in Library and Information Science ◽

10.4018/978-1-5225-1871-6.ch009 ◽

2017 ◽

pp. 169-176

Author(s):

Viveka Vardhan Jumpala

Keyword(s):

World Wide Web ◽

Search Engine ◽

Search Engines ◽

World Wide ◽

The Internet ◽

Information Strategies ◽

The World ◽

Search For Information ◽

Effective Use ◽

The Web

The Internet, which is an information super high way, has practically compressed the world into a cyber colony through various networks and other Internets. The development of the Internet and the emergence of the World Wide Web (WWW) as common vehicle for communication and instantaneous access to search engines and databases. Search Engine is designed to facilitate search for information on the WWW. Search Engines are essentially the tools that help in finding required information on the web quickly in an organized manner. Different search engines do the same job in different ways thus giving different results for the same query. Search Strategies are the new trend on the Web.

Download Full-text

Enhancing Electronic Governance in Singapore with Government Portals

Encyclopedia of Portal Technologies and Applications ◽

10.4018/978-1-59140-989-2.ch059 ◽

2011 ◽

pp. 348-352

Author(s):

Leo Tan Wee Hin

Keyword(s):

World Wide Web ◽

Search Engines ◽

World Wide ◽

Distinct Advantage ◽

The Internet ◽

Web Portals ◽

The World ◽

One Stop ◽

The Web ◽

Extract Information

The World Wide Web represents one of the most profound developments that has accompanied the evolution of the Internet. It is truly a global library. Information on the Web is increasing exponentially, and mechanisms to extract information from it have become an engaging field of research. While search engines have been doing an admirable job in finding information, the emergence of Web portals has also been a useful development—their distinct advantage lies in their positioning as a one-stop destination for information and services of a particular nature.

Download Full-text

Internet Search Engines

Encyclopedia of E-Commerce, E-Government, and Mobile Commerce ◽

10.4018/978-1-59140-799-7.ch108 ◽

2011 ◽

pp. 672-677

Author(s):

Vijay Kasi ◽

Radhika Jain

Keyword(s):

Search Engine ◽

Web Sites ◽

Search Engines ◽

World Wide ◽

Relevant Information ◽

The Internet ◽

Web Pages ◽

Web Page ◽

The World ◽

The Web

In the context of the Internet, a search engine can be defined as a software program designed to help one access information, documents, and other content on the World Wide Web. The adoption and growth of the Internet in the last decade has been unprecedented. The World Wide Web has always been applauded for its simplicity and ease of use. This is evident looking at the extent of the knowledge one requires to build a Web page. The flexible nature of the Internet has enabled the rapid growth and adoption of it, making it hard to search for relevant information on the Web. The number of Web pages has been increasing at an astronomical pace, from around 2 million registered domains in 1995 to 233 million registered domains in 2004 (Consortium, 2004). The Internet, considered a distributed database of information, has the CRUD (create, retrieve, update, and delete) rule applied to it. While the Internet has been effective at creating, updating, and deleting content, it has considerably lacked in enabling the retrieval of relevant information. After all, there is no point in having a Web page that has little or no visibility on the Web. Since the 1990s when the first search program was released, we have come a long way in terms of searching for information. Although we are currently witnessing a tremendous growth in search engine technology, the growth of the Internet has overtaken it, leading to a state in which the existing search engine technology is falling short. When we apply the metrics of relevance, rigor, efficiency, and effectiveness to the search domain, it becomes very clear that we have progressed on the rigor and efficiency metrics by utilizing abundant computing power to produce faster searches with a lot of information. Rigor and efficiency are evident in the large number of indexed pages by the leading search engines (Barroso, Dean, & Holzle, 2003). However, more research needs to be done to address the relevance and effectiveness metrics. Users typically type in two to three keywords when searching, only to end up with a search result having thousands of Web pages! This has made it increasingly hard to effectively find any useful, relevant information. Search engines face a number of challenges today requiring them to perform rigorous searches with relevant results efficiently so that they are effective. These challenges include the following (“Search Engines,” 2004). 1. The Web is growing at a much faster rate than any present search engine technology can index. 2. Web pages are updated frequently, forcing search engines to revisit them periodically. 3. Dynamically generated Web sites may be slow or difficult to index, or may result in excessive results from a single Web site. 4. Many dynamically generated Web sites are not able to be indexed by search engines. 5. The commercial interests of a search engine can interfere with the order of relevant results the search engine shows. 6. Content that is behind a firewall or that is password protected is not accessible to search engines (such as those found in several digital libraries).1 7. Some Web sites have started using tricks such as spamdexing and cloaking to manipulate search engines to display them as the top results for a set of keywords. This can make the search results polluted, with more relevant links being pushed down in the result list. This is a result of the popularity of Web searches and the business potential search engines can generate today. 8. Search engines index all the content of the Web without any bounds on the sensitivity of information. This has raised a few security and privacy flags. With the above background and challenges in mind, we lay out the article as follows. In the next section, we begin with a discussion of search engine evolution. To facilitate the examination and discussion of the search engine development’s progress, we break down this discussion into the three generations of search engines. Figure 1 depicts this evolution pictorially and highlights the need for better search engine technologies. Next, we present a brief discussion on the contemporary state of search engine technology and various types of content searches available today. With this background, the next section documents various concerns about existing search engines setting the stage for better search engine technology. These concerns include information overload, relevance, representation, and categorization. Finally, we briefly address the research efforts under way to alleviate these concerns and then present our conclusion.

Download Full-text

Semantic Search Engines Based on Data Integration Systems

Semantic Web Services ◽

10.4018/978-1-59904-045-5.ch013 ◽

2011 ◽

pp. 317-342 ◽

Cited By ~ 4

Author(s):

D. Beneventano

Keyword(s):

Data Integration ◽

Search Engines ◽

World Wide ◽

Web Search ◽

Semantic Search ◽

The World ◽

Functional Features ◽

Web Search Engines ◽

Relevant Class ◽

The Web

As the use of the World Wide Web has become increasingly widespread, the business of commercial search engines has become a vital and lucrative part of the Web. Search engines are common place tools for virtually every user of the Internet; and companies, such as Google and Yahoo!, have become household names. Semantic search engines try to augment and improve traditional Web Search Engines by using not just words, but concepts and logical relationships. In this chapter a relevant class of semantic search engines, based on a peer-to-peer, data integration mediator-based architecture is described. The architectural and functional features are presented with respect to two projects, SEWASIE and WISDOM, involving the authors. The methodology to create a two level ontology and query processing in the SEWASIE project are fully described.

Download Full-text