Modern Search Engine Techniques

The World Wide Web has emerged to become the biggest and most popular way of communication and information dissemination. Every day, the Web is expending and people generally rely on search engine to explore the web. Because of its rapid and chaotic growth, the resulting network of information lacks of organization and structure. It is a challenge for service provider to provide proper, relevant and quality information to the internet users by using the web page contents and hyperlinks between web pages. This paper deals with analysis and comparison of web pages ranking algorithms based on various parameters to find out their advantages and limitations for ranking web pages and to give the further scope of research in web pages ranking algorithms. Six important algorithms: the Page Rank, Query Dependent-PageRank, HITS, SALSA, Simultaneous Terms Query Dependent-PageRank (SQD-PageRank) and Onto-SQD-PageRank are presented and their performances are discussed.

Download Full-text

Eccentric Methodology with Optimization to Unearth Hidden Facts of Search Engine Result Pages

Recent Patents on Computer Science ◽

10.2174/2213275911666181115093050 ◽

2019 ◽

Vol 12 (2) ◽

pp. 110-119 ◽

Cited By ~ 3

Author(s):

Jayaraman Sethuraman ◽

Jafar A. Alzubi ◽

Ramachandran Manikandan ◽

Mehdi Gheisari ◽

Ambeshwar Kumar

Keyword(s):

Search Engine ◽

World Wide ◽

Optimization Techniques ◽

Web Pages ◽

Web Page ◽

Search Engine Optimization ◽

The World ◽

Search Engine Result ◽

The Web ◽

New Framework

Background: The World Wide Web houses an abundance of information that is used every day by billions of users across the world to find relevant data. Website owners employ webmasters to ensure their pages are ranked top in search engine result pages. However, understanding how the search engine ranks a website, which comprises numerous web pages, as the top ten or twenty websites is a major challenge. Although systems have been developed to understand the ranking process, a specialized tool based approach has not been tried. Objective: This paper develops a new framework and system that process website contents to determine search engine optimization factors. Methods: To analyze the web page dynamically by assessing the web site content based on specific keywords, elimination method was used in an attempt to reveal various search engine optimization techniques. Conclusion: Our results lead to conclude that the developed system is able to perform a deeper analysis and find factors which play a role in bringing the site on the top of the list.

Download Full-text

Filtering Method for the Annotated and Non-Annotated Web Pages

International Journal of Knowledge Society Research ◽

10.4018/ijksr.2017010101 ◽

2017 ◽

Vol 8 (1) ◽

pp. 1-22 ◽

Cited By ~ 1

Author(s):

Sahar Maâlej Dammak ◽

Anis Jedidi ◽

Rafik Bouaziz

Keyword(s):

Search Engine ◽

Semantic Annotation ◽

Web Pages ◽

Filtering Method ◽

Annotation Process ◽

The World ◽

Multiple Domains ◽

The Web ◽

Great Mass

With the great mass of the pages managed through the world, and especially with the advent of the Web, it has become more difficult to find the relevant pages after an interrogation. Furthermore, the manual filtering of the indexed Web pages is a laborious task. A new filtering method of the annotated Web pages (by our semantic annotation process) and the non-annotated Web pages (retrieved from search engine “Google”) is then necessary to group the relevant Web pages for the user. In this paper, the authors will first synthesize their previous work of the semantic annotation of Web pages. Then, they will define a new filtering method based on three activities. The authors will also present their querying and filtering component of Web pages; their purpose is to demonstrate the feasibility of the filtering method. Finally, the authors will present an evaluation of this component, which has proved its performance for multiple domains.

Download Full-text

Filtering Method for the Annotated and Non-Annotated Web Pages

Information Retrieval and Management ◽

10.4018/978-1-5225-5191-1.ch057 ◽

2018 ◽

pp. 1300-1322

Author(s):

Sahar Maâlej Dammak ◽

Anis Jedidi ◽

Rafik Bouaziz

Keyword(s):

Search Engine ◽

Semantic Annotation ◽

Web Pages ◽

Filtering Method ◽

Annotation Process ◽

The World ◽

Multiple Domains ◽

The Web ◽

Great Mass

With the great mass of the pages managed through the world, and especially with the advent of the Web, it has become more difficult to find the relevant pages after an interrogation. Furthermore, the manual filtering of the indexed Web pages is a laborious task. A new filtering method of the annotated Web pages (by our semantic annotation process) and the non-annotated Web pages (retrieved from search engine “Google”) is then necessary to group the relevant Web pages for the user. In this paper, the authors will first synthesize their previous work of the semantic annotation of Web pages. Then, they will define a new filtering method based on three activities. The authors will also present their querying and filtering component of Web pages; their purpose is to demonstrate the feasibility of the filtering method. Finally, the authors will present an evaluation of this component, which has proved its performance for multiple domains.

Download Full-text

Internet Search Engines

Encyclopedia of E-Commerce, E-Government, and Mobile Commerce ◽

10.4018/978-1-59140-799-7.ch108 ◽

2011 ◽

pp. 672-677

Author(s):

Vijay Kasi ◽

Radhika Jain

Keyword(s):

Search Engine ◽

Web Sites ◽

Search Engines ◽

World Wide ◽

Relevant Information ◽

The Internet ◽

Web Pages ◽

Web Page ◽

The World ◽

The Web

In the context of the Internet, a search engine can be defined as a software program designed to help one access information, documents, and other content on the World Wide Web. The adoption and growth of the Internet in the last decade has been unprecedented. The World Wide Web has always been applauded for its simplicity and ease of use. This is evident looking at the extent of the knowledge one requires to build a Web page. The flexible nature of the Internet has enabled the rapid growth and adoption of it, making it hard to search for relevant information on the Web. The number of Web pages has been increasing at an astronomical pace, from around 2 million registered domains in 1995 to 233 million registered domains in 2004 (Consortium, 2004). The Internet, considered a distributed database of information, has the CRUD (create, retrieve, update, and delete) rule applied to it. While the Internet has been effective at creating, updating, and deleting content, it has considerably lacked in enabling the retrieval of relevant information. After all, there is no point in having a Web page that has little or no visibility on the Web. Since the 1990s when the first search program was released, we have come a long way in terms of searching for information. Although we are currently witnessing a tremendous growth in search engine technology, the growth of the Internet has overtaken it, leading to a state in which the existing search engine technology is falling short. When we apply the metrics of relevance, rigor, efficiency, and effectiveness to the search domain, it becomes very clear that we have progressed on the rigor and efficiency metrics by utilizing abundant computing power to produce faster searches with a lot of information. Rigor and efficiency are evident in the large number of indexed pages by the leading search engines (Barroso, Dean, & Holzle, 2003). However, more research needs to be done to address the relevance and effectiveness metrics. Users typically type in two to three keywords when searching, only to end up with a search result having thousands of Web pages! This has made it increasingly hard to effectively find any useful, relevant information. Search engines face a number of challenges today requiring them to perform rigorous searches with relevant results efficiently so that they are effective. These challenges include the following (“Search Engines,” 2004). 1. The Web is growing at a much faster rate than any present search engine technology can index. 2. Web pages are updated frequently, forcing search engines to revisit them periodically. 3. Dynamically generated Web sites may be slow or difficult to index, or may result in excessive results from a single Web site. 4. Many dynamically generated Web sites are not able to be indexed by search engines. 5. The commercial interests of a search engine can interfere with the order of relevant results the search engine shows. 6. Content that is behind a firewall or that is password protected is not accessible to search engines (such as those found in several digital libraries).1 7. Some Web sites have started using tricks such as spamdexing and cloaking to manipulate search engines to display them as the top results for a set of keywords. This can make the search results polluted, with more relevant links being pushed down in the result list. This is a result of the popularity of Web searches and the business potential search engines can generate today. 8. Search engines index all the content of the Web without any bounds on the sensitivity of information. This has raised a few security and privacy flags. With the above background and challenges in mind, we lay out the article as follows. In the next section, we begin with a discussion of search engine evolution. To facilitate the examination and discussion of the search engine development’s progress, we break down this discussion into the three generations of search engines. Figure 1 depicts this evolution pictorially and highlights the need for better search engine technologies. Next, we present a brief discussion on the contemporary state of search engine technology and various types of content searches available today. With this background, the next section documents various concerns about existing search engines setting the stage for better search engine technology. These concerns include information overload, relevance, representation, and categorization. Finally, we briefly address the research efforts under way to alleviate these concerns and then present our conclusion.

Download Full-text

Retrieval of Relevant Web Pages by a New Filtering Method

Knowledge-Intensive Economies and Opportunities for Social, Organizational, and Technological Growth - Advances in Knowledge Acquisition, Transfer, and Management ◽

10.4018/978-1-5225-7347-0.ch012 ◽

2019 ◽

pp. 222-247

Author(s):

Sahar Maâlej Dammak ◽

Anis Jedidi ◽

Rafik Bouaziz

Keyword(s):

Search Engine ◽

Semantic Annotation ◽

Web Pages ◽

Retrieval Method ◽

Filtering Method ◽

Annotation Process ◽

The World ◽

Multiple Domains ◽

The Web ◽

Great Mass

With the great mass of the pages managed through the world, and especially with the advent of the web, it has become more difficult to find the relevant pages after an interrogation. Furthermore, the manual filtering of the indexed web pages is a laborious task. A new filtering method of the annotated web pages (by a semantic annotation process) and the non-annotated web pages (retrieved from search engine Google) is then necessary to group the relevant web pages for the user. In this chapter, the authors first synthesize their previous work of the semantic annotation of web pages. Then, they define a new filtering method based on three activities. They also present their querying and filtering component of web pages; their purpose is to demonstrate the feasibility of our filtering method. Finally, the authors present an evaluation of this component, which has proved its performance for multiple domains, and they discuss the use of the extended Boolean retrieval method in the new filtering method.

Download Full-text

Wittgenstein and web facets

NASKO ◽

10.7152/nasko.v3i1.12788 ◽

2011 ◽

Vol 3 (1) ◽

pp. 33

Author(s):

Elizabeth Milonas

Keyword(s):

Search Engine ◽

Philosophy Of Language ◽

World Wide ◽

Web Search ◽

Language Usage ◽

Search Result ◽

Web Search Engine ◽

Daunting Task ◽

The World ◽

The Web

The World Wide Web has grown exponentially in the last few years. The popularity of Web search engines has also grown in a similar manner. The task of a Web search engine is to provide the Web searcher with accurate and targeted information from the plethora of information available on the Web. This is a daunting task that requires the careful usage of language to ensure accuracy. As a result, the importance of the usage and meaning of language in the Web domain has become the focus of recent research. In this paper, the author will explore Wittgenstein’s later philosophy of language as it applies to the language used in the search result pages of a Web search engine in an effort to broaden the understanding of language usage within this domain.

Download Full-text

Using the Google™ Search Engine for Health Information: Is There a Problem? Case Study: Supplements for Cancer

Current Developments in Nutrition ◽

10.1093/cdn/nzab002 ◽

2021 ◽

Vol 5 (2) ◽

Author(s):

Hannah C Cai ◽

Leanne E King ◽

Johanna T Dwyer

Keyword(s):

Health Information ◽

Search Engine ◽

Information Quality ◽

Nutrition Information ◽

High Quality ◽

Search Results ◽

Health And Nutrition ◽

Quality Rating

ABSTRACT We assessed the quality of online health and nutrition information using a Google™ search on “supplements for cancer”. Search results were scored using the Health Information Quality Index (HIQI), a quality-rating tool consisting of 12 objective criteria related to website domain, lack of commercial aspects, and authoritative nature of the health and nutrition information provided. Possible scores ranged from 0 (lowest) to 12 (“perfect” or highest quality). After eliminating irrelevant results, the remaining 160 search results had median and mean scores of 8. One-quarter of the results were of high quality (score of 10–12). There was no correlation between high-quality scores and early appearance in the sequence of search results, where results are presumably more visible. Also, 496 advertisements, over twice the number of search results, appeared. We conclude that the Google™ search engine may have shortcomings when used to obtain information on dietary supplements and cancer.

Download Full-text

A Low-Cost Library Database Solution

Information Technology and Libraries ◽

10.6017/ital.v19i1.10074 ◽

2017 ◽

Vol 19 (1) ◽

pp. 46-49 ◽

Cited By ~ 3

Author(s):

Mark England ◽

Lura Joseph ◽

Nem W. Schlect

Keyword(s):

Search Engine ◽

Programming Language ◽

Relational Databases ◽

Low Cost ◽

The World ◽

The Web ◽

Interface Designs

Two locally created databases are made available to the world via the Web using an inexpensive but highly functional search engine created in-house. The technology consists of a microcomputer running UNIX to serve relational databases. CGI forms created using the programming language Perl offer flexible interface designs for database users and database maintainers.

Download Full-text

Creating and using Web corpora

International Journal of Corpus Linguistics ◽

10.1075/ijcl.10.4.07the ◽

2005 ◽

Vol 10 (4) ◽

pp. 517-541 ◽

Cited By ~ 4

Author(s):

Mike Thelwall

Keyword(s):

Search Engine ◽

Web Sites ◽

Web Crawler ◽

Commercial Search Engine ◽

British National Corpus ◽

The Uk ◽

The University ◽

The Web ◽

National Corpus

The Web has recently been used as a corpus for linguistic investigations, often with the help of a commercial search engine. We discuss some potential problems with collecting data from commercial search engine and with using the Web as a corpus. We outline an alternative strategy for data collection, using a personal Web crawler. As a case study, the university Web sites of three nations (Australia, New Zealand and the UK) were crawled. The most frequent words were broadly consistent with non-Web written English, but with some academic-related words amongst the top 50 most frequent. It was also evident that the university Web sites contained a significant amount of non-English text, and academic Web English seems to be more future-oriented than British National Corpus written English.

Download Full-text