Retrieval of Relevant Web Pages by a New Filtering Method

Author(s):  
Sahar Maâlej Dammak ◽  
Anis Jedidi ◽  
Rafik Bouaziz

With the great mass of the pages managed through the world, and especially with the advent of the web, it has become more difficult to find the relevant pages after an interrogation. Furthermore, the manual filtering of the indexed web pages is a laborious task. A new filtering method of the annotated web pages (by a semantic annotation process) and the non-annotated web pages (retrieved from search engine Google) is then necessary to group the relevant web pages for the user. In this chapter, the authors first synthesize their previous work of the semantic annotation of web pages. Then, they define a new filtering method based on three activities. They also present their querying and filtering component of web pages; their purpose is to demonstrate the feasibility of our filtering method. Finally, the authors present an evaluation of this component, which has proved its performance for multiple domains, and they discuss the use of the extended Boolean retrieval method in the new filtering method.

2017 ◽  
Vol 8 (1) ◽  
pp. 1-22 ◽  
Author(s):  
Sahar Maâlej Dammak ◽  
Anis Jedidi ◽  
Rafik Bouaziz

With the great mass of the pages managed through the world, and especially with the advent of the Web, it has become more difficult to find the relevant pages after an interrogation. Furthermore, the manual filtering of the indexed Web pages is a laborious task. A new filtering method of the annotated Web pages (by our semantic annotation process) and the non-annotated Web pages (retrieved from search engine “Google”) is then necessary to group the relevant Web pages for the user. In this paper, the authors will first synthesize their previous work of the semantic annotation of Web pages. Then, they will define a new filtering method based on three activities. The authors will also present their querying and filtering component of Web pages; their purpose is to demonstrate the feasibility of the filtering method. Finally, the authors will present an evaluation of this component, which has proved its performance for multiple domains.


Author(s):  
Sahar Maâlej Dammak ◽  
Anis Jedidi ◽  
Rafik Bouaziz

With the great mass of the pages managed through the world, and especially with the advent of the Web, it has become more difficult to find the relevant pages after an interrogation. Furthermore, the manual filtering of the indexed Web pages is a laborious task. A new filtering method of the annotated Web pages (by our semantic annotation process) and the non-annotated Web pages (retrieved from search engine “Google”) is then necessary to group the relevant Web pages for the user. In this paper, the authors will first synthesize their previous work of the semantic annotation of Web pages. Then, they will define a new filtering method based on three activities. The authors will also present their querying and filtering component of Web pages; their purpose is to demonstrate the feasibility of the filtering method. Finally, the authors will present an evaluation of this component, which has proved its performance for multiple domains.


2019 ◽  
Vol 12 (2) ◽  
pp. 110-119 ◽  
Author(s):  
Jayaraman Sethuraman ◽  
Jafar A. Alzubi ◽  
Ramachandran Manikandan ◽  
Mehdi Gheisari ◽  
Ambeshwar Kumar

Background: The World Wide Web houses an abundance of information that is used every day by billions of users across the world to find relevant data. Website owners employ webmasters to ensure their pages are ranked top in search engine result pages. However, understanding how the search engine ranks a website, which comprises numerous web pages, as the top ten or twenty websites is a major challenge. Although systems have been developed to understand the ranking process, a specialized tool based approach has not been tried. Objective: This paper develops a new framework and system that process website contents to determine search engine optimization factors. Methods: To analyze the web page dynamically by assessing the web site content based on specific keywords, elimination method was used in an attempt to reveal various search engine optimization techniques. Conclusion: Our results lead to conclude that the developed system is able to perform a deeper analysis and find factors which play a role in bringing the site on the top of the list.


Author(s):  
Vijay Kasi ◽  
Radhika Jain

In the context of the Internet, a search engine can be defined as a software program designed to help one access information, documents, and other content on the World Wide Web. The adoption and growth of the Internet in the last decade has been unprecedented. The World Wide Web has always been applauded for its simplicity and ease of use. This is evident looking at the extent of the knowledge one requires to build a Web page. The flexible nature of the Internet has enabled the rapid growth and adoption of it, making it hard to search for relevant information on the Web. The number of Web pages has been increasing at an astronomical pace, from around 2 million registered domains in 1995 to 233 million registered domains in 2004 (Consortium, 2004). The Internet, considered a distributed database of information, has the CRUD (create, retrieve, update, and delete) rule applied to it. While the Internet has been effective at creating, updating, and deleting content, it has considerably lacked in enabling the retrieval of relevant information. After all, there is no point in having a Web page that has little or no visibility on the Web. Since the 1990s when the first search program was released, we have come a long way in terms of searching for information. Although we are currently witnessing a tremendous growth in search engine technology, the growth of the Internet has overtaken it, leading to a state in which the existing search engine technology is falling short. When we apply the metrics of relevance, rigor, efficiency, and effectiveness to the search domain, it becomes very clear that we have progressed on the rigor and efficiency metrics by utilizing abundant computing power to produce faster searches with a lot of information. Rigor and efficiency are evident in the large number of indexed pages by the leading search engines (Barroso, Dean, & Holzle, 2003). However, more research needs to be done to address the relevance and effectiveness metrics. Users typically type in two to three keywords when searching, only to end up with a search result having thousands of Web pages! This has made it increasingly hard to effectively find any useful, relevant information. Search engines face a number of challenges today requiring them to perform rigorous searches with relevant results efficiently so that they are effective. These challenges include the following (“Search Engines,” 2004). 1. The Web is growing at a much faster rate than any present search engine technology can index. 2. Web pages are updated frequently, forcing search engines to revisit them periodically. 3. Dynamically generated Web sites may be slow or difficult to index, or may result in excessive results from a single Web site. 4. Many dynamically generated Web sites are not able to be indexed by search engines. 5. The commercial interests of a search engine can interfere with the order of relevant results the search engine shows. 6. Content that is behind a firewall or that is password protected is not accessible to search engines (such as those found in several digital libraries).1 7. Some Web sites have started using tricks such as spamdexing and cloaking to manipulate search engines to display them as the top results for a set of keywords. This can make the search results polluted, with more relevant links being pushed down in the result list. This is a result of the popularity of Web searches and the business potential search engines can generate today. 8. Search engines index all the content of the Web without any bounds on the sensitivity of information. This has raised a few security and privacy flags. With the above background and challenges in mind, we lay out the article as follows. In the next section, we begin with a discussion of search engine evolution. To facilitate the examination and discussion of the search engine development’s progress, we break down this discussion into the three generations of search engines. Figure 1 depicts this evolution pictorially and highlights the need for better search engine technologies. Next, we present a brief discussion on the contemporary state of search engine technology and various types of content searches available today. With this background, the next section documents various concerns about existing search engines setting the stage for better search engine technology. These concerns include information overload, relevance, representation, and categorization. Finally, we briefly address the research efforts under way to alleviate these concerns and then present our conclusion.


Author(s):  
Sukhmeet Singh Guruwada

To be at the top in the world of www, every website needs to be standardized and well format as per standards defined. SEO has many new algorithms & indexing ways that helps user to get best results for their searches. Search Engine Optimization (SEO) is important for websites to improve the rank for search results and get more page views resulting into large user traffic. The search engine ranks provide the better and optimized result to user's query, which help them to view the exact contents they are looking for from list of popular web pages among the number of pages available on the web. My case study will focus on some advanced techniques that are helpful to the website owner for better page rank in search result. This will focus on simple modern SEO techniques that can be add on to your website designing which will indirectly help into SEO page rank.


2017 ◽  
Vol 19 (1) ◽  
pp. 46-49 ◽  
Author(s):  
Mark England ◽  
Lura Joseph ◽  
Nem W. Schlect

Two locally created databases are made available to the world via the Web using an inexpensive but highly functional search engine created in-house. The technology consists of a microcomputer running UNIX to serve relational databases. CGI forms created using the programming language Perl offer flexible interface designs for database users and database maintainers.


Author(s):  
Satinder Kaur ◽  
Sunil Gupta

Inform plays a very important role in life and nowadays, the world largely depends on the World Wide Web to obtain any information. Web comprises of a lot of websites of every discipline, whereas websites consists of web pages which are interlinked with each other with the help of hyperlinks. The success of a website largely depends on the design aspects of the web pages. Researchers have done a lot of work to appraise the web pages quantitatively. Keeping in mind the importance of the design aspects of a web page, this paper aims at the design of an automated evaluation tool which evaluate the aspects for any web page. The tool takes the HTML code of the web page as input, and then it extracts and checks the HTML tags for the uniformity. The tool comprises of normalized modules which quantify the measures of design aspects. For realization, the tool has been applied on four web pages of distinct sites and design aspects have been reported for comparison. The tool will have various advantages for web developers who can predict the design quality of web pages and enhance it before and after implementation of website without user interaction.


2020 ◽  
pp. 143-158
Author(s):  
Chris Bleakley

Chapter 8 explores the arrival of the World Wide Web, Amazon, and Google. The web allows users to display “pages” of information retrieved from remote computers by means of the Internet. Inventor Tim Berners-Lee released the first web software for free, setting in motion an explosion in Internet usage. Seeing the opportunity of a lifetime, Jeff Bezos set-up Amazon as an online bookstore. Amazon’s success was accelerated by a product recommender algorithm that selectively targets advertising at users. By the mid-1990s there were so many web sites that users often couldn’t find what they were looking for. Stanford PhD student Larry Page invented an algorithm for ranking search results based on the importance and relevance of web pages. Page and fellow student, Sergey Brin, established a company to bring their search algorithm to the world. Page and Brin - the founders of Google - are now worth US$35-40 billion, each.


Author(s):  
Alison Harcourt ◽  
George Christou ◽  
Seamus Simpson

This chapter explains one of the most important components of the web: the development and standardization of Hypertext Markup Language (HTML) and DOM (Document Object Model) which are used for creating web pages and applications. In 1994, Tim Berners-Lee established the World Wide Web consortium (W3C) to work on HTML development. In 1995, the W3C decided to introduce a new standard, WHTML 2.0. However, it was incompatible with the older HTML/WHTML versions. This led to the establishment of Web Hypertext Application Technology Working Group (WHATWG) which worked externally to the W3C. WHATWG developed HTML5 which was adopted by the major browser developers Google, Opera, Mozilla, IBM, Microsoft, and Apple. For this reason, the W3C decided to work on HTML5, leading to a joint WHATWG/W3C working group. This chapter explains the development of HTML and WHATWG’s Living Standard with explanation of ongoing splits and agreements between the two fora. It explains how this division of labour led to W3C focus on the main areas of web architecture, the semantic web, the web of devices, payments applications, and web and television (TV) standards. This has led to the spillover of work to the W3C from the national sphere, notably in the development of copyright protection for TV streaming.


Author(s):  
Ravi P. Kumar ◽  
Ashutosh K. Singh ◽  
Anand Mohan

In this era of Web computing, Cyber Security is very important as more and more data is moving into the Web. Some data are confidential and important. There are many threats for the data in the Web. Some of the basic threats can be addressed by designing the Web sites properly using Search Engine Optimization techniques. One such threat is the hanging page which gives room for link spamming. This chapter addresses the issues caused by hanging pages in Web computing. This Chapter has four important objectives. They are 1) Compare and review the different types of link structure based ranking algorithms in ranking Web pages. PageRank is used as the base algorithm throughout this Chapter. 2) Study on hanging pages, explore the effects of hanging pages in Web security and compare the existing methods to handle hanging pages. 3) Study on Link spam and explore the effect of hanging pages in link spam contribution and 4) Study on Search Engine Optimization (SEO) / Web Site Optimization (WSO) and explore the effect of hanging pages in Search Engine Optimization (SEO).


Sign in / Sign up

Export Citation Format

Share Document