Eccentric Methodology with Optimization to Unearth Hidden Facts of Search Engine Result Pages

Background: The World Wide Web houses an abundance of information that is used every day by billions of users across the world to find relevant data. Website owners employ webmasters to ensure their pages are ranked top in search engine result pages. However, understanding how the search engine ranks a website, which comprises numerous web pages, as the top ten or twenty websites is a major challenge. Although systems have been developed to understand the ranking process, a specialized tool based approach has not been tried. Objective: This paper develops a new framework and system that process website contents to determine search engine optimization factors. Methods: To analyze the web page dynamically by assessing the web site content based on specific keywords, elimination method was used in an attempt to reveal various search engine optimization techniques. Conclusion: Our results lead to conclude that the developed system is able to perform a deeper analysis and find factors which play a role in bringing the site on the top of the list.

Download Full-text

Internet Search Engines

Encyclopedia of E-Commerce, E-Government, and Mobile Commerce ◽

10.4018/978-1-59140-799-7.ch108 ◽

2011 ◽

pp. 672-677

Author(s):

Vijay Kasi ◽

Radhika Jain

Keyword(s):

Search Engine ◽

Web Sites ◽

Search Engines ◽

World Wide ◽

Relevant Information ◽

The Internet ◽

Web Pages ◽

Web Page ◽

The World ◽

The Web

In the context of the Internet, a search engine can be defined as a software program designed to help one access information, documents, and other content on the World Wide Web. The adoption and growth of the Internet in the last decade has been unprecedented. The World Wide Web has always been applauded for its simplicity and ease of use. This is evident looking at the extent of the knowledge one requires to build a Web page. The flexible nature of the Internet has enabled the rapid growth and adoption of it, making it hard to search for relevant information on the Web. The number of Web pages has been increasing at an astronomical pace, from around 2 million registered domains in 1995 to 233 million registered domains in 2004 (Consortium, 2004). The Internet, considered a distributed database of information, has the CRUD (create, retrieve, update, and delete) rule applied to it. While the Internet has been effective at creating, updating, and deleting content, it has considerably lacked in enabling the retrieval of relevant information. After all, there is no point in having a Web page that has little or no visibility on the Web. Since the 1990s when the first search program was released, we have come a long way in terms of searching for information. Although we are currently witnessing a tremendous growth in search engine technology, the growth of the Internet has overtaken it, leading to a state in which the existing search engine technology is falling short. When we apply the metrics of relevance, rigor, efficiency, and effectiveness to the search domain, it becomes very clear that we have progressed on the rigor and efficiency metrics by utilizing abundant computing power to produce faster searches with a lot of information. Rigor and efficiency are evident in the large number of indexed pages by the leading search engines (Barroso, Dean, & Holzle, 2003). However, more research needs to be done to address the relevance and effectiveness metrics. Users typically type in two to three keywords when searching, only to end up with a search result having thousands of Web pages! This has made it increasingly hard to effectively find any useful, relevant information. Search engines face a number of challenges today requiring them to perform rigorous searches with relevant results efficiently so that they are effective. These challenges include the following (“Search Engines,” 2004). 1. The Web is growing at a much faster rate than any present search engine technology can index. 2. Web pages are updated frequently, forcing search engines to revisit them periodically. 3. Dynamically generated Web sites may be slow or difficult to index, or may result in excessive results from a single Web site. 4. Many dynamically generated Web sites are not able to be indexed by search engines. 5. The commercial interests of a search engine can interfere with the order of relevant results the search engine shows. 6. Content that is behind a firewall or that is password protected is not accessible to search engines (such as those found in several digital libraries).1 7. Some Web sites have started using tricks such as spamdexing and cloaking to manipulate search engines to display them as the top results for a set of keywords. This can make the search results polluted, with more relevant links being pushed down in the result list. This is a result of the popularity of Web searches and the business potential search engines can generate today. 8. Search engines index all the content of the Web without any bounds on the sensitivity of information. This has raised a few security and privacy flags. With the above background and challenges in mind, we lay out the article as follows. In the next section, we begin with a discussion of search engine evolution. To facilitate the examination and discussion of the search engine development’s progress, we break down this discussion into the three generations of search engines. Figure 1 depicts this evolution pictorially and highlights the need for better search engine technologies. Next, we present a brief discussion on the contemporary state of search engine technology and various types of content searches available today. With this background, the next section documents various concerns about existing search engines setting the stage for better search engine technology. These concerns include information overload, relevance, representation, and categorization. Finally, we briefly address the research efforts under way to alleviate these concerns and then present our conclusion.

Download Full-text

Review of Link Structure Based Ranking Algorithms and Hanging Pages

Handbook of Research on Modern Cryptographic Solutions for Computer and Cyber Security - Advances in Information Security, Privacy, and Ethics ◽

10.4018/978-1-5225-0105-3.ch018 ◽

2016 ◽

pp. 420-459 ◽

Cited By ~ 1

Author(s):

Ravi P. Kumar ◽

Ashutosh K. Singh ◽

Anand Mohan

Keyword(s):

Search Engine ◽

Web Sites ◽

Cyber Security ◽

Web Security ◽

Optimization Techniques ◽

Web Pages ◽

Link Structure ◽

Search Engine Optimization ◽

Ranking Algorithms ◽

The Web

In this era of Web computing, Cyber Security is very important as more and more data is moving into the Web. Some data are confidential and important. There are many threats for the data in the Web. Some of the basic threats can be addressed by designing the Web sites properly using Search Engine Optimization techniques. One such threat is the hanging page which gives room for link spamming. This chapter addresses the issues caused by hanging pages in Web computing. This Chapter has four important objectives. They are 1) Compare and review the different types of link structure based ranking algorithms in ranking Web pages. PageRank is used as the base algorithm throughout this Chapter. 2) Study on hanging pages, explore the effects of hanging pages in Web security and compare the existing methods to handle hanging pages. 3) Study on Link spam and explore the effect of hanging pages in link spam contribution and 4) Study on Search Engine Optimization (SEO) / Web Site Optimization (WSO) and explore the effect of hanging pages in Search Engine Optimization (SEO).

Download Full-text

A Signal-Representation-Based Parser to Extract Text-Based Information from the Web

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2010.p0531 ◽

2010 ◽

Vol 14 (5) ◽

pp. 531-539

Author(s):

Mu-Chun Su ◽

◽

Shao-Jui Wang ◽

Chen-Ko Huang ◽

Pa-ChunWang ◽

...

Keyword(s):

Web Services ◽

World Wide ◽

Information Sources ◽

State Of The Art ◽

Value Added ◽

Web Pages ◽

Web Page ◽

Web Information ◽

The World ◽

The Web

Most of the dramatically increased amount of information available on the World Wide Web is provided via HTML and formatted for human browsing rather than for software programs. This situation calls for a tool that automatically extracts information from semistructured Web information sources, increasing the usefulness of value-added Web services. We present a signal-representation-based parser (SIRAP) that breaks Web pages up into logically coherent groups - groups of information related to an entity, for example. Templates for records with different tag structures are generated incrementally by a Histogram-Based Correlation Coefficient (HBCC) algorithm, then records on a Web page are detected efficiently using templates generated by matching. Hundreds of Web pages from 17 state-of-the-art search engines were used to demonstrate the feasibility of our approach.

Download Full-text

Web Algorithms for Information Retrieval

International Journal of Mobile Computing and Multimedia Communications ◽

10.4018/ijmcmc.2014010101 ◽

2014 ◽

Vol 6 (1) ◽

pp. 1-16

Author(s):

Bouchra Frikh ◽

Brahim Ouhbi

Keyword(s):

World Wide ◽

Information Dissemination ◽

Quality Information ◽

Web Pages ◽

Web Page ◽

Page Rank ◽

Ranking Algorithms ◽

Internet Users ◽

The World ◽

The Web

The World Wide Web has emerged to become the biggest and most popular way of communication and information dissemination. Every day, the Web is expending and people generally rely on search engine to explore the web. Because of its rapid and chaotic growth, the resulting network of information lacks of organization and structure. It is a challenge for service provider to provide proper, relevant and quality information to the internet users by using the web page contents and hyperlinks between web pages. This paper deals with analysis and comparison of web pages ranking algorithms based on various parameters to find out their advantages and limitations for ranking web pages and to give the further scope of research in web pages ranking algorithms. Six important algorithms: the Page Rank, Query Dependent-PageRank, HITS, SALSA, Simultaneous Terms Query Dependent-PageRank (SQD-PageRank) and Onto-SQD-PageRank are presented and their performances are discussed.

Download Full-text

Interactive Proxy for URL Correction

Issues of Human Computer Interaction ◽

10.4018/978-1-59140-191-9.ch005 ◽

2011 ◽

pp. 72-84

Author(s):

Kai-Hsiang Yang

Keyword(s):

Web Sites ◽

World Wide ◽

Web Pages ◽

Web Content ◽

Web Page ◽

Proxy Servers ◽

Personal Favorite ◽

The World ◽

Cache Miss ◽

The Web

This chapter will address the issues of Uniform Resource Locator (URL) correction techniques in proxy servers. The proxy servers are more and more important in the World Wide Web (WWW), and they provide Web page caches for browsing the Web pages quickly, and also reduce unnecessary network traffic. Traditional proxy servers use the URL to identify their cache, and it is a cache-miss when the request URL is non-existent in its caches. However, for general users, there must be some regularity and scope in browsing the Web. It would be very convenient for users when they do not need to enter the whole long URL, or if they still could see the Web content even though they forgot some part of the URL, especially for those personal favorite Web sites. We will introduce one URL correction mechanism into the personal proxy server to achieve this goal.

Download Full-text

Template-Based Delta Compression of Large Scale Web Pages

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.347-350.2666 ◽

2013 ◽

Vol 347-350 ◽

pp. 2666-2672

Author(s):

Kai Lei ◽

Guang Yu Sun ◽

Lian En Huang

Keyword(s):

Control Systems ◽

Large Scale ◽

World Wide ◽

Web Pages ◽

Web Page ◽

Clustering Techniques ◽

The World ◽

Version Control Systems ◽

Delta Compression ◽

The Web

Delta compression techniques are commonly used in the context of version control systems and the World Wide Web. They are used to compactly encode the differences between two files or strings in order to reduce communication or storage costs. In this paper, we study the use of delta compression in compressing massive web pages according to the similarity of their templates. We propose a framework for template-based delta compression which uses template-based clustering techniques to find the web pages that have similar templates and then encode their differences with delta compression techniques to reduce the storage cost. We also propose a filter-based optimization of Diff algorithm to improve the efficiency of the delta compression approach. To demonstrate the efficiency of our approach, we present experimental results on massive web pages. Our experiments show that template-based delta compression achieves significant improvements in compression ratio as compared to individually compressing each web page.

Download Full-text

DeDuSERP: De-duplication in search engine result page

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.8.10475 ◽

2018 ◽

Vol 7 (2.8) ◽

pp. 427

Author(s):

Naresh Sharma ◽

Priti Dimri

Keyword(s):

Search Engine ◽

Service Provision ◽

Web Pages ◽

Similar Data ◽

Web Page ◽

Web Searches ◽

Result Page ◽

String Comparison ◽

Search Engine Result ◽

The Web

Web offers a new way of service provision by arranging different resources over the web. The most critical and prominent is web searches. The purpose of this research is to identify a subtype of De-Duplication. DeDuSERP is de-duplication in search engine result page. It restricts the showcasing of urls with duplicate or similar data and hence enhances the search result experience of any client. By duplicate results we mean different links containing the same content or information. To solve this problem, we have designed a filter between Search engine result page and indexed-ranked pages which we get from the search engine in response to the query of the searcher. This filter eliminates the duplicate links idiosyncratically and displays the unique results on the SERP for the searcher. We have performed the string to string comparison of web pages and if the content is 90% similar then we adjudge them as duplicates and then check their inventiveness of these duplicate links on the basis of timestamp. By this we mean then the web page crawled earlier is original. The process of comparison and timestamp matching is done using an open source apache API Commons IO 2.4.

Download Full-text

A Simulation of the Structure of the World-Wide Web

Sociological Research Online ◽

10.5153/sro.684 ◽

2002 ◽

Vol 7 (1) ◽

pp. 9-25 ◽

Cited By ~ 2

Author(s):

Moses Boudourides ◽

Gerasimos Antypas

Keyword(s):

World Wide Web ◽

Power Law ◽

Web Sites ◽

World Wide ◽

The Internet ◽

Web Pages ◽

Small Worlds ◽

Web Page ◽

Simple Simulation ◽

The World

In this paper we are presenting a simple simulation of the Internet World-Wide Web, where one observes the appearance of web pages belonging to different web sites, covering a number of different thematic topics and possessing links to other web pages. The goal of our simulation is to reproduce the form of the observed World-Wide Web and of its growth, using a small number of simple assumptions. In our simulation, existing web pages may generate new ones as follows: First, each web page is equipped with a topic concerning its contents. Second, links between web pages are established according to common topics. Next, new web pages may be randomly generated and subsequently they might be equipped with a topic and be assigned to web sites. By repeated iterations of these rules, our simulation appears to exhibit the observed structure of the World-Wide Web and, in particular, a power law type of growth. In order to visualise the network of web pages, we have followed N. Gilbert's (1997) methodology of scientometric simulation, assuming that web pages can be represented by points in the plane. Furthermore, the simulated graph is found to possess the property of small worlds, as it is the case with a large number of other complex networks.

Download Full-text

PREDICTION OF DESIGN ASPECTS OF WEB PAGE BY HTML PARSER

International Journal of Engineering Technologies and Management Research ◽

10.29121/ijetmr.v5.i2.2018.157 ◽

2020 ◽

Vol 5 (2) ◽

pp. 143-158

Author(s):

Satinder Kaur ◽

Sunil Gupta

Keyword(s):

User Interaction ◽

Web Pages ◽

Evaluation Tool ◽

Web Page ◽

Design Quality ◽

Automated Evaluation ◽

The World ◽

Before And After ◽

The Web

Inform plays a very important role in life and nowadays, the world largely depends on the World Wide Web to obtain any information. Web comprises of a lot of websites of every discipline, whereas websites consists of web pages which are interlinked with each other with the help of hyperlinks. The success of a website largely depends on the design aspects of the web pages. Researchers have done a lot of work to appraise the web pages quantitatively. Keeping in mind the importance of the design aspects of a web page, this paper aims at the design of an automated evaluation tool which evaluate the aspects for any web page. The tool takes the HTML code of the web page as input, and then it extracts and checks the HTML tags for the uniformity. The tool comprises of normalized modules which quantify the measures of design aspects. For realization, the tool has been applied on four web pages of distinct sites and design aspects have been reported for comparison. The tool will have various advantages for web developers who can predict the design quality of web pages and enhance it before and after implementation of website without user interaction.

Download Full-text

Googling the Web

Poems That Solve Puzzles ◽

10.1093/oso/9780198853732.003.0008 ◽

2020 ◽

pp. 143-158

Author(s):

Chris Bleakley

Keyword(s):

Web Sites ◽

World Wide ◽

Search Algorithm ◽

Internet Usage ◽

Web Pages ◽

The World ◽

Set Up ◽

Recommender Algorithm ◽

A Company ◽

The Web

Chapter 8 explores the arrival of the World Wide Web, Amazon, and Google. The web allows users to display “pages” of information retrieved from remote computers by means of the Internet. Inventor Tim Berners-Lee released the first web software for free, setting in motion an explosion in Internet usage. Seeing the opportunity of a lifetime, Jeff Bezos set-up Amazon as an online bookstore. Amazon’s success was accelerated by a product recommender algorithm that selectively targets advertising at users. By the mid-1990s there were so many web sites that users often couldn’t find what they were looking for. Stanford PhD student Larry Page invented an algorithm for ranking search results based on the importance and relevance of web pages. Page and fellow student, Sergey Brin, established a company to bring their search algorithm to the world. Page and Brin - the founders of Google - are now worth US$35-40 billion, each.

Download Full-text