A Novel Approach on Focused Crawling With Anchor Text

A novel approach with focused crawling for various anchor texts is discussed in this paper. Most of the search engines search the web with the anchor text to retrieve the relevant pages and answer the queries given by the users. The crawler usually searches the web pages and filters the unnecessary pages which can be done through focused crawling. A focused crawler generates its boundary to crawl the relevant pages based on the link and ignores the irrelevant pages on the web. In this paper, an effective focused crawling method is implemented to improve the quality of the search. Here, three learning phases are considered namely, content-based, link-based and sibling-based learning are undergone to improve the navigation of the search. In this approach, the crawler crawls through the relevant pages efficiently and more relevant pages are retrieved in an effective way. It is proved experimentally that more number of relevant pages are retrieved for different anchor texts with three learning phases using focused crawling.

Download Full-text

PREDICTION OF DESIGN ASPECTS OF WEB PAGE BY HTML PARSER

International Journal of Engineering Technologies and Management Research ◽

10.29121/ijetmr.v5.i2.2018.157 ◽

2020 ◽

Vol 5 (2) ◽

pp. 143-158

Author(s):

Satinder Kaur ◽

Sunil Gupta

Keyword(s):

User Interaction ◽

Web Pages ◽

Evaluation Tool ◽

Web Page ◽

Design Quality ◽

Automated Evaluation ◽

The World ◽

Before And After ◽

The Web

Inform plays a very important role in life and nowadays, the world largely depends on the World Wide Web to obtain any information. Web comprises of a lot of websites of every discipline, whereas websites consists of web pages which are interlinked with each other with the help of hyperlinks. The success of a website largely depends on the design aspects of the web pages. Researchers have done a lot of work to appraise the web pages quantitatively. Keeping in mind the importance of the design aspects of a web page, this paper aims at the design of an automated evaluation tool which evaluate the aspects for any web page. The tool takes the HTML code of the web page as input, and then it extracts and checks the HTML tags for the uniformity. The tool comprises of normalized modules which quantify the measures of design aspects. For realization, the tool has been applied on four web pages of distinct sites and design aspects have been reported for comparison. The tool will have various advantages for web developers who can predict the design quality of web pages and enhance it before and after implementation of website without user interaction.

Download Full-text

Distributed and collaborative Web Change Detection system

Computer Science and Information Systems ◽

10.2298/csis131120081p ◽

2015 ◽

Vol 12 (1) ◽

pp. 91-114 ◽

Cited By ~ 7

Author(s):

Víctor Prieto ◽

Manuel Álvarez ◽

Víctor Carneiro ◽

Fidel Cacheda

Keyword(s):

Change Detection ◽

Search Engines ◽

Web Site ◽

Detection System ◽

Computational Cost ◽

Web Pages ◽

Web Page ◽

Case Scenario ◽

Worst Case ◽

The Web

Search engines use crawlers to traverse the Web in order to download web pages and build their indexes. Maintaining these indexes up-to-date is an essential task to ensure the quality of search results. However, changes in web pages are unpredictable. Identifying the moment when a web page changes as soon as possible and with minimal computational cost is a major challenge. In this article we present the Web Change Detection system that, in a best case scenario, is capable to detect, almost in real time, when a web page changes. In a worst case scenario, it will require, on average, 12 minutes to detect a change on a low PageRank web site and about one minute on a web site with high PageRank. Meanwhile, current search engines require more than a day, on average, to detect a modification in a web page (in both cases).

Download Full-text

Webometric Analysis and Link Relationship of Computer Science and Information Technology Websites of Malaysia Universities

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2020.8798 ◽

2020 ◽

Vol 17 (2) ◽

pp. 1260-1265

Author(s):

Mohd Sharul Hafiz Razak ◽

Nor Azman Ismail ◽

Alif Fikri Mohktar ◽

Su Elya Namira ◽

Nurina Izzati Ramzi

Keyword(s):

Information Technology ◽

Computer Science ◽

Rank Correlation ◽

Web Pages ◽

Web Content ◽

Content Strategy ◽

Relationship Of ◽

The Relationship ◽

The Web

This paper aims to investigate 18 web domains of computer science and information technology academic websites of Malaysia universities.We collected more than two million web pages. A webometric analysis was used to explore the number of web pages, inbound links, the web impact factor (WIF) and link relationships. The results show Fakulti Teknologi dan Sains Maklumat (FTSM), Universiti Kebangsaan Malaysia (UKM) has the highest number of webpages while Fakulti Teknologi Kreatif dan Warisan (FTKW), Universiti Malaysia Kelantan (UMK) has the largest WIF score. Pearson’s rank correlation coefficient was used to detect the relationship between institutions subdomain age and WIF. Correlations point out that there is scant relationship between subdomain age and WIF score across all 18 Malaysia selected schools [r =−.076, n = 18, p < .0005]. This is due to WIF are highly dependent on the quality of the content to attract backlinks and Google crawler algorithm that changes from time to time for the number of web pages. Subdomain age is independent to the year of establishment of the schools. These findings can be used as a guide to the implementation of university web content strategy.

Download Full-text

The Canon of Dutch Literature According to Google

10.31235/osf.io/ewy27 ◽

2019 ◽

Author(s):

Lucas van der Deijl ◽

Antal van den Bosch ◽

Roel Smeets

Keyword(s):

Knowledge Base ◽

Search Engines ◽

Literary History ◽

Information Sources ◽

Search Algorithms ◽

Web Pages ◽

Printed Media ◽

Literary Histories ◽

Dutch Literature ◽

The Web

Literary history is no longer written in books alone. As literary reception thrives in blogs, Wikipedia entries, Amazon reviews, and Goodreads pro les, the Web has become a key platform for the exchange of information on literature. Al- though conventional printed media in the eld—academic monographs, literary supplements, and magazines—may still claim the highest authority, online me- dia presumably provide the rst (and possibly the only) source for many readers casually interested in literary history. Wikipedia o ers quick and free answers to readers’ questions and the range of topics described in its entries dramatically exceeds the volume any printed encyclopedia could possibly cover. While an important share of this expanding knowledge base about literature is produced bottom-up (user based and crowd-sourced), search engines such as Google have become brokers in this online economy of knowledge, organizing information on the Web for its users. Similar to the printed literary histories, search engines prioritize certain information sources over others when ranking and sorting Web pages; as such, their search algorithms create hierarchies of books, authors, and periods.

Download Full-text

Solving Accessibility and Other Problems in School and Classroom Web Sites

Rural Special Education Quarterly ◽

10.1177/875687050102000403 ◽

2001 ◽

Vol 20 (4) ◽

pp. 11-18 ◽

Cited By ~ 3

Author(s):

Cleborne D. Maddux

Keyword(s):

Special Needs ◽

Web Sites ◽

Rural Areas ◽

World Wide ◽

Special Educators ◽

Web Pages ◽

Children With Special Needs ◽

The World ◽

The Web

The Internet and the World Wide Web are growing at unprecedented rates. More and more teachers are authoring school or classroom web pages. Such pages have particular potential for use in rural areas by special educators, children with special needs, and the parents of children with special needs. The quality of many of these pages leaves much to be desired. All web pages, especially those authored by special educators should be accessible for people with disabilities. Many other problems complicate use of the web for all users, whether or not they have disabilities. By taking some simple steps, beginning webmasters can avoid these problems. This article discusses practical solutions to common accessibility problems and other problems seen commonly on the web.

Download Full-text

Valuation and analysis of drug addiction web sites / Criterios de valoración y análisis de sitios web sobre drogodependencias

Health and Addictions/Salud y Drogas ◽

10.21134/haaj.v4i1.138 ◽

2004 ◽

Vol 4 (1) ◽

Author(s):

David Carabantes Alarcón ◽

Carmen García Carrión ◽

Juan Vicente Beneit Montesinos

Keyword(s):

Drug Addiction ◽

Web Sites ◽

Drug Dependence ◽

Web Site ◽

Point Of View ◽

Web Pages ◽

Web Page ◽

Specific System ◽

The Web

La calidad en Internet tiene un gran valor, y más aún cuando se trata de una página web sobre salud como es un recurso sobre drogodependencias. El presente artículo recoge los estimadores y sistemas más destacados sobre calidad web para el desarrollo de un sistema específico de valoración de la calidad de recursos web sobre drogodependencias. Se ha realizado una prueba de viabilidad mediante el análisis de las principales páginas web sobre este tema (n=60), recogiendo la valoración, desde el punto de vista del usuario, de la calidad de los recursos. Se han detectado aspectos de mejora en cuanto a la exactitud y fiabilidad de la información, autoría, y desarrollo de descripciones y valoraciones de los enlaces externos. AbstractThe quality in Internet has a great value, and still more when is a web page on health like a resource of drug dependence. This paper contains the estimators and systems on quality in the web for the development of a specific system to value the quality of a web site about drug dependence. A test of viability by means of the analysis of the main web pages has been made on this subject, gathering the valuation from the point of view of the user of the quality of the resources. Aspects of improvement as the exactitude and reliability of the information, responsibility, and development of descriptions and valuations of the external links have been detected.

Download Full-text

A Novel Approach for Prefetching of Web Pages through Clustering of Web Users to Reduce the Web Latency

Advances in Intelligent Systems and Computing - Proceedings of International Conference on Advances in Computing ◽

10.1007/978-81-322-0740-5_119 ◽

2013 ◽

pp. 983-989 ◽

Cited By ~ 2

Author(s):

G. T. Raju ◽

M. V. Sudhamani

Keyword(s):

Web Pages ◽

Novel Approach ◽

Web Latency ◽

The Web

Download Full-text

Analysis of Web Pages Based the Changed Information and its’ Application in the Search Engine for one Web Site

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.303-306.2311 ◽

2013 ◽

Vol 303-306 ◽

pp. 2311-2316

Author(s):

Hong Shen Liu ◽

Peng Fei Wang

Keyword(s):

Search Engine ◽

Search Engines ◽

Web Site ◽

New Method ◽

Web Pages ◽

Web Crawler ◽

The Core ◽

Core Technology ◽

The Web

The structures and contents of researching search engines are presented and the core technology is the analysis technology of web pages. The characteristic of analyzing web pages in one website is studied, relations between the web pages web crawler gained at two times are able to be obtained and the changed information among them are found easily. A new method of analyzing web pages in one website is introduced and the method analyzes web pages with the changed information of web pages. The result of applying the method shows that the new method is effective in the analysis of web pages.

Download Full-text

QoS Based Ranking For Web Search

10.32920/ryerson.14653005 ◽

2021 ◽

Author(s):

Xiangyi Chen

Keyword(s):

Quality Of Service ◽

Search Engine ◽

Web Search ◽

Web Pages ◽

Ranking Algorithm ◽

Internet Connection ◽

Web Search Engine ◽

The One ◽

The Web

Text, link and usage information are the most commonly used sources in the ranking algorithm of a web search engine. In this thesis, we argue that the quality of the web pages such as the performance of the page delivery (e.g. reliability and response time) should also play an important role in ranking, especially for users with a slow Internet connection or mobile users. Based on this principle, if two pages have the same level of relevancy to a query, the one with a higher delivery quality (e.g. faster response) should be ranked higher. We define several important attributes for the Quality of Service (QoS) and explain how we rank the web pages based on these algorithms. In addition, while combining those QoS attributes, we have tested and compared different aggregation algorithms. The experiment results show that our proposed algorithms can promote the pages with a higher delivery quality to higher positions in the result list, which is beneficial to users to improve their overall experiences of using the search engine and QoS based re-ranking algorithm always gets the best performance.

Download Full-text

Web Crawler and Web Crawler Algorithms: A Perspective

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.e9362.069520 ◽

2020 ◽

Vol 9 (5) ◽

pp. 203-205

Keyword(s):

Search Engine ◽

Search Engines ◽

The Internet ◽

Web Pages ◽

Web Crawler ◽

Day By Day ◽

The Web

A web crawler is also called spider. For the intention of web indexing it automatically searches on the WWW. As the W3 is increasing day by day, globally the number of web pages grown massively. To make the search sociable for users, searching engine are mandatory. So to discover the particular data from the WWW search engines are operated. It would be almost challenging for mankind devoid of search engines to find anything from the web unless and until he identifies a particular URL address. A central depository of HTML documents in indexed form is sustained by every search Engine. Every time an operator gives the inquiry, searching is done at the database of indexed web pages. The size of a database of every search engine depends on the existing page on the internet. So to increase the proficiency of search engines, it is permitted to store only the most relevant and significant pages in the database.

Download Full-text