scholarly journals Understanding the Concept of Different Types of Web Crawling and Its Implementation

Author(s):  
Palika Jajoo

Web crawling is the method in which the topics and information is browsed in the world wide web and then it is stored in big storing device from where it can be accessed by the user as per his need. This paper will explain the use of web crawling in digital world and how does it make difference for the search engine. There are a variety of web crawling available which is explained in brief in this paper. Web crawler has many advantages over other traditional methods of searching information online. Many tools are made available which supports web crawling and makes the process easy.

2017 ◽  
Vol 4 (1) ◽  
pp. 95-110 ◽  
Author(s):  
Deepika Punj ◽  
Ashutosh Dixit

In order to manage the vast information available on web, crawler plays a significant role. The working of crawler should be optimized to get maximum and unique information from the World Wide Web. In this paper, architecture of migrating crawler is proposed which is based on URL ordering, URL scheduling and document redundancy elimination mechanism. The proposed ordering technique is based on URL structure, which plays a crucial role in utilizing the web efficiently. Scheduling ensures that URLs should go to optimum agent for downloading. To ensure this, characteristics of both agents and URLs are taken into consideration for scheduling. Duplicate documents are also removed to make the database unique. To reduce matching time, document matching is made on the basis of their Meta information only. The agents of proposed migrating crawler work more efficiently than traditional single crawler by providing ordering and scheduling of URLs.


2010 ◽  
Vol 106 (2) ◽  
pp. 490-498 ◽  
Author(s):  
Aurelie Dommes ◽  
Aline Chevalier ◽  
Marilyne Rossetti

This pilot study investigated the age-related differences in searching for information on the World Wide Web with a search engine. 11 older adults (6 men, 5 women; M age = 59 yr., SD = 2.76, range = 55–65 yr.) and 12 younger adults (2 men, 10 women; M = 23.7 yr., SD = 1.07, range = 22–25 yr.) had to conduct six searches differing in complexity, and for which a search method was or was not induced. The results showed that the younger and older participants provided with an induced search method were less flexible than the others and produced fewer new keywords. Moreover, older participants took longer than the younger adults, especially in the complex searches. The younger participants were flexible in the first request and spontaneously produced new keywords (spontaneous flexibility), whereas the older participants only produced new keywords when confronted by impasses (reactive flexibility). Aging may influence web searches, especially the nature of keywords used.


2019 ◽  
Author(s):  
Adrienne Canino

This essay examines the beta tool from Google, Google Dataset Search. The Google Dataset Search, announced in September 2018, is a search engine specific to finding research data published on the internet. The structure and methods of the search engine are examined, as well as the methods Google recommends to web developers to make it an effective tool across the World Wide Web. The column concludes with a discussion of the pros and cons of this tool in the research information landscape.


2018 ◽  
pp. 742-748
Author(s):  
Viveka Vardhan Jumpala

The Internet, which is an information super high way, has practically compressed the world into a cyber colony through various networks and other Internets. The development of the Internet and the emergence of the World Wide Web (WWW) as common vehicle for communication and instantaneous access to search engines and databases. Search Engine is designed to facilitate search for information on the WWW. Search Engines are essentially the tools that help in finding required information on the web quickly in an organized manner. Different search engines do the same job in different ways thus giving different results for the same query. Search Strategies are the new trend on the Web.


Author(s):  
Deepak Mayal

World Wide Web (WWW)also referred to as web acts as a vital source of information and searching over the web has become so much easy nowadays all thanks to search engines google, yahoo etc. A search engine is basically a complex multiprogram that allows user to search information available on the web and for that purpose, they use web crawlers. Web crawler systematically browses the world wide web. Effective search helps in avoiding downloading and visiting irrelevant web pages on the web in order to do that web crawlers use different searching algorithm . This paper reviews different web crawling algorithm that determines the fate of the search system.


2009 ◽  
Vol 1 (4) ◽  
pp. 58-69 ◽  
Author(s):  
Chad M.S. Steel

While the supply of child pornography through the World Wide Web has been frequently speculated upon, the demand has not adequately been explored. Quantification and qualification of the demand provides forensic examiners a behavioral basis for determining the sophistication of individual seeking child pornography. Additionally, the research assists an examiner in searching for and presenting the evidence of child pornography browsing. The overall search engine demand for child pornography is bounded as being between .19 and .49%, depending on the inclusion of ambiguous phrases, with the top search for child pornography being “lolita bbs”. Unlike peer-to-peer networks, however, the top child pornography related query ranks only as the 198th most popular query overall. The queries on search engines appear to be decreasing as well, and the techniques employed are becoming less reliant direct links to content.


Author(s):  
Abhishek Das ◽  
Ankit Jain

In this chapter, the authors describe the key indexing components of today’s web search engines. As the World Wide Web has grown, the systems and methods for indexing have changed significantly. The authors present the data structures used, the features extracted, the infrastructure needed, and the options available for designing a brand new search engine. Techniques are highlighted that improve relevance of results, discuss trade-offs to best utilize machine resources, and cover distributed processing concepts in this context. In particular, the authors delve into the topics of indexing phrases instead of terms, storage in memory vs. on disk, and data partitioning. Some thoughts on information organization for the newly emerging data-forms conclude the chapter.


Author(s):  
Kamal Taha ◽  
Ramez Elmasri

With the emergence of the World Wide Web, business’ databases are increasingly being queried directly by customers. The customers may not be aware of the exact structure of the underlying data, and might have never learned a query language that enables them to issue structured queries. Some of the employees who query the databases may also not be aware of the structure of the data, but they are likely to be aware of some labels of elements containing the data. There is a need for a dual search engine that accommodates both business employees and customers. We propose in this chapter an XML search engine called SEEC, which accepts Keyword-Based queries (which can be used for answering customers’ queries) and Loosely Structured queries (which can be used for answering employees’ queries). We proposed previously a stand-alone Loosely Structured search engine called OOXSearch (Taha & Elmasri, 2007). SEEC integrates OOXSearch with a Keyword-Based search engine and uses novel search techniques. It is built on top of an XQuery search engine (Katz, 2005). SEEC was evaluated experimentally and compared with three recently proposed systems: XSEarch (Cohen & Mamou & Sagiv, 2003), Schema Free XQuery (Li & Yu & Jagadish, 2004), and XKSearch (Xu & Papakonstantinou, 2005). The results showed marked improvement.


1999 ◽  
Vol 17 (2) ◽  
pp. 385-387 ◽  
Author(s):  
Bernard J. Hibbitts

Legal historians have had an ambivalent relationship with new technology. As students and spokespersons of the somewhat-stodgy legal past, our sympathies have predictably been with traditional methods of doing things rather than with the latest and greatest devices of our own age. In the twentieth century we have tended to champion writing and books more than radio, television, and computers. Today we may use new tools to help us create our scholarship and even to help us teach, but like most of our academic colleagues in law and in history we generally employ those tools as extensions of established media instead of exploiting their potential to deploy information and develop ideas in new ways.


Sign in / Sign up

Export Citation Format

Share Document