Understanding the Concept of Different Types of Web Crawling and Its Implementation

Web crawling is the method in which the topics and information is browsed in the world wide web and then it is stored in big storing device from where it can be accessed by the user as per his need. This paper will explain the use of web crawling in digital world and how does it make difference for the search engine. There are a variety of web crawling available which is explained in brief in this paper. Web crawler has many advantages over other traditional methods of searching information online. Many tools are made available which supports web crawling and makes the process easy.

Download Full-text

Design of a Migrating Crawler Based on a Novel URL Scheduling Mechanism using AHP

International Journal of Rough Sets and Data Analysis ◽

10.4018/ijrsda.2017010106 ◽

2017 ◽

Vol 4 (1) ◽

pp. 95-110 ◽

Cited By ~ 2

Author(s):

Deepika Punj ◽

Ashutosh Dixit

Keyword(s):

World Wide Web ◽

Significant Role ◽

Crucial Role ◽

World Wide ◽

Web Crawler ◽

Redundancy Elimination ◽

The World ◽

Meta Information ◽

Elimination Mechanism ◽

The Web

In order to manage the vast information available on web, crawler plays a significant role. The working of crawler should be optimized to get maximum and unique information from the World Wide Web. In this paper, architecture of migrating crawler is proposed which is based on URL ordering, URL scheduling and document redundancy elimination mechanism. The proposed ordering technique is based on URL structure, which plays a crucial role in utilizing the web efficiently. Scheduling ensures that URLs should go to optimum agent for downloading. To ensure this, characteristics of both agents and URLs are taken into consideration for scheduling. Duplicate documents are also removed to make the database unique. To reduce matching time, document matching is made on the basis of their Meta information only. The agents of proposed migrating crawler work more efficiently than traditional single crawler by providing ordering and scheduling of URLs.

Download Full-text

Searching for Information on the World Wide Web with a Search Engine: A Pilot Study on Cognitive Flexibility in Younger and Older Users

Psychological Reports ◽

10.2466/pr0.106.2.490-498 ◽

2010 ◽

Vol 106 (2) ◽

pp. 490-498 ◽

Cited By ~ 5

Author(s):

Aurelie Dommes ◽

Aline Chevalier ◽

Marilyne Rossetti

Keyword(s):

Pilot Study ◽

World Wide Web ◽

Search Engine ◽

Cognitive Flexibility ◽

World Wide ◽

Search Method ◽

Web Searches ◽

Younger Adults ◽

Age Related ◽

The World

This pilot study investigated the age-related differences in searching for information on the World Wide Web with a search engine. 11 older adults (6 men, 5 women; M age = 59 yr., SD = 2.76, range = 55–65 yr.) and 12 younger adults (2 men, 10 women; M = 23.7 yr., SD = 1.07, range = 22–25 yr.) had to conduct six searches differing in complexity, and for which a search method was or was not induced. The results showed that the younger and older participants provided with an induced search method were less flexible than the others and produced fewer new keywords. Moreover, older participants took longer than the younger adults, especially in the complex searches. The younger participants were flexible in the first request and spontaneously produced new keywords (spontaneous flexibility), whereas the older participants only produced new keywords when confronted by impasses (reactive flexibility). Aging may influence web searches, especially the nature of keywords used.

Download Full-text

Deconstructing Google Dataset Search

10.31229/osf.io/9vjqa ◽

2019 ◽

Author(s):

Adrienne Canino

Keyword(s):

World Wide Web ◽

Search Engine ◽

World Wide ◽

Research Data ◽

The Internet ◽

Research Information ◽

The World ◽

Pros And Cons ◽

Information Landscape ◽

Web Developers

This essay examines the beta tool from Google, Google Dataset Search. The Google Dataset Search, announced in September 2018, is a search engine specific to finding research data published on the internet. The structure and methods of the search engine are examined, as well as the methods Google recommends to web developers to make it an effective tool across the World Wide Web. The column concludes with a discussion of the pros and cons of this tool in the research information landscape.

Download Full-text

Searching Bioinformatics Information Strategies for Effective Use of Search Engine

Biomedical Engineering ◽

10.4018/978-1-5225-3158-6.ch033 ◽

2018 ◽

pp. 742-748

Author(s):

Viveka Vardhan Jumpala

Keyword(s):

World Wide Web ◽

Search Engine ◽

Search Engines ◽

World Wide ◽

The Internet ◽

Information Strategies ◽

The World ◽

Search For Information ◽

Effective Use ◽

The Web

The Internet, which is an information super high way, has practically compressed the world into a cyber colony through various networks and other Internets. The development of the Internet and the emergence of the World Wide Web (WWW) as common vehicle for communication and instantaneous access to search engines and databases. Search Engine is designed to facilitate search for information on the WWW. Search Engines are essentially the tools that help in finding required information on the web quickly in an organized manner. Different search engines do the same job in different ways thus giving different results for the same query. Search Strategies are the new trend on the Web.

Download Full-text

Analysis on Web Crawling Algorithms

International Journal on Recent and Innovation Trends in Computing and Communication ◽

10.17762/ijritcc.v6i12.5216 ◽

2018 ◽

Vol 6 (12) ◽

pp. 33-36

Author(s):

Deepak Mayal

Keyword(s):

World Wide Web ◽

World Wide ◽

Web Pages ◽

Web Crawling ◽

Web Crawler ◽

Search System ◽

Web Crawlers ◽

Source Of Information ◽

Search Information ◽

The Web

World Wide Web (WWW)also referred to as web acts as a vital source of information and searching over the web has become so much easy nowadays all thanks to search engines google, yahoo etc. A search engine is basically a complex multiprogram that allows user to search information available on the web and for that purpose, they use web crawlers. Web crawler systematically browses the world wide web. Effective search helps in avoiding downloading and visiting irrelevant web pages on the web in order to do that web crawlers use different searching algorithm . This paper reviews different web crawling algorithm that determines the fate of the search system.

Download Full-text

Web-Based Child Pornography

International Journal of Digital Crime and Forensics ◽

10.4018/jdcf.2009062405 ◽

2009 ◽

Vol 1 (4) ◽

pp. 58-69 ◽

Cited By ~ 7

Author(s):

Chad M.S. Steel

Keyword(s):

World Wide Web ◽

Search Engine ◽

Search Engines ◽

World Wide ◽

Child Pornography ◽

Peer To Peer ◽

Peer Networks ◽

Web Based ◽

Peer To Peer Networks ◽

The World

While the supply of child pornography through the World Wide Web has been frequently speculated upon, the demand has not adequately been explored. Quantification and qualification of the demand provides forensic examiners a behavioral basis for determining the sophistication of individual seeking child pornography. Additionally, the research assists an examiner in searching for and presenting the evidence of child pornography browsing. The overall search engine demand for child pornography is bounded as being between .19 and .49%, depending on the inclusion of ambiguous phrases, with the top search for child pornography being “lolita bbs”. Unlike peer-to-peer networks, however, the top child pornography related query ranks only as the 198th most popular query overall. The queries on search engines appear to be decreasing as well, and the techniques employed are becoming less reliant direct links to content.

Download Full-text

Indexing the World Wide Web

Next Generation Search Engines ◽

10.4018/978-1-4666-0330-1.ch001 ◽

2012 ◽

pp. 1-28 ◽

Cited By ~ 2

Author(s):

Abhishek Das ◽

Ankit Jain

Keyword(s):

World Wide Web ◽

Search Engine ◽

Data Structures ◽

World Wide ◽

Web Search ◽

Distributed Processing ◽

Data Partitioning ◽

Information Organization ◽

Trade Offs ◽

The World

In this chapter, the authors describe the key indexing components of today’s web search engines. As the World Wide Web has grown, the systems and methods for indexing have changed significantly. The authors present the data structures used, the features extracted, the infrastructure needed, and the options available for designing a brand new search engine. Techniques are highlighted that improve relevance of results, discuss trade-offs to best utilize machine resources, and cover distributed processing concepts in this context. In particular, the authors delve into the topics of indexing phrases instead of terms, storage in memory vs. on disk, and data partitioning. Some thoughts on information organization for the newly emerging data-forms conclude the chapter.

Download Full-text

SEEC

Services and Business Computing Solutions with XML ◽

10.4018/978-1-60566-330-2.ch004 ◽

2010 ◽

pp. 57-81

Author(s):

Kamal Taha ◽

Ramez Elmasri

Keyword(s):

World Wide Web ◽

Search Engine ◽

Marked Improvement ◽

World Wide ◽

Query Language ◽

Search Techniques ◽

The World ◽

Xml Search Engine

With the emergence of the World Wide Web, business’ databases are increasingly being queried directly by customers. The customers may not be aware of the exact structure of the underlying data, and might have never learned a query language that enables them to issue structured queries. Some of the employees who query the databases may also not be aware of the structure of the data, but they are likely to be aware of some labels of elements containing the data. There is a need for a dual search engine that accommodates both business employees and customers. We propose in this chapter an XML search engine called SEEC, which accepts Keyword-Based queries (which can be used for answering customers’ queries) and Loosely Structured queries (which can be used for answering employees’ queries). We proposed previously a stand-alone Loosely Structured search engine called OOXSearch (Taha & Elmasri, 2007). SEEC integrates OOXSearch with a Keyword-Based search engine and uses novel search techniques. It is built on top of an XQuery search engine (Katz, 2005). SEEC was evaluated experimentally and compared with three recently proposed systems: XSEarch (Cohen & Mamou & Sagiv, 2003), Schema Free XQuery (Li & Yu & Jagadish, 2004), and XKSearch (Xu & Papakonstantinou, 2005). The results showed marked improvement.

Download Full-text

Changing Our Minds: Legal History Meets the World Wide Web

Law and History Review ◽

10.2307/744018 ◽

1999 ◽

Vol 17 (2) ◽

pp. 385-387 ◽

Cited By ~ 1

Author(s):

Bernard J. Hibbitts

Keyword(s):

Twentieth Century ◽

World Wide Web ◽

Legal History ◽

World Wide ◽

New Technology ◽

Traditional Methods ◽

The World ◽

Ambivalent Relationship

Legal historians have had an ambivalent relationship with new technology. As students and spokespersons of the somewhat-stodgy legal past, our sympathies have predictably been with traditional methods of doing things rather than with the latest and greatest devices of our own age. In the twentieth century we have tended to champion writing and books more than radio, television, and computers. Today we may use new tools to help us create our scholarship and even to help us teach, but like most of our academic colleagues in law and in history we generally employ those tools as extensions of established media instead of exploiting their potential to deploy information and develop ideas in new ways.

Download Full-text

Database querying on the World Wide Web: UniGuide, an object-relational search engine for Australian universities

Computer Networks and ISDN Systems ◽

10.1016/s0169-7552(98)00080-4 ◽

1998 ◽

Vol 30 (1-7) ◽

pp. 567-572

Author(s):

Carlos F. Enguix

Keyword(s):

World Wide Web ◽

Search Engine ◽

World Wide ◽

Database Querying ◽

The World ◽

Object Relational ◽

Australian Universities

Download Full-text