YumiInt — A deep Web integration system for local search engines for Geo-referenced objects

Author(s):  
E. Dragut ◽  
B. P. Beirne ◽  
A. Neyestani ◽  
B. Atassi ◽  
C. Yu ◽  
...  
2013 ◽  
Vol 756-759 ◽  
pp. 1855-1859
Author(s):  
Meng Juan Li ◽  
Lian Yin Jia ◽  
Jin Guo You ◽  
Jia Man Ding ◽  
Hai He Zhou

Deep web data integration has become the center of many research efforts in the recent few years. Near duplicate detection is very important for deep web integration system, there are seldom researches focusing on integrating deep web Integration and near duplicate detection together. In this paper, we develop a integration system, DWI-ndfree to solve this problem. The wrapper of DWI-ndfree consists of four parts: the form filler, the navigator, the extractor and the near duplicate detector. To find near duplicate records, we propose efficient algorithm CheckNearDuplicate. DWI-ndfree can integrate deep web data with near duplicate free and has been used to execute several web extraction and integration tasks efficiently.


2010 ◽  
Vol 3 (1-2) ◽  
pp. 1613-1616 ◽  
Author(s):  
Thomas Kabisch ◽  
Eduard C. Dragut ◽  
Clement Yu ◽  
Ulf Leser
Keyword(s):  
Deep Web ◽  

Author(s):  
Punam Bedi ◽  
Neha Gupta ◽  
Vinita Jindal

The World Wide Web is a part of the Internet that provides data dissemination facility to people. The contents of the Web are crawled and indexed by search engines so that they can be retrieved, ranked, and displayed as a result of users' search queries. These contents that can be easily retrieved using Web browsers and search engines comprise the Surface Web. All information that cannot be crawled by search engines' crawlers falls under Deep Web. Deep Web content never appears in the results displayed by search engines. Though this part of the Web remains hidden, it can be reached using targeted search over normal Web browsers. Unlike Deep Web, there exists a portion of the World Wide Web that cannot be accessed without special software. This is known as the Dark Web. This chapter describes how the Dark Web differs from the Deep Web and elaborates on the commonly used software to enter the Dark Web. It highlights the illegitimate and legitimate sides of the Dark Web and specifies the role played by cryptocurrencies in the expansion of Dark Web's user base.


The Dark Web ◽  
2018 ◽  
pp. 359-374
Author(s):  
Dilip Kumar Sharma ◽  
A. K. Sharma

ICT plays a vital role in human development through information extraction and includes computer networks and telecommunication networks. One of the important modules of ICT is computer networks, which are the backbone of the World Wide Web (WWW). Search engines are computer programs that browse and extract information from the WWW in a systematic and automatic manner. This paper examines the three main components of search engines: Extractor, a web crawler which starts with a URL; Analyzer, an indexer that processes words on the web page and stores the resulting index in a database; and Interface Generator, a query handler that understands the need and preferences of the user. This paper concentrates on the information available on the surface web through general web pages and the hidden information behind the query interface, called deep web. This paper emphasizes the Extraction of relevant information to generate the preferred content for the user as the first result of his or her search query. This paper discusses the aspect of deep web with analysis of a few existing deep web search engines.


2021 ◽  
pp. 50-71
Author(s):  
Shakeel Ahmed ◽  
Shubham Sharma ◽  
Saneh Lata Yadav

Information retrieval is finding material of unstructured nature within large collections stored on computers. Surface web consists of indexed content accessible by traditional browsers whereas deep or hidden web content cannot be found with traditional search engines and requires a password or network permissions. In deep web, dark web is also growing as new tools make it easier to navigate hidden content and accessible with special software like Tor. According to a study by Nature, Google indexes no more than 16% of the surface web and misses all of the deep web. Any given search turns up just 0.03% of information that exists online. So, the key part of the hidden web remains inaccessible to the users. This chapter deals with positing some questions about this research. Detailed definitions, analogies are explained, and the chapter discusses related work and puts forward all the advantages and limitations of the existing work proposed by researchers. The chapter identifies the need for a system that will process the surface and hidden web data and return integrated results to the users.


Author(s):  
Dilip Kumar Sharma ◽  
A. K. Sharma

ICT plays a vital role in human development through information extraction and includes computer networks and telecommunication networks. One of the important modules of ICT is computer networks, which are the backbone of the World Wide Web (WWW). Search engines are computer programs that browse and extract information from the WWW in a systematic and automatic manner. This paper examines the three main components of search engines: Extractor, a web crawler which starts with a URL; Analyzer, an indexer that processes words on the web page and stores the resulting index in a database; and Interface Generator, a query handler that understands the need and preferences of the user. This paper concentrates on the information available on the surface web through general web pages and the hidden information behind the query interface, called deep web. This paper emphasizes the Extraction of relevant information to generate the preferred content for the user as the first result of his or her search query. This paper discusses the aspect of deep web with analysis of a few existing deep web search engines.


Sign in / Sign up

Export Citation Format

Share Document