vertical search engine
Recently Published Documents


TOTAL DOCUMENTS

57
(FIVE YEARS 4)

H-INDEX

3
(FIVE YEARS 0)

2021 ◽  
pp. 338-356
Author(s):  
Tarfah Alrashed ◽  
Dimitris Paparas ◽  
Omar Benjelloun ◽  
Ying Sheng ◽  
Natasha Noy

AbstractSemantic markup, such as , allows providers on the Web to describe content using a shared controlled vocabulary. This markup is invaluable in enabling a broad range of applications, from vertical search engines, to rich snippets in search results, to actions on emails, to many others. In this paper, we focus on semantic markup for datasets, specifically in the context of developing a vertical search engine for datasets on the Web, Google’s Dataset Search. Dataset Search relies on to identify pages that describe datasets. While was the core enabling technology for this vertical search, we also discovered that we need to address the following problem: pages from 61% of internet hosts that provide markup do not actually describe datasets. We analyze the veracity of dataset markup for Dataset Search’s Web-scale corpus and categorize pages where this markup is not reliable. We then propose a way to drastically increase the quality of the dataset metadata corpus by developing a deep neural-network classifier that identifies whether or not a page with markup is a dataset page. Our classifier achieves 96.7% recall at the 95% precision point. This level of precision enables Dataset Search to circumvent the noise in semantic markup and to use the metadata to provide high quality results to users.


Information ◽  
2019 ◽  
Vol 10 (6) ◽  
pp. 200
Author(s):  
Adrian Alexandrescu

This paper presents the processing steps needed in order to have a fully functional vertical search engine. Four actions are identified (i.e., retrieval, extraction, presentation, and delivery) and are required to crawl websites, get the product information from the retrieved webpages, process that data, and offer the end-user the possibility of looking for various products. The whole application flow is focused on low resource usage, and especially on the delivery action, which consists of a web application that uses cloud resources and is optimized for cost efficiency. Novel methods for representing the crawl and extraction template, for product index optimizations, and for deploying and storing data in the cloud database are identified and explained. In addition, key aspects are discussed regarding ethics and security in the proposed solution. A practical use-case scenario is also presented, where products are extracted from seven online board and card game retailers. Finally, the potential of the proposed solution is discussed in terms of researching new methods for improving various aspects of the proposed solution in order to increase cost efficiency and scalability.


The Dark Web ◽  
2018 ◽  
pp. 319-333
Author(s):  
Sudhakar Ranjan ◽  
Komal Kumar Bhatia

Now days with the advent of internet technologies and ecommerce the need for smart search engine for human life is rising. The traditional search engines are not intelligent as well as smart and thus lead to the rise in searching costs. In this paper, architecture of a vertical search engine based on the domain specific hidden web crawler is proposed. To make a least cost vertical search engine improvement in the following techniques like: searching, indexing, ranking, transaction and query interface are suggested. The domain term analyzer filters the useless information to the maximum extent and finally provides the users with high precision information. Through the experimental result it is shown that the system works on accelerating the access, computation, storage, communication time, increased efficiency and work professionally.


2018 ◽  
Vol 176 ◽  
pp. 03014
Author(s):  
Yaru Cao ◽  
Ning Ma ◽  
Fucheng Wan ◽  
Xiangzhen He

Based on the research of vertical search engine and cross-language information retrieval, a crosslanguage vertical search engine design for e-commerce platform is proposed. It aims to solve the problem that it is difficult for Internet users to quickly, efficiently, and comprehensively search for valuable products, especially ethnic minority netizens. Cross-language in this article mainly refers to the conversion of Chinese, English, and Tibetan. Using dictionary-based query translation method to translate query words to achieve cross-language function. Improved Heritrix designed a web crawler information collection method. Using HtmlParser to achieve structured information extraction, and using Lucene to build an index and achieve retrieval.


Sign in / Sign up

Export Citation Format

Share Document