Classification of means and methods of the Web semantic retrieval

2017 ◽  
pp. 030-050
Author(s):  
J.V. Rogushina ◽  

Problems associated with the improve ment of information retrieval for open environment are considered and the need for it’s semantization is grounded. Thecurrent state and prospects of development of semantic search engines that are focused on the Web information resources processing are analysed, the criteria for the classification of such systems are reviewed. In this analysis the significant attention is paid to the semantic search use of ontologies that contain knowledge about the subject area and the search users. The sources of ontological knowledge and methods of their processing for the improvement of the search procedures are considered. Examples of semantic search systems that use structured query languages (eg, SPARQL), lists of keywords and queries in natural language are proposed. Such criteria for the classification of semantic search engines like architecture, coupling, transparency, user context, modification requests, ontology structure, etc. are considered. Different ways of support of semantic and otology based modification of user queries that improve the completeness and accuracy of the search are analyzed. On base of analysis of the properties of existing semantic search engines in terms of these criteria, the areas for further improvement of these systems are selected: the development of metasearch systems, semantic modification of user requests, the determination of an user-acceptable transparency level of the search procedures, flexibility of domain knowledge management tools, increasing productivity and scalability. In addition, the development of means of semantic Web search needs in use of some external knowledge base which contains knowledge about the domain of user information needs, and in providing the users with the ability to independent selection of knowledge that is used in the search process. There is necessary to take into account the history of user interaction with the retrieval system and the search context for personalization of the query results and their ordering in accordance with the user information needs. All these aspects were taken into account in the design and implementation of semantic search engine "MAIPS" that is based on an ontological model of users and resources cooperation into the Web.

Author(s):  
Jon Atle Gulla ◽  
Hans Olaf Borch ◽  
Jon Espen Ingvaldsen

Due to the large amount of information on the web and the difficulties of relating user’s expressed information needs to document content, large-scale web search engines tend to return thousands of ranked documents. This chapter discusses the use of clustering to help users navigate through the result sets and explore the domain. A newly developed system, HOBSearch, makes use of suffix tree clustering to overcome many of the weaknesses of traditional clustering approaches. Using result snippets rather than full documents, HOBSearch both speeds up clustering substantially and manages to tailor the clustering to the topics indicated in user’s query. An inherent problem with clustering, though, is the choice of cluster labels. Our experiments with HOBSearch show that cluster labels of an acceptable quality can be generated with no upervision or predefined structures and within the constraints given by large-scale web search.


Author(s):  
Abdelkrim Bouramoul

Users of Web search engines are generally confronted to numerous responses that are rarely structured, making it difficult to analyze the available results. Indeed, the linear results displayed through lists ordered according to a relevance criterion, although still widely used, seem often limitless. A solution to this problem is to improve the interfaces for better visualization of large number of results. In this paper, we propose modeling and implementation of a tool for graphical visualization and manipulation of results returned by search engines. The goal is to facilitate the analysis, the interpretation and the supervision of users' information needs. The architecture of the ‘Gravisor' tool is based on Multi-Agent paradigm. It is composed of four agents working in full cooperation and coordination. We hope that besides the web information retrieval field, the three graphical visualization modes offered by the ‘Gravisor' tool will be a promising alternative for better information visualization in other areas.


2011 ◽  
pp. 317-342 ◽  
Author(s):  
D. Beneventano

As the use of the World Wide Web has become increasingly widespread, the business of commercial search engines has become a vital and lucrative part of the Web. Search engines are common place tools for virtually every user of the Internet; and companies, such as Google and Yahoo!, have become household names. Semantic search engines try to augment and improve traditional Web Search Engines by using not just words, but concepts and logical relationships. In this chapter a relevant class of semantic search engines, based on a peer-to-peer, data integration mediator-based architecture is described. The architectural and functional features are presented with respect to two projects, SEWASIE and WISDOM, involving the authors. The methodology to create a two level ontology and query processing in the SEWASIE project are fully described.


Author(s):  
Shahid Kamal ◽  
Jamal Abdul Nasir ◽  
Zia Uddin ◽  
Bakhtiar Khan

The web search engines are perhaps the most significant of all software systems. True, the vast information needs of the users would not satisfy without a way to represent the data and provide the interface components. The fast growth of the information world and the internet saturation into all fields has increased the importance of the problem of information access. In pursuance of communication being made with a web search system, the user must express his or her needs in the form of queries. However, due to the lack of domain knowledge and the limitation of natural language such as synonym and polysemy, many system users cannot formulate their needs into effective queries. In this chapter, the authors attempt to give detailed description of ambiguity concept, search queries, and then disambiguation process with respect to different types. Disambiguating the search intent and improving the accuracy of resulting information is a crucial issue in the domain of web search systems, especially when most of users are unable to clearly express their information needs.


Author(s):  
R. Subhashini ◽  
V.Jawahar Senthil Kumar

The World Wide Web is a large distributed digital information space. The ability to search and retrieve information from the Web efficiently and effectively is an enabling technology for realizing its full potential. Information Retrieval (IR) plays an important role in search engines. Today’s most advanced engines use the keyword-based (“bag of words”) paradigm, which has inherent disadvantages. Organizing web search results into clusters facilitates the user’s quick browsing of search results. Traditional clustering techniques are inadequate because they do not generate clusters with highly readable names. This paper proposes an approach for web search results in clustering based on a phrase based clustering algorithm. It is an alternative to a single ordered result of search engines. This approach presents a list of clusters to the user. Experimental results verify the method’s feasibility and effectiveness.


Author(s):  
B. J. Jansen ◽  
A. Spink

People are now confronted with the task of locating electronic information needed to address the issues of their daily lives. The Web is presently the major information source for many people in the U.S. (Cole, Suman, Schramm, Lunn, & Aquino, 2003), used more than newspapers, magazines, and television as a source of information. Americans are expanding their use of the Web for all sorts of information and commercial purposes (Horrigan, 2004; Horrigan & Rainie, 2002; National Telecommunications and Information Administration, 2002). Searching for information is one of the most popular Web activities, second only to the use of e-mail (Nielsen Media, 1997). However, successfully locating needed information remains a difficult and challenging task (Eastman & Jansen, 2003). Locating relevant information not only affects individuals but also commercial, educational, and governmental organizations. This is especially true in regards to people interacting with their governmental agencies. Executive Order 13011 (Clinton, 1996) directed the U.S. federal government to move aggressively with strategies to utilize the Internet. Birdsell and Muzzio (1999) present the growing presence of governmental Web sites, classifying them into three general categories, (1) provision of information, (2) delivery of forms, and (3) transactions. In 2004, 29% of American said they visited a government Web site to contact some governmental entity, 18% sent an e-mail and 22% use multiple means (Horrigan, 2004). It seems clear that the Web is a major conduit for accessing governmental information and maybe services. Search engines are the primary means for people to locate Web sites (Nielsen Media, 1997). Given the Web’s importance, we need to understand how Web search engines perform (Lawrence & Giles, 1998) and how people use and interact with Web search engines to locate governmental information. Examining Web searching for governmental information is an important area of research with the potential to increase our understanding of users of Web-based governmental information, advance our knowledge of Web searchers’ governmental information needs, and positively impact the design of Web search engines and sites that specialize in governmental information.


Author(s):  
Denis Shestakov

Finding information on the Web using a web search engine is one of the primary activities of today’s web users. For a majority of users results returned by conventional search engines are an essentially complete set of links to all pages on the Web relevant to their queries. However, currentday searchers do not crawl and index a significant portion of the Web and, hence, web users relying on search engines only are unable to discover and access a large amount of information from the nonindexable part of the Web. Specifically, dynamic pages generated based on parameters provided by a user via web search forms are not indexed by search engines and cannot be found in searchers’ results. Such search interfaces provide web users with an online access to myriads of databases on the Web. In order to obtain some information from a web database of interest, a user issues his/her query by specifying query terms in a search form and receives the query results, a set of dynamic pages which embed required information from a database. At the same time, issuing a query via an arbitrary search interface is an extremely complex task for any kind of automatic agents including web crawlers, which, at least up to the present day, do not even attempt to pass through web forms on a large scale.


2018 ◽  
Vol 6 (3) ◽  
pp. 67-78
Author(s):  
Tian Nie ◽  
Yi Ding ◽  
Chen Zhao ◽  
Youchao Lin ◽  
Takehito Utsuro

The background of this article is the issue of how to overview the knowledge of a given query keyword. Especially, the authors focus on concerns of those who search for web pages with a given query keyword. The Web search information needs of a given query keyword is collected through search engine suggests. Given a query keyword, the authors collect up to around 1,000 suggests, while many of them are redundant. They classify redundant search engine suggests based on a topic model. However, one limitation of the topic model based classification of search engine suggests is that the granularity of the topics, i.e., the clusters of search engine suggests, is too coarse. In order to overcome the problem of the coarse-grained classification of search engine suggests, this article further applies the word embedding technique to the webpages used during the training of the topic model, in addition to the text data of the whole Japanese version of Wikipedia. Then, the authors examine the word embedding based similarity between search engines suggests and further classify search engine suggests within a single topic into finer-grained subtopics based on the similarity of word embeddings. Evaluation results prove that the proposed approach performs well in the task of subtopic classification of search engine suggests.


2015 ◽  
Vol 39 (2) ◽  
pp. 197-213 ◽  
Author(s):  
Ahmet Uyar ◽  
Farouk Musa Aliyu

Purpose – The purpose of this paper is to better understand three main aspects of semantic web search engines of Google Knowledge Graph and Bing Satori. The authors investigated: coverage of entity types, the extent of their support for list search services and the capabilities of their natural language query interfaces. Design/methodology/approach – The authors manually submitted selected queries to these two semantic web search engines and evaluated the returned results. To test the coverage of entity types, the authors selected the entity types from Freebase database. To test the capabilities of natural language query interfaces, the authors used a manually developed query data set about US geography. Findings – The results indicate that both semantic search engines cover only the very common entity types. In addition, the list search service is provided for a small percentage of entity types. Moreover, both search engines support queries with very limited complexity and with limited set of recognised terms. Research limitations/implications – Both companies are continually working to improve their semantic web search engines. Therefore, the findings show their capabilities at the time of conducting this research. Practical implications – The results show that in the near future the authors can expect both semantic search engines to expand their entity databases and improve their natural language interfaces. Originality/value – As far as the authors know, this is the first study evaluating any aspect of newly developing semantic web search engines. It shows the current capabilities and limitations of these semantic web search engines. It provides directions to researchers by pointing out the main problems for semantic web search engines.


Sign in / Sign up

Export Citation Format

Share Document