Ontology-Based Metasearch Engine in Electronics Area

Paper The goal of search engines is to return accurate and complete results. Satisfying concrete user information needs becomes more and more difficult because of inability in it complete explicit specification and short comes of keyword-based searching and indexing. General search engines have indexed millions of web resources and often return thousands of results to the user query (most of them often inadequate). To increase result’s precession, users sometimes choose search engines, specialized in searching concrete domain, personalized or semantic search. A grand variety of specialized search engines may be found (and used) in the internet, but no one may guarantee finding of existing in the web and needed for the concrete user resources. In this paper we present our research on building a meta-search engine that uses domain and user profile ontologies, as well as information (or metadata), directly extracted from web sites to improve search result quality. We state main requirements to the search engine for students, PHD students and scientists, propose a conceptual model and discuss approaches of it practical realization. Our prototype metasearch engine first perform interactive semantic query refinement and then, using refined query, it automatically generate several search queries, sends them to different digital libraries and web search engines, augments and ranks returned results, using ontologically represented domain and user metadata. For testing our model, we develop domain ontologies in the electronic domain. We will use ontological terminology representation to propose recommendations for query disambiguation, and to ensure knowledge for reranking the returned results. We also present some partial initial implementations query disambiguation strategies and testing results.

Download Full-text

Classification of means and methods of the Web semantic retrieval

PROBLEMS IN PROGRAMMING ◽

10.15407/pp2017.01.030 ◽

2017 ◽

pp. 030-050

Author(s):

J.V. Rogushina ◽

Keyword(s):

Search Engines ◽

Domain Knowledge ◽

Information Needs ◽

Web Search ◽

User Interaction ◽

Query Languages ◽

Semantic Search ◽

Semantic Retrieval ◽

The Web

Problems associated with the improve ment of information retrieval for open environment are considered and the need for it’s semantization is grounded. Thecurrent state and prospects of development of semantic search engines that are focused on the Web information resources processing are analysed, the criteria for the classification of such systems are reviewed. In this analysis the significant attention is paid to the semantic search use of ontologies that contain knowledge about the subject area and the search users. The sources of ontological knowledge and methods of their processing for the improvement of the search procedures are considered. Examples of semantic search systems that use structured query languages (eg, SPARQL), lists of keywords and queries in natural language are proposed. Such criteria for the classification of semantic search engines like architecture, coupling, transparency, user context, modification requests, ontology structure, etc. are considered. Different ways of support of semantic and otology based modification of user queries that improve the completeness and accuracy of the search are analyzed. On base of analysis of the properties of existing semantic search engines in terms of these criteria, the areas for further improvement of these systems are selected: the development of metasearch systems, semantic modification of user requests, the determination of an user-acceptable transparency level of the search procedures, flexibility of domain knowledge management tools, increasing productivity and scalability. In addition, the development of means of semantic Web search needs in use of some external knowledge base which contains knowledge about the domain of user information needs, and in providing the users with the ability to independent selection of knowledge that is used in the search process. There is necessary to take into account the history of user interaction with the retrieval system and the search context for personalization of the query results and their ordering in accordance with the user information needs. All these aspects were taken into account in the design and implementation of semantic search engine "MAIPS" that is based on an ontological model of users and resources cooperation into the Web.

Download Full-text

The Matter of Chance: Auditing Web Search Results Related to the 2020 U.S. Presidential Primary Elections Across Six Search Engines

Social Science Computer Review ◽

10.1177/08944393211006863 ◽

2021 ◽

pp. 089443932110068

Author(s):

Aleksandra Urman ◽

Mykola Makhortykh ◽

Roberto Ulloa

Keyword(s):

Search Engine ◽

Search Engines ◽

Large Scale ◽

Web Search ◽

Primary Elections ◽

Virtual Agents ◽

Search Results ◽

Presidential Primary ◽

Large Scale Analysis ◽

Algorithmic Information

We examine how six search engines filter and rank information in relation to the queries on the U.S. 2020 presidential primary elections under the default—that is nonpersonalized—conditions. For that, we utilize an algorithmic auditing methodology that uses virtual agents to conduct large-scale analysis of algorithmic information curation in a controlled environment. Specifically, we look at the text search results for “us elections,” “donald trump,” “joe biden,” “bernie sanders” queries on Google, Baidu, Bing, DuckDuckGo, Yahoo, and Yandex, during the 2020 primaries. Our findings indicate substantial differences in the search results between search engines and multiple discrepancies within the results generated for different agents using the same search engine. It highlights that whether users see certain information is decided by chance due to the inherent randomization of search results. We also find that some search engines prioritize different categories of information sources with respect to specific candidates. These observations demonstrate that algorithmic curation of political information can create information inequalities between the search engine users even under nonpersonalized conditions. Such inequalities are particularly troubling considering that search results are highly trusted by the public and can shift the opinions of undecided voters as demonstrated by previous research.

Download Full-text

Associating Searching on Search Engines to Subsequent Searching on Sites

International Journal of Information Systems in the Service Sector ◽

10.4018/ijisss.2016040103 ◽

2016 ◽

Vol 8 (2) ◽

pp. 30-43

Author(s):

Adan Ortiz-Cordova ◽

Bernard J. Jansen

Keyword(s):

Search Engine ◽

Search Engines ◽

Web Search ◽

Research Study ◽

Search Queries ◽

Web Search Engine ◽

Search Patterns ◽

Search Information

In this research study, the authors investigate the association between external searching, which is searching on a web search engine, and internal searching, which is searching on a website. They classify 295,571 external – internal searches where each search is composed of a search engine query that is submitted to a web search engine and then one or more subsequent queries submitted to a commercial website by the same user. The authors examine 891,453 queries from all searches, of which 295,571 were external search queries and 595,882 were internal search queries. They algorithmically classify all queries into states, and then clustered the searching episodes into major searching configurations and identify the most commonly occurring search patterns for both external, internal, and external-to-internal searching episodes. The research implications of this study are that external sessions and internal sessions must be considered as part of a continuous search episode and that online businesses can leverage external search information to more effectively target potential consumers.

Download Full-text

An Intelligent Web Search Using Multi-Document Summarization

International Journal of Information Retrieval Research ◽

10.4018/ijirr.2016040103 ◽

2016 ◽

Vol 6 (2) ◽

pp. 41-65 ◽

Cited By ~ 2

Author(s):

Sheetal A. Takale ◽

Prakash J. Kulkarni ◽

Sahil K. Shah

Keyword(s):

Search Engine ◽

Web Search ◽

Document Clustering ◽

The Internet ◽

Web Pages ◽

Extractive Summarization ◽

Text Understanding ◽

User Query ◽

Sentence Clustering

Information available on the internet is huge, diverse and dynamic. Current Search Engine is doing the task of intelligent help to the users of the internet. For a query, it provides a listing of best matching or relevant web pages. However, information for the query is often spread across multiple pages which are returned by the search engine. This degrades the quality of search results. So, the search engines are drowning in information, but starving for knowledge. Here, we present a query focused extractive summarization of search engine results. We propose a two level summarization process: identification of relevant theme clusters, and selection of top ranking sentences to form summarized result for user query. A new approach to semantic similarity computation using semantic roles and semantic meaning is proposed. Document clustering is effectively achieved by application of MDL principle and sentence clustering and ranking is done by using SNMF. Experiments conducted demonstrate the effectiveness of system in semantic text understanding, document clustering and summarization.

Download Full-text

Web Search Engine Architectures and their Performance Analysis

Handbook of Research on Web Information Systems Quality ◽

10.4018/978-1-59904-847-5.ch028 ◽

2011 ◽

pp. 491-509

Author(s):

Xiannong Meng

Keyword(s):

Performance Analysis ◽

Search Engine ◽

Search Engines ◽

Web Search ◽

General Purpose ◽

Performance Measurements ◽

Web Documents ◽

System Architectures ◽

Web Search Engine ◽

And Performance

This chapter surveys various technologies involved in a Web search engine with an emphasis on performance analysis issues. The aspects of a general-purpose search engine covered in this survey include system architectures, information retrieval theories as the basis of Web search, indexing and ranking of Web documents, relevance feedback and machine learning, personalization, and performance measurements. The objectives of the chapter are to review the theories and technologies pertaining to Web search, and help us understand how Web search engines work and how to use the search engines more effectively and efficiently.

Download Full-text

How People Search for Governmental Information on the Web

Encyclopedia of Digital Government ◽

10.4018/978-1-59140-789-8.ch140 ◽

2011 ◽

pp. 933-939

Author(s):

B. J. Jansen ◽

A. Spink

Keyword(s):

Web Sites ◽

Search Engines ◽

Web Search ◽

Daily Lives ◽

Governmental Organizations ◽

Web Search Engines ◽

People Search ◽

E Mail ◽

The U.S ◽

The Web

People are now confronted with the task of locating electronic information needed to address the issues of their daily lives. The Web is presently the major information source for many people in the U.S. (Cole, Suman, Schramm, Lunn, & Aquino, 2003), used more than newspapers, magazines, and television as a source of information. Americans are expanding their use of the Web for all sorts of information and commercial purposes (Horrigan, 2004; Horrigan & Rainie, 2002; National Telecommunications and Information Administration, 2002). Searching for information is one of the most popular Web activities, second only to the use of e-mail (Nielsen Media, 1997). However, successfully locating needed information remains a difficult and challenging task (Eastman & Jansen, 2003). Locating relevant information not only affects individuals but also commercial, educational, and governmental organizations. This is especially true in regards to people interacting with their governmental agencies. Executive Order 13011 (Clinton, 1996) directed the U.S. federal government to move aggressively with strategies to utilize the Internet. Birdsell and Muzzio (1999) present the growing presence of governmental Web sites, classifying them into three general categories, (1) provision of information, (2) delivery of forms, and (3) transactions. In 2004, 29% of American said they visited a government Web site to contact some governmental entity, 18% sent an e-mail and 22% use multiple means (Horrigan, 2004). It seems clear that the Web is a major conduit for accessing governmental information and maybe services. Search engines are the primary means for people to locate Web sites (Nielsen Media, 1997). Given the Web’s importance, we need to understand how Web search engines perform (Lawrence & Giles, 1998) and how people use and interact with Web search engines to locate governmental information. Examining Web searching for governmental information is an important area of research with the potential to increase our understanding of users of Web-based governmental information, advance our knowledge of Web searchers’ governmental information needs, and positively impact the design of Web search engines and sites that specialize in governmental information.

Download Full-text

A Method of Subtopic Classification of Search Engine Suggests by Integrating a Topic Model and Word Embeddings

International Journal of Software Innovation ◽

10.4018/ijsi.2018070105 ◽

2018 ◽

Vol 6 (3) ◽

pp. 67-78

Author(s):

Tian Nie ◽

Yi Ding ◽

Chen Zhao ◽

Youchao Lin ◽

Takehito Utsuro

Keyword(s):

Search Engine ◽

Information Needs ◽

Web Search ◽

Topic Model ◽

Japanese Version ◽

Word Embedding ◽

Coarse Grained ◽

Web Pages ◽

Word Embeddings

The background of this article is the issue of how to overview the knowledge of a given query keyword. Especially, the authors focus on concerns of those who search for web pages with a given query keyword. The Web search information needs of a given query keyword is collected through search engine suggests. Given a query keyword, the authors collect up to around 1,000 suggests, while many of them are redundant. They classify redundant search engine suggests based on a topic model. However, one limitation of the topic model based classification of search engine suggests is that the granularity of the topics, i.e., the clusters of search engine suggests, is too coarse. In order to overcome the problem of the coarse-grained classification of search engine suggests, this article further applies the word embedding technique to the webpages used during the training of the topic model, in addition to the text data of the whole Japanese version of Wikipedia. Then, the authors examine the word embedding based similarity between search engines suggests and further classify search engine suggests within a single topic into finer-grained subtopics based on the similarity of word embeddings. Evaluation results prove that the proposed approach performs well in the task of subtopic classification of search engine suggests.

Download Full-text

WEBCONTENT VISUALIZER: A VISUALIZATION SYSTEM FOR SEARCH ENGINES IN SEMATIC WEB

International Journal of Information Technology & Decision Making ◽

10.1142/s0219622011004646 ◽

2011 ◽

Vol 10 (05) ◽

pp. 913-931 ◽

Cited By ~ 2

Author(s):

XIANYONG FANG ◽

CHRISTIAN JACQUEMIN ◽

FRÉDÉRIC VERNIER

Keyword(s):

Semantic Web ◽

Search Engine ◽

Search Engines ◽

Web Search ◽

Search Query ◽

Visualization System ◽

Xml Documents ◽

Web Search Engines ◽

New Generation

Since the results from Semantic Web search engines are highly structured XML documents, they cannot be efficiently visualized with traditional explorers. Therefore, the Semantic Web calls for a new generation of search query visualizers that can rely on document metadata. This paper introduces such a visualization system called WebContent Visualizer that is used to display and browse search engine results. The visualization is organized into three levels: (1) Carousels contain documents with the same ranking, (2) carousels are piled into stacks, one for each date, and (3) these stacks are organized along a meta-carousel to display the results for several dates. Carousel stacks are piles of local carousels with increasing radii to visualize the ranks of classes. For document comparison, colored links connect documents between neighboring classes on the basis of shared entities. Based on these techniques, the interface is made of three collaborative components: an inspector window, a visualization panel, and a detailed dialog component. With this architecture, the system is intended to offer an efficient way to explore the results returned by Semantic Web search engines.

Download Full-text

Rethinking gaming: The ethical work of optimization in web search engines

Social Studies of Science ◽

10.1177/0306312719865607 ◽

2019 ◽

Vol 49 (5) ◽

pp. 707-731 ◽

Cited By ~ 8

Author(s):

Malte Ziewitz

Keyword(s):

Search Engine ◽

Search Engines ◽

Web Search ◽

The United Kingdom ◽

Ethical Work ◽

Web Search Engines ◽

The One ◽

A Site ◽

Shifting Boundaries ◽

Precarious Situation

When measures come to matter, those measured find themselves in a precarious situation. On the one hand, they have a strong incentive to respond to measurement so as to score a favourable rating. On the other hand, too much of an adjustment runs the risk of being flagged and penalized by system operators as an attempt to ‘game the system’. Measures, the story goes, are most useful when they depict those measured as they usually are and not how they intend to be. In this article, I explore the practices and politics of optimization in the case of web search engines. Drawing on materials from ethnographic fieldwork with search engine optimization (SEO) consultants in the United Kingdom, I show how maximizing a website’s visibility in search results involves navigating the shifting boundaries between ‘good’ and ‘bad’ optimization. Specifically, I am interested in the ethical work performed as SEO consultants artfully arrange themselves to cope with moral ambiguities provoked and delegated by the operators of the search engine. Building on studies of ethics as a practical accomplishment, I suggest that the ethicality of optimization has itself become a site of governance and contestation. Studying such practices of ‘being ethical’ not only offers opportunities for rethinking popular tropes like ‘gaming the system’, but also draws attention to often-overlooked struggles for authority at the margins of contemporary ranking schemes.

Download Full-text

The Externalities of Search 2.0: The Emerging Privacy Threats when the Drive for the Perfect Search Engine meets Web 2.0

First Monday ◽

10.5210/fm.v13i3.2136 ◽

2008 ◽

Cited By ~ 31

Author(s):

Michael Zimmer

Keyword(s):

Web 2.0 ◽

Search Engine ◽

Search Engines ◽

Information Seeking ◽

Web Search ◽

Personal Information ◽

Informational Privacy ◽

Capture Process ◽

Privacy Threats ◽

Intellectual Activities

Web search engines have emerged as a ubiquitous and vital tool for the successful navigation of the growing online informational sphere. As Google puts it, the goal is to "organize the world's information and make it universally accessible and useful" and to create the "perfect search engine" that provides only intuitive, personalized, and relevant results. Meanwhile, the so-called Web 2.0 phenomenon has blossomed based, largely, on the faith in the power of the networked masses to capture, process, and mashup one's personal information flows in order to make them more useful, social, and meaningful. The (inevitable) combining of Google's suite of information-seeking products with Web 2.0 infrastructures -- what I call Search 2.0 -- intends to capture the best of both technical systems for the touted benefit of users. By capturing the information flowing across Web 2.0, search engines can better predict users' needs and wants, and deliver more relevant and meaningful results. While intended to enhance mobility in the online sphere, this paper argues that the drive for Search 2.0 necessarily requires the widespread monitoring and aggregation of a users' online personal and intellectual activities, bringing with it particular externalities, such as threats to informational privacy while online.

Download Full-text