scholarly journals KnoE: A Web Mining Tool to Validate Previously Discovered Semantic Correspondences

2017 ◽  
Author(s):  
Jorge Martinez-Gil ◽  
José F. Aldana-Montes

The problem of matching schemas or ontologies consists of providing corresponding entities in two or more knowledge models that belong to a same domain but have been developed separately. Nowadays there are a lot of techniques and tools for addressing this problem, however, the complex nature of the matching problem make existing solutions for real situations not fully satisfactory. The Google Similarity Distance has appeared recently. Its purpose is to mine knowledge from the Web using the Google search engine in order to semantically compare text expressions. Our work consists of developing a software application for validating results discovered by schema and ontology matching tools using the philosophy behind this distance. Moreover, we are interested in using not only Google, but other popular search engines with this similarity distance. The results reveal three main facts. Firstly, some web search engines can help us to validate semantic correspondences satisfactorily. Secondly there are significant differences among the web search engines. And thirdly the best results are obtained when using combinations of the web search engines that we have studied.

Author(s):  
B. J. Jansen ◽  
A. Spink

People are now confronted with the task of locating electronic information needed to address the issues of their daily lives. The Web is presently the major information source for many people in the U.S. (Cole, Suman, Schramm, Lunn, & Aquino, 2003), used more than newspapers, magazines, and television as a source of information. Americans are expanding their use of the Web for all sorts of information and commercial purposes (Horrigan, 2004; Horrigan & Rainie, 2002; National Telecommunications and Information Administration, 2002). Searching for information is one of the most popular Web activities, second only to the use of e-mail (Nielsen Media, 1997). However, successfully locating needed information remains a difficult and challenging task (Eastman & Jansen, 2003). Locating relevant information not only affects individuals but also commercial, educational, and governmental organizations. This is especially true in regards to people interacting with their governmental agencies. Executive Order 13011 (Clinton, 1996) directed the U.S. federal government to move aggressively with strategies to utilize the Internet. Birdsell and Muzzio (1999) present the growing presence of governmental Web sites, classifying them into three general categories, (1) provision of information, (2) delivery of forms, and (3) transactions. In 2004, 29% of American said they visited a government Web site to contact some governmental entity, 18% sent an e-mail and 22% use multiple means (Horrigan, 2004). It seems clear that the Web is a major conduit for accessing governmental information and maybe services. Search engines are the primary means for people to locate Web sites (Nielsen Media, 1997). Given the Web’s importance, we need to understand how Web search engines perform (Lawrence & Giles, 1998) and how people use and interact with Web search engines to locate governmental information. Examining Web searching for governmental information is an important area of research with the potential to increase our understanding of users of Web-based governmental information, advance our knowledge of Web searchers’ governmental information needs, and positively impact the design of Web search engines and sites that specialize in governmental information.


Author(s):  
Ali Shiri ◽  
Lydia Zvyagintseva

The purpose of this study is to examine the performance of dynamic query suggestion in three popular web search engines, namely Google, Yahoo! and Bing. Using the TREC Web Track topics, this study conducts a comparative examination of the number, type and variations in the query term suggestions provided by the Web search engines.Le but de cette étude est d'examiner la performance des suggestions de requête active dans trois moteurs de recherche Web populaires, à savoir Google, Yahoo! et Bing. En utilisant les thèmes proposés sur le TREC (Text Retrieval Conference), cette étude procède à un examen comparatif du nombre, du type et des variations dans les suggestions de termes de requête fournis par les moteurs de recherche Web.


Author(s):  
Jon Atle Gulla ◽  
Hans Olaf Borch ◽  
Jon Espen Ingvaldsen

Due to the large amount of information on the web and the difficulties of relating user’s expressed information needs to document content, large-scale web search engines tend to return thousands of ranked documents. This chapter discusses the use of clustering to help users navigate through the result sets and explore the domain. A newly developed system, HOBSearch, makes use of suffix tree clustering to overcome many of the weaknesses of traditional clustering approaches. Using result snippets rather than full documents, HOBSearch both speeds up clustering substantially and manages to tailor the clustering to the topics indicated in user’s query. An inherent problem with clustering, though, is the choice of cluster labels. Our experiments with HOBSearch show that cluster labels of an acceptable quality can be generated with no upervision or predefined structures and within the constraints given by large-scale web search.


Author(s):  
Rahul Pradhan ◽  
Dilip Kumar Sharma

Users issuing query on search engine, expect results to more relevant to query topic rather than just the textual match with text in query. Studies conducted by few researchers shows that user want the search engine to understand the implicit intent of query rather than looking the textual match in hypertext structure of document or web page. In this paper the authors will be addressing queries that have any temporal intent and help the web search engines to classify them in certain categories. These classes or categories will help search engine to understand and cater the need of query. The authors will consider temporal expression (e.g. 1943) in document and categories them on the basis of temporal boundary of that query. Their experiment classifies the query and tries to suggest further course of action for search engines. Results shows that classifying the query to these classes will help user to reach his/her seeking information faster.


Author(s):  
Shanfeng Zhu ◽  
Xiaotie Deng ◽  
Qizhi Fang ◽  
Weimin Zhang

Web search engines are one of the most popular services to help users find useful information on the Web. Although many studies have been carried out to estimate the size and overlap of the general web search engines, it may not benefit the ordinary web searching users, since they care more about the overlap of the top N (N=10, 20 or 50) search results on concrete queries, but not the overlap of the total index database. In this study, we present experimental results on the comparison of the overlap of the top N (N=10, 20 or 50) search results from AlltheWeb, Google, AltaVista and WiseNut for the 58 most popular queries, as well as for the distance of the overlapped results. These 58 queries are chosen from WordTracker service, which records the most popular queries submitted to some famous metasearch engines, such as MetaCrawler and Dogpile. We divide these 58 queries into three categories for further investigation. Through in-depth study, we observe a number of interesting results: the overlap of the top N results retrieved by different search engines is very small; the search results of the queries in different categories behave in dramatically different ways; Google, on average, has the highest overlap among these four search engines; each search engine tends to adopt a different rank algorithm independently.


Author(s):  
Rajeev Gupta ◽  
Virender Singh

Purpose: With the popularity and remarkable usage of digital images in various domains, the existing image retrieval techniques need to be enhanced. The content-based image retrieval is playing a vital role to retrieve the requested data from the database available in cyberspace. CBIR from cyberspace is a popular and interesting research area nowadays for a better outcome. The searching and downloading of the requested images accurately based on meta-data from the cyberspace by using CBIR techniques is a challenging task. The purpose of this study is to explore the various image retrieval techniques for retrieving the data available in cyberspace.  Methodology: Whenever a user wishes to retrieve an image from the web, using present search engines, a bunch of images is retrieved based on a user query. But, most of the resultant images are unrelated to the user query. Here, the user puts their text-based query in the web-based search engine and compute the related images and retrieval time. Main Findings:  This study compares the accuracy and retrieval-time of the requested image. After the detailed analysis, the main finding is none of the used web-search engines viz. Flickr, Pixabay, Shutterstock, Bing, Everypixel, retrieved the accurate related images based on the entered query.   Implications: This study is discussing and performs a comparative analysis of various content-based image retrieval techniques from cyberspace. Novelty of Study: Research community has been making efforts towards efficient retrieval of useful images from the web but this problem has not been solved and it still prevails as an open research challenge. This study makes some efforts to resolve this research challenge and perform a comparative analysis of the outcome of various web-search engines.


2013 ◽  
Vol 23 (4) ◽  
pp. 823-837
Author(s):  
Ichiro Hofuku ◽  
Kunio Oshima

Abstract In web search engines, such as Google, the ranking of a particular keyword is determined by mathematical tools, e.g., Pagerank or Hits. However, as the size of the network increases, it becomes increasingly difficult to use keyword ranking to quickly find the information required by an individual user. One reason for this phenomenon is the interference of superfluous information with the link structure. The WorldWideWeb can be expressed as an enormous directed graph. The purpose of the present study is to provide tools for studying the web as a directed graph in order to find clues to the solution of the problem of interference from superfluous information, and to reform the directed graph to clarify the relationships between the nodes.


2013 ◽  
Vol 10 (9) ◽  
pp. 1969-1976
Author(s):  
Sathya Bama ◽  
M.S.Irfan Ahmed ◽  
A. Saravanan

The growth of internet is increasing continuously by which the need for improving the quality of services has been increased. Web mining is a research area which applies data mining techniques to address all this need. With billions of pages on the web it is very intricate task for the search engines to provide the relevant information to the users. Web structure mining plays a vital role by ranking the web pages based on user query which is the most essential attempt of the web search engines. PageRank, Weighted PageRank and HITS are the commonly used algorithm in web structure mining for ranking the web page. But all these algorithms treat all links equally when distributing initial rank scores. In this paper, an improved page rank algorithm is introduced. The result shows that the algorithm has better performance over PageRank algorithm.


Sign in / Sign up

Export Citation Format

Share Document