RDF Keyword Search by Query Computation

2018 ◽  
Vol 29 (4) ◽  
pp. 1-27 ◽  
Author(s):  
Zongmin Ma ◽  
Xiaoqing Lin ◽  
Li Yan ◽  
Zhen Zhao

Keyword searches based on the keywords-to-SPARQL translation is attracting more attention because of a growing number of excellent SPARQL search engines. Current approaches for keyword search based on the keywords-to-SPARQL translation suffer from returning incomplete answers or wrong answers due to a lack of underlying schema information. To overcome these difficulties, in this article, we propose a new keyword search paradigm by translating keyword queries into SPARQL queries for exploring RDF data. An inter-entity relationship summary with complete schema information is distilled from the RDF data graph for composing SPARQL queries. To avoid potentially wasteful summary graph expansion, we develop a new search prioritization scheme by combining the degree of a vertex with the distance from the original keyword element. Starting from the ordered priority list that is built in advance, we apply the forward path index to faster find the top-k subgraphs, which are relevant to the conjunction of the entering keywords. The experimental results show that our approach is efficient and scalable.

Filomat ◽  
2018 ◽  
Vol 32 (5) ◽  
pp. 1861-1873 ◽  
Author(s):  
Xiaoqing Lin ◽  
Fu Zhang ◽  
Danling Wang ◽  
Jingwei Cheng

Since SPARQL has been the standard language for querying RDF data, keyword search based on keywords-to-SPARQL translation attracts more intention. However, existing keyword search based on keywords-to-SPARQL translation have limitations that the schema used for keyword-to-SPARQL translation is incomplete so that wrong or incomplete answers are returned and advantages of indexes are not fully taken. To address the issues, an inter-entity relationship summary (ER-summary) is constructed by distilling all the inter-entity relationships of RDF data graph. On ER-summary, we draw circles around each vertex with a given radius r and in the circles we build the shortest property path index (SP-index), the shortest distance index (SD-index) and the r-neighborhoods index by using dynamic programming algorithm. Rather than searching for top-k subgraphs connecting all the keywords centered directly as most existing methods do, we use these indexes to translate keyword queries into SPARQL queries to realize exchanging space for time. Extensive experiments show that our approach is efficient and effective.


1996 ◽  
Vol 35 (04/05) ◽  
pp. 309-316 ◽  
Author(s):  
M. R. Lehto ◽  
G. S. Sorock

Abstract:Bayesian inferencing as a machine learning technique was evaluated for identifying pre-crash activity and crash type from accident narratives describing 3,686 motor vehicle crashes. It was hypothesized that a Bayesian model could learn from a computer search for 63 keywords related to accident categories. Learning was described in terms of the ability to accurately classify previously unclassifiable narratives not containing the original keywords. When narratives contained keywords, the results obtained using both the Bayesian model and keyword search corresponded closely to expert ratings (P(detection)≥0.9, and P(false positive)≤0.05). For narratives not containing keywords, when the threshold used by the Bayesian model was varied between p>0.5 and p>0.9, the overall probability of detecting a category assigned by the expert varied between 67% and 12%. False positives correspondingly varied between 32% and 3%. These latter results demonstrated that the Bayesian system learned from the results of the keyword searches.


Author(s):  
Roberto De Virgilio ◽  
Antonio Maccioni ◽  
Paolo Cappellari
Keyword(s):  

JAMIA Open ◽  
2020 ◽  
Vol 3 (2) ◽  
pp. 225-232 ◽  
Author(s):  
Anita M Preininger ◽  
Brett South ◽  
Jeff Heiland ◽  
Adam Buchold ◽  
Mya Baca ◽  
...  

Abstract Objective This article describes the system architecture, training, initial use, and performance of Watson Assistant (WA), an artificial intelligence-based conversational agent, accessible within Micromedex®. Materials and methods The number and frequency of intents (target of a user’s query) triggered in WA during its initial use were examined; intents triggered over 9 months were compared to the frequency of topics accessed via keyword search of Micromedex. Accuracy of WA intents assigned to 400 queries was compared to assignments by 2 independent subject matter experts (SMEs), with inter-rater reliability measured by Cohen’s kappa. Results In over 126 000 conversations with WA, intents most frequently triggered involved dosing (N = 30 239, 23.9%) and administration (N = 14 520, 11.5%). SMEs with substantial inter-rater agreement (kappa = 0.71) agreed with intent mapping in 247 of 400 queries (62%), including 16 queries related to content that WA and SMEs agreed was unavailable in WA. SMEs found 57 (14%) of 400 queries incorrectly mapped by WA; 112 (28%) queries unanswerable by WA included queries that were either ambiguous, contained unrecognized typographical errors, or addressed topics unavailable to WA. Of the queries answerable by WA (288), SMEs determined 231 (80%) were correctly linked to an intent. Discussion A conversational agent successfully linked most queries to intents in Micromedex. Ongoing system training seeks to widen the scope of WA and improve matching capabilities. Conclusion WA enabled Micromedex users to obtain answers to many medication-related questions using natural language, with the conversational agent facilitating mapping to a broader distribution of topics than standard keyword searches.


Author(s):  
Karen Corral ◽  
David Schuff ◽  
Robert D. St. Louis ◽  
Ozgur Turetken

Inefficient and ineffective search is widely recognized as a problem for businesses. The shortcomings of keyword searches have been elaborated upon by many authors, and many enhancements to keyword searches have been proposed. To date, however, no one has provided a quantitative model or systematic process for evaluating the savings that accrue from enhanced search procedures. This paper presents a model for estimating the total cost to a company of relying on keyword searches versus a dimensional search approach. The model is based on the Zipf-Mandelbrot law in quantitative linguistics. Our analysis of the model shows that a surprisingly small number of searches are required to justify the cost associated with encoding the metadata necessary to support a dimensional search engine. The results imply that it is cost effective for almost any business organization to implement a dimensional search strategy.


2018 ◽  
Vol 14 (3) ◽  
pp. 299-316 ◽  
Author(s):  
Chang-Sup Park

Purpose This paper aims to propose a new keyword search method on graph data to improve the relevance of search results and reduce duplication of content nodes in the answer trees obtained by previous approaches based on distinct root semantics. The previous approaches are restricted to find answer trees having different root nodes and thus often generate a result consisting of answer trees with low relevance to the query or duplicate content nodes. The method allows limited redundancy in the root nodes of top-k answer trees to produce more effective query results. Design/methodology/approach A measure for redundancy in a set of answer trees regarding their root nodes is defined, and according to the metric, a set of answer trees with limited root redundancy is proposed for the result of a keyword query on graph data. For efficient query processing, an index on the useful paths in the graph using inverted lists and a hash map is suggested. Then, based on the path index, a top-k query processing algorithm is presented to find most relevant and diverse answer trees given a maximum amount of root redundancy allowed for a set of answer trees. Findings The results of experiments using real graph datasets show that the proposed approach can produce effective query answers which are more diverse in the content nodes and more relevant to the query than the previous approach based on distinct root semantics. Originality/value This paper first takes redundancy in the root nodes of answer trees into account to improve the relevance and content nodes redundancy of query results over the previous distinct root semantics. It can satisfy the users’ various information need on a large and complex graph data using a keyword-based query.


Author(s):  
Ji Ke ◽  
J. S. Wallace ◽  
L. H. Shu

Biology is a good source of analogies for engineering design. One approach of retrieving biological analogies is to perform keyword searches on natural-language sources such as books, journals, etc. A challenge of retrieving information from natural-language sources is the potential requirement to process a large number of search results. This paper describes a categorization method that organizes a large group of diverse biological information into meaningful categories. The benefits of the categorization functionality are demonstrated through a case study on the redesign of a fuel cell bipolar plate. In this case study, our categorization method reduced the effort to systematically identify biological phenomena by up to ∼80%.


Sign in / Sign up

Export Citation Format

Share Document