RDF Keyword Search by Query Computation

Keyword searches based on the keywords-to-SPARQL translation is attracting more attention because of a growing number of excellent SPARQL search engines. Current approaches for keyword search based on the keywords-to-SPARQL translation suffer from returning incomplete answers or wrong answers due to a lack of underlying schema information. To overcome these difficulties, in this article, we propose a new keyword search paradigm by translating keyword queries into SPARQL queries for exploring RDF data. An inter-entity relationship summary with complete schema information is distilled from the RDF data graph for composing SPARQL queries. To avoid potentially wasteful summary graph expansion, we develop a new search prioritization scheme by combining the degree of a vertex with the distance from the original keyword element. Starting from the ordered priority list that is built in advance, we apply the forward path index to faster find the top-k subgraphs, which are relevant to the conjunction of the entering keywords. The experimental results show that our approach is efficient and scalable.

Download Full-text

RDF keyword search using multiple indexes

Filomat ◽

10.2298/fil1805861l ◽

2018 ◽

Vol 32 (5) ◽

pp. 1861-1873 ◽

Cited By ~ 2

Author(s):

Xiaoqing Lin ◽

Fu Zhang ◽

Danling Wang ◽

Jingwei Cheng

Keyword(s):

Dynamic Programming ◽

Keyword Search ◽

Dynamic Programming Algorithm ◽

Programming Algorithm ◽

Path Index ◽

Shortest Distance ◽

Data Graph ◽

Rdf Data ◽

Entity Relationship ◽

Entity Relationships

Since SPARQL has been the standard language for querying RDF data, keyword search based on keywords-to-SPARQL translation attracts more intention. However, existing keyword search based on keywords-to-SPARQL translation have limitations that the schema used for keyword-to-SPARQL translation is incomplete so that wrong or incomplete answers are returned and advantages of indexes are not fully taken. To address the issues, an inter-entity relationship summary (ER-summary) is constructed by distilling all the inter-entity relationships of RDF data graph. On ER-summary, we draw circles around each vertex with a given radius r and in the circles we build the shortest property path index (SP-index), the shortest distance index (SD-index) and the r-neighborhoods index by using dynamic programming algorithm. Rather than searching for top-k subgraphs connecting all the keywords centered directly as most existing methods do, we use these indexes to translate keyword queries into SPARQL queries to realize exchanging space for time. Extensive experiments show that our approach is efficient and effective.

Download Full-text

Machine Learning of Motor Vehicle Accident Categories from Narrative Data

Methods of Information in Medicine ◽

10.1055/s-0038-1634680 ◽

1996 ◽

Vol 35 (04/05) ◽

pp. 309-316 ◽

Cited By ~ 4

Author(s):

M. R. Lehto ◽

G. S. Sorock

Keyword(s):

Machine Learning ◽

Bayesian Model ◽

Keyword Search ◽

Motor Vehicle ◽

Motor Vehicle Accident ◽

Computer Search ◽

Vehicle Accident ◽

Learning Technique ◽

Expert Ratings ◽

Keyword Searches

Abstract:Bayesian inferencing as a machine learning technique was evaluated for identifying pre-crash activity and crash type from accident narratives describing 3,686 motor vehicle crashes. It was hypothesized that a Bayesian model could learn from a computer search for 63 keywords related to accident categories. Learning was described in terms of the ability to accurately classify previously unclassifiable narratives not containing the original keywords. When narratives contained keywords, the results obtained using both the Bayesian model and keyword search corresponded closely to expert ratings (P(detection)≥0.9, and P(false positive)≤0.05). For narratives not containing keywords, when the threshold used by the Bayesian model was varied between p>0.5 and p>0.9, the overall probability of detecting a category assigned by the expert varied between 67% and 12%. False positives correspondingly varied between 32% and 3%. These latter results demonstrated that the Bayesian system learned from the results of the keyword searches.

Download Full-text

A Linear and Monotonic Strategy to Keyword Search over RDF Data

Lecture Notes in Computer Science - Web Engineering ◽

10.1007/978-3-642-39200-9_28 ◽

2013 ◽

pp. 338-353 ◽

Cited By ~ 5

Author(s):

Roberto De Virgilio ◽

Antonio Maccioni ◽

Paolo Cappellari

Keyword(s):

Keyword Search ◽

Rdf Data

Download Full-text

Artificial intelligence-based conversational agent to support medication prescribing

JAMIA Open ◽

10.1093/jamiaopen/ooaa009 ◽

2020 ◽

Vol 3 (2) ◽

pp. 225-232 ◽

Cited By ~ 1

Author(s):

Anita M Preininger ◽

Brett South ◽

Jeff Heiland ◽

Adam Buchold ◽

Mya Baca ◽

...

Keyword(s):

Artificial Intelligence ◽

Natural Language ◽

Subject Matter ◽

System Architecture ◽

Keyword Search ◽

Conversational Agent ◽

Subject Matter Experts ◽

Rater Agreement ◽

And Performance ◽

Keyword Searches

Abstract Objective This article describes the system architecture, training, initial use, and performance of Watson Assistant (WA), an artificial intelligence-based conversational agent, accessible within Micromedex®. Materials and methods The number and frequency of intents (target of a user’s query) triggered in WA during its initial use were examined; intents triggered over 9 months were compared to the frequency of topics accessed via keyword search of Micromedex. Accuracy of WA intents assigned to 400 queries was compared to assignments by 2 independent subject matter experts (SMEs), with inter-rater reliability measured by Cohen’s kappa. Results In over 126 000 conversations with WA, intents most frequently triggered involved dosing (N = 30 239, 23.9%) and administration (N = 14 520, 11.5%). SMEs with substantial inter-rater agreement (kappa = 0.71) agreed with intent mapping in 247 of 400 queries (62%), including 16 queries related to content that WA and SMEs agreed was unavailable in WA. SMEs found 57 (14%) of 400 queries incorrectly mapped by WA; 112 (28%) queries unanswerable by WA included queries that were either ambiguous, contained unrecognized typographical errors, or addressed topics unavailable to WA. Of the queries answerable by WA (288), SMEs determined 231 (80%) were correctly linked to an intent. Discussion A conversational agent successfully linked most queries to intents in Micromedex. Ongoing system training seeks to widen the scope of WA and improve matching capabilities. Conclusion WA enabled Micromedex users to obtain answers to many medication-related questions using natural language, with the conversational agent facilitating mapping to a broader distribution of topics than standard keyword searches.

Download Full-text

A Model for Estimating the Savings from Dimensional vs. Keyword Search

Advances in Database Research - Advanced Principles for Improving Database Design, Systems Modeling, and Software Development ◽

10.4018/978-1-60566-172-8.ch009 ◽

2009 ◽

pp. 146-157

Author(s):

Karen Corral ◽

David Schuff ◽

Robert D. St. Louis ◽

Ozgur Turetken

Keyword(s):

Search Engine ◽

Search Strategy ◽

Keyword Search ◽

Cost Effective ◽

Quantitative Model ◽

Total Cost ◽

Search Approach ◽

The Cost ◽

Keyword Searches ◽

A Company

Inefficient and ineffective search is widely recognized as a problem for businesses. The shortcomings of keyword searches have been elaborated upon by many authors, and many enhancements to keyword searches have been proposed. To date, however, no one has provided a quantitative model or systematic process for evaluating the savings that accrue from enhanced search procedures. This paper presents a model for estimating the total cost to a company of relying on keyword searches versus a dimensional search approach. The model is based on the Zipf-Mandelbrot law in quantitative linguistics. Our analysis of the model shows that a surprisingly small number of searches are required to justify the cost associated with encoding the metadata necessary to support a dimensional search engine. The results imply that it is cost effective for almost any business organization to implement a dimensional search strategy.

Download Full-text

RDF data retrieval in structured format using aggregate function and keyword search in MashQL

Third International Conference on Computational Intelligence and Information Technology (CIIT 2013) ◽

10.1049/cp.2013.2604 ◽

2013 ◽

Author(s):

S. Medhe ◽

D.A. Phalke

Keyword(s):

Keyword Search ◽

Data Retrieval ◽

Aggregate Function ◽

Rdf Data

Download Full-text

Personalizing Keyword Search on RDF Data

Research and Advanced Technology for Digital Libraries - Lecture Notes in Computer Science ◽

10.1007/978-3-642-40501-3_27 ◽

2013 ◽

pp. 272-278 ◽

Cited By ~ 4

Author(s):

Giorgos Giannopoulos ◽

Evmorfia Biliri ◽

Timos Sellis

Keyword(s):

Keyword Search ◽

Rdf Data

Download Full-text

Effective keyword search on graph data using limited root redundancy of answer trees

International Journal of Web Information Systems ◽

10.1108/ijwis-10-2017-0070 ◽

2018 ◽

Vol 14 (3) ◽

pp. 299-316 ◽

Cited By ~ 2

Author(s):

Chang-Sup Park

Keyword(s):

Query Processing ◽

Keyword Search ◽

Information Need ◽

Keyword Query ◽

Content Type ◽

Graph Data ◽

Distinct Root ◽

Path Index ◽

Complex Graph ◽

Efficient Query Processing

Purpose This paper aims to propose a new keyword search method on graph data to improve the relevance of search results and reduce duplication of content nodes in the answer trees obtained by previous approaches based on distinct root semantics. The previous approaches are restricted to find answer trees having different root nodes and thus often generate a result consisting of answer trees with low relevance to the query or duplicate content nodes. The method allows limited redundancy in the root nodes of top-k answer trees to produce more effective query results. Design/methodology/approach A measure for redundancy in a set of answer trees regarding their root nodes is defined, and according to the metric, a set of answer trees with limited root redundancy is proposed for the result of a keyword query on graph data. For efficient query processing, an index on the useful paths in the graph using inverted lists and a hash map is suggested. Then, based on the path index, a top-k query processing algorithm is presented to find most relevant and diverse answer trees given a maximum amount of root redundancy allowed for a set of answer trees. Findings The results of experiments using real graph datasets show that the proposed approach can produce effective query answers which are more diverse in the content nodes and more relevant to the query than the previous approach based on distinct root semantics. Originality/value This paper first takes redundancy in the root nodes of answer trees into account to improve the relevance and content nodes redundancy of query results over the previous distinct root semantics. It can satisfy the users’ various information need on a large and complex graph data using a keyword-based query.

Download Full-text

Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data

2009 IEEE 25th International Conference on Data Engineering ◽

10.1109/icde.2009.119 ◽

2009 ◽

Cited By ~ 149

Author(s):

Thanh Tran ◽

Haofen Wang ◽

Sebastian Rudolph ◽

Philipp Cimiano

Keyword(s):

Keyword Search ◽

Rdf Data

Download Full-text

Supporting Biomimetic Design Through Categorization of Natural-Language Keyword-Search Results

Volume 8: 14th Design for Manufacturing and the Life Cycle Conference; 6th Symposium on International Design and Design Education; 21st International Conference on Design Theory and Methodology, Parts A and B ◽

10.1115/detc2009-86681 ◽

2009 ◽

Cited By ~ 4

Author(s):

Ji Ke ◽

J. S. Wallace ◽

L. H. Shu

Keyword(s):

Fuel Cell ◽

Natural Language ◽

Keyword Search ◽

Bipolar Plate ◽

Biological Information ◽

Good Source ◽

Search Results ◽

Biological Phenomena ◽

Keyword Searches

Biology is a good source of analogies for engineering design. One approach of retrieving biological analogies is to perform keyword searches on natural-language sources such as books, journals, etc. A challenge of retrieving information from natural-language sources is the potential requirement to process a large number of search results. This paper describes a categorization method that organizes a large group of diverse biological information into meaningful categories. The benefits of the categorization functionality are demonstrated through a case study on the redesign of a fuel cell bipolar plate. In this case study, our categorization method reduced the effort to systematically identify biological phenomena by up to ∼80%.

Download Full-text