Path-Oriented Keyword Search Query over RDF

Author(s):  
Roberto De Virgilio ◽  
Antonio Maccioni ◽  
Riccardo Torlone ◽  
Paolo Cappellari
Keyword(s):  
2015 ◽  
Vol 11 (1) ◽  
pp. 33-53
Author(s):  
Abubakar Roko ◽  
Shyamala Doraisamy ◽  
Azrul Hazri Jantan ◽  
Azreen Azman

Purpose – The purpose of this paper is to propose and evaluate XKQSS, a query structuring method that relegates the task of generating structured queries from a user to a search engine while retaining the simple keyword search query interface. A more effective way for searching XML database is to use structured queries. However, using query languages to express queries prove to be difficult for most users since this requires learning a query language and knowledge of the underlying data schema. On the other hand, the success of Web search engines has made many users to be familiar with keyword search and, therefore, they prefer to use a keyword search query interface to search XML data. Design/methodology/approach – Existing query structuring approaches require users to provide structural hints in their input keyword queries even though their interface is keyword base. Other problems with existing systems include their inability to put keyword query ambiguities into consideration during query structuring and how to select the best generated structure query that best represents a given keyword query. To address these problems, this study allows users to submit a schema independent keyword query, use named entity recognition (NER) to categorize query keywords to resolve query ambiguities and compute semantic information for a node from its data content. Algorithms were proposed that find user search intentions and convert the intentions into a set of ranked structured queries. Findings – Experiments with Sigmod and IMDB datasets were conducted to evaluate the effectiveness of the method. The experimental result shows that the XKQSS is about 20 per cent more effective than XReal in terms of return nodes identification, a state-of-art systems for XML retrieval. Originality/value – Existing systems do not take keyword query ambiguities into account. XKSS consists of two guidelines based on NER that help to resolve these ambiguities before converting the submitted query. It also include a ranking function computes a score for each generated query by using both semantic information and data statistic, as opposed to data statistic only approach used by the existing approaches.


2021 ◽  
Vol 3 (1) ◽  
pp. 263-283
Author(s):  
Callum Hughes ◽  
Maxim Filimonov ◽  
Alison Wray ◽  
Irena Spasić

Idioms are multi-word expressions whose meaning cannot always be deduced from the literal meaning of constituent words. A key feature of idioms that is central to this paper is their peculiar mixture of fixedness and variability, which poses challenges for their retrieval from large corpora using traditional search approaches. These challenges hinder insights into idiom usage, affecting users who are conducting linguistic research as well as those involved in language education. To facilitate access to idiom examples taken from real-world contexts, we introduce an information retrieval system designed specifically for idioms. Given a search query that represents an idiom, typically in its canonical form, the system expands it automatically to account for the most common types of idiom variation including inflection, open slots, adjectival or adverbial modification and passivisation. As a by-product of query expansion, other types of idiom variation captured include derivation, compounding, negation, distribution across multiple clauses as well as other unforeseen types of variation. The system was implemented on top of Elasticsearch, an open-source, distributed, scalable, real-time search engine. Flexible retrieval of idioms is supported by a combination of linguistic pre-processing of the search queries, their translation into a set of query clauses written in a query language called Query DSL, and analysis, an indexing process that involves tokenisation and normalisation. Our system outperformed the phrase search in terms of recall and outperformed the keyword search in terms of precision. Out of the three, our approach was found to provide the best balance between precision and recall. By providing a fast and easy way of finding idioms in large corpora, our approach can facilitate further developments in fields such as linguistics, language education and natural language processing.


Author(s):  
Paolo Cappellari ◽  
Roberto De Virgilio ◽  
Antonio Maccioni ◽  
Mark Roantree

2017 ◽  
Vol 6 (1) ◽  
pp. 1-16
Author(s):  
Pranav Murali

Search Engines use indexing techniques to minimize the time taken to find the relevant information to a search query. They maintain a keywords list that may reside either in the memory or in the external storage, like a hard disk. While a pure binary search can be used for this purpose, it suffers from performance issue when keywords are stored in the external storage. Some implementations of search engines use a B-tree and sparse indexes to reduce access time. This paper aims at reducing the keyword access time further. It presents a keyword search technique that utilizes a combination of trie data structure and a new keyword prefixing method. Experimental results show good improvement in performance over pure binary search. The merits of incorporating trie based approach into contemporary indexing methods is also discussed. Keyword prefixing method is described and some salient steps in the process of keyword generation are outlined.


1996 ◽  
Vol 35 (04/05) ◽  
pp. 309-316 ◽  
Author(s):  
M. R. Lehto ◽  
G. S. Sorock

Abstract:Bayesian inferencing as a machine learning technique was evaluated for identifying pre-crash activity and crash type from accident narratives describing 3,686 motor vehicle crashes. It was hypothesized that a Bayesian model could learn from a computer search for 63 keywords related to accident categories. Learning was described in terms of the ability to accurately classify previously unclassifiable narratives not containing the original keywords. When narratives contained keywords, the results obtained using both the Bayesian model and keyword search corresponded closely to expert ratings (P(detection)≥0.9, and P(false positive)≤0.05). For narratives not containing keywords, when the threshold used by the Bayesian model was varied between p>0.5 and p>0.9, the overall probability of detecting a category assigned by the expert varied between 67% and 12%. False positives correspondingly varied between 32% and 3%. These latter results demonstrated that the Bayesian system learned from the results of the keyword searches.


Sign in / Sign up

Export Citation Format

Share Document