scholarly journals A Natural Language Interface to Relational Databases Using an Online Analytic Processing Hypercube

AI ◽  
2021 ◽  
Vol 2 (4) ◽  
pp. 720-737
Author(s):  
Fadi H. Hazboun ◽  
Majdi Owda ◽  
Amani Yousef Owda

Structured Query Language (SQL) is commonly used in Relational Database Management Systems (RDBMS) and is currently one of the most popular data definition and manipulation languages. Its core functionality is implemented, with only some minor variations, throughout all RDBMS products. It is an effective tool in the process of managing and querying data in relational databases. This paper describes a method to effectively automate the conversion of a data query from a Natural Language Query (NLQ) to Structured Query Language (SQL) with Online Analytical Processing (OLAP) cube data warehouse objects. To obtain or manipulate the data from relational databases, the user must be familiar with SQL and must also write an appropriate and valid SQL statement. However, users who are not familiar with SQL are unable to obtain relevant data through relational databases. To address this, we propose a Natural Language Processing (NLP) model to convert an NLQ into an SQL query. This allows novice users to obtain the required data without having to know any complicated SQL details. The model is also capable of handling complex queries using the OLAP cube technique, which allows data to be pre-calculated and stored in a multi-dimensional and ready-to-use format. A multi-dimensional cube (hypercube) is used to connect with the NLP interface, thereby eliminating long-running data queries and enabling self-service business intelligence. The study demonstrated how the use of hypercube technology helps to increase the system response speed and the ability to process very complex query sentences. The system achieved impressive performance in terms of NLP and the accuracy of generating different query sentences. Using OLAP hypercube technology, the study achieved distinguished results compared to previous studies in terms of the speed of the response of the model to NLQ analysis, the generation of complex SQL statements, and the dynamic display of the results. As a plan for future work, it is recommended to use infinite-dimension (n-D) cubes instead of 4-D cubes to enable ingesting as much data as possible in a single object and to facilitate the execution of query statements that may be too complex in query interfaces running in a data warehouse. The study demonstrated how the use of hypercube technology helps to increase system response speed and process very complex query sentences.

2021 ◽  
Vol 8 ◽  
Author(s):  
Benjamin Hunter ◽  
Sara Reis ◽  
Des Campbell ◽  
Sheila Matharu ◽  
Prashanthi Ratnakumar ◽  
...  

Importance: The stratification of indeterminate lung nodules is a growing problem, but the burden of lung nodules on healthcare services is not well-described. Manual service evaluation and research cohort curation can be time-consuming and potentially improved by automation.Objective: To automate lung nodule identification in a tertiary cancer centre.Methods: This retrospective cohort study used Electronic Healthcare Records to identify CT reports generated between 31st October 2011 and 24th July 2020. A structured query language/natural language processing tool was developed to classify reports according to lung nodule status. Performance was externally validated. Sentences were used to train machine-learning classifiers to predict concerning nodule features in 2,000 patients.Results: 14,586 patients with lung nodules were identified. The cancer types most commonly associated with lung nodules were lung (39%), neuro-endocrine (38%), skin (35%), colorectal (33%) and sarcoma (33%). Lung nodule patients had a greater proportion of metastatic diagnoses (45 vs. 23%, p < 0.001), a higher mean post-baseline scan number (6.56 vs. 1.93, p < 0.001), and a shorter mean scan interval (4.1 vs. 5.9 months, p < 0.001) than those without nodules. Inter-observer agreement for sentence classification was 0.94 internally and 0.98 externally. Sensitivity and specificity for nodule identification were 93 and 99% internally, and 100 and 100% at external validation, respectively. A linear-support vector machine model predicted concerning sentence features with 94% accuracy.Conclusion: We have developed and validated an accurate tool for automated lung nodule identification that is valuable for service evaluation and research data acquisition.


2021 ◽  
Vol 40 ◽  
pp. 03018
Author(s):  
Dhairya Shah ◽  
Aniruddha Das ◽  
Aniket Shahane ◽  
Dharmik Parikh ◽  
Pranit Bari

Incorporating SQL questions from normal language is a long-standing open issue and has been drawing in extensive intrigue as of late. Natural Language Interface (NLI) is the confluence of Natural Language Processing (NLP) and Human-Computer Interaction, which allows interaction between humans and computers through the utilization of Natural Language. Here we are gonna deal with the problem of automatic generation of Structured Query Language (SQL) queries. SQL is a database language for querying and manipulating relational databases. Despite the spectacular rise in the acceptance of relational databases, there is a fundamental limitaion to the ability to fetch data from those databases. One of the major reasons for this is the fact that the users of these relational databases need to comprehend convoluted structured query languages. In this body of work, we present an interface that allows users to interact with the databases using Natural Lanaguage as opposted to the conventional structure query languages.


2017 ◽  
Vol 62 (10) ◽  
pp. 2713-2718 ◽  
Author(s):  
Joseph S. Redman ◽  
Yamini Natarajan ◽  
Jason K. Hou ◽  
Jingqi Wang ◽  
Muzammil Hanif ◽  
...  

2017 ◽  
Vol 30 (3) ◽  
pp. 503-525
Author(s):  
Kamal Hamaz ◽  
Fouzia Benchikha

Purpose With the development of systems and applications, the number of users interacting with databases has increased considerably. The relational database model is still considered as the most used model for data storage and manipulation. However, it does not offer any semantic support for the stored data which can facilitate data access for the users. Indeed, a large number of users are intimidated when retrieving data because they are non-technical or have little technical knowledge. To overcome this problem, researchers are continuously developing new techniques for Natural Language Interfaces to Databases (NLIDB). Nowadays, the usage of existing NLIDBs is not widespread due to their deficiencies in understanding natural language (NL) queries. In this sense, the purpose of this paper is to propose a novel method for an intelligent understanding of NL queries using semantically enriched database sources. Design/methodology/approach First a reverse engineering process is applied to extract relational database hidden semantics. In the second step, the extracted semantics are enriched further using a domain ontology. After this, all semantics are stored in the same relational database. The phase of processing NL queries uses the stored semantics to generate a semantic tree. Findings The evaluation part of the work shows the advantages of using a semantically enriched database source to understand NL queries. Additionally, enriching a relational database has given more flexibility to understand contextual and synonymous words that may be used in a NL query. Originality/value Existing NLIDBs are not yet a standard option for interfacing a relational database due to their lack for understanding NL queries. Indeed, the techniques used in the literature have their limits. This paper handles those limits by identifying the NL elements by their semantic nature in order to generate a semantic tree. This last is a key solution towards an intelligent understanding of NL queries to relational databases.


2021 ◽  
Vol 14 (5) ◽  
pp. 813-821
Author(s):  
Arif Usta ◽  
Akifhan Karakayali ◽  
Özgür Ulusoy

Translating Natural Language Queries (NLQs) to Structured Query Language (SQL) in interfaces deployed in relational databases is a challenging task, which has been widely studied in database community recently. Conventional rule based systems utilize series of solutions as a pipeline to deal with each step of this task, namely stop word filtering, tokenization, stemming/lemmatization, parsing, tagging, and translation. Recent works have mostly focused on the translation step overlooking the earlier steps by using adhoc solutions. In the pipeline, one of the most critical and challenging problems is keyword mapping; constructing a mapping between tokens in the query and relational database elements (tables, attributes, values, etc.). We define the keyword mapping problem as a sequence tagging problem, and propose a novel deep learning based supervised approach that utilizes POS tags of NLQs. Our proposed approach, called DBTagger (DataBase Tagger), is an end-to-end and schema independent solution, which makes it practical for various relational databases. We evaluate our approach on eight different datasets, and report new state-of-the-art accuracy results, 92.4% on the average. Our results also indicate that DBTagger is faster than its counterparts up to 10000 times and scalable for bigger databases.


2020 ◽  
Author(s):  
Han Wang ◽  
Wesley Yeung ◽  
Mengling Feng

UNSTRUCTURED Electronic Health Record (EHR) systems used in hospitals and healthcare institutes generate vast amounts of data stored in relational databases. Structured Query Language (SQL) is a common language used to update, extract and pre-process data in EHR databases. Pre-processing is a necessary step before statistical modeling and causal inference studies can be carried out in observational studies. Data extraction and pre-processing using SQL require a collaborative effort between data engineers and researchers such as clinicians or biostatisticians. Natural Language to SQL (NL2SQL) models converts study designs in natural language to SQL queries to obtain the desired cohort and risk factors. While they cannot completely replace the need for cross-disciplinary collaboration, they have the potential to enable clinicians and biostatisticians who are not trained in SQL to explore EHR databases on their own and reduce the burden placed on data engineers by automating less-complex tasks. There has been substantial research on NL2SQL tasks on general knowledge databases but their application in EHR databases that contain domain-specific knowledge are not well studied. In this paper, we will introduce the general NL2SQL tasks, and discuss in-depth about the potential challenges in developing NL2SQL tools for EHR databases.


2011 ◽  
Vol 22 (3) ◽  
pp. 533-547 ◽  
Author(s):  
ALEKSANDAR PEROVIĆ ◽  
ALEKSANDAR TAKAČI ◽  
SRDJAN ŠKRBIĆ

Using the concept of a generalised priority constraint satisfaction problem, we previously found a way to introduce priority queries into fuzzy relational databases. The results were PFSQL (Priority Fuzzy Structured Query Language) together with a database independent interpreter for it. In an effort to improve the performance of the resolution of PFSQL queries, the aim of the current paper is to formalise PFSQL queries by obtaining their interpretation in an existing fuzzy logic. We have found that the ŁΠ logic provides sufficient elements. The SELECT line of PFSQL queries is semantically a formula of some fuzzy logic, and we show that such formulas can be naturally expressed in a conservative extension of the ŁΠ logic. Furthermore, we prove a theorem that gives the PSPACE containment for the complexity of finding a model for a given ŁΠ logic formula.


Sign in / Sign up

Export Citation Format

Share Document