A Natural Language Interface to Relational Databases Using an Online Analytic Processing Hypercube

Structured Query Language (SQL) is commonly used in Relational Database Management Systems (RDBMS) and is currently one of the most popular data definition and manipulation languages. Its core functionality is implemented, with only some minor variations, throughout all RDBMS products. It is an effective tool in the process of managing and querying data in relational databases. This paper describes a method to effectively automate the conversion of a data query from a Natural Language Query (NLQ) to Structured Query Language (SQL) with Online Analytical Processing (OLAP) cube data warehouse objects. To obtain or manipulate the data from relational databases, the user must be familiar with SQL and must also write an appropriate and valid SQL statement. However, users who are not familiar with SQL are unable to obtain relevant data through relational databases. To address this, we propose a Natural Language Processing (NLP) model to convert an NLQ into an SQL query. This allows novice users to obtain the required data without having to know any complicated SQL details. The model is also capable of handling complex queries using the OLAP cube technique, which allows data to be pre-calculated and stored in a multi-dimensional and ready-to-use format. A multi-dimensional cube (hypercube) is used to connect with the NLP interface, thereby eliminating long-running data queries and enabling self-service business intelligence. The study demonstrated how the use of hypercube technology helps to increase the system response speed and the ability to process very complex query sentences. The system achieved impressive performance in terms of NLP and the accuracy of generating different query sentences. Using OLAP hypercube technology, the study achieved distinguished results compared to previous studies in terms of the speed of the response of the model to NLQ analysis, the generation of complex SQL statements, and the dynamic display of the results. As a plan for future work, it is recommended to use infinite-dimension (n-D) cubes instead of 4-D cubes to enable ingesting as much data as possible in a single object and to facilitate the execution of query statements that may be too complex in query interfaces running in a data warehouse. The study demonstrated how the use of hypercube technology helps to increase system response speed and process very complex query sentences.

Download Full-text

Development of a Structured Query Language and Natural Language Processing Algorithm to Identify Lung Nodules in a Cancer Centre

Frontiers in Medicine ◽

10.3389/fmed.2021.748168 ◽

2021 ◽

Vol 8 ◽

Author(s):

Benjamin Hunter ◽

Sara Reis ◽

Des Campbell ◽

Sheila Matharu ◽

Prashanthi Ratnakumar ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Query Language ◽

Lung Nodule ◽

Service Evaluation ◽

Support Vector ◽

Lung Nodules ◽

Cancer Centre ◽

Structured Query Language

Importance: The stratification of indeterminate lung nodules is a growing problem, but the burden of lung nodules on healthcare services is not well-described. Manual service evaluation and research cohort curation can be time-consuming and potentially improved by automation.Objective: To automate lung nodule identification in a tertiary cancer centre.Methods: This retrospective cohort study used Electronic Healthcare Records to identify CT reports generated between 31st October 2011 and 24th July 2020. A structured query language/natural language processing tool was developed to classify reports according to lung nodule status. Performance was externally validated. Sentences were used to train machine-learning classifiers to predict concerning nodule features in 2,000 patients.Results: 14,586 patients with lung nodules were identified. The cancer types most commonly associated with lung nodules were lung (39%), neuro-endocrine (38%), skin (35%), colorectal (33%) and sarcoma (33%). Lung nodule patients had a greater proportion of metastatic diagnoses (45 vs. 23%, p < 0.001), a higher mean post-baseline scan number (6.56 vs. 1.93, p < 0.001), and a shorter mean scan interval (4.1 vs. 5.9 months, p < 0.001) than those without nodules. Inter-observer agreement for sentence classification was 0.94 internally and 0.98 externally. Sensitivity and specificity for nodule identification were 93 and 99% internally, and 100 and 100% at external validation, respectively. A linear-support vector machine model predicted concerning sentence features with 94% accuracy.Conclusion: We have developed and validated an accurate tool for automated lung nodule identification that is valuable for service evaluation and research data acquisition.

Download Full-text

Indonesian Text Translator into Database Structured Query Language with Multi Parameters using Natural Language Processing

IOP Conference Series Materials Science and Engineering ◽

10.1088/1757-899x/662/2/022095 ◽

2019 ◽

Vol 662 ◽

pp. 022095

Author(s):

G Hermawan ◽

I Faturohman ◽

N Isharmawan

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Query Language ◽

Structured Query Language

Download Full-text

SpeakQL Natural Language to SQL

ITM Web of Conferences ◽

10.1051/itmconf/20214003018 ◽

2021 ◽

Vol 40 ◽

pp. 03018

Author(s):

Dhairya Shah ◽

Aniruddha Das ◽

Aniket Shahane ◽

Dharmik Parikh ◽

Pranit Bari

Keyword(s):

Natural Language ◽

Language Processing ◽

Relational Databases ◽

Query Language ◽

Automatic Generation ◽

Query Languages ◽

Natural Language Interface ◽

Open Issue ◽

Conventional Structure ◽

Structure Query

Incorporating SQL questions from normal language is a long-standing open issue and has been drawing in extensive intrigue as of late. Natural Language Interface (NLI) is the confluence of Natural Language Processing (NLP) and Human-Computer Interaction, which allows interaction between humans and computers through the utilization of Natural Language. Here we are gonna deal with the problem of automatic generation of Structured Query Language (SQL) queries. SQL is a database language for querying and manipulating relational databases. Despite the spectacular rise in the acceptance of relational databases, there is a fundamental limitaion to the ability to fetch data from those databases. One of the major reasons for this is the fact that the users of these relational databases need to comprehend convoluted structured query languages. In this body of work, we present an interface that allows users to interact with the databases using Natural Lanaguage as opposted to the conventional structure query languages.

Download Full-text

Accurate Identification of Fatty Liver Disease in Data Warehouse Utilizing Natural Language Processing

Digestive Diseases and Sciences ◽

10.1007/s10620-017-4721-9 ◽

2017 ◽

Vol 62 (10) ◽

pp. 2713-2718 ◽

Cited By ~ 3

Author(s):

Joseph S. Redman ◽

Yamini Natarajan ◽

Jason K. Hou ◽

Jingqi Wang ◽

Muzammil Hanif ◽

...

Keyword(s):

Natural Language Processing ◽

Liver Disease ◽

Natural Language ◽

Fatty Liver ◽

Data Warehouse ◽

Language Processing ◽

Fatty Liver Disease ◽

Accurate Identification

Download Full-text

Relational Databases: Structured Query Language (SQL)

Principles of Database Management ◽

10.1017/9781316888773.009 ◽

2018 ◽

pp. 146-206

Keyword(s):

Relational Databases ◽

Query Language ◽

Structured Query Language

Download Full-text

A novel method for providing relational databases with rich semantics and natural language processing

Journal of Enterprise Information Management ◽

10.1108/jeim-01-2015-0005 ◽

2017 ◽

Vol 30 (3) ◽

pp. 503-525

Author(s):

Kamal Hamaz ◽

Fouzia Benchikha

Keyword(s):

Natural Language ◽

Data Storage ◽

Language Processing ◽

Relational Database ◽

Relational Databases ◽

Data Access ◽

New Techniques ◽

Content Type ◽

Semantic Tree ◽

Novel Method

Purpose With the development of systems and applications, the number of users interacting with databases has increased considerably. The relational database model is still considered as the most used model for data storage and manipulation. However, it does not offer any semantic support for the stored data which can facilitate data access for the users. Indeed, a large number of users are intimidated when retrieving data because they are non-technical or have little technical knowledge. To overcome this problem, researchers are continuously developing new techniques for Natural Language Interfaces to Databases (NLIDB). Nowadays, the usage of existing NLIDBs is not widespread due to their deficiencies in understanding natural language (NL) queries. In this sense, the purpose of this paper is to propose a novel method for an intelligent understanding of NL queries using semantically enriched database sources. Design/methodology/approach First a reverse engineering process is applied to extract relational database hidden semantics. In the second step, the extracted semantics are enriched further using a domain ontology. After this, all semantics are stored in the same relational database. The phase of processing NL queries uses the stored semantics to generate a semantic tree. Findings The evaluation part of the work shows the advantages of using a semantically enriched database source to understand NL queries. Additionally, enriching a relational database has given more flexibility to understand contextual and synonymous words that may be used in a NL query. Originality/value Existing NLIDBs are not yet a standard option for interfacing a relational database due to their lack for understanding NL queries. Indeed, the techniques used in the literature have their limits. This paper handles those limits by identifying the NL elements by their semantic nature in order to generate a semantic tree. This last is a key solution towards an intelligent understanding of NL queries to relational databases.

Download Full-text

An Arabic natural language interface for querying relational databases based on natural language processing and graph theory methods

International Journal of Reasoning-based Intelligent Systems ◽

10.1504/ijris.2018.092221 ◽

2018 ◽

Vol 10 (2) ◽

pp. 155 ◽

Cited By ~ 1

Author(s):

Hanane Bais ◽

Mustapha Machkour ◽

Lahcen Koutti

Keyword(s):

Graph Theory ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Relational Databases ◽

Natural Language Interface

Download Full-text

DBTagger

Proceedings of the VLDB Endowment ◽

10.14778/3446095.3446103 ◽

2021 ◽

Vol 14 (5) ◽

pp. 813-821

Author(s):

Arif Usta ◽

Akifhan Karakayali ◽

Özgür Ulusoy

Keyword(s):

Deep Learning ◽

Natural Language ◽

Relational Database ◽

Relational Databases ◽

Query Language ◽

State Of The Art ◽

Independent Solution ◽

Rule Based ◽

Mapping Problem ◽

Stop Word

Translating Natural Language Queries (NLQs) to Structured Query Language (SQL) in interfaces deployed in relational databases is a challenging task, which has been widely studied in database community recently. Conventional rule based systems utilize series of solutions as a pipeline to deal with each step of this task, namely stop word filtering, tokenization, stemming/lemmatization, parsing, tagging, and translation. Recent works have mostly focused on the translation step overlooking the earlier steps by using adhoc solutions. In the pipeline, one of the most critical and challenging problems is keyword mapping; constructing a mapping between tokens in the query and relational database elements (tables, attributes, values, etc.). We define the keyword mapping problem as a sequence tagging problem, and propose a novel deep learning based supervised approach that utilizes POS tags of NLQs. Our proposed approach, called DBTagger (DataBase Tagger), is an end-to-end and schema independent solution, which makes it practical for various relational databases. We evaluate our approach on eight different datasets, and report new state-of-the-art accuracy results, 92.4% on the average. Our results also indicate that DBTagger is faster than its counterparts up to 10000 times and scalable for bigger databases.

Download Full-text

Natural Language to SQL Generation for Observational Study Designs: Current Challenges and Possible Directions (Preprint)

10.2196/preprints.20801 ◽

2020 ◽

Author(s):

Han Wang ◽

Wesley Yeung ◽

Mengling Feng

Keyword(s):

Natural Language ◽

Observational Studies ◽

Relational Databases ◽

Query Language ◽

Data Extraction ◽

Process Data ◽

Specific Knowledge ◽

Domain Specific ◽

Domain Specific Knowledge ◽

Study Designs

UNSTRUCTURED Electronic Health Record (EHR) systems used in hospitals and healthcare institutes generate vast amounts of data stored in relational databases. Structured Query Language (SQL) is a common language used to update, extract and pre-process data in EHR databases. Pre-processing is a necessary step before statistical modeling and causal inference studies can be carried out in observational studies. Data extraction and pre-processing using SQL require a collaborative effort between data engineers and researchers such as clinicians or biostatisticians. Natural Language to SQL (NL2SQL) models converts study designs in natural language to SQL queries to obtain the desired cohort and risk factors. While they cannot completely replace the need for cross-disciplinary collaboration, they have the potential to enable clinicians and biostatisticians who are not trained in SQL to explore EHR databases on their own and reduce the burden placed on data engineers by automating less-complex tasks. There has been substantial research on NL2SQL tasks on general knowledge databases but their application in EHR databases that contain domain-specific knowledge are not well studied. In this paper, we will introduce the general NL2SQL tasks, and discuss in-depth about the potential challenges in developing NL2SQL tools for EHR databases.

Download Full-text

Formalising PFSQL queries using ŁΠ fuzzy logic

Mathematical Structures in Computer Science ◽

10.1017/s0960129511000673 ◽

2011 ◽

Vol 22 (3) ◽

pp. 533-547 ◽

Cited By ~ 4

Author(s):

ALEKSANDAR PEROVIĆ ◽

ALEKSANDAR TAKAČI ◽

SRDJAN ŠKRBIĆ

Keyword(s):

Fuzzy Logic ◽

Constraint Satisfaction ◽

Relational Databases ◽

Query Language ◽

Constraint Satisfaction Problem ◽

Current Paper ◽

Conservative Extension ◽

Logic Formula ◽

Structured Query Language

Using the concept of a generalised priority constraint satisfaction problem, we previously found a way to introduce priority queries into fuzzy relational databases. The results were PFSQL (Priority Fuzzy Structured Query Language) together with a database independent interpreter for it. In an effort to improve the performance of the resolution of PFSQL queries, the aim of the current paper is to formalise PFSQL queries by obtaining their interpretation in an existing fuzzy logic. We have found that the ŁΠ logic provides sufficient elements. The SELECT line of PFSQL queries is semantically a formula of some fuzzy logic, and we show that such formulas can be naturally expressed in a conservative extension of the ŁΠ logic. Furthermore, we prove a theorem that gives the PSPACE containment for the complexity of finding a model for a given ŁΠ logic formula.

Download Full-text