Automatic Generation of Scripts for Database Creation from Scenario Descriptions

Aims: Database creation is the most critical component of the design and implementation of any software application. Generally, the process of creating the database from the requirement specification of a software application is believed to be extremely hard. This study presents a method to automatically generate database scripts from a given scenario description of the requirement specification. Study Design: The method is developed based on a set of natural language processing (NLP) techniques and a few algorithms. Standard database scenario descriptions presented in popular textbooks on Database Design are used for the validation of the method. Place and Duration of Study: Department of Statistics and Computer Science, Faculty of Science, University of Peradeniya, Sri Lanka, Between December 2019 to December 2020. Methodology: The description of the problem scenario is processed using NLP operations such as tokenization, complex word handling, basic group handling, complex phrase handling, structure merging, and template construction to extract the necessary information required for the entity relational model. New algorithms are proposed to automatically convert the entity relational model to the logical schema and finally to the database script. The system can generate scripts for relational databases (RDB), object relational databases (ORDB) and Not Only SQL (NoSQL) databases. The proposed method is integrated into a web application where the users can type the scenario in natural or free text. The user can select the type of database (i.e., one of RDB, ORDB, NoSQL) considered in their system and accordingly the application generates the SQL scripts. Results: The proposed method was evaluated using 10 scenario descriptions connected to 10 different domains such as company, university, airport, etc. for all three types of databases. The method performed with impressive accuracies of 82.5%, 84.0% and 83.5% for RDB, ORDB and NoSQL scripts, respectively. Conclusion: This study is mainly focused on the automatic generation of SQL scripts from scenario descriptions of the requirement specification of a software system. Overall, the developed method helps to speed up the database development process. Further, the developed web application provides a learning environment for people who are novices in database technology.

Download Full-text

A study of semantic integration across archaeological data and reports in different languages

Journal of Information Science ◽

10.1177/0165551518789874 ◽

2018 ◽

Vol 45 (3) ◽

pp. 364-386

Author(s):

Ceri Binding ◽

Douglas Tudhope ◽

Andreas Vlachidis

Keyword(s):

Language Processing ◽

Web Application ◽

Reference Model ◽

Grey Literature ◽

Beech Wood ◽

Semantic Integration ◽

Free Text ◽

Semantic Framework ◽

Integrative Research ◽

Pattern Approach

This study investigates the semantic integration of data extracted from archaeological datasets with information extracted via natural language processing (NLP) across different languages. The investigation follows a broad theme relating to wooden objects and their dating via dendrochronological techniques, including types of wooden material, samples taken and wooden objects including shipwrecks. The outcomes are an integrated RDF dataset coupled with an associated interactive research demonstrator query builder application. The semantic framework combines the CIDOC Conceptual Reference Model (CRM) with the Getty Art and Architecture Thesaurus (AAT). The NLP, data cleansing and integration methods are described in detail together with illustrative scenarios from the web application Demonstrator. Reflections and recommendations from the study are discussed. The Demonstrator is a novel SPARQL web application, with CRM/AAT-based data integration. Functionality includes the combination of free text and semantic search with browsing on semantic links, hierarchical and associative relationship thesaurus query expansion. Queries concern wooden objects (e.g. samples of beech wood keels), optionally from a given date range, with automatic expansion over AAT hierarchies of wood types and specialised associative relationships. Following a ‘mapping pattern’ approach (via the STELETO tool) ensured validity and consistency of all RDF output. The user is shielded from the complexity of the underlying semantic framework by a query builder user interface. The study demonstrates the feasibility of connecting information extracted from datasets and grey literature reports in different languages and semantic cross-searching of the integrated information. The semantic linking of textual reports and datasets opens new possibilities for integrative research across diverse resources.

Download Full-text

SpeakQL Natural Language to SQL

ITM Web of Conferences ◽

10.1051/itmconf/20214003018 ◽

2021 ◽

Vol 40 ◽

pp. 03018

Author(s):

Dhairya Shah ◽

Aniruddha Das ◽

Aniket Shahane ◽

Dharmik Parikh ◽

Pranit Bari

Keyword(s):

Natural Language ◽

Language Processing ◽

Relational Databases ◽

Query Language ◽

Automatic Generation ◽

Query Languages ◽

Natural Language Interface ◽

Open Issue ◽

Conventional Structure ◽

Structure Query

Incorporating SQL questions from normal language is a long-standing open issue and has been drawing in extensive intrigue as of late. Natural Language Interface (NLI) is the confluence of Natural Language Processing (NLP) and Human-Computer Interaction, which allows interaction between humans and computers through the utilization of Natural Language. Here we are gonna deal with the problem of automatic generation of Structured Query Language (SQL) queries. SQL is a database language for querying and manipulating relational databases. Despite the spectacular rise in the acceptance of relational databases, there is a fundamental limitaion to the ability to fetch data from those databases. One of the major reasons for this is the fact that the users of these relational databases need to comprehend convoluted structured query languages. In this body of work, we present an interface that allows users to interact with the databases using Natural Lanaguage as opposted to the conventional structure query languages.

Download Full-text

Development of an Automated Solution for Large Scale Health Service Feedback: Using NLP and Topic Modelling techniques (Preprint)

10.2196/preprints.29385 ◽

2021 ◽

Author(s):

George Alexander ◽

Mohammed Bahja ◽

Gibran F Butt

Keyword(s):

Lived Experience ◽

Language Processing ◽

Web Application ◽

Large Scale ◽

Service Providers ◽

Classification Model ◽

Free Text ◽

Analysis Tool ◽

Patient Feedback ◽

Health And Social Care

UNSTRUCTURED Obtaining patient feedback is an essential mechanism for healthcare service providers to assess their quality and effectiveness. Unlike assessments of clinical outcomes, feedback from patients offers insights into their lived experience. The Department of Health and Social Care in England via NHS Digital operates a patient feedback web service through which patients can leave feedback of their experiences into structured and free-text report forms. Free-text feedback compared to structured questionnaires may be less biased by the feedback collector thus more representative; however, it is harder to analyse in large quantities and challenging to derive meaningful, quantitative outcomes for better representation of the general public feedback. This study details the development of a text analysis tool that utilises contemporary natural language processing (NLP) and machine learning models to analyse free-text clinical service reviews to develop a robust classification model, and interactive visualisation web application based on a Vue.js application with NodeJS, working with a C# serverless API and SQL server all hosted on Microsoft Azure Platform, which facilitates exploration of the data, designed for the use by all stakeholders. Of the 11,103 possible clinical services that could be reviewed across England, 2030 different services had received a combined total of 51,845 reviews between 1/10/2017 and 31/10/2019; these were included for analysis. Dominant topics were identified for the entire corpus and then negative and positive sentiment topics in turn. Reviews containing high and low sentiment topics occurred more frequently than less polarised topics. Time series analysis can identify trends in topic and sentiment occurrence frequency across the study period. This tool automates the analysis of large volumes of free text specific to medical services, and the web application summarises the results and presents them in an accessible and interactive format. Such a tool has the potential to considerably reduce administrative burden and increase user uptake.

Download Full-text

The Case for Retaining Natural Language Descriptions of Phenotypes in Plant Databases and a Web Application as Proof of Concept

10.1101/2021.02.04.429796 ◽

2021 ◽

Author(s):

Ian R. Braun ◽

Diane C. Bassham ◽

Carolyn J. Lawrence-Dill

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Web Application ◽

Free Text ◽

Similarity Metrics ◽

Proof Of Concept ◽

Link Type ◽

Plant Genes ◽

Phenotype Similarity

ABSTRACTMotivationFinding similarity across phenotypic descriptions is not straightforward, with previous successes in computation requiring significant expert data curation. Natural language processing of free text phenotype descriptions is often easier to apply than intensive curation. It is therefore critical to understand the extent to which these techniques can be used to organize and analyze biological datasets and enable biological discoveries.ResultsA wide variety of approaches from the natural language processing domain perform as well as similarity metrics over curated annotations for predicting shared phenotypes. These approaches also show promise both for helping curators organize and work through large datasets as well as for enabling researchers to explore relationships among available phenotype descriptions. Here we generate networks of phenotype similarity and share a web application for querying a dataset of associated plant genes using these text mining approaches. Example situations and species for which application of these techniques is most useful are discussed.AvailabilityThe dataset used in this work is available at https://git.io/JTutQ. The code for the analysis performed here is available at https://git.io/JTutN and https://git.io/JTuqv. The code for the web application discussed here is available at https://git.io/Jtv9J, and the application itself is available at https://quoats.dill-picl.org/.

Download Full-text

ANALYZING AND IMPROVING WEB APPLICATION QUALITY USING DESIGN PATTERNS

INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY ◽

10.24297/ijct.v2i2b.2641 ◽

2012 ◽

Vol 2 (2) ◽

pp. 112-116

Author(s):

Shikha Bhatia ◽

Mr. Harshpreet Singh

Keyword(s):

Web Application ◽

Design Pattern ◽

Design Patterns ◽

Web Applications ◽

Past Research ◽

Software Systems ◽

Interactive Software ◽

Web Based ◽

Software Application ◽

Execution Speed

With the mounting demand of web applications, a number of issues allied to its quality have came in existence. In the meadow of web applications, it is very thorny to develop high quality web applications. A design pattern is a general repeatable solution to a generally stirring problem in software design. It should be noted that design pattern is not a finished product that can be directly transformed into source code. Rather design pattern is a depiction or template that describes how to find solution of a problem that can be used in many different situations. Past research has shown that design patterns greatly improved the execution speed of a software application. Design pattern are classified as creational design patterns, structural design pattern, behavioral design pattern, etc. MVC design pattern is very productive for architecting interactive software systems and web applications. This design pattern is partition-independent, because it is expressed in terms of an interactive application running in a single address space. We will design and analyze an algorithm by using MVC approach to improve the performance of web based application. The objective of our study will be to reduce one of the major object oriented features i.e. coupling between model and view segments of web based application. The implementation for the same will be done in by using .NET framework.

Download Full-text

Sentiment Analysis Techniques Applied to Raw-Text Data from a Csq-8 Questionnaire about Mindfulness in Times of COVID-19 to Improve Strategy Generation

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18126408 ◽

2021 ◽

Vol 18 (12) ◽

pp. 6408

Author(s):

Mario Jojoa Acosta ◽

Gema Castillo-Sánchez ◽

Begonya Garcia-Zapirain ◽

Isabel de la Torre Díez ◽

Manuel Franco-Martín

Keyword(s):

Health Care ◽

Natural Language Processing ◽

Natural Language ◽

Sentiment Analysis ◽

Transfer Learning ◽

Language Processing ◽

Health Care Professionals ◽

Ground Truth ◽

Relevant Information ◽

Free Text

The use of artificial intelligence in health care has grown quickly. In this sense, we present our work related to the application of Natural Language Processing techniques, as a tool to analyze the sentiment perception of users who answered two questions from the CSQ-8 questionnaires with raw Spanish free-text. Their responses are related to mindfulness, which is a novel technique used to control stress and anxiety caused by different factors in daily life. As such, we proposed an online course where this method was applied in order to improve the quality of life of health care professionals in COVID 19 pandemic times. We also carried out an evaluation of the satisfaction level of the participants involved, with a view to establishing strategies to improve future experiences. To automatically perform this task, we used Natural Language Processing (NLP) models such as swivel embedding, neural networks, and transfer learning, so as to classify the inputs into the following three categories: negative, neutral, and positive. Due to the limited amount of data available—86 registers for the first and 68 for the second—transfer learning techniques were required. The length of the text had no limit from the user’s standpoint, and our approach attained a maximum accuracy of 93.02% and 90.53%, respectively, based on ground truth labeled by three experts. Finally, we proposed a complementary analysis, using computer graphic text representation based on word frequency, to help researchers identify relevant information about the opinions with an objective approach to sentiment. The main conclusion drawn from this work is that the application of NLP techniques in small amounts of data using transfer learning is able to obtain enough accuracy in sentiment analysis and text classification stages.

Download Full-text

Applying natural language processing and machine learning techniques to patient experience feedback: a systematic review

BMJ Health & Care Informatics ◽

10.1136/bmjhci-2020-100262 ◽

2021 ◽

Vol 28 (1) ◽

pp. e100262

Author(s):

Mustafa Khanbhai ◽

Patrick Anyadi ◽

Joshua Symons ◽

Kelsey Flott ◽

Ara Darzi ◽

...

Keyword(s):

Machine Learning ◽

Systematic Review ◽

Social Media ◽

Natural Language Processing ◽

Natural Language ◽

Patient Experience ◽

Language Processing ◽

Performance Metrics ◽

Free Text ◽

Patient Feedback

ObjectivesUnstructured free-text patient feedback contains rich information, and analysing these data manually would require a lot of personnel resources which are not available in most healthcare organisations.To undertake a systematic review of the literature on the use of natural language processing (NLP) and machine learning (ML) to process and analyse free-text patient experience data.MethodsDatabases were systematically searched to identify articles published between January 2000 and December 2019 examining NLP to analyse free-text patient feedback. Due to the heterogeneous nature of the studies, a narrative synthesis was deemed most appropriate. Data related to the study purpose, corpus, methodology, performance metrics and indicators of quality were recorded.ResultsNineteen articles were included. The majority (80%) of studies applied language analysis techniques on patient feedback from social media sites (unsolicited) followed by structured surveys (solicited). Supervised learning was frequently used (n=9), followed by unsupervised (n=6) and semisupervised (n=3). Comments extracted from social media were analysed using an unsupervised approach, and free-text comments held within structured surveys were analysed using a supervised approach. Reported performance metrics included the precision, recall and F-measure, with support vector machine and Naïve Bayes being the best performing ML classifiers.ConclusionNLP and ML have emerged as an important tool for processing unstructured free text. Both supervised and unsupervised approaches have their role depending on the data source. With the advancement of data analysis tools, these techniques may be useful to healthcare organisations to generate insight from the volumes of unstructured free-text data.

Download Full-text

Ascertaining Framingham heart failure phenotype from inpatient electronic health record data using natural language processing: a multicentre Atherosclerosis Risk in Communities (ARIC) validation study

BMJ Open ◽

10.1136/bmjopen-2020-047356 ◽

2021 ◽

Vol 11 (6) ◽

pp. e047356

Author(s):

Carlton R Moore ◽

Saumya Jain ◽

Stephanie Haas ◽

Harish Yadav ◽

Eric Whitsel ◽

...

Keyword(s):

Language Processing ◽

Validation Dataset ◽

Free Text ◽

Electronic Health Record Data ◽

Atherosclerosis Risk In Communities ◽

Clinical Notes ◽

Atherosclerosis Risk ◽

Record Data ◽

Sensitivity Specificity ◽

Aric Study

ObjectivesUsing free-text clinical notes and reports from hospitalised patients, determine the performance of natural language processing (NLP) ascertainment of Framingham heart failure (HF) criteria and phenotype.Study designA retrospective observational study design of patients hospitalised in 2015 from four hospitals participating in the Atherosclerosis Risk in Communities (ARIC) study was used to determine NLP performance in the ascertainment of Framingham HF criteria and phenotype.SettingFour ARIC study hospitals, each representing an ARIC study region in the USA.ParticipantsA stratified random sample of hospitalisations identified using a broad range of International Classification of Disease, ninth revision, diagnostic codes indicative of an HF event and occurring during 2015 was drawn for this study. A randomly selected set of 394 hospitalisations was used as the derivation dataset and 406 hospitalisations was used as the validation dataset.InterventionUse of NLP on free-text clinical notes and reports to ascertain Framingham HF criteria and phenotype.Primary and secondary outcome measuresNLP performance as measured by sensitivity, specificity, positive-predictive value (PPV) and agreement in ascertainment of Framingham HF criteria and phenotype. Manual medical record review by trained ARIC abstractors was used as the reference standard.ResultsOverall, performance of NLP ascertainment of Framingham HF phenotype in the validation dataset was good, with 78.8%, 81.7%, 84.4% and 80.0% for sensitivity, specificity, PPV and agreement, respectively.ConclusionsBy decreasing the need for manual chart review, our results on the use of NLP to ascertain Framingham HF phenotype from free-text electronic health record data suggest that validated NLP technology holds the potential for significantly improving the feasibility and efficiency of conducting large-scale epidemiologic surveillance of HF prevalence and incidence.

Download Full-text

Measuring Adoption of Patient Priorities-Aligned Care Using Natural Language Processing

Innovation in Aging ◽

10.1093/geroni/igaa057.592 ◽

2020 ◽

Vol 4 (Supplement_1) ◽

pp. 183-183

Author(s):

Javad Razjouyan ◽

Jennifer Freytag ◽

Edward Odom ◽

Lilian Dindo ◽

Aanand Naik

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Chart Review ◽

Group Analysis ◽

Intervention Group ◽

Multiple Chronic Conditions ◽

Free Text ◽

Term Care

Abstract Patient Priorities Care (PPC) is a model of care that aligns health care recommendations with priorities of older adults with multiple chronic conditions. Social workers (SW), after online training, document PPC in the patient’s electronic health record (EHR). Our goal is to identify free-text notes with PPC language using a natural language processing (NLP) model and to measure PPC adoption and effect on long term services and support (LTSS) use. Free-text notes from the EHR produced by trained SWs passed through a hybrid NLP model that utilized rule-based and statistical machine learning. NLP accuracy was validated against chart review. Patients who received PPC were propensity matched with patients not receiving PPC (control) on age, gender, BMI, Charlson comorbidity index, facility and SW. The change in LTSS utilization 6-month intervals were compared by groups with univariate analysis. Chart review indicated that 491 notes out of 689 had PPC language and the NLP model reached to precision of 0.85, a recall of 0.90, an F1 of 0.87, and an accuracy of 0.91. Within group analysis shows that intervention group used LTSS 1.8 times more in the 6 months after the encounter compared to 6 months prior. Between group analysis shows that intervention group has significant higher number of LTSS utilization (p=0.012). An automated NLP model can be used to reliably measure the adaptation of PPC by SW. PPC seems to encourage use of LTSS that may delay time to long term care placement.

Download Full-text

Applying graph database technology for analyzing perturbed co-expression networks in cancer

Database ◽

10.1093/database/baaa110 ◽

2020 ◽

Vol 2020 ◽

Author(s):

Claire M Simpson ◽

Florian Gnad

Keyword(s):

Relational Databases ◽

Molecular Mechanisms ◽

Biological Data ◽

Database Management System ◽

Graph Database ◽

Graph Databases ◽

Graph Representations ◽

Rnaseq Data ◽

Database Technology ◽

Speed Accuracy

Abstract Graph representations provide an elegant solution to capture and analyze complex molecular mechanisms in the cell. Co-expression networks are undirected graph representations of transcriptional co-behavior indicating (co-)regulations, functional modules or even physical interactions between the corresponding gene products. The growing avalanche of available RNA sequencing (RNAseq) data fuels the construction of such networks, which are usually stored in relational databases like most other biological data. Inferring linkage by recursive multiple-join statements, however, is computationally expensive and complex to design in relational databases. In contrast, graph databases store and represent complex interconnected data as nodes, edges and properties, making it fast and intuitive to query and analyze relationships. While graph-based database technologies are on their way from a fringe domain to going mainstream, there are only a few studies reporting their application to biological data. We used the graph database management system Neo4j to store and analyze co-expression networks derived from RNAseq data from The Cancer Genome Atlas. Comparing co-expression in tumors versus healthy tissues in six cancer types revealed significant perturbation tracing back to erroneous or rewired gene regulation. Applying centrality, community detection and pathfinding graph algorithms uncovered the destruction or creation of central nodes, modules and relationships in co-expression networks of tumors. Given the speed, accuracy and straightforwardness of managing these densely connected networks, we conclude that graph databases are ready for entering the arena of biological data.

Download Full-text