Natural Language Interfaces to Domain Specific Knowledge Bases: An Illustration for Querying Elements of the Periodic Table

This paper describes a learning/adaptive approach to automatically building knowledge bases for information extraction from text based web pages. A frame based representation is introduced to represent domain knowledge as knowledge unit frames. A frame learning algorithm is developed to automatically learn knowledge unit frames from training examples. Some training examples can be obtained by automatically parsing a number of tabular web pages in the same domain, which greatly reduced the amount of time consuming manual work. This approach was investigated on ten web sites of real estate advertisements and car advertisements and nearly all the information was successfully extracted with very few false alarms. These results suggest that both the knowledge unit frame representation and the frame learning algorithm work well, domain specific knowledge bases can be learned from training examples, and the domain specific knowledge base can be used for information extraction from flexible text-based semi-structured Web pages on multiple Web sites. The investigation of the knowledge representation on five other domains suggests that this approach can be easily applied to other domains by simply changing the training examples.

Download Full-text

Natural Language to SQL Generation for Observational Study Designs: Current Challenges and Possible Directions (Preprint)

10.2196/preprints.20801 ◽

2020 ◽

Author(s):

Han Wang ◽

Wesley Yeung ◽

Mengling Feng

Keyword(s):

Natural Language ◽

Observational Studies ◽

Relational Databases ◽

Query Language ◽

Data Extraction ◽

Process Data ◽

Specific Knowledge ◽

Domain Specific ◽

Domain Specific Knowledge ◽

Study Designs

UNSTRUCTURED Electronic Health Record (EHR) systems used in hospitals and healthcare institutes generate vast amounts of data stored in relational databases. Structured Query Language (SQL) is a common language used to update, extract and pre-process data in EHR databases. Pre-processing is a necessary step before statistical modeling and causal inference studies can be carried out in observational studies. Data extraction and pre-processing using SQL require a collaborative effort between data engineers and researchers such as clinicians or biostatisticians. Natural Language to SQL (NL2SQL) models converts study designs in natural language to SQL queries to obtain the desired cohort and risk factors. While they cannot completely replace the need for cross-disciplinary collaboration, they have the potential to enable clinicians and biostatisticians who are not trained in SQL to explore EHR databases on their own and reduce the burden placed on data engineers by automating less-complex tasks. There has been substantial research on NL2SQL tasks on general knowledge databases but their application in EHR databases that contain domain-specific knowledge are not well studied. In this paper, we will introduce the general NL2SQL tasks, and discuss in-depth about the potential challenges in developing NL2SQL tools for EHR databases.

Download Full-text

PreFace: Faceted Retrieval of Prerequisites Using Domain-Specific Knowledge Bases

Lecture Notes in Computer Science - The Semantic Web – ISWC 2020 ◽

10.1007/978-3-030-62419-4_34 ◽

2020 ◽

pp. 601-618

Author(s):

Prajna Upadhyay ◽

Maya Ramanath

Keyword(s):

Knowledge Bases ◽

Specific Knowledge ◽

Domain Specific ◽

Domain Specific Knowledge

Download Full-text

Building chatbots from large scale domain-specific knowledge bases: challenges and opportunities

2020 IEEE International Conference on Prognostics and Health Management (ICPHM) ◽

10.1109/icphm49022.2020.9187036 ◽

2020 ◽

Author(s):

Walid Shalaby ◽

Adriano Arantes ◽

Teresa GonzalezDiaz ◽

Chetan Gupta

Keyword(s):

Large Scale ◽

Knowledge Bases ◽

Specific Knowledge ◽

Domain Specific ◽

Challenges And Opportunities ◽

Domain Specific Knowledge

Download Full-text

AN INTERACTIVE TOOL FOR THE RAPID DEVELOPMENT OF KNOWLEDGE BASES

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213001000428 ◽

2001 ◽

Vol 10 (01n02) ◽

pp. 65-86 ◽

Cited By ~ 15

Author(s):

DAN I. MOLDOVAN ◽

ROXANA C. GÎRJU

Keyword(s):

Rapid Development ◽

Knowledge Bases ◽

General Purpose ◽

Specific Knowledge ◽

Interactive Mode ◽

Domain Specific ◽

New Concepts ◽

Financial Domain ◽

Domain Specific Knowledge ◽

Knowledge Intensive

It is widely accepted that more knowledge means more intelligence. In many knowledge intensive applications, it is necessary to have extensive domain-specific knowledge in addition to general-purpose knowledge bases. This paper presents a methodology for discovering domain-specific concepts and relationships in an attempt to extend WordNet. The method was tested on five seed concepts selected from the financial domain: interest rate, stock market, inflation, economic growth, and employment. Queries were formed with each of these concepts and a corpus of 5000 sentences was extracted automatically from the Internet and TREC-8 corpora. On this corpus, the system discovered a total of 264 new concepts not defined in WordNet, of which 221 contain the seeds and 43 are other related concepts. The system also discovered 64 relationships that link these concepts with either WordNet concepts or with each other. The relationships were extracted with the help of 22 distinct lexico-syntactic patterns representing four semantic relations. It takes the system approximately 40 minutes per seed working in interactive mode to discover the new concepts and relationships on the 5000 sentence corpus.

Download Full-text

A pipeline for extracting and deduplicating domain-specific knowledge bases

2015 IEEE International Conference on Big Data (Big Data) ◽

10.1109/bigdata.2015.7363868 ◽

2015 ◽

Cited By ~ 4

Author(s):

Mayank Kejriwal ◽

Qiaoling Liu ◽

Ferosh Jacob ◽

Faizan Javed

Keyword(s):

Knowledge Bases ◽

Specific Knowledge ◽

Domain Specific ◽

Domain Specific Knowledge

Download Full-text

A Light-weight Text Summarizer for Fast Access to Medical Evidence

10.1101/2020.05.22.20110742 ◽

2020 ◽

Author(s):

Abeed Sarker ◽

Yuan-Chi Yang ◽

Mohammed Ali Al-Garadi

Keyword(s):

Point Of Care ◽

Knowledge Bases ◽

Evidence Based ◽

Specific Knowledge ◽

Domain Specific ◽

Domain Specific Knowledge ◽

Simple Implementation ◽

Summarization System ◽

Simple Features ◽

Based Medicine

AbstractThe performances of current medical text summarization systems rely on resource-heavy domain-specific knowledge sources, and preprocessing methods (e.g., classification or deep learning) for deriving semantic information. Consequently, these systems are often difficult to customize, extend or deploy in low-resource settings, and are operationally slow. We propose a fast summarization system that can aid practitioners at point-of-care, and, thus, improve evidence-based healthcare. At runtime, our system utilizes similarity measurements derived from pre-trained domain-specific word embeddings in addition to simple features, rather than clunky knowledge bases and resource-heavy preprocessing. Automatic evaluation on a public dataset for evidence-based medicine shows that our system’s performance, despite the simple implementation, is statistically comparable with the state-of-the-art.

Download Full-text