Natural Language Interfaces to Domain Specific Knowledge Bases: An Illustration for Querying Elements of the Periodic Table

Author(s):  
Mukesh Kumar Rohil ◽  
Rohan Kumar Rohil ◽  
Divyesakshi Rohil ◽  
Anurag Runthala
2004 ◽  
Vol 13 (03) ◽  
pp. 721-738 ◽  
Author(s):  
XIAOYING GAO ◽  
MENGJIE ZHANG

This paper describes a learning/adaptive approach to automatically building knowledge bases for information extraction from text based web pages. A frame based representation is introduced to represent domain knowledge as knowledge unit frames. A frame learning algorithm is developed to automatically learn knowledge unit frames from training examples. Some training examples can be obtained by automatically parsing a number of tabular web pages in the same domain, which greatly reduced the amount of time consuming manual work. This approach was investigated on ten web sites of real estate advertisements and car advertisements and nearly all the information was successfully extracted with very few false alarms. These results suggest that both the knowledge unit frame representation and the frame learning algorithm work well, domain specific knowledge bases can be learned from training examples, and the domain specific knowledge base can be used for information extraction from flexible text-based semi-structured Web pages on multiple Web sites. The investigation of the knowledge representation on five other domains suggests that this approach can be easily applied to other domains by simply changing the training examples.


2020 ◽  
Author(s):  
Han Wang ◽  
Wesley Yeung ◽  
Mengling Feng

UNSTRUCTURED Electronic Health Record (EHR) systems used in hospitals and healthcare institutes generate vast amounts of data stored in relational databases. Structured Query Language (SQL) is a common language used to update, extract and pre-process data in EHR databases. Pre-processing is a necessary step before statistical modeling and causal inference studies can be carried out in observational studies. Data extraction and pre-processing using SQL require a collaborative effort between data engineers and researchers such as clinicians or biostatisticians. Natural Language to SQL (NL2SQL) models converts study designs in natural language to SQL queries to obtain the desired cohort and risk factors. While they cannot completely replace the need for cross-disciplinary collaboration, they have the potential to enable clinicians and biostatisticians who are not trained in SQL to explore EHR databases on their own and reduce the burden placed on data engineers by automating less-complex tasks. There has been substantial research on NL2SQL tasks on general knowledge databases but their application in EHR databases that contain domain-specific knowledge are not well studied. In this paper, we will introduce the general NL2SQL tasks, and discuss in-depth about the potential challenges in developing NL2SQL tools for EHR databases.


2001 ◽  
Vol 10 (01n02) ◽  
pp. 65-86 ◽  
Author(s):  
DAN I. MOLDOVAN ◽  
ROXANA C. GÎRJU

It is widely accepted that more knowledge means more intelligence. In many knowledge intensive applications, it is necessary to have extensive domain-specific knowledge in addition to general-purpose knowledge bases. This paper presents a methodology for discovering domain-specific concepts and relationships in an attempt to extend WordNet. The method was tested on five seed concepts selected from the financial domain: interest rate, stock market, inflation, economic growth, and employment. Queries were formed with each of these concepts and a corpus of 5000 sentences was extracted automatically from the Internet and TREC-8 corpora. On this corpus, the system discovered a total of 264 new concepts not defined in WordNet, of which 221 contain the seeds and 43 are other related concepts. The system also discovered 64 relationships that link these concepts with either WordNet concepts or with each other. The relationships were extracted with the help of 22 distinct lexico-syntactic patterns representing four semantic relations. It takes the system approximately 40 minutes per seed working in interactive mode to discover the new concepts and relationships on the 5000 sentence corpus.


2020 ◽  
Author(s):  
Abeed Sarker ◽  
Yuan-Chi Yang ◽  
Mohammed Ali Al-Garadi

AbstractThe performances of current medical text summarization systems rely on resource-heavy domain-specific knowledge sources, and preprocessing methods (e.g., classification or deep learning) for deriving semantic information. Consequently, these systems are often difficult to customize, extend or deploy in low-resource settings, and are operationally slow. We propose a fast summarization system that can aid practitioners at point-of-care, and, thus, improve evidence-based healthcare. At runtime, our system utilizes similarity measurements derived from pre-trained domain-specific word embeddings in addition to simple features, rather than clunky knowledge bases and resource-heavy preprocessing. Automatic evaluation on a public dataset for evidence-based medicine shows that our system’s performance, despite the simple implementation, is statistically comparable with the state-of-the-art.


2017 ◽  
Vol 10 (12) ◽  
pp. 1965-1968 ◽  
Author(s):  
S. Bharadwaj ◽  
L. Chiticariu ◽  
M. Danilevsky ◽  
S. Dhingra ◽  
S. Divekar ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document