scholarly journals Building chatbots from large scale domain-specific knowledge bases: challenges and opportunities

Author(s):  
Walid Shalaby ◽  
Adriano Arantes ◽  
Teresa GonzalezDiaz ◽  
Chetan Gupta
2017 ◽  
Vol 10 (12) ◽  
pp. 1965-1968 ◽  
Author(s):  
S. Bharadwaj ◽  
L. Chiticariu ◽  
M. Danilevsky ◽  
S. Dhingra ◽  
S. Divekar ◽  
...  

2004 ◽  
Vol 13 (03) ◽  
pp. 721-738 ◽  
Author(s):  
XIAOYING GAO ◽  
MENGJIE ZHANG

This paper describes a learning/adaptive approach to automatically building knowledge bases for information extraction from text based web pages. A frame based representation is introduced to represent domain knowledge as knowledge unit frames. A frame learning algorithm is developed to automatically learn knowledge unit frames from training examples. Some training examples can be obtained by automatically parsing a number of tabular web pages in the same domain, which greatly reduced the amount of time consuming manual work. This approach was investigated on ten web sites of real estate advertisements and car advertisements and nearly all the information was successfully extracted with very few false alarms. These results suggest that both the knowledge unit frame representation and the frame learning algorithm work well, domain specific knowledge bases can be learned from training examples, and the domain specific knowledge base can be used for information extraction from flexible text-based semi-structured Web pages on multiple Web sites. The investigation of the knowledge representation on five other domains suggests that this approach can be easily applied to other domains by simply changing the training examples.


2001 ◽  
Vol 10 (01n02) ◽  
pp. 65-86 ◽  
Author(s):  
DAN I. MOLDOVAN ◽  
ROXANA C. GÎRJU

It is widely accepted that more knowledge means more intelligence. In many knowledge intensive applications, it is necessary to have extensive domain-specific knowledge in addition to general-purpose knowledge bases. This paper presents a methodology for discovering domain-specific concepts and relationships in an attempt to extend WordNet. The method was tested on five seed concepts selected from the financial domain: interest rate, stock market, inflation, economic growth, and employment. Queries were formed with each of these concepts and a corpus of 5000 sentences was extracted automatically from the Internet and TREC-8 corpora. On this corpus, the system discovered a total of 264 new concepts not defined in WordNet, of which 221 contain the seeds and 43 are other related concepts. The system also discovered 64 relationships that link these concepts with either WordNet concepts or with each other. The relationships were extracted with the help of 22 distinct lexico-syntactic patterns representing four semantic relations. It takes the system approximately 40 minutes per seed working in interactive mode to discover the new concepts and relationships on the 5000 sentence corpus.


2020 ◽  
Author(s):  
Abeed Sarker ◽  
Yuan-Chi Yang ◽  
Mohammed Ali Al-Garadi

AbstractThe performances of current medical text summarization systems rely on resource-heavy domain-specific knowledge sources, and preprocessing methods (e.g., classification or deep learning) for deriving semantic information. Consequently, these systems are often difficult to customize, extend or deploy in low-resource settings, and are operationally slow. We propose a fast summarization system that can aid practitioners at point-of-care, and, thus, improve evidence-based healthcare. At runtime, our system utilizes similarity measurements derived from pre-trained domain-specific word embeddings in addition to simple features, rather than clunky knowledge bases and resource-heavy preprocessing. Automatic evaluation on a public dataset for evidence-based medicine shows that our system’s performance, despite the simple implementation, is statistically comparable with the state-of-the-art.


Author(s):  
Nidhi Goyal ◽  
Niharika Sachdeva ◽  
Vijay Choudhary ◽  
Rijula Kar ◽  
Ponnurangam Kumaraguru ◽  
...  

2020 ◽  
Vol 34 (03) ◽  
pp. 2901-2908 ◽  
Author(s):  
Weijie Liu ◽  
Peng Zhou ◽  
Zhe Zhao ◽  
Zhiruo Wang ◽  
Qi Ju ◽  
...  

Pre-trained language representation models, such as BERT, capture a general language representation from large-scale corpora, but lack domain-specific knowledge. When reading a domain text, experts make inferences with relevant knowledge. For machines to achieve this capability, we propose a knowledge-enabled language representation model (K-BERT) with knowledge graphs (KGs), in which triples are injected into the sentences as domain knowledge. However, too much knowledge incorporation may divert the sentence from its correct meaning, which is called knowledge noise (KN) issue. To overcome KN, K-BERT introduces soft-position and visible matrix to limit the impact of knowledge. K-BERT can easily inject domain knowledge into the models by being equipped with a KG without pre-training by itself because it is capable of loading model parameters from the pre-trained BERT. Our investigation reveals promising results in twelve NLP tasks. Especially in domain-specific tasks (including finance, law, and medicine), K-BERT significantly outperforms BERT, which demonstrates that K-BERT is an excellent choice for solving the knowledge-driven problems that require experts.


Sign in / Sign up

Export Citation Format

Share Document