Linking Named Entity in a Question with DBpedia Knowledge Base

Author(s):  
Huiying Li ◽  
Jing Shi
Keyword(s):  
Author(s):  
Yu Gong ◽  
Xusheng Luo ◽  
Yu Zhu ◽  
Wenwu Ou ◽  
Zhao Li ◽  
...  

Slot filling is a critical task in natural language understanding (NLU) for dialog systems. State-of-the-art approaches treat it as a sequence labeling problem and adopt such models as BiLSTM-CRF. While these models work relatively well on standard benchmark datasets, they face challenges in the context of E-commerce where the slot labels are more informative and carry richer expressions. In this work, inspired by the unique structure of E-commerce knowledge base, we propose a novel multi-task model with cascade and residual connections, which jointly learns segment tagging, named entity tagging and slot filling. Experiments show the effectiveness of the proposed cascade and residual structures. Our model has a 14.6% advantage in F1 score over the strong baseline methods on a new Chinese E-commerce shopping assistant dataset, while achieving competitive accuracies on a standard dataset. Furthermore, online test deployed on such dominant E-commerce platform shows 130% improvement on accuracy of understanding user utterances. Our model has already gone into production in the E-commerce platform.


Author(s):  
Dat Ba Nguyen ◽  
Martin Theobald ◽  
Gerhard Weikum

Methods for Named Entity Recognition and Disambiguation (NERD) perform NER and NED in two separate stages. Therefore, NED may be penalized with respect to precision by NER false positives, and suffers in recall from NER false negatives. Conversely, NED does not fully exploit information computed by NER such as types of mentions. This paper presents J-NERD, a new approach to perform NER and NED jointly, by means of a probabilistic graphical model that captures mention spans, mention types, and the mapping of mentions to entities in a knowledge base. We present experiments with different kinds of texts from the CoNLL’03, ACE’05, and ClueWeb’09-FACC1 corpora. J-NERD consistently outperforms state-of-the-art competitors in end-to-end NERD precision, recall, and F1.


2013 ◽  
Vol 22 (03) ◽  
pp. 1350018 ◽  
Author(s):  
M. D. JIMÉNEZ ◽  
N. FERNÁNDEZ ◽  
J. ARIAS FISTEUS ◽  
L. SÁNCHEZ

The amount of information available on the Web has grown considerably in recent years, leading to the need to structure it in order to access it in a quick and accurate way. In order to develop techniques to automate the structuring process, the Knowledge Base Population (KBP) track of the Text Analysis Conference (TAC) was created. This forum aims to encourage research in automated systems capable of capturing knowledge in unstructured information. One of the tasks proposed in the context of the KBP track is named entity linking, and its goal is to link named entities mentioned in a document to instances in a reference knowledge base built from Wikipedia. This paper focuses on the entity linking task in the context of KBP 2010, where two different varieties of this task were considered, depending on whether the use of the text from Wikipedia was allowed or not. Specifically, the paper proposes a set of modifications to a system that participated in KBP 2010, named WikiIdRank, in order to improve its performance. The different modifications were evaluated in the official KBP 2010 corpus, showing that the best combination increases the accuracy of the initial system in a 7.04%. Though the resultant system, named WikiIdRank++, is unsupervised and does not take advantage of Wikipedia text, a comparison with other approaches in KBP indicates that the system would rank as 4th (out of 16) in the global comparison, outperforming other approaches that use human supervision and take advantage of Wikipedia textual contents. Furthermore, the system would rank as 1st in the category of systems that do not use Wikipedia text.


2015 ◽  
Vol 22 (3) ◽  
pp. 423-456 ◽  
Author(s):  
MENA B. HABIB ◽  
MAURICE VAN KEULEN

AbstractTwitter is a rich source of continuously and instantly updated information. Shortness and informality of tweets are challenges for Natural Language Processing tasks. In this paper, we present TwitterNEED, a hybrid approach for Named Entity Extraction and Named Entity Disambiguation for tweets. We believe that disambiguation can help to improve the extraction process. This mimics the way humans understand language and reduces error propagation in the whole system. Our extraction approach aims for high extraction recall first, after which a Support Vector Machine attempts to filter out false positives among the extracted candidates using features derived from the disambiguation phase in addition to other word shape and Knowledge Base features. For Named Entity Disambiguation, we obtain a list of entity candidates from the YAGO Knowledge Base in addition to top-ranked pages from the Google search engine for each extracted mention. We use a Support Vector Machine to rank the candidate pages according to a set of URL and context similarity features. For evaluation, five data sets are used to evaluate the extraction approach, and three of them to evaluate both the disambiguation approach and the combined extraction and disambiguation approach. Experiments show better results compared to our competitors DBpedia Spotlight, Stanford Named Entity Recognition, and the AIDA disambiguation system.


2019 ◽  
Vol 06 (04) ◽  
pp. 471-487 ◽  
Author(s):  
Ngoc-Vu Nguyen ◽  
Thi-Lan Nguyen ◽  
Cam-Van Nguyen Thi ◽  
Mai-Vu Tran ◽  
Tri-Thanh Nguyen ◽  
...  

Named entity recognition (NER) is a fundamental task which affects the performance of its dependent task, e.g. machine translation. Lifelong machine learning (LML) is a continuous learning process, in which the knowledge base accumulated from previous tasks will be used to improve future learning tasks having few samples. Since there are a few studies on LML based on deep neural networks for NER, especially in Vietnamese, we propose a lifelong learning model based on deep learning with a CRFs layer, named DeepLML–NER, for NER in Vietnamese texts. DeepLML–NER includes an algorithm to extract the knowledge of “prefix-features” of named entities in previous domains. Then the model uses the knowledge in the knowledge base to solve the current NER task. Preprocessing and model parameter tuning are also investigated to improve the performance. The effect of the model was demonstrated by in-domain and cross-domain experiments, achieving promising results.


Author(s):  
N. Senthil Kumar ◽  
Dinakaran Muruganantham

The web and social web is holding the huge amount of unstructured data and makes the searching processing more cumbersome. The principal task here is to migrate the unstructured data into the structured data through the appropriate utilization of named entity detections. The goal of the paper is to automatically build and store the deep knowledge base of important facts and construct the comprehensive details about the facts such as its related named entities, its semantic classes of the entities and its mutual relationship with its temporal context can be thoroughly analyzed and probed. In this paper, the authors have given and proposed the model to identify all the major interpretations of the named entities and effectively link them to the appropriate mentions of the knowledge base (DBpedia). They finally evaluate the approaches that uniquely identify the DBpedia URIs of the selected entities and eliminate the other candidate mentions of the entities based on the authority rankings of those candidate mentions.


2017 ◽  
Vol 5 ◽  
pp. 233-246 ◽  
Author(s):  
Bhavana Dalvi Mishra ◽  
Niket Tandon ◽  
Peter Clark

Our goal is to construct a domain-targeted, high precision knowledge base (KB), containing general (subject,predicate,object) statements about the world, in support of a downstream question-answering (QA) application. Despite recent advances in information extraction (IE) techniques, no suitable resource for our task already exists; existing resources are either too noisy, too named-entity centric, or too incomplete, and typically have not been constructed with a clear scope or purpose. To address these, we have created a domain-targeted, high precision knowledge extraction pipeline, leveraging Open IE, crowdsourcing, and a novel canonical schema learning algorithm (called CASI), that produces high precision knowledge targeted to a particular domain - in our case, elementary science. To measure the KB’s coverage of the target domain’s knowledge (its “comprehensiveness” with respect to science) we measure recall with respect to an independent corpus of domain text, and show that our pipeline produces output with over 80% precision and 23% recall with respect to that target, a substantially higher coverage of tuple-expressible science knowledge than other comparable resources. We have made the KB publicly available.


Sign in / Sign up

Export Citation Format

Share Document