scholarly journals Joint Posterior Revision of NLP Annotations via Ontological Knowledge

Author(s):  
Marco Rospocher ◽  
Francesco Corcoglioniti

Different well-established NLP tasks contribute to elicit the semantics of entities mentioned in natural language text, such as Named Entity Recognition and Classification (NERC) and Entity Linking (EL). However, combining the outcomes of these tasks may result in NLP annotations --- such as a NERC organization linked by EL to a person --- that are unlikely or contradictory when interpreted in the light of common world knowledge about the entities these annotations refer to. We thus propose a general probabilistic model that explicitly captures the relations between multiple NLP annotations for an entity mention, the ontological entity classes implied by those annotations, and the background ontological knowledge those classes may be consistent with. We use the model to estimate the posterior probability of NLP annotations given their confidences (prior probabilities) and the ontological knowledge, and consequently revise the best annotation choice performed by the NLP tools. In a concrete scenario with two state-of-the-art tools for NERC and EL, we experimentally show on three reference datasets that for these tasks, the joint annotation revision performed by the model consistently improves on the original results of the tools.

2021 ◽  
Author(s):  
SHANHAO ZHONG ◽  
QINGSONG YU

Abstract. Medical named entity recognition is the first step in processing electronic medical records. It is the basis for processing medical natural language text information into medical structured information, which has extremely high research value and application value. In this paper, we have proposed a model that aims to identify various types of named entities such as disease, imaging examination, laboratory examination, operation, drug, and anatomy from Chinese electronic medical record. We construct a fusion Glyph and lexicon model based on BERT. Experimental studies have shown that increasing character-level semantic representation can improve the performance of named entity recognition. In order to boost it, the major measures of our model include: (1) a CNN structure is proposed to capture glyph information. (2) Soft-Lexicon method is introduced to encode lexicon information. Our models show an improvement over the baseline BERT-BiLSTM-CRF model. The experimental results on CCKS2019 dataset showed that the F1 score was 84.64, which was +1.99 higher than the baseline level.


2004 ◽  
Vol 01 (04) ◽  
pp. 611-626 ◽  
Author(s):  
LORRAINE TANABE ◽  
W. JOHN WILBUR

The identification of gene/protein names in natural language text is an important problem in named entity recognition. In previous work we have processed MEDLINE© documents to obtain a collection of over two million names of which we estimate that perhaps two thirds are valid gene/protein names. Our problem has been how to purify this set to obtain a high quality subset of gene/protein names. Here we describe an approach which is based on the generation of certain classes of names that are characterized by common morphological features. Within each class inductive logic programming (ILP) is applied to learn the characteristics of those names that are gene/protein names. The criteria learned in this manner are then applied to our large set of names. We generated 193 classes of names and ILP led to criteria defining a select subset of 1,240,462 names. A simple false positive filter was applied to remove 8% of this set leaving 1,145,913 names. Examination of a random sample from this gene/protein name lexicon suggests it is composed of 82% (±3%) complete and accurate gene/protein names, 12% names related to genes/proteins (too generic, a valid name plus additional text, part of a valid name, etc.), and 6% names unrelated to genes/proteins. The lexicon is freely available at .


Author(s):  
Simone Tedeschi ◽  
Simone Conia ◽  
Francesco Cecconi ◽  
Roberto Navigli

2021 ◽  
Author(s):  
Ghadeer Mobasher ◽  
Lukrecia Mertova ◽  
Sucheta Ghosh ◽  
Olga Krebs ◽  
Bettina Heinlein ◽  
...  

Chemical named entity recognition (NER) is a significant step for many downstream applications like entity linking for the chemical text-mining pipeline. However, the identification of chemical entities in a biomedical text is a challenging task due to the diverse morphology of chemical entities and the different types of chemical nomenclature. In this work, we describe our approach that was submitted for BioCreative version 7 challenge Track 2, focusing on the "Chemical Identification" task for identifying chemical entities and entity linking, using MeSH. For this purpose, we have applied a two-stage approach as follows (a) usage of fine-tuned BioBERT for identification of chemical entities (b) semantic approximate search in MeSH and PubChem databases for entity linking. There was some friction between the two approaches, as our rule-based approach did not harmonise optimally with partially recognized words forwarded by the BERT component. For our future work, we aim to resolve the issue of the artefacts arising from BERT tokenizers and develop joint learning of chemical named entity recognition and entity linking using pretrained transformer-based models and compare their performance with our preliminary approach. Next, we will improve the efficiency of our approximate search in reference databases during entity linking. This task is non-trivial as it entails determining similarity scores of large sets of trees with respect to a query tree. Ideally, this will enable flexible parametrization and rule selection for the entity linking search.


2019 ◽  
Author(s):  
Pedro Henrique Martins ◽  
Zita Marinho ◽  
André F. T. Martins

Author(s):  
Greg Durrett ◽  
Dan Klein

We present a joint model of three core tasks in the entity analysis stack: coreference resolution (within-document clustering), named entity recognition (coarse semantic typing), and entity linking (matching to Wikipedia entities). Our model is formally a structured conditional random field. Unary factors encode local features from strong baselines for each task. We then add binary and ternary factors to capture cross-task interactions, such as the constraint that coreferent mentions have the same semantic type. On the ACE 2005 and OntoNotes datasets, we achieve state-of-the-art results for all three tasks. Moreover, joint modeling improves performance on each task over strong independent baselines.


Sign in / Sign up

Export Citation Format

Share Document