Joint Posterior Revision of NLP Annotations via Ontological Knowledge

Different well-established NLP tasks contribute to elicit the semantics of entities mentioned in natural language text, such as Named Entity Recognition and Classification (NERC) and Entity Linking (EL). However, combining the outcomes of these tasks may result in NLP annotations --- such as a NERC organization linked by EL to a person --- that are unlikely or contradictory when interpreted in the light of common world knowledge about the entities these annotations refer to. We thus propose a general probabilistic model that explicitly captures the relations between multiple NLP annotations for an entity mention, the ontological entity classes implied by those annotations, and the background ontological knowledge those classes may be consistent with. We use the model to estimate the posterior probability of NLP annotations given their confidences (prior probabilities) and the ontological knowledge, and consequently revise the best annotation choice performed by the NLP tools. In a concrete scenario with two state-of-the-art tools for NERC and EL, we experimentally show on three reference datasets that for these tasks, the joint annotation revision performed by the model consistently improves on the original results of the tools.

Download Full-text

IMPROVING CHINESE MEDICAL NAMED ENTITY RECOGNITION USING GLYPH AND LEXICON

10.12783/dtssehs/aeim2021/35969 ◽

2021 ◽

Author(s):

SHANHAO ZHONG ◽

QINGSONG YU

Keyword(s):

Semantic Representation ◽

Experimental Studies ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entities ◽

Natural Language Text ◽

Named Entity ◽

Structured Information ◽

Text Information ◽

Language Text

Abstract. Medical named entity recognition is the first step in processing electronic medical records. It is the basis for processing medical natural language text information into medical structured information, which has extremely high research value and application value. In this paper, we have proposed a model that aims to identify various types of named entities such as disease, imaging examination, laboratory examination, operation, drug, and anatomy from Chinese electronic medical record. We construct a fusion Glyph and lexicon model based on BERT. Experimental studies have shown that increasing character-level semantic representation can improve the performance of named entity recognition. In order to boost it, the major measures of our model include: (1) a CNN structure is proposed to capture glyph information. (2) Soft-Lexicon method is introduced to encode lexicon information. Our models show an improvement over the baseline BERT-BiLSTM-CRF model. The experimental results on CCKS2019 dataset showed that the F1 score was 84.64, which was +1.99 higher than the baseline level.

Download Full-text

GENERATION OF A LARGE GENE/PROTEIN LEXICON BY MORPHOLOGICAL PATTERN ANALYSIS

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720004000399 ◽

2004 ◽

Vol 01 (04) ◽

pp. 611-626 ◽

Cited By ~ 11

Author(s):

LORRAINE TANABE ◽

W. JOHN WILBUR

Keyword(s):

Pattern Analysis ◽

Inductive Logic ◽

Named Entity Recognition ◽

Entity Recognition ◽

Large Set ◽

Morphological Pattern ◽

Natural Language Text ◽

Named Entity ◽

Large Gene ◽

Language Text

The identification of gene/protein names in natural language text is an important problem in named entity recognition. In previous work we have processed MEDLINE© documents to obtain a collection of over two million names of which we estimate that perhaps two thirds are valid gene/protein names. Our problem has been how to purify this set to obtain a high quality subset of gene/protein names. Here we describe an approach which is based on the generation of certain classes of names that are characterized by common morphological features. Within each class inductive logic programming (ILP) is applied to learn the characteristics of those names that are gene/protein names. The criteria learned in this manner are then applied to our large set of names. We generated 193 classes of names and ILP led to criteria defining a select subset of 1,240,462 names. A simple false positive filter was applied to remove 8% of this set leaving 1,145,913 names. Examination of a random sample from this gene/protein name lexicon suggests it is composed of 82% (±3%) complete and accurate gene/protein names, 12% names related to genes/proteins (too generic, a valid name plus additional text, part of a valid name, etc.), and 6% names unrelated to genes/proteins. The lexicon is freely available at .

Download Full-text

Named Entity Recognition for Entity Linking: What Works and What’s Next

10.18653/v1/2021.findings-emnlp.220 ◽

2021 ◽

Author(s):

Simone Tedeschi ◽

Simone Conia ◽

Francesco Cecconi ◽

Roberto Navigli

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Entity Linking ◽

Named Entity ◽

What Works

Download Full-text

Enhancing Named Entity Recognition in Twitter Messages Using Entity Linking

10.18653/v1/w15-4320 ◽

2015 ◽

Cited By ~ 10

Author(s):

Ikuya Yamada ◽

Hideaki Takeda ◽

Yoshiyasu Takefuji

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Entity Linking ◽

Named Entity

Download Full-text

Crowdsourcing Named Entity Recognition and Entity Linking Corpora

Handbook of Linguistic Annotation ◽

10.1007/978-94-024-0881-2_32 ◽

2017 ◽

pp. 875-892 ◽

Cited By ~ 6

Author(s):

Kalina Bontcheva ◽

Leon Derczynski ◽

Ian Roberts

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Entity Linking ◽

Named Entity

Download Full-text

Combining dictionary- and rule-based approximate entity linking with tuned BioBERT

10.1101/2021.11.09.467905 ◽

2021 ◽

Author(s):

Ghadeer Mobasher ◽

Lukrecia Mertova ◽

Sucheta Ghosh ◽

Olga Krebs ◽

Bettina Heinlein ◽

...

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Entity Linking ◽

Rule Based ◽

Approximate Search ◽

Named Entity ◽

Significant Step ◽

Rule Based Approach ◽

Future Work ◽

Chemical Named Entity Recognition

Chemical named entity recognition (NER) is a significant step for many downstream applications like entity linking for the chemical text-mining pipeline. However, the identification of chemical entities in a biomedical text is a challenging task due to the diverse morphology of chemical entities and the different types of chemical nomenclature. In this work, we describe our approach that was submitted for BioCreative version 7 challenge Track 2, focusing on the "Chemical Identification" task for identifying chemical entities and entity linking, using MeSH. For this purpose, we have applied a two-stage approach as follows (a) usage of fine-tuned BioBERT for identification of chemical entities (b) semantic approximate search in MeSH and PubChem databases for entity linking. There was some friction between the two approaches, as our rule-based approach did not harmonise optimally with partially recognized words forwarded by the BERT component. For our future work, we aim to resolve the issue of the artefacts arising from BERT tokenizers and develop joint learning of chemical named entity recognition and entity linking using pretrained transformer-based models and compare their performance with our preliminary approach. Next, we will improve the efficiency of our approximate search in reference databases during entity linking. This task is non-trivial as it entails determining similarity scores of large sets of trees with respect to a query tree. Ideally, this will enable flexible parametrization and rule selection for the entity linking search.

Download Full-text

Joint Learning of Named Entity Recognition and Entity Linking

10.18653/v1/p19-2026 ◽

2019 ◽

Cited By ~ 1

Author(s):

Pedro Henrique Martins ◽

Zita Marinho ◽

André F. T. Martins

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Entity Linking ◽

Joint Learning ◽

Named Entity

Download Full-text

A Joint Model for Entity Analysis: Coreference, Typing, and Linking

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00197 ◽

2014 ◽

Vol 2 ◽

pp. 477-490 ◽

Cited By ~ 34

Author(s):

Greg Durrett ◽

Dan Klein

Keyword(s):

State Of The Art ◽

Conditional Random Field ◽

Named Entity Recognition ◽

Joint Modeling ◽

Joint Model ◽

Entity Recognition ◽

Coreference Resolution ◽

Entity Linking ◽

Semantic Type ◽

Named Entity

We present a joint model of three core tasks in the entity analysis stack: coreference resolution (within-document clustering), named entity recognition (coarse semantic typing), and entity linking (matching to Wikipedia entities). Our model is formally a structured conditional random field. Unary factors encode local features from strong baselines for each task. We then add binary and ternary factors to capture cross-task interactions, such as the constraint that coreferent mentions have the same semantic type. On the ACE 2005 and OntoNotes datasets, we achieve state-of-the-art results for all three tasks. Moreover, joint modeling improves performance on each task over strong independent baselines.

Download Full-text