BiTeM at WNUT 2020 Shared Task-1: Named Entity Recognition over Wet Lab Protocols using an Ensemble of Contextual Language Models

AbstractThe health and life science domains are well known for their wealth of entities. These entities are presented as free text in large corpora, such as biomedical scientific and electronic health records. To enable the secondary use of these corpora and unlock their value, named entity recognition (NER) methods are proposed. Inspired by the success of deep masked language models, we present an ensemble approach for NER using these models. Results show statistically significant improvement of the ensemble models over baselines based on individual models in multiple domains - chemical, clinical and wet lab - and languages - English and French. The ensemble model achieves an overall performance of 79.2% macro F1-score, a 4.6 percentage point increase upon the baseline in multiple domains and languages. These results suggests that ensembles are a more effective strategy for tackling NER. We further perform a detailed analysis of their performance based on a set of entity properties.

Download Full-text

WNUT 2020 Shared Task-1: Conditional Random Field(CRF) based Named Entity Recognition(NER) for Wet Lab Protocols

10.18653/v1/2020.wnut-1.37 ◽

2020 ◽

Author(s):

Kaushik Acharya

Keyword(s):

Random Field ◽

Conditional Random Field ◽

Named Entity Recognition ◽

Entity Recognition ◽

Shared Task ◽

Named Entity ◽

Wet Lab

Download Full-text

Towards corpus and model: Hierarchical structured-attention-based features for Indonesian named entity recognition

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-202286 ◽

2021 ◽

pp. 1-12

Author(s):

Yingwen Fu ◽

Nankai Lin ◽

Xiaotian Lin ◽

Shengyi Jiang

Keyword(s):

Language Processing ◽

State Of The Art ◽

Named Entity Recognition ◽

Entity Recognition ◽

Language Models ◽

Neural Models ◽

Performance Models ◽

Named Entity ◽

High Resource ◽

Benchmark Datasets

Named entity recognition (NER) is fundamental to natural language processing (NLP). Most state-of-the-art researches on NER are based on pre-trained language models (PLMs) or classic neural models. However, these researches are mainly oriented to high-resource languages such as English. While for Indonesian, related resources (both in dataset and technology) are not yet well-developed. Besides, affix is an important word composition for Indonesian language, indicating the essentiality of character and token features for token-wise Indonesian NLP tasks. However, features extracted by currently top-performance models are insufficient. Aiming at Indonesian NER task, in this paper, we build an Indonesian NER dataset (IDNER) comprising over 50 thousand sentences (over 670 thousand tokens) to alleviate the shortage of labeled resources in Indonesian. Furthermore, we construct a hierarchical structured-attention-based model (HSA) for Indonesian NER to extract sequence features from different perspectives. Specifically, we use an enhanced convolutional structure as well as an enhanced attention structure to extract deeper features from characters and tokens. Experimental results show that HSA establishes competitive performance on IDNER and three benchmark datasets.

Download Full-text

Study of Pre-trained Language Models for Named Entity Recognition in Clinical Trial Eligibility Criteria from Multiple Corpora

10.1109/ichi52183.2021.00095 ◽

2021 ◽

Author(s):

Jianfu Li ◽

Qiang Wei ◽

Omid Ghiasvand ◽

Miao Chen ◽

Victor Lobanov ◽

...

Keyword(s):

Clinical Trial ◽

Named Entity Recognition ◽

Entity Recognition ◽

Language Models ◽

Eligibility Criteria ◽

Named Entity

Download Full-text

A Study on the Impact of Intradomain Finetuning of Deep Language Models for Legal Named Entity Recognition in Portuguese

Intelligent Systems - Lecture Notes in Computer Science ◽

10.1007/978-3-030-61377-8_46 ◽

2020 ◽

pp. 648-662

Author(s):

Luiz Henrique Bonifacio ◽

Paulo Arantes Vilela ◽

Gustavo Rocha Lobato ◽

Eraldo Rezende Fernandes

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Language Models ◽

Named Entity ◽

The Impact

Download Full-text

Robust Multilingual Named Entity Recognition with Shallow Semi-supervised Features (Extended Abstract)

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/703 ◽

2017 ◽

Cited By ~ 1

Author(s):

Rodrigo Agerri ◽

German Rigau

Keyword(s):

Reproducibility Of Results ◽

State Of The Art ◽

Named Entity Recognition ◽

Local Information ◽

Entity Recognition ◽

Shared Task ◽

Competitive System ◽

Named Entity ◽

Text Understanding ◽

Domain Models

We present a multilingual Named Entity Recognition approach based on a robust and general set of features across languages and datasets. Our system combines shallow local information with clustering semi-supervised features induced on large amounts of unlabeled text. Understanding via empiricalexperimentation how to effectively combine various types of clustering features allows us to seamlessly export our system to other datasets and languages. The result is a simple but highly competitive system which obtains state of the art results across five languages and twelve datasets. The results are reported on standard shared task evaluation data such as CoNLL for English, Spanish and Dutch. Furthermore, and despite the lack of linguistically motivated features, we also report best results for languages such as Basque and German. In addition, we demonstrate that our method also obtains very competitive results even when the amount of supervised data is cut by half, alleviating the dependency on manually annotated data. Finally, the results show that our emphasis on clustering features is crucial to develop robust out-of-domain models. The system and models are freely available to facilitate its use and guarantee the reproducibility of results.

Download Full-text

Overview of CCKS 2020 Task 3: Named Entity Recognition and Event Extraction in Chinese Electronic Medical Records

Data Intelligence ◽

10.1162/dint_a_00093 ◽

2021 ◽

pp. 1-13

Author(s):

Xia Li ◽

Qinghua Wen ◽

Zengtao Jiao ◽

Jiangtao Zhang

Keyword(s):

Electronic Medical Records ◽

Medical Records ◽

Named Entity Recognition ◽

Event Extraction ◽

Entity Recognition ◽

Language Models ◽

Data Sets ◽

External Resources ◽

Named Entity ◽

Evaluation Task

Abstract The China Conference on Knowledge Graph and Semantic Computing (CCKS) 2020 Evaluation Task 3 presented clinical named entity recognition and event extraction for the Chinese electronic medical records. Two annotated data sets and some other additional resources for these two subtasks were provided for participators. This evaluation competition attracted 354 teams and 46 of them successfully submitted the valid results. The pre-trained language models are widely applied in this evaluation task. Data argumentation and external resources are also helpful.

Download Full-text

Information Extraction from Electronic Medical Records Using Multitask Recurrent Neural Network with Contextual Word Embedding

Applied Sciences ◽

10.3390/app9183658 ◽

2019 ◽

Vol 9 (18) ◽

pp. 3658 ◽

Cited By ~ 6

Author(s):

Jianliang Yang ◽

Yuenan Liu ◽

Minghui Qian ◽

Chenghua Guan ◽

Xiangfei Yuan

Keyword(s):

Electronic Medical Records ◽

Medical Records ◽

Large Scale ◽

Short Term Memory ◽

Conditional Random Field ◽

Named Entity Recognition ◽

Recognition Task ◽

Entity Recognition ◽

Language Models ◽

Named Entity

Clinical named entity recognition is an essential task for humans to analyze large-scale electronic medical records efficiently. Traditional rule-based solutions need considerable human effort to build rules and dictionaries; machine learning-based solutions need laborious feature engineering. For the moment, deep learning solutions like Long Short-term Memory with Conditional Random Field (LSTM–CRF) achieved considerable performance in many datasets. In this paper, we developed a multitask attention-based bidirectional LSTM–CRF (Att-biLSTM–CRF) model with pretrained Embeddings from Language Models (ELMo) in order to achieve better performance. In the multitask system, an additional task named entity discovery was designed to enhance the model’s perception of unknown entities. Experiments were conducted on the 2010 Informatics for Integrating Biology & the Bedside/Veterans Affairs (I2B2/VA) dataset. Experimental results show that our model outperforms the state-of-the-art solution both on the single model and ensemble model. Our work proposes an approach to improve the recall in the clinical named entity recognition task based on the multitask mechanism.

Download Full-text

RUREBUS-2020 SHARED TASK: RUSSIAN RELATION EXTRACTION FOR BUSINESS

Computational Linguistics and Intellectual Technologies ◽

10.28995/2075-7182-2020-19-416-431 ◽

2020 ◽

Cited By ~ 1

Author(s):

V. A. Ivanin ◽

◽

E. L. Artemova ◽

T. V. Batura ◽

V. V. Ivanov ◽

...

Keyword(s):

Named Entity Recognition ◽

Relation Extraction ◽

Business Case ◽

Entity Recognition ◽

Quality Data ◽

Shared Task ◽

Annotation Scheme ◽

Data Set ◽

Named Entity ◽

Short Time

In this paper, we present a shared task on core information extraction problems, named entity recognition and relation extraction. In contrast to popular shared tasks on related problems, we try to move away from strictly academic rigor and rather model a business case. As a source for textual data we choose the corpus of Russian strategic documents, which we annotated according to our own annotation scheme. To speed up the annotation process, we exploit various active learning techniques. In total we ended up with more than two hundred annotated documents. Thus we managed to create a high-quality data set in short time. The shared task consisted of three tracks, devoted to 1) named entity recognition, 2) relation extraction and 3) joint named entity recognition and relation extraction. We provided with the annotated texts as well as a set of unannotated texts, which could of been used in any way to improve solutions. In the paper we overview and compare solutions, submitted by the shared task participants. We release both raw and annotated corpora along with annotation guidelines, evaluation scripts and results at https://github.com/dialogue-evaluation/RuREBus.

Download Full-text

Combining word-based features, statistical language models, and parsing for named entity recognition

10.21437/interspeech.2010-404 ◽

2010 ◽

Author(s):

Joseph Polifroni ◽

Stephanie Seneff

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Language Models ◽

Named Entity ◽

Statistical Language Models

Download Full-text