Evaluating Dutch Named Entity Recognition and De-Identification Methods in the Human Resource Domain

The human resource (HR) domain contains various types of privacy-sensitive textual data, such as e-mail correspondence and performance appraisal. Doing research on these documents brings several challenges, one of them anonymisation. In this paper, we evaluate the current Dutch text de-identification methods for the HR domain in three steps. First, by updating one of these methods with the latest named entity recognition (NER) models. The result is that the NER model based on the CoNLL 2002 corpus in combination with the BERTje transformer give the best combination for suppressing persons (recall 0.94) and locations (recall 0.82). For suppressing gender, DEDUCE is performing best (recall 0.53). Second NER evaluation is based on both strict de-identification of entities (a person must be suppressed as a person) and third evaluation on a loose sense of de-identification (no matter what how a person is suppressed, as long it is suppressed).

Download Full-text

Dutch Named Entity Recognition and De-Identification Methods for the Human Resource Domain

International Journal on Natural Language Computing ◽

10.5121/ijnlc.2020.9602 ◽

2020 ◽

Vol 9 (6) ◽

pp. 23-34

Author(s):

Chaïm van Toledo ◽

Friso van Dijk ◽

Marco Spruit

Keyword(s):

Performance Appraisal ◽

Human Resource ◽

Dutch Text ◽

Named Entity Recognition ◽

Entity Recognition ◽

Identification Methods ◽

Named Entity ◽

Job Titles ◽

Textual Data ◽

And Performance

The human resource (HR) domain contains various types of privacy-sensitive textual data, such as e-mail correspondence and performance appraisal. Doing research on these documents brings several challenges, one of them anonymisation. In this paper, we evaluate the current Dutch text de-identification methods for the HR domain in four steps. First, by updating one of these methods with the latest named entity recognition (NER) models. The result is that the NER model based on the CoNLL 2002 corpus in combination with the BERTje transformer give the best combination for suppressing persons (recall 0.94) and locations (recall 0.82). For suppressing gender, DEDUCE is performing best (recall 0.53). Second NER evaluation is based on both strict de-identification of entities (a person must be suppressed as a person) and third evaluation on a loose sense of de-identification (no matter what how a person is suppressed, as long it is suppressed). In the fourth and last step a new kind of NER dataset is tested for recognising job titles in tezts.

Download Full-text

Named Entity Recognition and Relation Extraction

ACM Computing Surveys ◽

10.1145/3445965 ◽

2021 ◽

Vol 54 (1) ◽

pp. 1-39

Author(s):

Zara Nasar ◽

Syed Waqar Jaffry ◽

Muhammad Kamran Malik

Keyword(s):

Deep Learning ◽

State Of The Art ◽

Named Entity Recognition ◽

Relation Extraction ◽

The State ◽

Entity Recognition ◽

Joint Models ◽

Named Entity ◽

Textual Data ◽

Benchmark Datasets

With the advent of Web 2.0, there exist many online platforms that result in massive textual-data production. With ever-increasing textual data at hand, it is of immense importance to extract information nuggets from this data. One approach towards effective harnessing of this unstructured textual data could be its transformation into structured text. Hence, this study aims to present an overview of approaches that can be applied to extract key insights from textual data in a structured way. For this, Named Entity Recognition and Relation Extraction are being majorly addressed in this review study. The former deals with identification of named entities, and the latter deals with problem of extracting relation between set of entities. This study covers early approaches as well as the developments made up till now using machine learning models. Survey findings conclude that deep-learning-based hybrid and joint models are currently governing the state-of-the-art. It is also observed that annotated benchmark datasets for various textual-data generators such as Twitter and other social forums are not available. This scarcity of dataset has resulted into relatively less progress in these domains. Additionally, the majority of the state-of-the-art techniques are offline and computationally expensive. Last, with increasing focus on deep-learning frameworks, there is need to understand and explain the under-going processes in deep architectures.

Download Full-text

Making Travel Smarter: Extracting Travel Information From E-Mail Itineraries Using Named Entity Recognition

RANLP 2017 - Recent Advances in Natural Language Processing Meet Deep Learning ◽

10.26615/978-954-452-049-6_047 ◽

2017 ◽

Author(s):

Divyansh Kaushik ◽

◽

Shashank Gupta ◽

Chakradhar Raju ◽

Reuben Dias ◽

...

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Travel Information ◽

Named Entity ◽

E Mail

Download Full-text

Datasets and Performance Metrics for Greek Named Entity Recognition

11th Hellenic Conference on Artificial Intelligence ◽

10.1145/3411408.3411437 ◽

2020 ◽

Author(s):

Nikos Bartziokas ◽

Thanassis Mavropoulos ◽

Constantine Kotropoulos

Keyword(s):

Performance Metrics ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

And Performance

Download Full-text

An Enhanced Malay Named Entity Recognition using Combination Approach for Crime Textual Data Analysis

International Journal of Advanced Computer Science and Applications ◽

10.14569/ijacsa.2018.090960 ◽

2018 ◽

Vol 9 (9) ◽

Author(s):

Siti Azirah Asmai ◽

Muhammad Sharilazlan ◽

Halizah Basiron ◽

Sabrina Ahmad

Keyword(s):

Data Analysis ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Textual Data ◽

Combination Approach

Download Full-text

ABioNER: A BERT-Based Model for Arabic Biomedical Named-Entity Recognition

Complexity ◽

10.1155/2021/6633213 ◽

2021 ◽

Vol 2021 ◽

pp. 1-6

Author(s):

Nada Boudjellal ◽

Huaping Zhang ◽

Asif Khan ◽

Arshad Ahmad ◽

Rashid Naseem ◽

...

Keyword(s):

Named Entity Recognition ◽

Model Performance ◽

Recognition Task ◽

Entity Recognition ◽

Small Scale ◽

Text Data ◽

Named Entities ◽

Named Entity ◽

Textual Data ◽

Biomedical Named Entity Recognition

The web is being loaded daily with a huge volume of data, mainly unstructured textual data, which increases the need for information extraction and NLP systems significantly. Named-entity recognition task is a key step towards efficiently understanding text data and saving time and effort. Being a widely used language globally, English is taking over most of the research conducted in this field, especially in the biomedical domain. Unlike other languages, Arabic suffers from lack of resources. This work presents a BERT-based model to identify biomedical named entities in the Arabic text data (specifically disease and treatment named entities) that investigates the effectiveness of pretraining a monolingual BERT model with a small-scale biomedical dataset on enhancing the model understanding of Arabic biomedical text. The model performance was compared with two state-of-the-art models (namely, AraBERT and multilingual BERT cased), and it outperformed both models with 85% F1-score.

Download Full-text