A hybrid approach to Arabic named entity recognition

The process of protecting sensitive data is continually growing and becoming increasingly important, especially as a result of the directives and laws imposed by the European Union. The effort to create automatic systems is continuous, but, in most cases, the processes behind them are still manual or semi-automatic. In this work, we have developed a component that can extract and classify sensitive data, from unstructured text information in European Portuguese. The objective was to create a system that allows organizations to understand their data and comply with legal and security purposes. We studied a hybrid approach to the problem of Named Entity Recognition for the Portuguese language. This approach combines several techniques such as rule-based/lexical-based models, machine learning algorithms, and neural networks. The rule-based and lexical-based approaches were used only for a set of specific classes. For the remaining classes of entities, two statistical models were tested—Conditional Random Fields and Random Forest and, finally, a Bidirectional-LSTM approach as experimented. Regarding the statistical models, we realized that Conditional Random Fields is the one that can obtain the best results, with a f1-score of 65.50%. With the Bi-LSTM approach, we have achieved a result of 83.01%. The corpora used for training and testing were HAREM Golden Collection, SIGARRA News Corpus, and DataSense NER Corpus.

Download Full-text

Data-Augmented Hybrid Named Entity Recognition for Disaster Management by Transfer Learning

Applied Sciences ◽

10.3390/app10124234 ◽

2020 ◽

Vol 10 (12) ◽

pp. 4234 ◽

Cited By ~ 3

Author(s):

Hung-Kai Kung ◽

Chun-Mo Hsieh ◽

Cheng-Yu Ho ◽

Yun-Cheng Tsai ◽

Hao-Yung Chan ◽

...

Keyword(s):

Transfer Learning ◽

Disaster Management ◽

Data Augmentation ◽

Reference Model ◽

Conditional Random Field ◽

Hybrid Approach ◽

Named Entity Recognition ◽

Entity Recognition ◽

Data Set ◽

Named Entity

This research aims to build a Mandarin named entity recognition (NER) module using transfer learning to facilitate damage information gathering and analysis in disaster management. The hybrid NER approach proposed in this research includes three modules: (1) data augmentation, which constructs a concise data set for disaster management; (2) reference model, which utilizes the bidirectional long short-term memory–conditional random field framework to implement NER; and (3) the augmented model built by integrating the first two modules via cross-domain transfer with disparate label sets. Through the combination of established rules and learned sentence patterns, the hybrid approach performs well in NER tasks for disaster management and recognizes unfamiliar words successfully. This research applied the proposed NER module to disaster management. In the application, we favorably handled the NER tasks of our related work and achieved our desired outcomes. Through proper transfer, the results of this work can be extended to other fields and consequently bring valuable advantages in diverse applications.

Download Full-text