A comparative study of Named Entity Recognition for Arabic using ensemble learning approaches

Named Entity Recognition (NER) from text constitutes the first step in many text mining applications. The most important preliminary step for NER systems using machine learning approaches is tokenization where raw text is segmented into tokens. This study proposes an enhanced rule based tokenizer, ChemTok, which utilizes rules extracted mainly from the train data set. The main novelty of ChemTok is the use of the extracted rules in order to merge the tokens split in the previous steps, thus producing longer and more discriminative tokens. ChemTok is compared to the tokenization methods utilized by ChemSpot and tmChem. Support Vector Machines and Conditional Random Fields are employed as the learning algorithms. The experimental results show that the classifiers trained on the output of ChemTok outperforms all classifiers trained on the output of the other two tokenizers in terms of classification performance, and the number of incorrectly segmented entities.

Download Full-text

A Comparative Study on the Performance of Named Entity Recognition in Materials and Chemistry Fields through Multiple Embedding Combination Based on a Pre-trained Neural Network Language Model

Journal of KIISE ◽

10.5626/jok.2021.48.6.696 ◽

2021 ◽

Vol 48 (6) ◽

pp. 696-706

Author(s):

Myunghoon Lee ◽

Hyeonho Shin ◽

Hong-Woo Chun ◽

Jae-Min Lee ◽

Taehyun Ha ◽

...

Keyword(s):

Neural Network ◽

Comparative Study ◽

Language Model ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Trained Neural Network ◽

Network Language

Download Full-text

A Comparative Study of Named Entity Recognition for Telugu

Proceedings of the 9th annual meeting of the Forum for Information Retrieval Evaluation on ZZZ - FIRE'17 ◽

10.1145/3158354.3158358 ◽

2017 ◽

Author(s):

SaiKiranmai Gorla ◽

N. L. Bhanu Murthy ◽

Aruna Malapati

Keyword(s):

Comparative Study ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity

Download Full-text

A Comparative Study of Named Entity Recognition for Hindi Using Sequential Learning Algorithms

2009 IEEE International Advance Computing Conference ◽

10.1109/iadcc.2009.4809179 ◽

2009 ◽

Cited By ~ 3

Author(s):

Awaghad Ashish Krishnarao ◽

Himanshu Gahlot ◽

Amit Srinet ◽

D. S. Kushwaha

Keyword(s):

Comparative Study ◽

Learning Algorithms ◽

Named Entity Recognition ◽

Entity Recognition ◽

Sequential Learning ◽

Named Entity

Download Full-text

The Application of Ensemble Learning on Named Entity Recognition for Legal Knowledgebase of Properties Involved in Criminal Cases

2020 IEEE International Conference on Advances in Electrical Engineering and Computer Applications( AEECA) ◽

10.1109/aeeca49918.2020.9213660 ◽

2020 ◽

Author(s):

Zongshen Jiang

Keyword(s):

Ensemble Learning ◽

Named Entity Recognition ◽

Entity Recognition ◽

Criminal Cases ◽

Named Entity

Download Full-text

Biomedical named entity recognition using deep neural networks with contextual information

BMC Bioinformatics ◽

10.1186/s12859-019-3321-4 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 8

Author(s):

Hyejin Cho ◽

Hyunju Lee

Keyword(s):

Deep Learning ◽

Short Term Memory ◽

Contextual Information ◽

Named Entity Recognition ◽

Entity Recognition ◽

Learning Approaches ◽

Short Term ◽

Term Memory ◽

Named Entity ◽

Long Short Term Memory

Abstract Background In biomedical text mining, named entity recognition (NER) is an important task used to extract information from biomedical articles. Previously proposed methods for NER are dictionary- or rule-based methods and machine learning approaches. However, these traditional approaches are heavily reliant on large-scale dictionaries, target-specific rules, or well-constructed corpora. These methods to NER have been superseded by the deep learning-based approach that is independent of hand-crafted features. However, although such methods of NER employ additional conditional random fields (CRF) to capture important correlations between neighboring labels, they often do not incorporate all the contextual information from text into the deep learning layers. Results We propose herein an NER system for biomedical entities by incorporating n-grams with bi-directional long short-term memory (BiLSTM) and CRF; this system is referred to as a contextual long short-term memory networks with CRF (CLSTM). We assess the CLSTM model on three corpora: the disease corpus of the National Center for Biotechnology Information (NCBI), the BioCreative II Gene Mention corpus (GM), and the BioCreative V Chemical Disease Relation corpus (CDR). Our framework was compared with several deep learning approaches, such as BiLSTM, BiLSTM with CRF, GRAM-CNN, and BERT. On the NCBI corpus, our model recorded an F-score of 85.68% for the NER of diseases, showing an improvement of 1.50% over previous methods. Moreover, although BERT used transfer learning by incorporating more than 2.5 billion words, our system showed similar performance with BERT with an F-scores of 81.44% for gene NER on the GM corpus and a outperformed F-score of 86.44% for the NER of chemicals and diseases on the CDR corpus. We conclude that our method significantly improves performance on biomedical NER tasks. Conclusion The proposed approach is robust in recognizing biological entities in text.

Download Full-text