scholarly journals A Comparison between Conditional Random Field and Structured Support Vector Machine for Arabic Named Entity Recognition

2020 ◽  
Vol 16 (1) ◽  
pp. 117-125
Author(s):  
Marwa Muhammad ◽  
Muhammad Rohaim ◽  
Alaa Hamouda ◽  
Salah Abdel-Mageid
2020 ◽  
Vol 8 ◽  
pp. 605-620 ◽  
Author(s):  
Takashi Shibuya ◽  
Eduard Hovy

When an entity name contains other names within it, the identification of all combinations of names can become difficult and expensive. We propose a new method to recognize not only outermost named entities but also inner nested ones. We design an objective function for training a neural model that treats the tag sequence for nested entities as the second best path within the span of their parent entity. In addition, we provide the decoding method for inference that extracts entities iteratively from outermost ones to inner ones in an outside-to-inside way. Our method has no additional hyperparameters to the conditional random field based model widely used for flat named entity recognition tasks. Experiments demonstrate that our method performs better than or at least as well as existing methods capable of handling nested entities, achieving F1-scores of 85.82%, 84.34%, and 77.36% on ACE-2004, ACE-2005, and GENIA datasets, respectively.


2014 ◽  
Vol 37 (1) ◽  
pp. 1-22 ◽  
Author(s):  
Asif Ekbal ◽  
Sivaji Bandyopadhyay

This paper reports a voted Named Entity Recognition (NER) system that exploits appropriate unlabeled data. Initially, we develop NER systems using the supervised machine learning algorithms such as Maximum Entropy (ME), Conditional Random Field (CRF) and Support Vector Machine (SVM). Each of these models makes use of the language independent features in the form of different contextual and orthographic word-level features along with the language dependent features extracted from the Part-of-Speech (POS) tagger and gazetteers. Context patterns generated from the unlabeled data using an active learning method are also used as the features in each of the classifiers. A semi-supervised method is proposed to describe the measures to automatically select effective unlabeled documents as well as sentences from the unlabeled data. Finally, the supervised models are combined together into a final system by defining appropriate weighted voting technique. Experimental results for a resource-poor language like Bengali show the effectiveness of the proposed approach with the overall recall, precision and F-measure values of 93.81%, 92.18% and 92.98%, respectively.


Author(s):  
Jonalyn Castillo ◽  
Marck Augustus L. Mateo ◽  
Antonio D. C. Paras ◽  
Ria A. Sagum ◽  
Vina Danica F. Santos

Information ◽  
2020 ◽  
Vol 11 (2) ◽  
pp. 82
Author(s):  
SaiKiranmai Gorla ◽  
Lalita Bhanu Murthy Neti ◽  
Aruna Malapati

Named entity recognition (NER) is a fundamental step for many natural language processing tasks and hence enhancing the performance of NER models is always appreciated. With limited resources being available, NER for South-East Asian languages like Telugu is quite a challenging problem. This paper attempts to improve the NER performance for Telugu using gazetteer-related features, which are automatically generated using Wikipedia pages. We make use of these gazetteer features along with other well-known features like contextual, word-level, and corpus features to build NER models. NER models are developed using three well-known classifiers—conditional random field (CRF), support vector machine (SVM), and margin infused relaxed algorithms (MIRA). The gazetteer features are shown to improve the performance, and theMIRA-based NER model fared better than its counterparts SVM and CRF.


Sign in / Sign up

Export Citation Format

Share Document