A Bootstrapping Approach for Geographic Named Entity Annotation

Author(s):  
Seungwoo Lee ◽  
Gary Geunbae Lee
Author(s):  
Elena Álvarez-Mellado ◽  
María Luisa Díez-Platas ◽  
Pablo Ruiz-Fabo ◽  
Helena Bermúdez ◽  
Salvador Ros ◽  
...  

AbstractMedieval documents are a rich source of historical data. Performing named-entity recognition (NER) on this genre of texts can provide us with valuable historical evidence. However, traditional NER categories and schemes are usually designed with modern documents in mind (i.e. journalistic text) and the general-domain NER annotation schemes fail to capture the nature of medieval entities. In this paper we explore the challenges of performing named-entity annotation on a corpus of Spanish medieval documents: we discuss the mismatches that arise when applying traditional NER categories to a corpus of Spanish medieval documents and we propose a novel humanist-friendly TEI-compliant annotation scheme and guidelines intended to capture the particular nature of medieval entities.


Semantic Web ◽  
2018 ◽  
Vol 9 (3) ◽  
pp. 355-379
Author(s):  
Oluwaseyi Feyisetan ◽  
Elena Simperl ◽  
Markus Luczak-Roesch ◽  
Ramine Tinati ◽  
Nigel Shadbolt

Symmetry ◽  
2020 ◽  
Vol 12 (10) ◽  
pp. 1673
Author(s):  
Jiabao Sheng ◽  
Aishan Wumaier ◽  
Zhe Li

To improve the performance of deep learning methods in case of a lack of labeled data for entity annotation in entity recognition tasks, this study proposes transfer learning schemes that combine the character to be the word to convert low-resource data symmetry into high-resource data. We combine character embedding, word embedding, and the embedding of the label features using high- and low-resource data based on the BiLSTM-CRF model, and perform the feature-transfer and parameter-sharing tasks in two domains of the BiLSTM network to annotate with zero resources. Before transfer learning, we must first calculate the label similarity between two different domains and select the label features with large similarity for feature transfer mapping. All training parameters of the source domain in the model are shared during the BiLSTM network processing and CRF layer. In addition, we also use the method of combining characters and words to reduce the problem of word segmentation across domains and reduce the error rate in label mapping. The results of experiments show that in terms of the overall F1 score, the proposed model without supervision was superior by 9.76 percentage points to the general parametric shared transfer learning method, and by 9.08 and 12.38 percentage points, respectively, to two recent high–low resource learning methods. The proposed scheme improves performance in terms of transfer learning between the high- and low-resource data and can identify the predicted data in the target domain.


Author(s):  
E. Boldrini ◽  
S. Ferrandez ◽  
R. Izquierdo ◽  
D. Tomas ◽  
O. Ferrandez ◽  
...  

Author(s):  
Kuzman Ganchev ◽  
Fernando Pereira ◽  
Mark Mandel ◽  
Steven Carroll ◽  
Peter White

Author(s):  
Takenobu Tokunaga ◽  
◽  
Hitoshi Nishikawa ◽  
Tomoya Iwakura ◽  
◽  
...  

Sign in / Sign up

Export Citation Format

Share Document