scholarly journals Named Entity Recognition in Historic Legal Text: A Transformer and State Machine Ensemble Method

Author(s):  
Fernando Trias ◽  
Hongming Wang ◽  
Sylvain Jaume ◽  
Stratos Idreos
Author(s):  
Christopher Dozier ◽  
Ravikumar Kondadadi ◽  
Marc Light ◽  
Arun Vachher ◽  
Sriharsha Veeramachaneni ◽  
...  

Author(s):  
Mohammad Sadegh Sheikhaei ◽  
Hasan Zafari ◽  
Yuan Tian

In this article, we propose a new encoding scheme for named entity recognition (NER) called Joined Type-Length encoding (JoinedTL). Unlike most existing named entity encoding schemes, which focus on flat entities, JoinedTL can label nested named entities in a single sequence. JoinedTL uses a packed encoding to represent both type and span of a named entity, which not only results in less tagged tokens compared to existing encoding schemes, but also enables it to support nested NER. We evaluate the effectiveness of JoinedTL for nested NER on three nested NER datasets: GENIA in English, GermEval in German, and PerNest, our newly created nested NER dataset in Persian. We apply CharLSTM+WordLSTM+CRF, a three-layer sequence tagging model on three datasets encoded using JoinedTL and two existing nested NE encoding schemes, i.e., JoinedBIO and JoinedBILOU. Our experiment results show that CharLSTM+WordLSTM+CRF trained with JoinedTL encoded datasets can achieve competitive F1 scores as the ones trained with datasets encoded by two other encodings, but with 27%–48% less tagged tokens. To leverage the power of three different encodings, i.e., JoinedTL, JoinedBIO, and JoinedBILOU, we propose an encoding-based ensemble method for nested NER. Evaluation results show that the ensemble method achieves higher F1 scores on all datasets than the three models each trained using one of the three encodings. By using nested NE encodings including JoinedTL with CharLSTM+WordLSTM+CRF, we establish new state-of-the-art performance with an F1 score of 83.7 on PerNest, 74.9 on GENIA, and 70.5 on GermEval, surpassing two recent neural models specially designed for nested NER.


Author(s):  
Pedro Henrique Luz de Araujo ◽  
Teófilo E. de Campos ◽  
Renato R. R. de Oliveira ◽  
Matheus Stauffer ◽  
Samuel Couto ◽  
...  

Author(s):  
Yassine Benajiba ◽  
Mona Diab ◽  
Paolo Rosso

Sign in / Sign up

Export Citation Format

Share Document