R-BERT FOR RELATIONSHIP EXTRACTION ON RUSSIAN BUSINESS DOCUMENTS

Author(s):  
V. A. Korzun ◽  

This paper provides results of participation in the Russian Relation Extraction for Business shared task (RuREBus) within DialogueEvaluation 2020. Our team took the first place among 5 other teams in Relation Extraction with Named Entities task. The experiments showed that the best model is based on R-BERT model. R-BERT achieved significant result in comparison with models based on Convolutional or Recurrent Neural Networks on the SemEval-2010 task 8 relational dataset. In order to adapt this model to RuREBus task we also added some modifications like negative sampling. In addition, we have tested other models for Relation Extraction and Named Entity Recognition tasks.

Author(s):  
V. A. Ivanin ◽  
◽  
E. L. Artemova ◽  
T. V. Batura ◽  
V. V. Ivanov ◽  
...  

In this paper, we present a shared task on core information extraction problems, named entity recognition and relation extraction. In contrast to popular shared tasks on related problems, we try to move away from strictly academic rigor and rather model a business case. As a source for textual data we choose the corpus of Russian strategic documents, which we annotated according to our own annotation scheme. To speed up the annotation process, we exploit various active learning techniques. In total we ended up with more than two hundred annotated documents. Thus we managed to create a high-quality data set in short time. The shared task consisted of three tracks, devoted to 1) named entity recognition, 2) relation extraction and 3) joint named entity recognition and relation extraction. We provided with the annotated texts as well as a set of unannotated texts, which could of been used in any way to improve solutions. In the paper we overview and compare solutions, submitted by the shared task participants. We release both raw and annotated corpora along with annotation guidelines, evaluation scripts and results at https://github.com/dialogue-evaluation/RuREBus.


2019 ◽  
Vol 27 (3) ◽  
pp. 457-470 ◽  
Author(s):  
Stephen Wu ◽  
Kirk Roberts ◽  
Surabhi Datta ◽  
Jingcheng Du ◽  
Zongcheng Ji ◽  
...  

Abstract Objective This article methodically reviews the literature on deep learning (DL) for natural language processing (NLP) in the clinical domain, providing quantitative analysis to answer 3 research questions concerning methods, scope, and context of current research. Materials and Methods We searched MEDLINE, EMBASE, Scopus, the Association for Computing Machinery Digital Library, and the Association for Computational Linguistics Anthology for articles using DL-based approaches to NLP problems in electronic health records. After screening 1,737 articles, we collected data on 25 variables across 212 papers. Results DL in clinical NLP publications more than doubled each year, through 2018. Recurrent neural networks (60.8%) and word2vec embeddings (74.1%) were the most popular methods; the information extraction tasks of text classification, named entity recognition, and relation extraction were dominant (89.2%). However, there was a “long tail” of other methods and specific tasks. Most contributions were methodological variants or applications, but 20.8% were new methods of some kind. The earliest adopters were in the NLP community, but the medical informatics community was the most prolific. Discussion Our analysis shows growing acceptance of deep learning as a baseline for NLP research, and of DL-based NLP in the medical community. A number of common associations were substantiated (eg, the preference of recurrent neural networks for sequence-labeling named entity recognition), while others were surprisingly nuanced (eg, the scarcity of French language clinical NLP with deep learning). Conclusion Deep learning has not yet fully penetrated clinical NLP and is growing rapidly. This review highlighted both the popular and unique trends in this active field.


Sign in / Sign up

Export Citation Format

Share Document