scholarly journals ChEMU shared task: chemical entity recognition and event extraction of chemical reactions from patents

Author(s):  
Christian Druckenbrodt ◽  
Saber Akhondi ◽  
Karin Verspoor ◽  
Zenan Zhai ◽  
Camilo Thorne ◽  
...  
Author(s):  
Darshini Mahendran ◽  
Gabrielle Gurdin ◽  
Nastassja Lewinski ◽  
Christina Tang ◽  
Bridget T. McInnes

Chemical patents are an essential source of information about novel chemicals and chemical reactions. However, with the increasing volume of such patents, mining information about these chemicals and chemical reactions has become a time-intensive and laborious endeavor. In this study, we present a system to extract chemical reaction events from patents automatically. Our approach consists of two steps: 1) named entity recognition (NER)—the automatic identification of chemical reaction parameters from the corresponding text, and 2) event extraction (EE)—the automatic classifying and linking of entities based on their relationships to each other. For our NER system, we evaluate bidirectional long short-term memory (BiLSTM)-based and bidirectional encoder representations from transformer (BERT)-based methods. For our EE system, we evaluate BERT-based, convolutional neural network (CNN)-based, and rule-based methods. We evaluate our NER and EE components independently and as an end-to-end system, reporting the precision, recall, and F1 score. Our results show that the BiLSTM-based method performed best at identifying the entities, and the CNN-based method performed best at extracting events.


Author(s):  
Dat Quoc Nguyen ◽  
Zenan Zhai ◽  
Hiyori Yoshikawa ◽  
Biaoyan Fang ◽  
Christian Druckenbrodt ◽  
...  

Author(s):  
Jingqi Wang ◽  
Yuankai Ren ◽  
Zhi Zhang ◽  
Hua Xu ◽  
Yaoyun Zhang

Chemical reactions and experimental conditions are fundamental information for chemical research and pharmaceutical applications. However, the latest information of chemical reactions is usually embedded in the free text of patents. The rapidly accumulating chemical patents urge automatic tools based on natural language processing (NLP) techniques for efficient and accurate information extraction. This work describes the participation of the Melax Tech team in the CLEF 2020—ChEMU Task of Chemical Reaction Extraction from Patent. The task consisted of two subtasks: (1) named entity recognition to identify compounds and different semantic roles in the chemical reaction and (2) event extraction to identify event triggers of chemical reaction and their relations with the semantic roles recognized in subtask 1. To build an end-to-end system with high performance, multiple strategies tailored to chemical patents were applied and evaluated, ranging from optimizing the tokenization, pre-training patent language models based on self-supervision, to domain knowledge-based rules. Our hybrid approaches combining different strategies achieved state-of-the-art results in both subtasks, with the top-ranked F1 of 0.957 for entity recognition and the top-ranked F1 of 0.9536 for event extraction, indicating that the proposed approaches are promising.


Author(s):  
Jiayuan He ◽  
Dat Quoc Nguyen ◽  
Saber A. Akhondi ◽  
Christian Druckenbrodt ◽  
Camilo Thorne ◽  
...  

Chemical patents represent a valuable source of information about new chemical compounds, which is critical to the drug discovery process. Automated information extraction over chemical patents is, however, a challenging task due to the large volume of existing patents and the complex linguistic properties of chemical patents. The Cheminformatics Elsevier Melbourne University (ChEMU) evaluation lab 2020, part of the Conference and Labs of the Evaluation Forum 2020 (CLEF2020), was introduced to support the development of advanced text mining techniques for chemical patents. The ChEMU 2020 lab proposed two fundamental information extraction tasks focusing on chemical reaction processes described in chemical patents: (1) chemical named entity recognition, requiring identification of essential chemical entities and their roles in chemical reactions, as well as reaction conditions; and (2) event extraction, which aims at identification of event steps relating the entities involved in chemical reactions. The ChEMU 2020 lab received 37 team registrations and 46 runs. Overall, the performance of submissions for these tasks exceeded our expectations, with the top systems outperforming strong baselines. We further show the methods to be robust to variations in sampling of the test data. We provide a detailed overview of the ChEMU 2020 corpus and its annotation, showing that inter-annotator agreement is very strong. We also present the methods adopted by participants, provide a detailed analysis of their performance, and carefully consider the potential impact of data leakage on interpretation of the results. The ChEMU 2020 Lab has shown the viability of automated methods to support information extraction of key information in chemical patents.


2021 ◽  
Vol 11 (4) ◽  
pp. 267-273
Author(s):  
Wen-Juan Hou ◽  
◽  
Bamfa Ceesay

Information extraction (IE) is the process of automatically identifying structured information from unstructured or partially structured text. IE processes can involve several activities, such as named entity recognition, event extraction, relationship discovery, and document classification, with the overall goal of translating text into a more structured form. Information on the changes in the effect of a drug, when taken in combination with a second drug, is known as drug–drug interaction (DDI). DDIs can delay, decrease, or enhance absorption of drugs and thus decrease or increase their efficacy or cause adverse effects. Recent research trends have shown several adaptation of recurrent neural networks (RNNs) from text. In this study, we highlight significant challenges of using RNNs in biomedical text processing and propose automatic extraction of DDIs aiming at overcoming some challenges. Our results show that the system is competitive against other systems for the task of extracting DDIs.


Author(s):  
Rodrigo Agerri ◽  
German Rigau

We present a multilingual Named Entity Recognition approach based on a robust and general set of features across languages and datasets. Our system combines shallow local information with clustering semi-supervised features induced on large amounts of unlabeled text. Understanding via empiricalexperimentation how to effectively combine various types of clustering features allows us to seamlessly export our system to other datasets and languages. The result is a simple but highly competitive system which obtains state of the art results across five languages and twelve datasets. The results are reported on standard shared task evaluation data such as CoNLL for English, Spanish and Dutch. Furthermore, and despite the lack of linguistically motivated features, we also report best results for languages such as Basque and German. In addition, we demonstrate that our method also obtains very competitive results even when the amount of supervised data is cut by half, alleviating the dependency on manually annotated data. Finally, the results show that our emphasis on clustering features is crucial to develop robust out-of-domain models. The system and models are freely available to facilitate its use and guarantee the reproducibility of results.


2021 ◽  
pp. 1-13
Author(s):  
Xia Li ◽  
Qinghua Wen ◽  
Zengtao Jiao ◽  
Jiangtao Zhang

Abstract The China Conference on Knowledge Graph and Semantic Computing (CCKS) 2020 Evaluation Task 3 presented clinical named entity recognition and event extraction for the Chinese electronic medical records. Two annotated data sets and some other additional resources for these two subtasks were provided for participators. This evaluation competition attracted 354 teams and 46 of them successfully submitted the valid results. The pre-trained language models are widely applied in this evaluation task. Data argumentation and external resources are also helpful.


2016 ◽  
Vol 12 (4) ◽  
pp. 21-44 ◽  
Author(s):  
R. Hema ◽  
T. V. Geetha

The two main challenges in chemical entity recognition are: (i) New chemical compounds are constantly being synthesized infinitely. (ii) High ambiguity in chemical representation in which a chemical entity is being described by different nomenclatures. Therefore, the identification and maintenance of chemical terminologies is a tough task. Since most of the existing text mining methods followed the term-based approaches, the problems of polysemy and synonymy came into the picture. So, a Named Entity Recognition (NER) system based on pattern matching in chemical domain is developed to extract the chemical entities from chemical documents. The Tf-idf and PMI association measures are used to filter out the non-chemical terms. The F-score of 92.19% is achieved for chemical NER. This proposed method is compared with the baseline method and other existing approaches. As the final step, the filtered chemical entities are classified into sixteen functional groups. The classification is done using SVM One against All multiclass classification approach and achieved the accuracy of 87%. One-way ANOVA is used to test the quality of pattern matching method with the other existing chemical NER methods.


Sign in / Sign up

Export Citation Format

Share Document