Chemical Named Entity Recognition: Improving Recall Using a Comprehensive List of Lexical Features

Author(s):  
Andre Lamurias ◽  
João Ferreira ◽  
Francisco M. Couto
Author(s):  
Hema R. ◽  
Ajantha Devi

Chemical entities can be represented in different forms like chemical names, chemical formulae, and chemical structures. Because of the different classification frameworks for chemical names, the task of distinguishing proof or extraction of chemical elements with less ambiguous is considered a major test. Compound named entity recognition (NER) is the initial phase in any chemical-related data extraction strategy. The majority of the chemical NER is done utilizing dictionary-based, rule-based, and machine learning procedures. Recently, deep learning methods have evolved, and, in this chapter, the authors sketch out the various deep learning techniques applied for chemical NER. First, the authors introduced the fundamental concepts of chemical named entity recognition, the textual contents of chemical documents, and how these chemicals are represented in chemical literature. The chapter concludes with the strengths and weaknesses of the above methods and also the types of the chemical entities extracted.


2014 ◽  
Vol 11 (3) ◽  
pp. 1-16 ◽  
Author(s):  
Andre Lamurias ◽  
João D. Ferreira ◽  
Francisco M. Couto

Summary Interactions between chemical compounds described in biomedical text can be of great importance to drug discovery and design, as well as pharmacovigilance. We developed a novel system, “Identifying Interactions between Chemical Entities” (IICE), to identify chemical interactions described in text. Kernel-based Support Vector Machines first identify the interactions and then an ensemble classifier validates and classifies the type of each interaction. This relation extraction module was evaluated with the corpus released for the DDI Extraction task of SemEval 2013, obtaining results comparable to stateof- the-art methods for this type of task. We integrated this module with our chemical named entity recognition module and made the whole system available as a web tool at www.lasige.di.fc.ul.pt/webtools/iice.


2021 ◽  
Author(s):  
Ghadeer Mobasher ◽  
Lukrecia Mertova ◽  
Sucheta Ghosh ◽  
Olga Krebs ◽  
Bettina Heinlein ◽  
...  

Chemical named entity recognition (NER) is a significant step for many downstream applications like entity linking for the chemical text-mining pipeline. However, the identification of chemical entities in a biomedical text is a challenging task due to the diverse morphology of chemical entities and the different types of chemical nomenclature. In this work, we describe our approach that was submitted for BioCreative version 7 challenge Track 2, focusing on the "Chemical Identification" task for identifying chemical entities and entity linking, using MeSH. For this purpose, we have applied a two-stage approach as follows (a) usage of fine-tuned BioBERT for identification of chemical entities (b) semantic approximate search in MeSH and PubChem databases for entity linking. There was some friction between the two approaches, as our rule-based approach did not harmonise optimally with partially recognized words forwarded by the BERT component. For our future work, we aim to resolve the issue of the artefacts arising from BERT tokenizers and develop joint learning of chemical named entity recognition and entity linking using pretrained transformer-based models and compare their performance with our preliminary approach. Next, we will improve the efficiency of our approximate search in reference databases during entity linking. This task is non-trivial as it entails determining similarity scores of large sets of trees with respect to a query tree. Ideally, this will enable flexible parametrization and rule selection for the entity linking search.


Sign in / Sign up

Export Citation Format

Share Document