scholarly journals UTNLP at SemEval-2021 Task 5: A Comparative Analysis of Toxic Span Detection using Attention-based, Named Entity Recognition, and Ensemble Models

Author(s):  
Alireza Salemi ◽  
Nazanin Sabri ◽  
Emad Kebriaei ◽  
Behnam Bahrak ◽  
Azadeh Shakery
2021 ◽  
Author(s):  
Breno David Lopes Pinheiro ◽  
Ellen Polliana Ramos Souza ◽  
Douglas Vitório ◽  
Hidelberg Oliveira Albuquerque

Informações textuais, apesar de digitais, não são computacionalmente estruturadas, necessitando do uso de técnicas para estruturá-las e extrair informações. Este trabalho tem o objetivo de avaliar ferramentas de REN utilizando machine learning para as variantes brasileira e europeia da língua portuguesa. As ferramentas Apache OpenNLP, Stanford CoreNLP e spaCy foram selecionadas; o corpus HAREM foi usado para treinar e avaliar os modelos; uma ferramenta foi desenvolvida para pré-processar o corpus HAREM. Dois tipos de comparações foram realizadas: uma geral e outra entre variantes do português. Foi possível identificar que as variantes podem afetar no treinamento e avaliação de modelos de REN (Reconhecimento de entidades nomeadas).


2020 ◽  
Author(s):  
Shintaro Tsuji ◽  
Andrew Wen ◽  
Naoki Takahashi ◽  
Hongjian Zhang ◽  
Katsuhiko Ogasawara ◽  
...  

BACKGROUND Named entity recognition (NER) plays an important role in extracting the features of descriptions for mining free-text radiology reports. However, the performance of existing NER tools is limited because the number of entities depends on its dictionary lookup. Especially, the recognition of compound terms is very complicated because there are a variety of patterns. OBJECTIVE The objective of the study is to develop and evaluate a NER tool concerned with compound terms using the RadLex for mining free-text radiology reports. METHODS We leveraged the clinical Text Analysis and Knowledge Extraction System (cTAKES) to develop customized pipelines using both RadLex and SentiWordNet (a general-purpose dictionary, GPD). We manually annotated 400 of radiology reports for compound terms (Cts) in noun phrases and used them as the gold standard for the performance evaluation (precision, recall, and F-measure). Additionally, we also created a compound-term-enhanced dictionary (CtED) by analyzing false negatives (FNs) and false positives (FPs), and applied it for another 100 radiology reports for validation. We also evaluated the stem terms of compound terms, through defining two measures: an occurrence ratio (OR) and a matching ratio (MR). RESULTS The F-measure of the cTAKES+RadLex+GPD was 32.2% (Precision 92.1%, Recall 19.6%) and that of combined the CtED was 67.1% (Precision 98.1%, Recall 51.0%). The OR indicated that stem terms of “effusion”, "node", "tube", and "disease" were used frequently, but it still lacks capturing Cts. The MR showed that 71.9% of stem terms matched with that of ontologies and RadLex improved about 22% of the MR from the cTAKES default dictionary. The OR and MR revealed that the characteristics of stem terms would have the potential to help generate synonymous phrases using ontologies. CONCLUSIONS We developed a RadLex-based customized pipeline for parsing radiology reports and demonstrated that CtED and stem term analysis has the potential to improve dictionary-based NER performance toward expanding vocabularies.


Author(s):  
Aditya Kiran Brahma ◽  
Prathyush Potluri ◽  
Meghana Kanapaneni ◽  
Sumanth Prabhu ◽  
Sundeep Teki

Sign in / Sign up

Export Citation Format

Share Document