One tagger, many uses: Illustrating the power of ontologies in dictionary-based named entity recognition

Mapping Intimacies ◽

10.1101/067132 ◽

2016 ◽

Cited By ~ 1

Author(s):

Lars Juhl Jensen

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Automatic Annotation ◽

Biomedical Ontologies ◽

Health Records ◽

Pubmed Central ◽

Named Entity ◽

Open Biomedical Ontologies ◽

Bulk Processing ◽

Different Sources

AbstractAutomatic annotation of text is an important complement to manual annotation, because the latter is highly labour intensive. We have developed a fast dictionary-based named entity recognition (NER) system and addressed a wide variety of biomedical problems by applied it to text from many different sources. We have used this tagger both in real-time tools to support curation efforts and in pipelines for populating databases through bulk processing of entire Medline, the open-access subset of PubMed Central, NIH grant abstracts, FDA drug labels, electronic health records, and the Encyclopedia of Life. Despite the simplicity of the approach, it typically achieves 80–90% precision and 70–80% recall. Many of the underlying dictionaries were built from open biomedical ontologies, which further facilitate integration of the text-mining results with evidence from other sources.

Real-time tagging of biomedical entities

10.1101/078469 ◽

2016 ◽

Author(s):

Evangelos Pafilis ◽

Lars Juhl Jensen

Keyword(s):

Real Time ◽

Named Entity Recognition ◽

Recognition System ◽

Entity Recognition ◽

Manual Annotation ◽

Automatic Annotation ◽

Web Resources ◽

Named Entity ◽

Bulk Processing

Automatic annotation of text is an important complement to manual annotation, because the latter is highly labor intensive. We have developed a fast dictionary-based named entity recognition system, which is used for both real-time and bulk processing of text in a variety of biomedical web resources. We propose to adapt the system to make it interoperable with the PubAnnotation and Open Annotation standards.

De-identifying Spanish medical texts - named entity recognition applied to radiology reports

Journal of Biomedical Semantics ◽

10.1186/s13326-021-00236-2 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Irene Pérez-Díez ◽

Raúl Pérez-Moraga ◽

Adolfo López-Cerdán ◽

Jose-Maria Salinas-Serrano ◽

María de la Iglesia-Vayá

Keyword(s):

Electronic Health Records ◽

English Language ◽

Personal Information ◽

Named Entity Recognition ◽

Entity Recognition ◽

Medical Texts ◽

Health Records ◽

Named Entity ◽

Radiology Reports ◽

Electronic Health

Abstract Background Medical texts such as radiology reports or electronic health records are a powerful source of data for researchers. Anonymization methods must be developed to de-identify documents containing personal information from both patients and medical staff. Although currently there are several anonymization strategies for the English language, they are also language-dependent. Here, we introduce a named entity recognition strategy for Spanish medical texts, translatable to other languages. Results We tested 4 neural networks on our radiology reports dataset, achieving a recall of 97.18% of the identifying entities. Alongside, we developed a randomization algorithm to substitute the detected entities with new ones from the same category, making it virtually impossible to differentiate real data from synthetic data. The three best architectures were tested with the MEDDOCAN challenge dataset of electronic health records as an external test, achieving a recall of 69.18%. Conclusions The strategy proposed, combining named entity recognition tasks with randomization of entities, is suitable for Spanish radiology reports. It does not require a big training corpus, thus it could be easily extended to other languages and medical texts, such as electronic health records.

Named Entity Recognition Using BERT BiLSTM CRF for Chinese Electronic Health Records

2019 12th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI) ◽

10.1109/cisp-bmei48845.2019.8965823 ◽

2019 ◽

Cited By ~ 5

Author(s):

Zhenjin Dai ◽

Xutao Wang ◽

Pin Ni ◽

Yuming Li ◽

Gangmin Li ◽

...

Keyword(s):

Electronic Health Records ◽

Named Entity Recognition ◽

Entity Recognition ◽

Health Records ◽

Named Entity ◽

Electronic Health

Concept Attribute Labeling and Context-Aware Named Entity Recognition in Electronic Health Records

International Journal of Reliable and Quality E-Healthcare ◽

10.4018/ijrqeh.2018010101 ◽

2018 ◽

Vol 7 (1) ◽

pp. 1-15 ◽

Cited By ~ 1

Author(s):

Alexandra Pomares-Quimbaya ◽

Rafael A. Gonzalez ◽

Oscar Mauricio Muñoz Velandia ◽

Angel Alberto Garcia Peña ◽

Julián Camilo Daza Rodríguez ◽

...

Keyword(s):

Electronic Health Records ◽

Ad Hoc ◽

Named Entity Recognition ◽

Ensemble Classification ◽

Entity Recognition ◽

Classification Model ◽

Health Records ◽

Named Entity ◽

Electronic Health ◽

Concept Attribute

Extracting valuable knowledge from Electronic Health Records (EHR) represents a challenging task due to the presence of both structured and unstructured data, including codified fields, images and test results. Narrative text in particular contains a variety of notes which are diverse in language and detail, as well as being full of ad hoc terminology, including acronyms and jargon, which is especially challenging in non-English EHR, where there is a dearth of annotated corpora or trained case sets. This paper proposes an approach for NER and concept attribute labeling for EHR that takes into consideration the contextual words around the entity of interest to determine its sense. The approach proposes a composition method of three different NER methods, together with the analysis of the context (neighboring words) using an ensemble classification model. This contributes to disambiguate NER, as well as labeling the concept as confirmed, negated, speculative, pending or antecedent. Results show an improvement of the recall and a limited impact on precision for the NER process.

Syntactic analyses and named entity recognition for PubMed and PubMed Central — up-to-the-minute

10.18653/v1/w16-2913 ◽

2016 ◽

Cited By ~ 6

Author(s):

Kai Hakala ◽

Suwisa Kaewphan ◽

Tapio Salakoski ◽

Filip Ginter

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Pubmed Central ◽

Named Entity

Generating features for named entity recognition by learning prototypes in semantic space: The case of de-identifying health records

2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) ◽

10.1109/bibm.2014.6999199 ◽

2014 ◽

Cited By ~ 9

Author(s):

Aron Henriksson ◽

Hercules Dalianis ◽

Stewart Kowalski

Keyword(s):

Named Entity Recognition ◽

Semantic Space ◽

Entity Recognition ◽

Health Records ◽

Named Entity

Clinical Named Entity Recognition From Chinese Electronic Health Records via Machine Learning Methods

JMIR Medical Informatics ◽

10.2196/medinform.9965 ◽

2018 ◽

Vol 6 (4) ◽

pp. e50 ◽

Cited By ~ 10

Author(s):

Yu Zhang ◽

Xuwen Wang ◽

Zhen Hou ◽

Jiao Li

Keyword(s):

Machine Learning ◽

Electronic Health Records ◽

Named Entity Recognition ◽

Entity Recognition ◽

Learning Methods ◽

Health Records ◽

Named Entity ◽

Machine Learning Methods ◽

Electronic Health

Automatic Annotation of Text Classification Data Set in Specific Field Using Named Entity Recognition

2019 IEEE 19th International Conference on Communication Technology (ICCT) ◽

10.1109/icct46805.2019.8947058 ◽

2019 ◽

Author(s):

Zhou Xin ◽

Wu Tianbo ◽

Chen Haiqiang ◽

Yang Qiang ◽

He Xiaohai

Keyword(s):

Text Classification ◽

Named Entity Recognition ◽

Entity Recognition ◽

Automatic Annotation ◽

Data Set ◽

Specific Field ◽

Named Entity

Named Entity Recognition for Clinical Portuguese Corpus with Conditional Random Fields and Semantic Groups

10.5753/sbcas.2019.6269 ◽

2019 ◽

Cited By ~ 1

Author(s):

João Vitor Andrioli De Souza ◽

Yohan Bonescki Gumiel ◽

Lucas Emanuel Silva e Oliveira ◽

Claudia Maria Cabral Moro

Keyword(s):

Electronic Health Records ◽

Random Fields ◽

Conditional Random Fields ◽

Named Entity Recognition ◽

Entity Recognition ◽

Health Records ◽

Named Entity ◽

Electronic Health

Considering the difficulties of extracting entities from Electronic Health Records (EHR) texts in Portuguese, we explore the Conditional Random Fields (CRF) algorithm to build a Named Entity Recognition (NER) system based on a corpus of clinical Portuguese data annotated by experts. We acquaint the challenges and methods to classify Abbreviations, Disorders, Procedures and Chemicals within the texts. By selecting a meaningful set of features, and parameters with the best performance the results demonstrate that the method is promising and may support other biomedical tasks, nonetheless, further experiments with more features, different architectures and sophisticated preprocessing steps are needed.

Generating Synthetic Training Data for Supervised De-Identification of Electronic Health Records

Future Internet ◽

10.3390/fi13050136 ◽

2021 ◽

Vol 13 (5) ◽

pp. 136

Author(s):

Claudia Alessandra Libbi ◽

Jan Trienes ◽

Dolf Trieschnigg ◽

Christin Seifert

Keyword(s):

Electronic Health Records ◽

Named Entity Recognition ◽

Synthetic Data ◽

Real Data ◽

Entity Recognition ◽

Language Models ◽

Text Generation ◽

Health Records ◽

Named Entity ◽

Electronic Health

A major hurdle in the development of natural language processing (NLP) methods for Electronic Health Records (EHRs) is the lack of large, annotated datasets. Privacy concerns prevent the distribution of EHRs, and the annotation of data is known to be costly and cumbersome. Synthetic data presents a promising solution to the privacy concern, if synthetic data has comparable utility to real data and if it preserves the privacy of patients. However, the generation of synthetic text alone is not useful for NLP because of the lack of annotations. In this work, we propose the use of neural language models (LSTM and GPT-2) for generating artificial EHR text jointly with annotations for named-entity recognition. Our experiments show that artificial documents can be used to train a supervised named-entity recognition model for de-identification, which outperforms a state-of-the-art rule-based baseline. Moreover, we show that combining real data with synthetic data improves the recall of the method, without manual annotation effort. We conduct a user study to gain insights on the privacy of artificial text. We highlight privacy risks associated with language models to inform future research on privacy-preserving automated text generation and metrics for evaluating privacy-preservation during text generation.