concept recognition Latest Research Papers

Abstract Background Automated assignment of specific ontology concepts to mentions in text is a critical task in biomedical natural language processing, and the subject of many open shared tasks. Although the current state of the art involves the use of neural network language models as a post-processing step, the very large number of ontology classes to be recognized and the limited amount of gold-standard training data has impeded the creation of end-to-end systems based entirely on machine learning. Recently, Hailu et al. recast the concept recognition problem as a type of machine translation and demonstrated that sequence-to-sequence machine learning models have the potential to outperform multi-class classification approaches. Methods We systematically characterize the factors that contribute to the accuracy and efficiency of several approaches to sequence-to-sequence machine learning through extensive studies of alternative methods and hyperparameter selections. We not only identify the best-performing systems and parameters across a wide variety of ontologies but also provide insights into the widely varying resource requirements and hyperparameter robustness of alternative approaches. Analysis of the strengths and weaknesses of such systems suggest promising avenues for future improvements as well as design choices that can increase computational efficiency with small costs in performance. Results Bidirectional encoder representations from transformers for biomedical text mining (BioBERT) for span detection along with the open-source toolkit for neural machine translation (OpenNMT) for concept normalization achieve state-of-the-art performance for most ontologies annotated in the CRAFT Corpus. This approach uses substantially fewer computational resources, including hardware, memory, and time than several alternative approaches. Conclusions Machine translation is a promising avenue for fully machine-learning-based concept recognition that achieves state-of-the-art results on the CRAFT Corpus, evaluated via a direct comparison to previous results from the 2019 CRAFT shared task. Experiments illuminating the reasons for the surprisingly good performance of sequence-to-sequence methods targeting ontology identifiers suggest that further progress may be possible by mapping to alternative target concept representations. All code and models can be found at: https://github.com/UCDenver-ccp/Concept-Recognition-as-Translation.

Download Full-text

Context-aware multi-token concept recognition of biological entities

BMC Bioinformatics ◽

10.1186/s12859-021-04248-8 ◽

2021 ◽

Vol 22 (S11) ◽

Author(s):

Kwangmin Kim ◽

Doheon Lee

Keyword(s):

Language Processing ◽

Contextual Information ◽

Named Entity Recognition ◽

Knowledge Bases ◽

Entity Recognition ◽

Biological Knowledge ◽

Concept Recognition ◽

Named Entity ◽

Named Entity Normalization ◽

Biological Entities

Abstract Background Concept recognition is a term that corresponds to the two sequential steps of named entity recognition and named entity normalization, and plays an essential role in the field of bioinformatics. However, the conventional dictionary-based methods did not sufficiently addressed the variation of the concepts in actual use in literature, resulting in the particularly degraded performances in recognition of multi-token concepts. Results In this paper, we propose a concept recognition method of multi-token biological entities using neural models combined with literature contexts. The key aspect of our method is utilizing the contextual information from the biological knowledge-bases for concept normalization, which is followed by named entity recognition procedure. The model showed improved performances over conventional methods, particularly for multi-token concepts with higher variations. Conclusions We expect that our model can be utilized for effective concept recognition and variety of natural language processing tasks on bioinformatics.

Download Full-text

Comparative study using inverse ontology cogency and alternatives for concept recognition in the annotated National Library of Medicine database

Neural Networks ◽

10.1016/j.neunet.2021.01.018 ◽

2021 ◽

Vol 139 ◽

pp. 86-104

Author(s):

George J. Shannon ◽

Naga Rayapati ◽

Steven M. Corns ◽

Donald C. Wunsch

Keyword(s):

Comparative Study ◽

National Library ◽

Concept Recognition

Download Full-text

SIENA: Semi-automatic semantic enhancement of datasets using concept recognition

Journal of Biomedical Semantics ◽

10.1186/s13326-021-00239-z ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Andreea Grigoriu ◽

Amrapali Zaveri ◽

Gerhard Weiss ◽

Michel Dumontier

Keyword(s):

Machine Learning ◽

Scientific Research ◽

Recognition Task ◽

Biomedical Ontology ◽

Machine Learning Techniques ◽

Published Data ◽

Full Potential ◽

Biomedical Data ◽

Concept Recognition ◽

Research Questions

Abstract Background The amount of available data, which can facilitate answering scientific research questions, is growing. However, the different formats of published data are expanding as well, creating a serious challenge when multiple datasets need to be integrated for answering a question. Results This paper presents a semi-automated framework that provides semantic enhancement of biomedical data, specifically gene datasets. The framework involved a concept recognition task using machine learning, in combination with the BioPortal annotator. Compared to using methods which require only the BioPortal annotator for semantic enhancement, the proposed framework achieves the highest results. Conclusions Using concept recognition combined with machine learning techniques and annotation with a biomedical ontology, the proposed framework can provide datasets to reach their full potential of providing meaningful information, which can answer scientific research questions.

Download Full-text

PhenoTagger: a hybrid method for phenotype concept recognition using human phenotype ontology

Bioinformatics ◽

10.1093/bioinformatics/btab019 ◽

2021 ◽

Author(s):

Ling Luo ◽

Shankai Yan ◽

Po-Ting Lai ◽

Daniel Veltri ◽

Andrew Oler ◽

...

Keyword(s):

Machine Learning ◽

Hybrid Method ◽

Human Phenotype Ontology ◽

Training Data ◽

Supplementary Information ◽

Training Dataset ◽

Biomedical Text ◽

Phenotype Ontology ◽

Concept Recognition ◽

Human Phenotype

Abstract Motivation Automatic phenotype concept recognition from unstructured text remains a challenging task in biomedical text mining research. Previous works that address the task typically use dictionary-based matching methods, which can achieve high precision but suffer from lower recall. Recently, machine learning-based methods have been proposed to identify biomedical concepts, which can recognize more unseen concept synonyms by automatic feature learning. However, most methods require large corpora of manually annotated data for model training, which is difficult to obtain due to the high cost of human annotation. Results In this article, we propose PhenoTagger, a hybrid method that combines both dictionary and machine learning-based methods to recognize Human Phenotype Ontology (HPO) concepts in unstructured biomedical text. We first use all concepts and synonyms in HPO to construct a dictionary, which is then used to automatically build a distantly supervised training dataset for machine learning. Next, a cutting-edge deep learning model is trained to classify each candidate phrase (n-gram from input sentence) into a corresponding concept label. Finally, the dictionary and machine learning-based prediction results are combined for improved performance. Our method is validated with two HPO corpora, and the results show that PhenoTagger compares favorably to previous methods. In addition, to demonstrate the generalizability of our method, we retrained PhenoTagger using the disease ontology MEDIC for disease concept recognition to investigate the effect of training on different ontologies. Experimental results on the NCBI disease corpus show that PhenoTagger without requiring manually annotated training data achieves competitive performance as compared with state-of-the-art supervised methods. Availabilityand implementation The source code, API information and data for PhenoTagger are freely available at https://github.com/ncbi-nlp/PhenoTagger. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Exploring Word Segmentation and Medical Concept Recognition for Chinese Medical Texts

10.18653/v1/2021.bionlp-1.23 ◽

2021 ◽

Author(s):

Yang Liu ◽

Yuanhe Tian ◽

Tsung-Hui Chang ◽

Song Wu ◽

Xiang Wan ◽

...

Keyword(s):

Word Segmentation ◽

Medical Texts ◽

Concept Recognition ◽

Medical Concept

Download Full-text

Relationship Between Concept Recognition of a Product/Service Brand and Willingness to Pay

International Symposium on Affective Science and Engineering ◽

10.5057/isase.2021-c000005 ◽

2021 ◽

Vol ISASE2021 (0) ◽

pp. 1-4

Author(s):

Takumi KATO

Keyword(s):

Willingness To Pay ◽

Concept Recognition ◽

Product Service ◽

Service Brand

Download Full-text

Concept Recognition as a Machine Translation Problem

10.1101/2020.12.03.410829 ◽

2020 ◽

Author(s):

Mayla R Boguslav ◽

Negacy D Hailu ◽

Michael Bada ◽

William A Baumgartner ◽

Lawrence E Hunter

Keyword(s):

Machine Learning ◽

Machine Translation ◽

Language Processing ◽

State Of The Art ◽

Training Data ◽

Language Models ◽

Alternative Methods ◽

Automated Assignment ◽

Concept Recognition ◽

Alternative Approaches

AbstractBackgroundAutomated assignment of specific ontology concepts to mentions in text is a critical task in biomedical natural language processing, and the subject of many open shared tasks. Although the current state of the art involves the use of neural network language models as a post-processing step, the very large number of ontology classes to be recognized and the limited amount of gold-standard training data has impeded the creation of end-to-end systems based entirely on machine learning. Recently, Hailu et al. recast the concept recognition problem as a type of machine translation and demonstrated that sequence-to-sequence machine learning models had the potential to outperform multi-class classification approaches. Here we systematically characterize the factors that contribute to the accuracy and efficiency of several approaches to sequence-to-sequence machine learning.ResultsWe report on our extensive studies of alternative methods and hyperparameter selections. The results not only identify the best-performing systems and parameters across a wide variety of ontologies but also illuminate about the widely varying resource requirements and hyperparameter robustness of alternative approaches. Analysis of the strengths and weaknesses of such systems suggest promising avenues for future improvements as well as design choices that can increase computational efficiency with small costs in performance. Bidirectional Encoder Representations from Transformers for Biomedical Text Mining (BioBERT) for span detection (as previously found) along with the Open-source Toolkit for Neural Machine Translation (OpenNMT) for concept normalization achieve state-of-the-art performance for most ontologies in CRAFT Corpus. This approach uses substantially fewer computational resources, including hardware, memory, and time than several alternative approaches.ConclusionsMachine translation is a promising avenue for fully machine-learning-based concept recognition that achieves state-of-the-art results on the CRAFT Corpus, evaluated via a direct comparison to previous results from the 2019 CRAFT Shared Task. Experiments illuminating the reasons for the surprisingly good performance of sequence-to-sequence methods targeting ontology identifiers suggest that further progress may be possible by mapping to alternative target concept representations. All code and models can be found at: https://github.com/UCDenver-ccp/Concept-Recognition-as-Translation.

Download Full-text

A Query Understanding Framework for Earth Data Discovery

Applied Sciences ◽

10.3390/app10031127 ◽

2020 ◽

Vol 10 (3) ◽

pp. 1127

Author(s):

Yun Li ◽

Yongyao Jiang ◽

Justin C. Goldstein ◽

Lewis J. Mcgibbney ◽

Chaowei Yang

Keyword(s):

Search Engine ◽

Query Expansion ◽

Entity Recognition ◽

Free Text ◽

Data Discovery ◽

Concept Recognition ◽

Semantic Query ◽

User Query ◽

Query Understanding ◽

Search Intent

One longstanding complication with Earth data discovery involves understanding a user’s search intent from the input query. Most of the geospatial data portals use keyword-based match to search data. Little attention has focused on the spatial and temporal information from a query or understanding the query with ontology. No research in the geospatial domain has investigated user queries in a systematic way. Here, we propose a query understanding framework and apply it to fill the gap by better interpreting a user’s search intent for Earth data search engines and adopting knowledge that was mined from metadata and user query logs. The proposed query understanding tool contains four components: spatial and temporal parsing; concept recognition; Named Entity Recognition (NER); and, semantic query expansion. Spatial and temporal parsing detects the spatial bounding box and temporal range from a query. Concept recognition isolates clauses from free text and provides the search engine phrases instead of a list of words. Name entity recognition detects entities from the query, which inform the search engine to query the entities detected. The semantic query expansion module expands the original query by adding synonyms and acronyms to phrases in the query that was discovered from Web usage data and metadata. The four modules interact to parse a user’s query from multiple perspectives, with the goal of understanding the consumer’s quest intent for data. As a proof-of-concept, the framework is applied to oceanographic data discovery. It is demonstrated that the proposed framework accurately captures a user’s intent.

Download Full-text

Towards eXplainable AI in Text Features Engineering for Concept Recognition

Statistical Language and Speech Processing - Lecture Notes in Computer Science ◽

10.1007/978-3-030-59430-5_10 ◽

2020 ◽

pp. 122-133

Author(s):

Andreas Waldis ◽

Luca Mazzola ◽

Alexander Denzler

Keyword(s):

Concept Recognition ◽

Explainable Ai ◽

Text Features

Download Full-text

concept recognition
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Concept recognition as a machine translation problem

Context-aware multi-token concept recognition of biological entities

Comparative study using inverse ontology cogency and alternatives for concept recognition in the annotated National Library of Medicine database

SIENA: Semi-automatic semantic enhancement of datasets using concept recognition

PhenoTagger: a hybrid method for phenotype concept recognition using human phenotype ontology

Exploring Word Segmentation and Medical Concept Recognition for Chinese Medical Texts

Relationship Between Concept Recognition of a Product/Service Brand and Willingness to Pay

Concept Recognition as a Machine Translation Problem

A Query Understanding Framework for Earth Data Discovery

Towards eXplainable AI in Text Features Engineering for Concept Recognition

Export Citation Format

concept recognitionRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Concept recognition as a machine translation problem

Context-aware multi-token concept recognition of biological entities

Comparative study using inverse ontology cogency and alternatives for concept recognition in the annotated National Library of Medicine database

SIENA: Semi-automatic semantic enhancement of datasets using concept recognition

PhenoTagger: a hybrid method for phenotype concept recognition using human phenotype ontology

Exploring Word Segmentation and Medical Concept Recognition for Chinese Medical Texts

Relationship Between Concept Recognition of a Product/Service Brand and Willingness to Pay

Concept Recognition as a Machine Translation Problem

A Query Understanding Framework for Earth Data Discovery

Towards eXplainable AI in Text Features Engineering for Concept Recognition

concept recognition
Recently Published Documents