scholarly journals Automatic concept recognition using the Human Phenotype Ontology reference and test suite corpora

Database ◽  
2015 ◽  
Vol 2015 (0) ◽  
pp. bav005-bav005 ◽  
Author(s):  
T. Groza ◽  
S. Kohler ◽  
S. Doelken ◽  
N. Collier ◽  
A. Oellrich ◽  
...  
Author(s):  
Ling Luo ◽  
Shankai Yan ◽  
Po-Ting Lai ◽  
Daniel Veltri ◽  
Andrew Oler ◽  
...  

Abstract Motivation Automatic phenotype concept recognition from unstructured text remains a challenging task in biomedical text mining research. Previous works that address the task typically use dictionary-based matching methods, which can achieve high precision but suffer from lower recall. Recently, machine learning-based methods have been proposed to identify biomedical concepts, which can recognize more unseen concept synonyms by automatic feature learning. However, most methods require large corpora of manually annotated data for model training, which is difficult to obtain due to the high cost of human annotation. Results In this article, we propose PhenoTagger, a hybrid method that combines both dictionary and machine learning-based methods to recognize Human Phenotype Ontology (HPO) concepts in unstructured biomedical text. We first use all concepts and synonyms in HPO to construct a dictionary, which is then used to automatically build a distantly supervised training dataset for machine learning. Next, a cutting-edge deep learning model is trained to classify each candidate phrase (n-gram from input sentence) into a corresponding concept label. Finally, the dictionary and machine learning-based prediction results are combined for improved performance. Our method is validated with two HPO corpora, and the results show that PhenoTagger compares favorably to previous methods. In addition, to demonstrate the generalizability of our method, we retrained PhenoTagger using the disease ontology MEDIC for disease concept recognition to investigate the effect of training on different ontologies. Experimental results on the NCBI disease corpus show that PhenoTagger without requiring manually annotated training data achieves competitive performance as compared with state-of-the-art supervised methods. Availabilityand implementation The source code, API information and data for PhenoTagger are freely available at https://github.com/ncbi-nlp/PhenoTagger. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 132 ◽  
pp. S149
Author(s):  
Anne Slavotinek ◽  
Hannah Prasad ◽  
Hannah Hoban ◽  
Tiffany Yip ◽  
Shannon Rego ◽  
...  

2013 ◽  
Vol 6 ◽  
pp. BII.S10729
Author(s):  
Tudor Groza ◽  
Jane Hunter ◽  
Andreas Zankl

Over the course of the last few years there has been a significant amount of research performed on ontology-based formalization of phenotype descriptions. The intrinsic value and knowledge captured within such descriptions can only be expressed by taking advantage of their inner structure that implicitly combines qualities and anatomical entities. We present a meta-model (the Phenotype Fragment Ontology) and a processing pipeline that enable together the automatic decomposition and conceptualization of phenotype descriptions for the human skeletal phenome. We use this approach to showcase the usefulness of the generic concept of phenotype decomposition by performing an experimental study on all skeletal phenotype concepts defined in the Human Phenotype Ontology.


2015 ◽  
Vol 97 (1) ◽  
pp. 111-124 ◽  
Author(s):  
Tudor Groza ◽  
Sebastian Köhler ◽  
Dawid Moldenhauer ◽  
Nicole Vasilevsky ◽  
Gareth Baynam ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document