scholarly journals Improving Broad-Coverage Medical Entity Linking with Semantic Type Prediction and Large-Scale Datasets

2021 ◽  
pp. 103880
Author(s):  
Shikhar Vashishth ◽  
Denis Newman-Griffis ◽  
Rishabh Joshi ◽  
Ritam Dutt ◽  
Carolyn P. Rosé
2021 ◽  
Vol 21 (S9) ◽  
Author(s):  
Cheng Yan ◽  
Yuanzhe Zhang ◽  
Kang Liu ◽  
Jun Zhao ◽  
Yafei Shi ◽  
...  

Abstract Background A lot of medical mentions can be extracted from a huge amount of medical texts. In order to make use of these medical mentions, a prerequisite step is to link those medical mentions to a medical domain knowledge base (KB). This linkage of mention to a well-defined, unambiguous KB is a necessary part of the downstream application such as disease diagnosis and prescription of drugs. Such demand becomes more urgent in colloquial and informal situations like online medical consultation, where the medical language is more casual and vaguer. In this article, we propose an unsupervised method to link the Chinese medical symptom mentions to the ICD10 classification in a colloquial background. Methods We propose an unsupervised entity linking model using multi-instance learning (MIL). Our approach builds on a basic unsupervised entity linking method (named BEL), which is an embedding similarity-based EL model in this paper, and uses MIL training paradigm to boost the performance of BEL. First, we construct a dataset from an unlabeled large-scale Chinese medical consultation corpus with the help of BEL. Subsequently, we use a variety of encoders to obtain the representations of mention-context and the ICD10 entities. Then the representations are fed into a ranking network to score candidate entities. Results We evaluate the proposed model on the test dataset annotated by professional doctors. The evaluation results show that our method achieves 60.34% accuracy, exceeding the fundamental BEL by 1.72%. Conclusions We propose an unsupervised entity linking method to the entity linking in the medical domain, using MIL training manner. We annotate a test set for evaluation. The experimental results show that our model behaves better than the fundamental model BEL, and provides an insight for future research.


Author(s):  
Greg Durrett ◽  
Dan Klein

We present a joint model of three core tasks in the entity analysis stack: coreference resolution (within-document clustering), named entity recognition (coarse semantic typing), and entity linking (matching to Wikipedia entities). Our model is formally a structured conditional random field. Unary factors encode local features from strong baselines for each task. We then add binary and ternary factors to capture cross-task interactions, such as the constraint that coreferent mentions have the same semantic type. On the ACE 2005 and OntoNotes datasets, we achieve state-of-the-art results for all three tasks. Moreover, joint modeling improves performance on each task over strong independent baselines.


2019 ◽  
Vol 76 (2) ◽  
pp. 948-963 ◽  
Author(s):  
Yingchun Xia ◽  
Xingyue Wang ◽  
Lichuan Gu ◽  
Qijuan Gao ◽  
Jun Jiao ◽  
...  

2019 ◽  
Vol 1 (1) ◽  
pp. 77-98 ◽  
Author(s):  
Hailong Jin ◽  
Chengjiang Li ◽  
Jing Zhang ◽  
Lei Hou ◽  
Juanzi Li ◽  
...  

Knowledge bases (KBs) are often greatly incomplete, necessitating a demand for KB completion. Although XLORE is an English-Chinese bilingual knowledge graph, there are only 423,974 cross-lingual links between English instances and Chinese instances. We present XLORE2, an extension of the XLORE that is built automatically from Wikipedia, Baidu Baike and Hudong Baike. We add more facts by making cross-lingual knowledge linking, cross-lingual property matching and fine-grained type inference. We also design an entity linking system to demonstrate the effectiveness and broad coverage of XLORE2.


2016 ◽  
Vol 26 ◽  
pp. 641 ◽  
Author(s):  
Aaron Steven White ◽  
Kyle Rawlins

We develop a probabilistic model of S(emantic)-selection that encodes both the notion of systematic mappings from semantic type signature to syntactic distribution—i.e., projection rules—and the notion of selectional noise—e.g., C(ategory)-selection, L(exical)-selection, and/or other independent syntactic processes. We train this model on data from a large-scale judgment study assessing the acceptability of 1,000 English clause-taking verbs in 50 distinct syntactic frames, finding that this model infers coherent semantic type signatures. We focus in on type signatures relevant to interrogative and declarative selection, arguing that our results suggest a principled split between cognitive verbs, which select distinct proposition and question types, and communicative verbs, which select a single hybrid type.


2019 ◽  
Author(s):  
Ishani Mondal ◽  
Sukannya Purkayastha ◽  
Sudeshna Sarkar ◽  
Pawan Goyal ◽  
Jitesh Pillai ◽  
...  

2020 ◽  
Vol 34 (05) ◽  
pp. 9757-9764
Author(s):  
Ming Zhu ◽  
Busra Celikkaya ◽  
Parminder Bhatia ◽  
Chandan K. Reddy

Entity linking is the task of linking mentions of named entities in natural language text, to entities in a curated knowledge-base. This is of significant importance in the biomedical domain, where it could be used to semantically annotate a large volume of clinical records and biomedical literature, to standardized concepts described in an ontology such as Unified Medical Language System (UMLS). We observe that with precise type information, entity disambiguation becomes a straightforward task. However, fine-grained type information is usually not available in biomedical domain. Thus, we propose LATTE, a LATent Type Entity Linking model, that improves entity linking by modeling the latent fine-grained type information about mentions and entities. Unlike previous methods that perform entity linking directly between the mentions and the entities, LATTE jointly does entity disambiguation, and latent fine-grained type learning, without direct supervision. We evaluate our model on two biomedical datasets: MedMentions, a large scale public dataset annotated with UMLS concepts, and a de-identified corpus of dictated doctor's notes that has been annotated with ICD concepts. Extensive experimental evaluation shows our model achieves significant performance improvements over several state-of-the-art techniques.


Sign in / Sign up

Export Citation Format

Share Document