Utilizing BERT for biomedical and clinical text mining

Author(s):  
Runjie Zhu ◽  
Xinhui Tu ◽  
Jimmy Xiangji Huang
Keyword(s):  
Author(s):  
Bethany Percha

Electronic health records (EHRs) are becoming a vital source of data for healthcare quality improvement, research, and operations. However, much of the most valuable information contained in EHRs remains buried in unstructured text. The field of clinical text mining has advanced rapidly in recent years, transitioning from rule-based approaches to machine learning and, more recently, deep learning. With new methods come new challenges, however, especially for those new to the field. This review provides an overview of clinical text mining for those who are encountering it for the first time (e.g., physician researchers, operational analytics teams, machine learning scientists from other domains). While not a comprehensive survey, this review describes the state of the art, with a particular focus on new tasks and methods developed over the past few years. It also identifies key barriers between these remarkable technical advances and the practical realities of implementation in health systems and in industry. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 4 is July 2021. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.


2014 ◽  
Vol 32 (30_suppl) ◽  
pp. 183-183
Author(s):  
Suzanne Tamang ◽  
Manali I. Patel ◽  
Sam Finlayson ◽  
Xuemei Chen ◽  
Julie Lawrence Kuznetsov ◽  
...  

183 Background: Unplanned care can result in poor outcomes that are potentially preventable. The design of effective interventions to improve outcomes for cancer patients requires a better understanding of the true nature of unplanned care. Although cancer care teams document each patient’s care trajectory in detailed free-text notes, care outcomes are typically measured from structured patient record data and do not contain key information necessary for quality improvement efforts, such as the etiology of emergent events, or events that occur at outside facilities. To inform clinical effectiveness work at Stanford’s Cancer Institute, we describe our application of text-mining to improve the assessment of post-diagnosis morbidity outcomes. Methods: We conducted a retrospective study of unplanned care among 3,318 patients with a new diagnosis of breast, gastrointestinal, or thoracic cancer during 2010-13. Using a validated framework for clinical text-mining, we analyzed 308,000 notes for two tasks. First, we extract information on external unplanned events that are documented by providers. Second, we profile symptom mentions in Emergency Department (ED) notes. Results: For all cancer patients, text-mining detected over 400 unplanned events (93% PPV) at outside facilities, resulting in patient rates of 5% in the first 30 days, and 11% up to one year post-diagnosis. Among breast cancer patients, the top three symptoms reported in ED notes are pain (89%), nausea (37%) and fever (18%). Pain is consistently the most prevalent symptom up to one year after diagnosis, other symptoms exhibit more dynamic trends; wound related disorders and nausea are more prevalent among ED admissions in the first three months, whereas fever, cognitive impairment and mental health issues become more prevalent among admissions after the first three months of cancer care. Conclusions: The application of text-mining methods can improve the quantification of morbidity outcomes by improving the estimation of unplanned care rates and by providing continued learning for symptom-driven interventions to mitigate preventable emergent care. Although additional information gaps in care trajectories may continue to exist, text-mining can aid in assessing the true nature of unplanned care.


Author(s):  
Luke T Slater ◽  
William Bradlow ◽  
Simon Ball ◽  
Robert Hoehndorf ◽  
Georgios V Gkoutos

AbstractBackgroundBiomedical ontologies contain a wealth of metadata that constitutes a fundamental infrastructural resource for text mining. For several reasons, redundancies exist in the ontology ecosystem, which lead to the same concepts being described by several terms in the same or similar contexts across several ontologies. While these terms describe the same concepts, they contain different sets of complementary metadata. Linking these definitions to make use of their combined metadata could lead to improved performance in ontology-based information retrieval, extraction, and analysis tasks.ResultsWe develop and present an algorithm that expands the set of labels associated with an ontology class using a combination of strict lexical matching and cross-ontology reasoner-enabled equivalency queries. Across all disease terms in the Disease Ontology, the approach found 51,362 additional labels, more than tripling the number defined by the ontology itself. Manual validation by a clinical expert on a random sampling of expanded synonyms over the Human Phenotype Ontology yielded a precision of 0.912. Furthermore, we found that annotating patient visits in MIMIC-III with an extended set of Disease Ontology labels led to semantic similarity score derived from those labels being a significantly better predictor of matching first diagnosis, with a mean average precision of 0.88 for the unexpanded set of annotations, and 0.913 for the expanded set.ConclusionsInter-ontology synonym expansion can lead to a vast increase in the scale of vocabulary available for text mining applications. While the accuracy of the extended vocabulary is not perfect, it nevertheless led to a significantly improved ontology-based characterisation of patients from text in one setting. Furthermore, where run-on error is not acceptable, the technique can be used to provide candidate synonyms which can be checked by a domain expert.


Sign in / Sign up

Export Citation Format

Share Document