scholarly journals Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction

2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Laila Rasmy ◽  
Yang Xiang ◽  
Ziqian Xie ◽  
Cui Tao ◽  
Degui Zhi

AbstractDeep learning (DL)-based predictive models from electronic health records (EHRs) deliver impressive performance in many clinical tasks. Large training cohorts, however, are often required by these models to achieve high accuracy, hindering the adoption of DL-based models in scenarios with limited training data. Recently, bidirectional encoder representations from transformers (BERT) and related models have achieved tremendous successes in the natural language processing domain. The pretraining of BERT on a very large training corpus generates contextualized embeddings that can boost the performance of models trained on smaller datasets. Inspired by BERT, we propose Med-BERT, which adapts the BERT framework originally developed for the text domain to the structured EHR domain. Med-BERT is a contextualized embedding model pretrained on a structured EHR dataset of 28,490,650 patients. Fine-tuning experiments showed that Med-BERT substantially improves the prediction accuracy, boosting the area under the receiver operating characteristics curve (AUC) by 1.21–6.14% in two disease prediction tasks from two clinical databases. In particular, pretrained Med-BERT obtains promising performances on tasks with small fine-tuning training sets and can boost the AUC by more than 20% or obtain an AUC as high as a model trained on a training set ten times larger, compared with deep learning models without Med-BERT. We believe that Med-BERT will benefit disease prediction studies with small local training datasets, reduce data collection expenses, and accelerate the pace of artificial intelligence aided healthcare.

2019 ◽  
Vol 182 ◽  
pp. 105055 ◽  
Author(s):  
Binh P. Nguyen ◽  
Hung N. Pham ◽  
Hop Tran ◽  
Nhung Nghiem ◽  
Quang H. Nguyen ◽  
...  

Author(s):  
Milica Milutinovic ◽  
Bart De Decker

Electronic Health Records (EHRs) are becoming the ubiquitous technology for managing patients' records in many countries. They allow for easier transfer and analysis of patient data on a large scale. However, privacy concerns linked to this technology are emerging. Namely, patients rarely fully understand how EHRs are managed. Additionally, the records are not necessarily stored within the organization where the patient is receiving her healthcare. This service may be delegated to a remote provider, and it is not always clear which health-provisioning entities have access to this data. Therefore, in this chapter the authors propose an alternative where users can keep and manage their records in their existing eHealth systems. The approach is user-centric and enables the patients to have better control over their data while still allowing for special measures to be taken in case of emergency situations with the goal of providing the required care to the patient.


2019 ◽  
Vol 97 ◽  
pp. 103256 ◽  
Author(s):  
Awais Ashfaq ◽  
Anita Sant’Anna ◽  
Markus Lingman ◽  
Sławomir Nowaczyk

Database ◽  
2019 ◽  
Vol 2019 ◽  
Author(s):  
Tao Chen ◽  
Mingfen Wu ◽  
Hexi Li

Abstract The automatic extraction of meaningful relations from biomedical literature or clinical records is crucial in various biomedical applications. Most of the current deep learning approaches for medical relation extraction require large-scale training data to prevent overfitting of the training model. We propose using a pre-trained model and a fine-tuning technique to improve these approaches without additional time-consuming human labeling. Firstly, we show the architecture of Bidirectional Encoder Representations from Transformers (BERT), an approach for pre-training a model on large-scale unstructured text. We then combine BERT with a one-dimensional convolutional neural network (1d-CNN) to fine-tune the pre-trained model for relation extraction. Extensive experiments on three datasets, namely the BioCreative V chemical disease relation corpus, traditional Chinese medicine literature corpus and i2b2 2012 temporal relation challenge corpus, show that the proposed approach achieves state-of-the-art results (giving a relative improvement of 22.2, 7.77, and 38.5% in F1 score, respectively, compared with a traditional 1d-CNN classifier). The source code is available at https://github.com/chentao1999/MedicalRelationExtraction.


2020 ◽  
Vol 101 ◽  
pp. 103337 ◽  
Author(s):  
Jose Roberto Ayala Solares ◽  
Francesca Elisa Diletta Raimondi ◽  
Yajie Zhu ◽  
Fatemeh Rahimian ◽  
Dexter Canoy ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document