scholarly journals Modified Bidirectional Encoder Representations From Transformers Extractive Summarization Model for Hospital Information Systems Based on Character-Level Tokens (AlphaBERT): Development and Performance Evaluation

10.2196/17787 ◽  
2020 ◽  
Vol 8 (4) ◽  
pp. e17787 ◽  
Author(s):  
Yen-Pin Chen ◽  
Yi-Ying Chen ◽  
Jr-Jiun Lin ◽  
Chien-Hua Huang ◽  
Feipei Lai

Background Doctors must care for many patients simultaneously, and it is time-consuming to find and examine all patients’ medical histories. Discharge diagnoses provide hospital staff with sufficient information to enable handling multiple patients; however, the excessive amount of words in the diagnostic sentences poses problems. Deep learning may be an effective solution to overcome this problem, but the use of such a heavy model may also add another obstacle to systems with limited computing resources. Objective We aimed to build a diagnoses-extractive summarization model for hospital information systems and provide a service that can be operated even with limited computing resources. Methods We used a Bidirectional Encoder Representations from Transformers (BERT)-based structure with a two-stage training method based on 258,050 discharge diagnoses obtained from the National Taiwan University Hospital Integrated Medical Database, and the highlighted extractive summaries written by experienced doctors were labeled. The model size was reduced using a character-level token, the number of parameters was decreased from 108,523,714 to 963,496, and the model was pretrained using random mask characters in the discharge diagnoses and International Statistical Classification of Diseases and Related Health Problems sets. We then fine-tuned the model using summary labels and cleaned up the prediction results by averaging all probabilities for entire words to prevent character level–induced fragment words. Model performance was evaluated against existing models BERT, BioBERT, and Long Short-Term Memory (LSTM) using the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) L score, and a questionnaire website was built to collect feedback from more doctors for each summary proposal. Results The area under the receiver operating characteristic curve values of the summary proposals were 0.928, 0.941, 0.899, and 0.947 for BERT, BioBERT, LSTM, and the proposed model (AlphaBERT), respectively. The ROUGE-L scores were 0.697, 0.711, 0.648, and 0.693 for BERT, BioBERT, LSTM, and AlphaBERT, respectively. The mean (SD) critique scores from doctors were 2.232 (0.832), 2.134 (0.877), 2.207 (0.844), 1.927 (0.910), and 2.126 (0.874) for reference-by-doctor labels, BERT, BioBERT, LSTM, and AlphaBERT, respectively. Based on the paired t test, there was a statistically significant difference in LSTM compared to the reference (P<.001), BERT (P=.001), BioBERT (P<.001), and AlphaBERT (P=.002), but not in the other models. Conclusions Use of character-level tokens in a BERT model can greatly decrease the model size without significantly reducing performance for diagnoses summarization. A well-developed deep-learning model will enhance doctors’ abilities to manage patients and promote medical studies by providing the capability to use extensive unstructured free-text notes.

2020 ◽  
Author(s):  
Yen-Pin Chen ◽  
Yi-Ying Chen ◽  
Jr-Jiun Lin ◽  
Chien-Hua Huang ◽  
Feipei Lai

BACKGROUND Doctors must care for many patients simultaneously, and it is time-consuming to find and examine all patients’ medical histories. Discharge diagnoses provide hospital staff with sufficient information to enable handling multiple patients; however, the excessive amount of words in the diagnostic sentences poses problems. Deep learning may be an effective solution to overcome this problem, but the use of such a heavy model may also add another obstacle to systems with limited computing resources. OBJECTIVE We aimed to build a diagnoses-extractive summarization model for hospital information systems and provide a service that can be operated even with limited computing resources. METHODS We used a Bidirectional Encoder Representations from Transformers (BERT)-based structure with a two-stage training method based on 258,050 discharge diagnoses obtained from the National Taiwan University Hospital Integrated Medical Database, and the highlighted extractive summaries written by experienced doctors were labeled. The model size was reduced using a character-level token, the number of parameters was decreased from 108,523,714 to 963,496, and the model was pretrained using random mask characters in the discharge diagnoses and International Statistical Classification of Diseases and Related Health Problems sets. We then fine-tuned the model using summary labels and cleaned up the prediction results by averaging all probabilities for entire words to prevent character level–induced fragment words. Model performance was evaluated against existing models BERT, BioBERT, and Long Short-Term Memory (LSTM) using the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) L score, and a questionnaire website was built to collect feedback from more doctors for each summary proposal. RESULTS The area under the receiver operating characteristic curve values of the summary proposals were 0.928, 0.941, 0.899, and 0.947 for BERT, BioBERT, LSTM, and the proposed model (AlphaBERT), respectively. The ROUGE-L scores were 0.697, 0.711, 0.648, and 0.693 for BERT, BioBERT, LSTM, and AlphaBERT, respectively. The mean (SD) critique scores from doctors were 2.232 (0.832), 2.134 (0.877), 2.207 (0.844), 1.927 (0.910), and 2.126 (0.874) for reference-by-doctor labels, BERT, BioBERT, LSTM, and AlphaBERT, respectively. Based on the paired t test, there was a statistically significant difference in LSTM compared to the reference (<i>P</i>&lt;.001), BERT (<i>P</i>=.001), BioBERT (<i>P</i>&lt;.001), and AlphaBERT (<i>P</i>=.002), but not in the other models. CONCLUSIONS Use of character-level tokens in a BERT model can greatly decrease the model size without significantly reducing performance for diagnoses summarization. A well-developed deep-learning model will enhance doctors’ abilities to manage patients and promote medical studies by providing the capability to use extensive unstructured free-text notes.


1995 ◽  
Vol 34 (04) ◽  
pp. 378-396 ◽  
Author(s):  
A. Winter ◽  
R. Haux

Abstract:Information processing in hospitals, especially in university hospitals, is currently faced with two major issues: low-cost hardware and progress in networking technology leads to a further decentralization of computing capacity, due to the increasing need for information processing in hospitals and due to economic restrictions, it is necessary to use, commercial software products. This leads to heterogeneous hospital information systems using a variety of software and hardware products, and to a stronger demand for integrating these products and, in general, for a dedicated methodology for the management of hospital information systems to support patient care and medical research. We present a three-level graph-based model (3LGM) to support the systematic management of hospital information systems. 3LGM can serve as a basis for assessing the quality of information processing in hospitals. 3LGM distinguishes between a procedural level for describing the information procedures (and their information interchange) of a hospital information system and thus its functionality, a logical tool level, focusing on application systems and communication links, and a physical tool level with physical subsystems (e.g., computer systems) and data transmission. The examples that are presented have been taken from the Heidelberg University Hospital Information System.


2021 ◽  
Author(s):  
Joon-myoung Kwon ◽  
Ye Rang Lee ◽  
Min-Seung Jung ◽  
Yoon-Ji Lee ◽  
Yong-Yeon Jo ◽  
...  

Abstract Background: Sepsis is a life-threatening organ dysfunction and is a major healthcare burden worldwide. Although sepsis is a medical emergency that requires immediate management, it is difficult to screen the occurrence of sepsis. In this study, we propose an artificial intelligence based on deep learning-based model (DLM) for screening sepsis using electrocardiography (ECG).Methods: This retrospective cohort study included 46,017 patients who admitted to two hospitals. 1,548 and 639 patients underwent sepsis and septic shock. The DLM was developed using 73,727 ECGs of 18,142 patients and internal validation was conducted using 7,774 ECGs of 7,774 patients. Furthermore, we conducted an external validation with 20,101 ECGs of 20,101 patients from another hospital to verify the applicability of the DLM across centers.Results: During the internal and external validation, the area under the receiver operating characteristic curve (AUC) of an DLM using 12-lead ECG for screening sepsis were 0.901 (95% confidence interval 0.882–0.920) and 0.863 (0.846–0.879), respectively. During internal and external validation, AUC of an DLM for detecting septic shock were 0.906 (95% CI = 0.877–0.936) and 0.899 (95% CI = 0.872–0.925), respectively. The AUC of the DLM for detecting sepsis using 6-lead and single-lead ECGs were 0.845–0.882. A sensitivity map showed that the QRS complex and T wave was associated with sepsis. Subgroup analysis was conducted using ECGs from 4,609 patients who admitted with infectious disease, The AUC of the DLM for predicting in-hospital mortality was 0.817 (0.793–0.840). There was a significant difference in the prediction score of DLM using ECG according to the presence of infection in the validation dataset (0.277 vs 0.574, p<0.001), including severe acute respiratory syndrome coronavirus 2 (0.260 vs 0.725, p=0.018).Conclusions: The DLM demonstrated reasonable performance for screening sepsis using 12-, 6-, and single-lead ECG. The results suggest that sepsis can be screened using not only conventional ECG devices, but also diverse life-type ECG machine employing the DLM, thereby preventing irreversible disease progression and mortality.


2021 ◽  
pp. 20200611
Author(s):  
Masako Nishiyama ◽  
Kenichiro Ishibashi ◽  
Yoshiko Ariji ◽  
Motoki Fukuda ◽  
Wataru Nishiyama ◽  
...  

Objective: The present study aimed to verify the classification performance of deep learning (DL) models for diagnosing fractures of the mandibular condyle on panoramic radiographs using data sets from two hospitals and to compare their internal and external validities. Methods: Panoramic radiographs of 100 condyles with and without fractures were collected from two hospitals and a fivefold cross-validation method was employed to construct and evaluate the DL models. The internal and external validities of classification performance were evaluated as accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve (AUC). Results: For internal validity, high classification performance was obtained, with AUC values of >0.85. Conversely, external validity for the data sets from the two hospitals exhibited low performance. Using combined data sets from both hospitals, the DL model exhibited high performance, which was slightly superior or equal to that of the internal validity but without a statistically significant difference. Conclusion: The constructed DL model can be clinically employed for diagnosing fractures of the mandibular condyle using panoramic radiographs. However, the domain shift phenomenon should be considered when generalizing DL systems.


2009 ◽  
Vol 48 (06) ◽  
pp. 531-539 ◽  
Author(s):  
L. Ißler ◽  
A. Winter ◽  
K. Takabayashi ◽  
F. Jahn

Summary Objectives: To examine the architectural differences and similarities of a Japanese and German hospital information system (HIS) in a case study. This cross-cultural comparison, which focuses on structural quality characteristics, offers the chance to get new insights into different HIS architectures, which possibly cannot be obtained by inner-country comparisons. Methods: A reference model for the domain layer of hospital information systems containing the typical enterprise functions of a hospital provides the basis of comparison for the two different hospital information systems. 3LGM2 models, which describe the two HISs and which are based on that reference model, are used to assess several structural quality criteria. Four of these criteria are introduced in detail. Results: The two examined HISs are different in terms of the four structural quality criteria examined. Whereas the centralized architecture of the hospital information system at Chiba University Hospital causes only few functional redundancies and leads to a low implementation of communication standards, the hospital information system at the University Hospital of Leipzig, having a decentralized architecture, exhibits more functional redundancies and a higher use of communication standards. Conclusions: Using a model-based comparison, it was possible to detect remarkable differences between the observed hospital information systems of completely different cultural areas. However, the usability of 3LGM2 models for comparisons has to be improved in order to apply key figures and to assess or benchmark the structural quality of health information systems architectures more thoroughly.


Diagnostics ◽  
2021 ◽  
Vol 11 (10) ◽  
pp. 1812
Author(s):  
Joseph Bae ◽  
Saarthak Kapse ◽  
Gagandeep Singh ◽  
Rishabh Gattu ◽  
Syed Ali ◽  
...  

In this study, we aimed to predict mechanical ventilation requirement and mortality using computational modeling of chest radiographs (CXRs) for coronavirus disease 2019 (COVID-19) patients. This two-center, retrospective study analyzed 530 deidentified CXRs from 515 COVID-19 patients treated at Stony Brook University Hospital and Newark Beth Israel Medical Center between March and August 2020. Linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), and random forest (RF) machine learning classifiers to predict mechanical ventilation requirement and mortality were trained and evaluated using radiomic features extracted from patients’ CXRs. Deep learning (DL) approaches were also explored for the clinical outcome prediction task and a novel radiomic embedding framework was introduced. All results are compared against radiologist grading of CXRs (zone-wise expert severity scores). Radiomic classification models had mean area under the receiver operating characteristic curve (mAUCs) of 0.78 ± 0.05 (sensitivity = 0.72 ± 0.07, specificity = 0.72 ± 0.06) and 0.78 ± 0.06 (sensitivity = 0.70 ± 0.09, specificity = 0.73 ± 0.09), compared with expert scores mAUCs of 0.75 ± 0.02 (sensitivity = 0.67 ± 0.08, specificity = 0.69 ± 0.07) and 0.79 ± 0.05 (sensitivity = 0.69 ± 0.08, specificity = 0.76 ± 0.08) for mechanical ventilation requirement and mortality prediction, respectively. Classifiers using both expert severity scores and radiomic features for mechanical ventilation (mAUC = 0.79 ± 0.04, sensitivity = 0.71 ± 0.06, specificity = 0.71 ± 0.08) and mortality (mAUC = 0.83 ± 0.04, sensitivity = 0.79 ± 0.07, specificity = 0.74 ± 0.09) demonstrated improvement over either artificial intelligence or radiologist interpretation alone. Our results also suggest instances in which the inclusion of radiomic features in DL improves model predictions over DL alone. The models proposed in this study and the prognostic information they provide might aid physician decision making and efficient resource allocation during the COVID-19 pandemic.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Yuta Nakamura ◽  
Shouhei Hanaoka ◽  
Yukihiro Nomura ◽  
Takahiro Nakao ◽  
Soichiro Miki ◽  
...  

Abstract Background It is essential for radiologists to communicate actionable findings to the referring clinicians reliably. Natural language processing (NLP) has been shown to help identify free-text radiology reports including actionable findings. However, the application of recent deep learning techniques to radiology reports, which can improve the detection performance, has not been thoroughly examined. Moreover, free-text that clinicians input in the ordering form (order information) has seldom been used to identify actionable reports. This study aims to evaluate the benefits of two new approaches: (1) bidirectional encoder representations from transformers (BERT), a recent deep learning architecture in NLP, and (2) using order information in addition to radiology reports. Methods We performed a binary classification to distinguish actionable reports (i.e., radiology reports tagged as actionable in actual radiological practice) from non-actionable ones (those without an actionable tag). 90,923 Japanese radiology reports in our hospital were used, of which 788 (0.87%) were actionable. We evaluated four methods, statistical machine learning with logistic regression (LR) and with gradient boosting decision tree (GBDT), and deep learning with a bidirectional long short-term memory (LSTM) model and a publicly available Japanese BERT model. Each method was used with two different inputs, radiology reports alone and pairs of order information and radiology reports. Thus, eight experiments were conducted to examine the performance. Results Without order information, BERT achieved the highest area under the precision-recall curve (AUPRC) of 0.5138, which showed a statistically significant improvement over LR, GBDT, and LSTM, and the highest area under the receiver operating characteristic curve (AUROC) of 0.9516. Simply coupling the order information with the radiology reports slightly increased the AUPRC of BERT but did not lead to a statistically significant improvement. This may be due to the complexity of clinical decisions made by radiologists. Conclusions BERT was assumed to be useful to detect actionable reports. More sophisticated methods are required to use order information effectively.


2021 ◽  
Vol 36 (Supplement_1) ◽  
Author(s):  
S Ueno ◽  
M Ito ◽  
K Uchiyama ◽  
T Okimura ◽  
A Yabuuchi ◽  
...  

Abstract Study question How is the performance of an automated embryo scoring system for pregnancy prediction after single-vitrified blastocyst transfer (SVBT) compared to other, annotation-dependent blastocyst grading systems? Summary answer Automatic embryo ranking by iDAScore shows a higher or equal performance, with regards to pregnancy prediction after SVBT, compared to manual, annotation-dependent grading systems. What is known already Blastocyst viability can be assessed by blastocyst morphology grades and/or morphokinetic parameters. However, morphological and morphokinetic embryo assessment is prone to both inter- and intra-observer variation. Recently, embryo ranking models have been developed based on artificial intelligence (AI) and deep learning. Such models rank embryos according to their potential for pregnancy only based on images and do not require any user-dependent annotation. So far, no study has independently assessed the performance of AI models compared to other embryo scoring models, including traditional morphological grading. Study design, size, duration A total of 3,014 SVBT cycles were retrospectively analysed. Embryos were stratified according to SART age groups. The quality and scoring of embryos were assessed by iDAScore v1.0 (iDAS, Vitrolife, Sweden), KIDScoreTM D5 v3 (KS; Vitrolife), and Gardner criteria. The performance of the pregnancy prediction for each embryo scoring model was compared using the area under curve (AUC) of the receiver operating characteristic curve for each maternal age group. Participants/materials, setting, methods Embryos were cultured in the EmbryoScope+ and EmbryoScopeFlex (Vitrolife). iDAS was automatically calculated using the iDAScore model running on the EmbryoViewer (Vitrolife). KS was calculated in EmbryoViewer after annotation of the required parameters. ICM and TE were annotated according to the Gardner criteria. The degree of expansion in all blastocysts was Grade 4 due to our freezing policy. Furthermore, Gardner’s scores were stratified into four grades (Excellent: AA, Good: AB BA, Fair: BB, Poor: others). Main results and the role of chance The AUCs of the &lt; 35 years age group (n = 389) for pregnancy prediction were 0.72 for iDAS, 0.66 for KS and 0.64 for Gardner criteria. The AUC of iDAS was significantly higher (P &lt; 0.05) compared to the other two models. For the 35–37 years age group (n = 514) the AUCs were 0.68, 0.68, and 0.65 for iDAS, KS and Gardner, respectively, and were not significantly different. The AUCs of the 38–40 years age group (n = 796) were 0.67 for iDAS, 0.65 for KS and 0.64 for Gardner criteria and where was not significantly different. The AUCs of the 41–42 years age group (n = 636) were 0.66, 0.66, and 0.63 for iDAS, KS and Gardner, respectively, and there was no significant difference among the pregnancy prediction models. For the &gt; 42 years age group (n = 389) AUCs were 0.76 for iDAS, 0.75 for KS and 0.75 for Gardner criteria and not significantly different. Thus, for all age groups, iDAS was either highest or equal to the highest AUC, although a significant difference was only observed for the youngest age group. Limitations, reasons for caution In this study, SVBT was performed after minimal stimulation and natural cycle in vitro fertilisation (IVF). Therefore, we had only few cycles with elective blastocyst transfer. However, there was also no bias in selecting the embryos for SVBT. Wider implications of the findings Our results showed that objective embryo assessment by a completely automatic and annotation-free model, iDAScore, does perform as good or even better than more traditional embryo assessment or an annotation-dependent ranking tool. iDAS could be an optimal pregnancy prediction model after SVBT, especially in young and advanced age patients. Trial registration number not applicable


2004 ◽  
Vol 43 (03) ◽  
pp. 256-267 ◽  
Author(s):  
A. Häber ◽  
B. Brigl ◽  
A. Winter ◽  
T. Wendt

Summary Objectives: We introduce the 3LGM2 tool, a tool for modeling information systems, and describe the process of modeling parts of the hospital information system of the Leipzig University Hospital (UKLa). We modeled the sub information systems of five patient record archiving sections to support the creation of a proposal for governmental financial support for a new document management and archiving system. We explain the steps of identifying the model elements and their relations as well as the analyzing capabilities of the 3LGM2 tool to answer questions about the information system. Methods: The 3LGM2 tool was developed on the basis of the meta model 3LGM2 which is described in detail in [1]. 3LGM2 defines an ontological basis, divided into three layers and their relationships. In addition to usual meta CASE tools, the 3LGM2 tool meets certain requirements of information management in hospitals. The model described in this article was created on the base of on-site surveys in five archiving sections of the UKL. Results: A prototype of the 3LGM2 tool is available and is currently tested in some projects at the UKL and partner institutions. The model presented in this article is a structured documentation about the current state of patient record archiving at the UKL. The analyzing capabilities of the 3LGM2 tool help to use the model and to answer questions about the information system. Conclusions: The 3LGM2 tool can be used to model and analyze information systems. The presentation capabilities and the reliability of the prototype have to be improved. The initial modeling effort of an institution is only valuable if the model is maintained regularly and reused in other projects. Reference catalogues and reference models are needed to decrease this effort and to support the creation of comparable models.


Sign in / Sign up

Export Citation Format

Share Document