Unstructured data in electronic health records, represented by clinical texts, are a vast source of healthcare information because they describe a patient's journey, including clinical findings, procedures, and information about the continuity of care. The publication of several studies on temporal relation extraction from clinical texts during the last decade and the realization of multiple shared tasks highlight the importance of this research theme. Therefore, we propose a review of temporal relation extraction in clinical texts. We analyzed 105 articles and verified that relations between events and document creation time, a coarse temporality type, were addressed with traditional machine learning–based models with few recent initiatives to push the state-of-the-art with deep learning–based models. For temporal relations between entities (event and temporal expressions) in the document, factors such as dataset imbalance because of candidate pair generation and task complexity directly affect the system's performance. The state-of-the-art resides on attention-based models, with contextualized word representations being fine-tuned for temporal relation extraction. However, further experiments and advances in the research topic are required until real-time clinical domain applications are released. Furthermore, most of the publications mainly reside on the same dataset, hindering the need for new annotation projects that provide datasets for different medical specialties, clinical text types, and even languages.
AbstractCurrent criteria for depression are imprecise and do not accurately characterize its distinct clinical presentations. As a result, its diagnosis lacks clinical utility in both treatment and research settings. Data-driven efforts to refine criteria have typically focused on a limited set of symptoms that do not reflect the disorder’s heterogeneity. By contrast, clinicians often write about patients in depth, creating descriptions that may better characterize depression. However, clinical text is not commonly used to this end. Here we show that clinically relevant depressive subtypes can be derived from unstructured electronic health records. Five subtypes were identified amongst 18,314 patients with depression treated at a large mental healthcare provider by using unsupervised machine learning: severe-typical, psychotic, mild-typical, agitated, and anergic-apathetic. Subtypes were used to place patients in groups for validation; groups were found to be associated with future outcomes and characteristics that were consistent with the subtypes. These associations suggest that these categorizations are actionable due to their validity with respect to disease prognosis. Moreover, they were derived with automated techniques that might theoretically be widely implemented, allowing for future analyses in more varied populations and settings. Additional research, especially with respect to treatment response, may prove useful in further evaluation.
Generalized language models that are pre-trained with a large corpus have achieved great performance on natural language tasks. While many pre-trained transformers for English are published, few models are available for Japanese text, especially in clinical medicine. In this work, we demonstrate the development of a clinical specific BERT model with a huge amount of Japanese clinical text and evaluate it on the NTCIR-13 MedWeb that has fake Twitter messages regarding medical concerns with eight labels. Approximately 120 million clinical texts stored at the University of Tokyo Hospital were used as our dataset. The BERT-base was pre-trained using the entire dataset and a vocabulary including 25,000 tokens. The pre-training was almost saturated at about 4 epochs, and the accuracies of Masked-LM and Next Sentence Prediction were 0.773 and 0.975, respectively. The developed BERT did not show significantly higher performance on the MedWeb task than the other BERT models that were pre-trained with Japanese Wikipedia text. The advantage of pre-training on clinical text may become apparent in more complex tasks on actual clinical text, and such an evaluation set needs to be developed.
Clinical notes are unstructured text documents generated by clinicians during patient encounters, generally are annotated with International Classification of Diseases (ICD) codes, which give formatted information about the diagnosis and treatment. ICD code has shown its potentials in many fields, but manual coding is labor-intensive and error-prone, lead to researches of automatic coding. Two specific challenges of this task are (1) given an annotated clinical notes, the reasons behind specific diagnoses and treatments are implicit; (2) explainability is important for practical automatic coding method, the method should not only explain its prediction output but also have explainable internal mechanics. This study aims to develop an explainable CNN approach to address these two challenges.
Our key idea is that for the automatic ICD coding task, the presence of informative snippets in the clinical text that correlated with each code plays an important role in the prediction of codes, and an informative snippet can be considered as a local and low-level feature. We infer that there exists a correspondence between a convolution filter and a local and low-level feature. Base on the inference, we come up with the Shallow and Wide Attention convolutional Mechanism (SWAM) to improve the CNN-based models’ ability to learn local and low-level features for each label.
We evaluate our approach on MIMIC-III, an open-access dataset of ICU medical records. Our approach substantially outperforms previous results on top-50 medical code prediction on MIMIC-III dataset, the precision of the worst-performing 10% labels in previous works is increased from 0% to 53% on average. We attribute this improvement to SWAM, by which the wide architecture with attention mechanism gives the model ability to more extensively learn the unique features of different codes, and we prove it by an ablation experiment. Besides, we perform manual analysis of the performance imbalance between different codes, and preliminary conclude the characteristics that determine the difficulty of learning specific codes.
Our main contributions can be summarized into the following three: (1) We present local and low-level features, a.k.a. informative snippets play an important role in the automatic ICD coding task, and the informative snippets extracted from the clinical text provide explanations for each code. (2) We propose that there exists a correspondence between a convolution filter and a local and low-level feature. A combination of wide and shallow convolutional layer and attention layer can help the CNN-based models better learn local and low-level features. (3) We improved the precision of the worst-performing 10% labels from 0 to 53% on average.
Natural language processing (NLP) is a set of automated methods to organise and evaluate the information contained in unstructured clinical notes, which are a rich source of real-world data from clinical care that may be used to improve outcomes and understanding of disease in cardiology. The purpose of this systematic review is to provide an understanding of NLP, review how it has been used to date within cardiology and illustrate the opportunities that this approach provides for both research and clinical care. We systematically searched six scholarly databases (ACM Digital Library, Arxiv, Embase, IEEE Explore, PubMed and Scopus) for studies published in 2015–2020 describing the development or application of NLP methods for clinical text focused on cardiac disease. Studies not published in English, lacking a description of NLP methods, non-cardiac focused and duplicates were excluded. Two independent reviewers extracted general study information, clinical details and NLP details and appraised quality using a checklist of quality indicators for NLP studies. We identified 37 studies developing and applying NLP in heart failure, imaging, coronary artery disease, electrophysiology, general cardiology and valvular heart disease. Most studies used NLP to identify patients with a specific diagnosis and extract disease severity using rule-based NLP methods. Some used NLP algorithms to predict clinical outcomes. A major limitation is the inability to aggregate findings across studies due to vastly different NLP methods, evaluation and reporting. This review reveals numerous opportunities for future NLP work in cardiology with more diverse patient samples, cardiac diseases, datasets, methods and applications.