JLAN: medical code prediction via joint learning attention networks and denoising mechanism

Abstract Background Clinical notes are documents that contain detailed information about the health status of patients. Medical codes generally accompany them. However, the manual diagnosis is costly and error-prone. Moreover, large datasets in clinical diagnosis are susceptible to noise labels because of erroneous manual annotation. Therefore, machine learning has been utilized to perform automatic diagnoses. Previous state-of-the-art (SOTA) models used convolutional neural networks to build document representations for predicting medical codes. However, the clinical notes are usually long-tailed. Moreover, most models fail to deal with the noise during code allocation. Therefore, denoising mechanism and long-tailed classification are the keys to automated coding at scale. Results In this paper, a new joint learning model is proposed to extend our attention model for predicting medical codes from clinical notes. On the MIMIC-III-50 dataset, our model outperforms all the baselines and SOTA models in all quantitative metrics. On the MIMIC-III-full dataset, our model outperforms in the macro-F1, micro-F1, macro-AUC, and precision at eight compared to the most advanced models. In addition, after introducing the denoising mechanism, the convergence speed of the model becomes faster, and the loss of the model is reduced overall. Conclusions The innovations of our model are threefold: firstly, the code-specific representation can be identified by adopted the self-attention mechanism and the label attention mechanism. Secondly, the performance of the long-tailed distributions can be boosted by introducing the joint learning mechanism. Thirdly, the denoising mechanism is suitable for reducing the noise effects in medical code prediction. Finally, we evaluate the effectiveness of our model on the widely-used MIMIC-III datasets and achieve new SOTA results.

Download Full-text

Explainable automated coding of clinical notes using hierarchical label-wise attention networks and label embedding initialisation

Journal of Biomedical Informatics ◽

10.1016/j.jbi.2021.103728 ◽

2021 ◽

Vol 116 ◽

pp. 103728

Author(s):

Hang Dong ◽

Víctor Suárez-Paniagua ◽

William Whiteley ◽

Honghan Wu

Keyword(s):

Attention Networks ◽

Clinical Notes ◽

Automated Coding

Download Full-text

SANTM: Efficient Self-attention-driven Network for Text Matching

ACM Transactions on Internet Technology ◽

10.1145/3426971 ◽

2022 ◽

Vol 22 (3) ◽

pp. 1-21

Author(s):

Prayag Tiwari ◽

Amit Kumar Jaiswal ◽

Sahil Garg ◽

Ilsun You

Keyword(s):

Natural Language ◽

State Of The Art ◽

The State ◽

Attention Mechanism ◽

Matching Problems ◽

Attention Model ◽

Extra Information ◽

Textual Entailment ◽

Benchmark Datasets ◽

Text Matching

Self-attention mechanisms have recently been embraced for a broad range of text-matching applications. Self-attention model takes only one sentence as an input with no extra information, i.e., one can utilize the final hidden state or pooling. However, text-matching problems can be interpreted either in symmetrical or asymmetrical scopes. For instance, paraphrase detection is an asymmetrical task, while textual entailment classification and question-answer matching are considered asymmetrical tasks. In this article, we leverage attractive properties of self-attention mechanism and proposes an attention-based network that incorporates three key components for inter-sequence attention: global pointwise features, preceding attentive features, and contextual features while updating the rest of the components. Our model follows evaluation on two benchmark datasets cover tasks of textual entailment and question-answer matching. The proposed efficient Self-attention-driven Network for Text Matching outperforms the state of the art on the Stanford Natural Language Inference and WikiQA datasets with much fewer parameters.

Download Full-text

Multi-Turn Chatbot Based on Query-Context Attentions and Dual Wasserstein Generative Adversarial Networks

Applied Sciences ◽

10.3390/app9183908 ◽

2019 ◽

Vol 9 (18) ◽

pp. 3908 ◽

Cited By ~ 3

Author(s):

Jintae Kim ◽

Shinhyeok Oh ◽

Oh-Woog Kwon ◽

Harksoo Kim

Keyword(s):

Performance Measures ◽

State Of The Art ◽

Attention Mechanism ◽

Generative Adversarial Networks ◽

Training Method ◽

Adversarial Networks ◽

Proposed Model ◽

Previous State ◽

Vector Representations

To generate proper responses to user queries, multi-turn chatbot models should selectively consider dialogue histories. However, previous chatbot models have simply concatenated or averaged vector representations of all previous utterances without considering contextual importance. To mitigate this problem, we propose a multi-turn chatbot model in which previous utterances participate in response generation using different weights. The proposed model calculates the contextual importance of previous utterances by using an attention mechanism. In addition, we propose a training method that uses two types of Wasserstein generative adversarial networks to improve the quality of responses. In experiments with the DailyDialog dataset, the proposed model outperformed the previous state-of-the-art models based on various performance measures.

Download Full-text

Context-Aware Self-Attention Networks

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.3301387 ◽

2019 ◽

Vol 33 ◽

pp. 387-394 ◽

Cited By ~ 4

Author(s):

Baosong Yang ◽

Jian Li ◽

Derek F. Wong ◽

Lidia S. Chao ◽

Xing Wang ◽

...

Keyword(s):

English Translation ◽

Contextual Information ◽

The Self ◽

Context Aware ◽

Short Term ◽

Internal Representations ◽

Attention Networks ◽

External Resources ◽

Attention Model ◽

Neural Representations

Self-attention model has shown its flexibility in parallel computation and the effectiveness on modeling both long- and short-term dependencies. However, it calculates the dependencies between representations without considering the contextual information, which has proven useful for modeling dependencies among neural representations in various natural language tasks. In this work, we focus on improving self-attention networks through capturing the richness of context. To maintain the simplicity and flexibility of the self-attention networks, we propose to contextualize the transformations of the query and key layers, which are used to calculate the relevance between elements. Specifically, we leverage the internal representations that embed both global and deep contexts, thus avoid relying on external resources. Experimental results on WMT14 English⇒German and WMT17 Chinese⇒English translation tasks demonstrate the effectiveness and universality of the proposed methods. Furthermore, we conducted extensive analyses to quantify how the context vectors participate in the self-attention model.

Download Full-text

2166

Journal of Clinical and Translational Science ◽

10.1017/cts.2017.59 ◽

2017 ◽

Vol 1 (S1) ◽

pp. 12-12

Author(s):

Jianyin Shao ◽

Ram Gouripeddi ◽

Julio C. Facelli

Keyword(s):

Clinical Trial ◽

Clinical Trials ◽

Power Distribution ◽

Large Scale ◽

Search Space ◽

Free Text ◽

Eligibility Criteria ◽

Clinical Notes ◽

Semantic Concepts ◽

Mimic Iii

OBJECTIVES/SPECIFIC AIMS: This poster presents a detailed characterization of the distribution of semantic concepts used in the text describing eligibility criteria of clinical trials reported to ClincalTrials.gov and patient notes from MIMIC-III. The final goal of this study is to find a minimal set of semantic concepts that can describe clinical trials and patients for efficient computational matching of clinical trial descriptions to potential participants at large scale. METHODS/STUDY POPULATION: We downloaded the free text describing the eligibility criteria of all clinical trials reported to ClinicalTrials.gov as of July 28, 2015, ~195,000 trials and ~2,000,000 clinical notes from MIMIC-III. Using MetaMap 2014 we extracted UMLS concepts (CUIs) from the collected text. We calculated the frequency of presence of the semantic concepts in the texts describing the clinical trials eligibility criteria and patient notes. RESULTS/ANTICIPATED RESULTS: The results show a classical power distribution, Y=210X(−2.043), R2=0.9599, for clinical trial eligibility criteria and Y=513X(−2.684), R2=0.9477 for MIMIC patient notes, where Y represents the number of documents in which a concept appears and X is the cardinal order the concept ordered from more to less frequent. From this distribution, it is possible to realize that from the over, 100,000 concepts in UMLS, there are only ~60,000 and 50,000 concepts that appear in less than 10 clinical trial eligibility descriptions and MIMIC-III patient clinical notes, respectively. This indicates that it would be possible to describe clinical trials and patient notes with a relatively small number of concepts, making the search space for matching patients to clinical trials a relatively small sub-space of the overall UMLS search space. DISCUSSION/SIGNIFICANCE OF IMPACT: Our results showing that the concepts used to describe clinical trial eligibility criteria and patient clinical notes follow a power distribution can lead to tractable computational approaches to automatically match patients to clinical trials at large scale by considerably reducing the search space. While automatic patient matching is not the panacea for improving clinical trial recruitment, better low cost computational preselection processes can allow the limited human resources assigned to patient recruitment to be redirected to the most promising targets for recruitment.

Download Full-text

MRE: A Military Relation Extraction Model Based on BiGRU and Multi-Head Attention

Symmetry ◽

10.3390/sym13091742 ◽

2021 ◽

Vol 13 (9) ◽

pp. 1742

Author(s):

Yiwei Lu ◽

Ruopeng Yang ◽

Xuping Jiang ◽

Dan Zhou ◽

Changshen Yin ◽

...

Keyword(s):

Comprehensive Evaluation ◽

Language Model ◽

Relation Extraction ◽

Attention Mechanism ◽

Semantic Features ◽

Chinese Word Segmentation ◽

Conceptual Foundation ◽

Attention Model ◽

Operational Information ◽

Military Relations

A great deal of operational information exists in the form of text. Therefore, extracting operational information from unstructured military text is of great significance for assisting command decision making and operations. Military relation extraction is one of the main tasks of military information extraction, which aims at identifying the relation between two named entities from unstructured military texts. However, the traditional methods of extracting military relations cannot easily resolve problems such as inadequate manual features and inaccurate Chinese word segmentation in military fields, failing to make full use of symmetrical entity relations in military texts. With our approach, based on the pre-trained language model, we present a Chinese military relation extraction method, which combines the bi-directional gate recurrent unit (BiGRU) and multi-head attention mechanism (MHATT). More specifically, the conceptual foundation of our method lies in constructing an embedding layer and combining word embedding with position embedding, based on the pre-trained language model; the output vectors of BiGRU neural networks are symmetrically spliced to learn the semantic features of context, and they fuse the multi-head attention mechanism to improve the ability of expressing semantic information. On the military text corpus that we have built, we conduct extensive experiments. We demonstrate the superiority of our method over the traditional non-attention model, attention model, and improved attention model, and the comprehensive evaluation value F1-score of the model is improved by about 4%.

Download Full-text

Word embeddings trained on published case reports are lightweight, effective for clinical tasks, and free of protected health information

10.1101/19013268 ◽

2019 ◽

Author(s):

Zachary N. Flamholz ◽

Lyle H. Ungar ◽

Gary E. Weissman

Keyword(s):

Health Information ◽

Case Reports ◽

Clinical Information ◽

Mortality Prediction ◽

Word Embeddings ◽

Protected Health Information ◽

Clinical Notes ◽

Consistent Performance ◽

Linguistic Regularity ◽

Mimic Iii

AbstractRationaleWord embeddings are used to create vector representations of text data but not all embeddings appropriately capture clinical information, are free of protected health information, and are computationally accessible to most researchers.MethodsWe trained word embeddings on published case reports because their language mimics that of clinical notes, the manuscripts are already de-identified by virtue of being published, and the corpus is much smaller than those trained on large, publicly available datasets. We tested the performance of these embeddings across five clinically relevant tasks and compared the results to embeddings trained on a large Wikipedia corpus, all publicly available manuscripts, notes from the MIMIC-III database using fastText, GloVe, and word2vec, and using different dimensions. Tasks included clinical applications of lexicographic coverage, semantic similarity, clustering purity, linguistic regularity, and mortality prediction.ResultsThe embeddings trained using the published case reports performed as well as if not better on most tasks than those using other corpora. The embeddings trained using all published manuscripts had the most consistent performance across all tasks and required a corpus with 100 times as many tokens as the corpus comprised of only case reports. Embeddings trained on the MIMIC-III dataset had small but marginally better scores on the clustering tasks which was also based on clinical notes from the MIMIC-III dataset. Embeddings trained on the Wikipedia corpus, although containing almost twice as many tokens as all available published manuscripts, performed poorly compared to those trained on medical and clinical corpora.ConclusionWord embeddings trained on freely available published case reports performed well for most clinical task, are free of protected health information, and are small compared to commonly used embeddings trained on larger clinical and non-clinical corpora. The optimal corpus, dimension size, and which embedding model to use for a given task involves tradeoffs in privacy, reproducibility, performance, and computational resources.

Download Full-text

A Self-Attention Model for Inferring Cooperativity between Regulatory Features

10.1101/2020.01.31.927996 ◽

2020 ◽

Author(s):

Fahad Ullah ◽

Asa Ben-Hur

Keyword(s):

Gene Expression ◽

Simulated Data ◽

Relevant Information ◽

Regulatory Elements ◽

Dnase I ◽

Attention Mechanism ◽

Feature Interaction ◽

Biologically Relevant ◽

Attention Model ◽

Biological Phenomena

AbstractMotivationDeep learning has demonstrated its predictive power in modeling complex biological phenomena such as gene expression. The value of these models hinges not only on their accuracy, but also on the ability to extract biologically relevant information from the trained models. While there has been much recent work on developing feature attribution methods that discover the most important features for a given sequence, inferring cooperativity between regulatory elements, which is the hallmark of phenomena such as gene expression, remains an open problemResultsWe present SATORI, a Self-ATtentiOn based model to predict Regulatory element Interactions. Our approach combines convolutional and recurrent layers with a self-attention mechanism that helps us capture a global view of the landscape of interactions between regulatory elements in a sequence. We evaluate our method on simulated data and three complex datasets: human TAL1-GATA1 transcription factor ChIP-Seq, DNase I Hypersensitive Sites (DHSs) in human promoters across 164 cell lines, and genome-wide DNase I-Seq and ATAC-Seq peaks across 36 arabidopsis samples. In each of the three experiments SATORI identified numerous statistically significant TF-TF interactions, many of which have been previously reported. Our method is able to detect higher numbers of these experimentally verified TF-TF interactions than the existing Feature Interaction Score, and also has the advantage of not requiring a computationally expensive post-processing step. Finally, SATORI can be used for detection of any type of feature interaction in models that use a similar attention mechanism, and is not limited to the detection of TF-TF interactionsAvailabilityThe source code for SATORI is available at https://github.com/fahadahaf/[email protected]

Download Full-text

Multi-Modality Global Fusion Attention Network for Visual Question Answering

Electronics ◽

10.3390/electronics9111882 ◽

2020 ◽

Vol 9 (11) ◽

pp. 1882

Author(s):

Cheng Yang ◽

Weijia Wu ◽

Yuxing Wang ◽

Hong Zhou

Keyword(s):

Correct Answer ◽

Question Answering ◽

Human Beings ◽

Single Model ◽

Attention Network ◽

Attention Model ◽

Global Perspectives ◽

Visual Question Answering ◽

Previous State ◽

High Level

Visual question answering (VQA) requires a high-level understanding of both questions and images, along with visual reasoning to predict the correct answer. Therefore, it is important to design an effective attention model to associate key regions in an image with key words in a question. Up to now, most attention-based approaches only model the relationships between individual regions in an image and words in a question. It is not enough to predict the correct answer for VQA, as human beings always think in terms of global information, not only local information. In this paper, we propose a novel multi-modality global fusion attention network (MGFAN) consisting of stacked global fusion attention (GFA) blocks, which can capture information from global perspectives. Our proposed method computes co-attention and self-attention at the same time, rather than computing them individually. We validate our proposed method on the two most commonly used benchmarks, the VQA-v2 datasets. Experimental results show that the proposed method outperforms the previous state-of-the-art. Our best single model achieves 70.67% accuracy on the test-dev set of VQA-v2.

Download Full-text

Converging Semantic Knowledge and Deep Learning for Medical Coding

International Journal of Privacy and Health Information Management ◽

10.4018/ijphim.2019070103 ◽

2019 ◽

Vol 7 (2) ◽

pp. 33-52

Author(s):

Nuria Garcia-Santa ◽

Beatriz San Miguel ◽

Takanori Ugai

Keyword(s):

Deep Learning ◽

International Classification Of Diseases ◽

Semantic Knowledge ◽

Clinical Notes ◽

Medical Coding ◽

Advantages And Disadvantages ◽

Classification Of Diseases ◽

Medical Reports ◽

Mimic Iii

The field of medical coding enables to assign codes of medical classifications such as the international classification of diseases (ICD) to clinical notes, which are medical reports about patients' conditions written by healthcare professionals in natural language. These texts potentially include medical terms that define diagnosis, symptoms, drugs, treatments, etc., and the use of spontaneous language is challenging for automatic processing. Medical coding is usually performed manually by human medical coders becoming time-consuming and prone to errors. This research aims at developing new approaches that combine deep learning elements together with traditional technologies. A semantic-based proposal supported by a proprietary knowledge graph (KG), neural network implementations, and an ensemble model to resolve the medical coding are presented. A comparative discussion between the proposals where the advantages and disadvantages of each one is analysed. To evaluate approaches, two main corpus have been used: MIMIC-III and private de-identified clinical notes.

Download Full-text