scholarly journals An explainable CNN approach for medical codes prediction from clinical text

2021 ◽  
Vol 21 (S9) ◽  
Author(s):  
Shuyuan Hu ◽  
Fei Teng ◽  
Lufei Huang ◽  
Jun Yan ◽  
Haibo Zhang

Abstract Background Clinical notes are unstructured text documents generated by clinicians during patient encounters, generally are annotated with International Classification of Diseases (ICD) codes, which give formatted information about the diagnosis and treatment. ICD code has shown its potentials in many fields, but manual coding is labor-intensive and error-prone, lead to researches of automatic coding. Two specific challenges of this task are (1) given an annotated clinical notes, the reasons behind specific diagnoses and treatments are  implicit; (2) explainability is important for practical automatic coding method, the method should not only explain its prediction output but also have explainable internal mechanics. This study aims to develop an explainable CNN approach to address these two challenges. Method Our key idea is that for the automatic ICD coding task, the presence of informative snippets in the clinical text that correlated with each code plays an important role in the prediction of codes, and an informative snippet can be considered as a local and low-level feature. We infer that there exists a correspondence between a convolution filter and a local and low-level feature. Base on the inference, we come up with the Shallow and Wide Attention convolutional Mechanism (SWAM) to improve the CNN-based models’ ability to learn local and low-level features for each label. Results We evaluate our approach on MIMIC-III, an open-access dataset of ICU medical records. Our approach substantially outperforms previous results on top-50 medical code prediction on MIMIC-III dataset, the precision of the worst-performing 10% labels in previous works is increased from 0% to 53% on average. We attribute this improvement to SWAM, by which the wide architecture with attention mechanism gives the model ability to more extensively learn the unique features of different codes, and we prove it by an ablation experiment. Besides, we perform manual analysis of the performance imbalance between different codes, and preliminary conclude the characteristics that determine the difficulty of learning specific codes. Conclusions Our main contributions can be summarized into the following three: (1) We present local and low-level features, a.k.a. informative snippets play an important role in the automatic ICD coding task, and the informative snippets extracted from the clinical text provide explanations for each code. (2) We propose that there exists a correspondence between a convolution filter and a local and low-level feature. A combination of wide and shallow convolutional layer and attention layer can help the CNN-based models better learn local and low-level features. (3) We improved the precision of the worst-performing 10% labels from 0 to 53% on average.

Author(s):  
Nuria Garcia-Santa ◽  
Beatriz San Miguel ◽  
Takanori Ugai

The field of medical coding enables to assign codes of medical classifications such as the international classification of diseases (ICD) to clinical notes, which are medical reports about patients' conditions written by healthcare professionals in natural language. These texts potentially include medical terms that define diagnosis, symptoms, drugs, treatments, etc., and the use of spontaneous language is challenging for automatic processing. Medical coding is usually performed manually by human medical coders becoming time-consuming and prone to errors. This research aims at developing new approaches that combine deep learning elements together with traditional technologies. A semantic-based proposal supported by a proprietary knowledge graph (KG), neural network implementations, and an ensemble model to resolve the medical coding are presented. A comparative discussion between the proposals where the advantages and disadvantages of each one is analysed. To evaluate approaches, two main corpus have been used: MIMIC-III and private de-identified clinical notes.


2021 ◽  
Vol 21 (S2) ◽  
Author(s):  
Weidong Bao ◽  
Hongfei Lin ◽  
Yijia Zhang ◽  
Jian Wang ◽  
Shaowu Zhang

Abstract Background Clinical notes record the health status, clinical manifestations and other detailed information of each patient. The International Classification of Diseases (ICD) codes are important labels for electronic health records. Automatic medical codes assignment to clinical notes through the deep learning model can not only improve work efficiency and accelerate the development of medical informatization but also facilitate the resolution of many issues related to medical insurance. Recently, neural network-based methods have been proposed for the automatic medical code assignment. However, in the medical field, clinical notes are usually long documents and contain many complex sentences, most of the current methods cannot effective in learning the representation of potential features from document text. Methods In this paper, we propose a hybrid capsule network model. Specifically, we use bi-directional LSTM (Bi-LSTM) with forwarding and backward directions to merge the information from both sides of the sequence. The label embedding framework embeds the text and labels together to leverage the label information. We then use a dynamic routing algorithm in the capsule network to extract valuable features for medical code prediction task. Results We applied our model to the task of automatic medical codes assignment to clinical notes and conducted a series of experiments based on MIMIC-III data. The experimental results show that our method achieves a micro F1-score of 67.5% on MIMIC-III dataset, which outperforms the other state-of-the-art methods. Conclusions The proposed model employed the dynamic routing algorithm and label embedding framework can effectively capture the important features across sentences. Both Capsule networks and domain knowledge are helpful for medical code prediction task.


2020 ◽  
Vol 83 (6) ◽  
pp. AB220
Author(s):  
Sara J. Li ◽  
Jonathan Lavian ◽  
Eunice Lee ◽  
Lindsey Bordone ◽  
Fernanda Caroline da Graca Polubriaginof ◽  
...  

2021 ◽  
Vol 11 (21) ◽  
pp. 10046
Author(s):  
Anandakumar Singaravelan ◽  
Chung-Ho Hsieh ◽  
Yi-Kai Liao ◽  
Jia-Lien Hsu

The International Classification of Diseases (ICD) is a globally recognized medical classification system that aids in the identification of diseases and the regulation of health trends. The ICD framework makes it easy to keep track of records and evaluate medical data for evidence-based decision-making. Several methods have predicted ICD-9 codes based on the discharge summary, clinical notes, and nursing notes. In our study, our approach only utilizes the subjective component to predict ICD-9 codes. Data cleaning and segmentation, and Natural Language Processing (NLP) techniques are applied on the subjective component during the pre-processing. Our study builds the Long Short-Term Memory (LSTM) and the Gated Recurrent Unit (GRU) to develop a model for predicting ICD-9 codes. The ICD-9 codes contain different ICD levels such as chapter, block, three-digit code, and full code. The GRU model scores the highest recall of 57.91% in the chapter level and the top-10 experiment has a recall of 67.37%. Based on the subjective component, the model can help patients in the form of a remote assistance tool.


2020 ◽  
Vol 34 (05) ◽  
pp. 8180-8187 ◽  
Author(s):  
Fei Li ◽  
Hong Yu

Automated ICD coding, which assigns the International Classification of Disease codes to patient visits, has attracted much research attention since it can save time and labor for billing. The previous state-of-the-art model utilized one convolutional layer to build document representations for predicting ICD codes. However, the lengths and grammar of text fragments, which are closely related to ICD coding, vary a lot in different documents. Therefore, a flat and fixed-length convolutional architecture may not be capable of learning good document representations. In this paper, we proposed a Multi-Filter Residual Convolutional Neural Network (MultiResCNN) for ICD coding. The innovations of our model are two-folds: it utilizes a multi-filter convolutional layer to capture various text patterns with different lengths and a residual convolutional layer to enlarge the receptive field. We evaluated the effectiveness of our model on the widely-used MIMIC dataset. On the full code set of MIMIC-III, our model outperformed the state-of-the-art model in 4 out of 6 evaluation metrics. On the top-50 code set of MIMIC-III and the full code set of MIMIC-II, our model outperformed all the existing and state-of-the-art models in all evaluation metrics. The code is available at https://github.com/foxlf823/Multi-Filter-Residual-Convolutional-Neural-Network.


2012 ◽  
Vol 51 (06) ◽  
pp. 519-528 ◽  
Author(s):  
S. Nitsuwat ◽  
W. Paoin

SummaryObjectives: The International Classification of Diseases and Related Health Problems, 10th Revision, Thai Modification (ICD-10-TM) ontology is a knowledge base created from the Thai modification of the World Health Organization International Classification of Diseases and Related Health Problems, 10th Revision. The objectives of this research were to develop the ICD-10-TM ontology as a knowledge base for use in a semi-automated ICD coding system and to test the usability of this system.Methods: ICD concepts and relations were identified from a tabular list and alphabetical indexes. An ICD-10-TM ontology was defined in the resource description framework (RDF), notation-3 (N3) format. All ICD-10-TM contents available as Microsoft Word documents were transformed into N3 format using Python scripts. Final RDF files were validated by ICD experts. The ontology was implemented as a knowledge base by using a novel semi-automated ICD coding system. Evaluation of usability was performed by a survey of forty volunteer users.Results: The ICD-10-TM ontology consists of two main knowledge bases (a tabular list knowledge base and an index knowledge base) containing a total of 309,985 concepts and 162,092 relations. The tabular list knowledge base can be divided into an upper level ontology, which defines hierarchical relationships between 22 ICD chapters, and a lower level ontology which defines relations between chapters, blocks, categories, rubrics and basic elements (include, exclude, synonym etc.)of the ICD tabular list. The index knowledge base describes relations between keywords, modifiers in general format and a table format of the ICD index. In this research, the creation of an ICD index ontology revealed interesting findings on problems with the current ICD index structure. One problem with the current structure is that it defines conditions that complicate pregnancy and perinatal conditions on the same hierarchical level as organ system diseases. This could mislead a coding algorithm into a wrong selection of ICD code. To prevent these coding errors by an algorithm, the ICD-10-TM index structure was modified by raising conditions complicating pregnancy and perinatal conditions into a higher hierarchical level of the index knowledge base. The modified ICD-10-TM ontology was implemented as a knowledge base in semi-automated ICD-10-TM coding software. A survey of users of the software revealed a high percentage of correct results obtained from ontology searches (> 95%) and user satisfaction on the usability of the ontology.Conclusion: The ICD-10-TM ontology is the first ICD-10 ontology with a comprehensive description of all concepts and relations in an ICD-10-TM tabular list and alphabetical index. A researcher developing an automated ICD coding system should be aware of The ICD index structure and the complexity of coding processes. These coding systems are not a word matching process. ICD-10 ontology should be used as a knowledge base in The ICD coding software. It can be used to facilitate successful implementation of ICD in developing countries, especially in those countries which do not have an adequate number of competent ICD coders.


2013 ◽  
Vol 22 (4) ◽  
pp. 297-301 ◽  
Author(s):  
D. Kingdon ◽  
L. Taylor ◽  
K. Ma ◽  
Y. Kinoshita

Names matter! Schizophrenia has negative associations which impede individual recovery and induce societal and self-stigmatization. Alternatives have been proposed and are worthy of debate; changes made in Japan have generally been considered successful. The group of ‘schizophrenia and other psychoses’ could be further differentiated based on the major social factors identified, i.e. drug misuse and the effects of severe childhood trauma. The use of appropriate International Classification of Diseases (ICD) coding and definitions could usefully differentiate these groups – the former is a drug-induced psychosis and the latter frequently presents as comorbid schizophrenia and borderline personality disorder (often attracting a diagnosis of schizoaffective disorder). The current established differentiation between early onset (‘stress-sensitive’ – ‘Kraepelinian’ schizophrenia) and later onset (DSM5 delusional disorder, i.e. with ‘non-bizarreness’ criterion removed) psychosis may also be worthy of further investigation to establish validity and reliability. Psychosocially descriptive terms have been found to be more acceptable to patients and perceived as less stigmatizing by others. Subgroups of psychosis with greater homogeneity would benefit research, clinical and therapeutic practice and public understanding, attitudes and behaviour.


Author(s):  
Lucia Otero Varela ◽  
Catherine Eastwood ◽  
Pallavi Mathur ◽  
Hude Quan

IntroductionThe International Classification of Diseases (ICD) is globally used for coding morbidity statistics, however, its use, as well as the training provided to individuals assigning codes, varies greatly across countries. Objectives and ApproachThe goal is to understand the quality of coder training worldwide. After an in-depth grey and academic literature review, an online survey was created to poll the 194 World Health Organization (WHO) member countries. Questions focused on hospital data collection systems and the training provided to the coding professionals. The survey was distributed to potential participants that meet the specific criteria, as well as to organizations specialized in the topic, such as WHO-CC (WHO Collaborating Centers) and IFHIMA (International Federation of Health Information Management Association), to be forwarded to their representatives. Answers will be analyzed using descriptive statistics. ResultsThis ongoing project aims to capture responses from as many countries as possible, and thus far, data from 45 respondents from 20 different countries has been collected. Initial results reveal worldwide use of ICD, with variations in the maximum allowable coding fields for diagnoses and interventions. Coding specialists are the main personnel assigning codes, followed by physicians, and although minimum training is not mandatory in all countries (Sweden, Italy, Germany and Thailand), in those where it is, college/university degree is the most common requirement. Coding certificates most frequently entail passing a certification exam. Continuing education for coders is offered in all countries except one (Nigeria). Once more information is available, countries will be ranked and those depicting a better performance will be highlighted. Conclusion/ImplicationsThese survey data will establish the current state of ICD use and coding training internationally, which will ultimately be valuable to the WHO for the promotion of ICD and the rollout of ICD-11, for better international comparisons of health data, and for further research on how to improve ICD coding.


2017 ◽  
Vol 56 (05) ◽  
pp. 401-406 ◽  
Author(s):  
Jesper Lagergren ◽  
Nele Brusselaers

SummaryBackground: Comorbidities may have an important impact on survival, and comorbidity scores are often implemented in studies assessing prognosis. The Charlson Comorbidity index is most widely used, yet several adaptations have been published, all using slightly different conversions of the International Classification of Diseases (ICD) coding.Objective: To evaluate which coding should be used to assess and quantify comorbidity for the Charlson Comorbidity Index for registry-based research, in particular if older ICD versions will be used.Methods: A systematic literature search was used to identify adaptations and modifications of the ICD-coding of the Charlson Comorbidity Index for general purpose in adults, published in English. Back-translation to ICD version 8 and version 9 was conducted by means of the ICD-code converter of Statistics Sweden.Results: In total, 16 studies were identified reporting ICD-adaptations of the Charlson Comorbidity Index. The Royal College of Surgeons in the United Kingdom combined 5 versions into an adapted and updated version which appeared appropriate for research purposes. Their ICD-10 codes were back-translated into ICD-9 and ICD-8 according to their proposed adaptations, and verified with previous versions of the Charlson Comorbidity Index.Conclusion: Many versions of the Charlson Comorbidity Index are used in parallel, so clear reporting of the version, exact ICD- coding and weighting is necessary to obtain transparency and reproducibility in research. Yet, the version of the Royal College of Surgeons is up-to-date and easy-to-use, and therefore an acceptable co-morbidity score to be used in registry-based research especially for surgical patients.


Sign in / Sign up

Export Citation Format

Share Document