scholarly journals A Deep Learning Framework for Automated ICD-10 Coding

Author(s):  
Abdelahad Chraibi ◽  
David Delerue ◽  
Julien Taillard ◽  
Ismat Chaib Draa ◽  
Régis Beuscart ◽  
...  

The International Statistical Classification of Diseases and Related Health Problems (ICD) is one of the widely used classification system for diagnoses and procedures to assign diagnosis codes to Electronic Health Record (EHR) associated with a patient’s stay. The aim of this paper is to propose an automated coding system to assist physicians in the assignment of ICD codes to EHR. For this purpose, we created a pipeline of Natural Language Processing (NLP) and Deep Learning (DL) models able to extract the useful information from French medical texts and to perform classification. After the evaluation phase, our approach was able to predict 346 diagnosis codes from heterogeneous medical units with an accuracy average of 83%. Our results were finally validated by physicians of the Medical Information Department (MID) in charge of coding hospital stays.

2014 ◽  
Vol 22 (1) ◽  
pp. 19-28 ◽  
Author(s):  
Andrew D Boyd ◽  
Young Min Yang ◽  
Jianrong Li ◽  
Colleen Kenost ◽  
Mike D Burton ◽  
...  

Abstract Reporting of hospital adverse events relies on Patient Safety Indicators (PSIs) using International Classification of Diseases, Ninth Edition, Clinical Modification (ICD-9-CM) codes. The US transition to ICD-10-CM in 2015 could result in erroneous comparisons of PSIs. Using the General Equivalent Mappings (GEMs), we compared the accuracy of ICD-9-CM coded PSIs against recommended ICD-10-CM codes from the Centers for Medicaid/Medicare Services (CMS). We further predict their impact in a cohort of 38 644 patients (1 446 581 visits and 399 hospitals). We compared the predicted results to the published PSI related ICD-10-CM diagnosis codes. We provide the first report of substantial hospital safety reporting errors with five direct comparisons from the 23 types of PSIs (transfusion and anesthesia related PSIs). One PSI was excluded from the comparison between code sets due to reorganization, while 15 additional PSIs were inaccurate to a lesser degree due to the complexity of the coding translation. The ICD-10-CM translations proposed by CMS pose impending risks for (1) comparing safety incidents, (2) inflating the number of PSIs, and (3) increasing the variability of calculations attributable to the abundance of coding system translations. Ethical organizations addressing ‘data-, process-, and system-focused’ improvements could be penalized using the new ICD-10-CM Agency for Healthcare Research and Quality PSIs because of apparent increases in PSIs bearing the same PSI identifier and label, yet calculated differently. Here we investigate which PSIs would reliably transition between ICD-9-CM and ICD-10-CM, and those at risk of under-reporting and over-reporting adverse events while the frequency of these adverse events remain unchanged.


2021 ◽  
Vol 27 (Suppl 1) ◽  
pp. i62-i65
Author(s):  
Michael J Clery ◽  
Philip Joseph Hudson ◽  
Jasmine C Moore ◽  
Laura M Mercer Kollar ◽  
Daniel T Wu

Health systems capture injuries using International Statistical Classification of Diseases and Related Health Problems, 10th Revision, Clinical Modification (ICD-10-CM) diagnostic codes and share data with public health to inform injury surveillance. This study analyses provider-assigned ICD-10-CM injury codes among self-reported injuries to determine the effectiveness of ICD-10-CM coding in capturing injury and assault.MethodsSelf-reported injury screen records from an urban, level 1 trauma centre collected between 20 November 2015 and 30 September 2019 were compared with corresponding provider-assigned ICD-10-CM codes discerning the frequency in which intentions are indicated among patients reporting (1) any injury and (2) assault.ResultsOf 380 922 patients screened, 32 788 (8.61%) reported any injury and 6763 (1.78%) reported assault. ICD-10-CM codes had a sensitivity of 67.40% (95% CI 66.89% to 67.91%) for any injury and specificity of 89.79% (95% CI 89.69% to 89.89%]). For assault, ICD-10-CM codes had sensitivity of 2.25% (95% CI 1.91% to 2.63%) and specificity of 99.97% (95% CI 99.97% 99.98%).DiscussionThis study found provider-assigned ICD-10-CM had limited sensitivity to identify injury and low sensitivity for assault. This study more fully characterises ICD-10-CM coding system effectiveness in identifying assaults.


2020 ◽  
Author(s):  
Lingling Zhou ◽  
Cheng Cheng ◽  
Dong Ou ◽  
Hao Huang

Abstract Background The International Classification of Diseases, 10th Revision (ICD-10) has been widely used to describe the diagnosis information of patients. Automatic ICD-10 coding is important because manually assigning codes is expensive, time consuming and error prone. Although numerous approaches have been developed to explore automatic coding, few of them have been applied in practice. Our aim is to construct a practical, automatic ICD-10 coding machine to improve coding efficiency and quality in daily work. Methods In this study, we propose the use of regular expressions (regexps) to establish a correspondence between diagnosis codes and diagnosis descriptions in outpatient settings and at admission and discharge. The description models of the regexps were embedded in our upgraded coding system, which queries a diagnosis description and assigns a unique diagnosis code. Like most studies, the precision (P), recall (R), F-measure (F) and overall accuracy (A) were used to evaluate the system performance. Our study had two stages. The datasets were obtained from the diagnosis information on the homepage of the discharge medical record. The testing sets were from October 1, 2017 to April 30, 2018 and from July 1, 2018 to January 31, 2019. Results The values of P were 89.27% and 88.38% in the first testing phase and the second testing phase, respectively, which demonstrate high precision. The automatic ICD-10 coding system completed more than 160,000 codes in 16 months, which reduced the workload of the coders. In addition, a comparison between the amount of time needed for manual coding and automatic coding indicated the effectiveness of the system-the time needed for automatic coding takes nearly 100 times less than manual coding. Conclusions Our automatic coding system is well suited for the coding task. Further studies are warranted to perfect the description models of the regexps and to develop synthetic approaches to improve system performance.


2020 ◽  
Author(s):  
Lingling Zhou ◽  
Cheng Cheng ◽  
Dong Ou ◽  
Hao Huang

Abstract Background The International Classification of Diseases, 10th Revision (ICD-10) has been widely used to describe the diagnosis information of patients. Automatic ICD-10 coding is important because manually assigning codes is expensive, time consuming and error prone. Although numerous approaches have been developed to explore automatic coding, few of them have been applied in practice. Our aim is to construct a practical, automatic ICD-10 coding machine to improve coding efficiency and quality in daily work. Methods In this study, we propose the use of regular expressions (regexps) to establish a correspondence between diagnosis codes and diagnosis descriptions in outpatient settings and at admission and discharge. The description models of the regexps were embedded in our upgraded coding system, which queries a diagnosis description and assigns a unique diagnosis code. Like most studies, the precision (P), recall (R), F-measure (F) and overall accuracy (A) were used to evaluate the system performance. Our study had two stages. The datasets were obtained from the diagnosis information on the homepage of the discharge medical record. The testing sets were from October 1, 2017 to April 30, 2018 and from July 1, 2018 to January 31, 2019. Results The values of P were 89.27% and 88.38% in the first testing phase and the second testing phase, respectively, which demonstrate high precision. The automatic ICD-10 coding system completed more than 160,000 codes in 16 months, which reduced the workload of the coders. In addition, a comparison between the amount of time needed for manual coding and automatic coding indicated the effectiveness of the system-the time needed for automatic coding takes nearly 100 times less than manual coding. Conclusions Our automatic coding system is well suited for the coding task. Further studies are warranted to perfect the description models of the regexps and to develop synthetic approaches to improve system performance.


2018 ◽  
Vol 2018 ◽  
pp. 1-7 ◽  
Author(s):  
Ewa Bejer-Oleńska ◽  
Michael Thoene ◽  
Andrzej Włodarczyk ◽  
Joanna Wojtkiewicz

Aim. The aim of the study was to determine the most commonly diagnosed neoplasms in the MRI scanned patient population and indicate correlations based on the descriptive variables. Methods. The SPSS software was used to determine the incidence of neoplasms within the specific diagnoses based on the descriptive variables of the studied population. Over a five year period, 791 patients and 839 MRI scans were identified in neoplasm category (C00-D48 according to the International Statistical Classification of Diseases and Related Health Problems ICD-10). Results. More women (56%) than men (44%) represented C00-D48. Three categories of neoplasms were recorded. Furthermore, benign neoplasms were the most numerous, diagnosed mainly in patients in the fifth decade of life, and included benign neoplasms of the brain and other parts of the central nervous system. Conclusions. Males ≤ 30 years of age with neoplasms had three times higher MRI scans rate than females of the same age group; even though females had much higher scans rate in every other category. The young males are more often selected for these scans if a neoplasm is suspected. Finally, the number of MRI-diagnosed neoplasms showed a linear annual increase.


2018 ◽  
Vol 49 (1) ◽  
pp. 58-61 ◽  
Author(s):  
Joel D Handley ◽  
Hedley CA Emsley

Background: Intracranial venous thrombosis (ICVT) accounts for around 0.5% of all stroke cases. There have been no previously published studies of the International Classification of Diseases, Tenth Edition (ICD-10) validation for the identification of ICVT admissions in adults. Objective: The aims of this study were to validate and quantify the performance of the ICD-10 coding system for identifying cases of ICVT in adults and to derive an estimate of incidence. Method: Administrative data were collected for all patients admitted to a regional neurosciences centre over a 5-year period. We searched for the following ICD-10 codes at any position: G08.X (intracranial and intraspinal phlebitis and thrombophlebitis), I67.6 (non-pyogenic thrombosis of intracranial venous system), I63.6 (cerebral infarction due to cerebral venous thrombosis, non-pyogenic), O22.5 (cerebral venous thrombosis in pregnancy) and O87.3 (cerebral venous thrombosis in the puerperium). Results: Sixty-five admissions were identified by at least one of the relevant ICD-10 codes. The overall positive predictive value (PPV) for confirmed ICVT from all of the admissions combined was 92.3% (60 out of 65) with the results for each code as follows: G08.X 91.5% (54 of 59), O22.5 100% (4 of 4), I67.6 100% (1 of 1), I63.6 100% (1 of 1) and O87.3 100% (1 of 1). There were 40 unique cases of ICVT over a 5-year period giving an annual incidence of ICVT of 5 per million. Conclusions: All codes gave a high PPV. Implications for practice: As demonstrated in previous studies, the incidence of ICVT may be higher than previously thought.


2021 ◽  
Vol 2021 ◽  
pp. 1-7
Author(s):  
Juncai Li ◽  
Xiaofei Jiang

Molecular property prediction is an essential task in drug discovery. Most computational approaches with deep learning techniques either focus on designing novel molecular representation or combining with some advanced models together. However, researchers pay fewer attention to the potential benefits in massive unlabeled molecular data (e.g., ZINC). This task becomes increasingly challenging owing to the limitation of the scale of labeled data. Motivated by the recent advancements of pretrained models in natural language processing, the drug molecule can be naturally viewed as language to some extent. In this paper, we investigate how to develop the pretrained model BERT to extract useful molecular substructure information for molecular property prediction. We present a novel end-to-end deep learning framework, named Mol-BERT, that combines an effective molecular representation with pretrained BERT model tailored for molecular property prediction. Specifically, a large-scale prediction BERT model is pretrained to generate the embedding of molecular substructures, by using four million unlabeled drug SMILES (i.e., ZINC 15 and ChEMBL 27). Then, the pretrained BERT model can be fine-tuned on various molecular property prediction tasks. To examine the performance of our proposed Mol-BERT, we conduct several experiments on 4 widely used molecular datasets. In comparison to the traditional and state-of-the-art baselines, the results illustrate that our proposed Mol-BERT can outperform the current sequence-based methods and achieve at least 2% improvement on ROC-AUC score on Tox21, SIDER, and ClinTox dataset.


10.2196/23230 ◽  
2021 ◽  
Vol 9 (8) ◽  
pp. e23230
Author(s):  
Pei-Fu Chen ◽  
Ssu-Ming Wang ◽  
Wei-Chih Liao ◽  
Lu-Cheng Kuo ◽  
Kuan-Chih Chen ◽  
...  

Background The International Classification of Diseases (ICD) code is widely used as the reference in medical system and billing purposes. However, classifying diseases into ICD codes still mainly relies on humans reading a large amount of written material as the basis for coding. Coding is both laborious and time-consuming. Since the conversion of ICD-9 to ICD-10, the coding task became much more complicated, and deep learning– and natural language processing–related approaches have been studied to assist disease coders. Objective This paper aims at constructing a deep learning model for ICD-10 coding, where the model is meant to automatically determine the corresponding diagnosis and procedure codes based solely on free-text medical notes to improve accuracy and reduce human effort. Methods We used diagnosis records of the National Taiwan University Hospital as resources and apply natural language processing techniques, including global vectors, word to vectors, embeddings from language models, bidirectional encoder representations from transformers, and single head attention recurrent neural network, on the deep neural network architecture to implement ICD-10 auto-coding. Besides, we introduced the attention mechanism into the classification model to extract the keywords from diagnoses and visualize the coding reference for training freshmen in ICD-10. Sixty discharge notes were randomly selected to examine the change in the F1-score and the coding time by coders before and after using our model. Results In experiments on the medical data set of National Taiwan University Hospital, our prediction results revealed F1-scores of 0.715 and 0.618 for the ICD-10 Clinical Modification code and Procedure Coding System code, respectively, with a bidirectional encoder representations from transformers embedding approach in the Gated Recurrent Unit classification model. The well-trained models were applied on the ICD-10 web service for coding and training to ICD-10 users. With this service, coders can code with the F1-score significantly increased from a median of 0.832 to 0.922 (P<.05), but not in a reduced interval. Conclusions The proposed model significantly improved the F1-score but did not decrease the time consumed in coding by disease coders.


2017 ◽  
Vol 12 ◽  
pp. 91
Author(s):  
Iwona Niewiadomska ◽  
Agnieszka Palacz-Chrisidis

Autorki poruszają kwestię zmian w kryteriach diagnostycznych dotyczących zaburzeń związanych z hazardem oraz uzależnień chemicznych i czynnościowych w literaturze przedmiotu. Prezentują też krótki przegląd kolejnych edycji podręczników międzynarodowych klasyfikacji, zarówno Diagnostics and Statistical Manual of Mental Disorders – DSM, jak i The International Statistical Classification of Diseases and Related Health Problems – ICD. W artykule przedstawiona jest również dyskusja badaczy na temat umiejscowienia zaburzeń związanych z hazardem w klasyfikacjach diagnostycznych. DSM-V umiejscawia zaburzenie hazardowe w kategorii „zaburzenia używania substancji i nałogów” (ang. Substance-Related and Addictive Disorders, DSM-V), w podkategorii „zaburzenia niezwiązane z substancjami” (ang. Non-Substace Related Disorders, DSM-V). Natomiast według nadal obowiązującego ICD-10, zaburzenie hazardowe pozostaje w obszarze zaburzeń kontroli i impulsów, pod nazwą „hazard patologiczny”.


Antibiotics ◽  
2020 ◽  
Vol 9 (9) ◽  
pp. 536
Author(s):  
George Germanos ◽  
Patrick Light ◽  
Roger Zoorob ◽  
Jason Salemi ◽  
Fareed Khan ◽  
...  

Objective: To validate the use of electronic algorithms based on International Classification of Diseases (ICD)-10 codes to identify outpatient visits for urinary tract infections (UTI), one of the most common reasons for antibiotic prescriptions. Methods: ICD-10 symptom codes (e.g., dysuria) alone or in addition to UTI diagnosis codes plus prescription of a UTI-relevant antibiotic were used to identify outpatient UTI visits. Chart review (gold standard) was performed by two reviewers to confirm diagnosis of UTI. The positive predictive value (PPV) that the visit was for UTI (based on chart review) was calculated for three different ICD-10 code algorithms using (1) symptoms only, (2) diagnosis only, or (3) both. Results: Of the 1087 visits analyzed, symptom codes only had the lowest PPV for UTI (PPV = 55.4%; 95%CI: 49.3–61.5%). Diagnosis codes alone resulted in a PPV of 85% (PPV = 84.9%; 95%CI: 81.1–88.2%). The highest PPV was obtained by using both symptom and diagnosis codes together to identify visits with UTI (PPV = 96.3%; 95%CI: 94.5–97.9%). Conclusions: ICD-10 diagnosis codes with or without symptom codes reliably identify UTI visits; symptom codes alone are not reliable. ICD-10 based algorithms are a valid method to study UTIs in primary care settings.


Sign in / Sign up

Export Citation Format

Share Document