scholarly journals Predicting ICD-9 Codes Using Self-Report of Patients

2021 ◽  
Vol 11 (21) ◽  
pp. 10046
Author(s):  
Anandakumar Singaravelan ◽  
Chung-Ho Hsieh ◽  
Yi-Kai Liao ◽  
Jia-Lien Hsu

The International Classification of Diseases (ICD) is a globally recognized medical classification system that aids in the identification of diseases and the regulation of health trends. The ICD framework makes it easy to keep track of records and evaluate medical data for evidence-based decision-making. Several methods have predicted ICD-9 codes based on the discharge summary, clinical notes, and nursing notes. In our study, our approach only utilizes the subjective component to predict ICD-9 codes. Data cleaning and segmentation, and Natural Language Processing (NLP) techniques are applied on the subjective component during the pre-processing. Our study builds the Long Short-Term Memory (LSTM) and the Gated Recurrent Unit (GRU) to develop a model for predicting ICD-9 codes. The ICD-9 codes contain different ICD levels such as chapter, block, three-digit code, and full code. The GRU model scores the highest recall of 57.91% in the chapter level and the top-10 experiment has a recall of 67.37%. Based on the subjective component, the model can help patients in the form of a remote assistance tool.

Rheumatology ◽  
2020 ◽  
Vol 59 (12) ◽  
pp. 3759-3766 ◽  
Author(s):  
Sicong Huang ◽  
Jie Huang ◽  
Tianrun Cai ◽  
Kumar P Dahal ◽  
Andrew Cagan ◽  
...  

Abstract Objective The objective of this study was to compare the performance of an RA algorithm developed and trained in 2010 utilizing natural language processing and machine learning, using updated data containing ICD10, new RA treatments, and a new electronic medical records (EMR) system. Methods We extracted data from subjects with ≥1 RA International Classification of Diseases (ICD) codes from the EMR of two large academic centres to create a data mart. Gold standard RA cases were identified from reviewing a random 200 subjects from the data mart, and a random 100 subjects who only have RA ICD10 codes. We compared the performance of the following algorithms using the original 2010 data with updated data: (i) a published 2010 RA algorithm; (ii) updated algorithm, incorporating ICD10 RA codes and new DMARDs; and (iii) published algorithm using ICD codes only, ICD RA code ≥3. Results The gold standard RA cases had mean age 65.5 years, 78.7% female, 74.1% RF or antibodies to cyclic citrullinated peptide (anti-CCP) positive. The positive predictive value (PPV) for ≥3 RA ICD was 54%, compared with 56% in 2010. At a specificity of 95%, the PPV of the 2010 algorithm and the updated version were both 91%, compared with 94% (95% CI: 91, 96%) in 2010. In subjects with ICD10 data only, the PPV for the updated 2010 RA algorithm was 93%. Conclusion The 2010 RA algorithm validated with the updated data with similar performance characteristics as the 2010 data. While the 2010 algorithm continued to perform better than the rule-based approach, the PPV of the latter also remained stable over time.


2021 ◽  
Vol 8 (Supplement_1) ◽  
pp. S448-S448
Author(s):  
H Nina Kim ◽  
Ayushi Gupta ◽  
Kristine F Lan ◽  
Jenell C Stewart ◽  
Shireesha Dhanireddy ◽  
...  

Abstract Background Studies on infective endocarditis (IE) have relied on International Classification of Diseases (ICD) codes to identify cases but few have validated this method which may be prone to misclassification. Examination of clinical narrative data could offer greater accuracy and richness. Methods We evaluated two algorithms for IE identification from 7/1/2015 to 7/31/2019: (1) a standard query of ICD codes for IE (ICD-9: 424.9, 424.91, 424.99, 421.0, 421.1, 421.9, 112.81, 036.42 and ICD-10: I38, I39, I33, I33.9, B37.6 and A39.51) with or without procedure codes for echocardiogram (93303-93356) and (2) a key word, pattern-based text query of discharge summaries (DS) that selected on the term “endocarditis” in fields headed by “Discharge Diagnosis” or “Admission Diagnosis” or similar. Further coding extracted the nature and type of valve and the organism responsible for the IE if present in DS. All identified cases were chart reviewed using pre-specified criteria for true IE. Positive predictive value (PPV) was calculated as the total number of verified cases over the algorithm-selected cases. Sensitivity was the total number of algorithm-matched cases over a final list of 166 independently identified true IE cases from ID and Cardiology services. Specificity was defined using 119 pre-adjudicated non-cases minus the number of algorithm-matched cases over 119. Results The ICD-based query identified 612 individuals from July 2015 to July 2019 who had a hospital billing code for infective endocarditis; of these, 534 also had an echocardiogram. The DS query identified 387 cases. PPV for the DS query was 84.5% (95% confidence interval [CI] 80.6%, 87.8%) compared with 72.4% (95% CI 68.7%, 75.8%) for ICD only and 75.8% (95% CI 72.0%, 79.3%) for ICD + echo queries. Sensitivity was 75.9% for the DS query and 86.8-93.4% for the ICD queries. Specificity was high for all queries >94%. The DS query also yielded valve data (prosthetic, tricuspid, pulmonic, aortic or mitral) in 60% and microbiologic data in 73% of identified cases with an accuracy of 94% and 90% respectively when assessed by chart review. Table 1. Test Characteristics of Three Electronic Health Record Queries for Infective Endocarditis Conclusion Compared to traditional ICD-based queries, text-based queries of discharge summaries have the potential to improve precision of IE case ascertainment and extract key clinical variables. Disclosures All Authors: No reported disclosures


Author(s):  
Nuria Garcia-Santa ◽  
Beatriz San Miguel ◽  
Takanori Ugai

The field of medical coding enables to assign codes of medical classifications such as the international classification of diseases (ICD) to clinical notes, which are medical reports about patients' conditions written by healthcare professionals in natural language. These texts potentially include medical terms that define diagnosis, symptoms, drugs, treatments, etc., and the use of spontaneous language is challenging for automatic processing. Medical coding is usually performed manually by human medical coders becoming time-consuming and prone to errors. This research aims at developing new approaches that combine deep learning elements together with traditional technologies. A semantic-based proposal supported by a proprietary knowledge graph (KG), neural network implementations, and an ensemble model to resolve the medical coding are presented. A comparative discussion between the proposals where the advantages and disadvantages of each one is analysed. To evaluate approaches, two main corpus have been used: MIMIC-III and private de-identified clinical notes.


Author(s):  
Osuagwu Uchechukwu Levi ◽  
Frederick Webb ◽  
David Simmons

Early identification/diagnosis of diabetes and frequent monitoring of hyperglycemia reduces hospitalizations and diabetes-related complications. The present study investigated the proportion of older adults coded with diabetes or newly diagnosed during their admissions and assessed discharge summary content for diabetes-related information. The study used electronic data on 4796 individuals aged ≥60 years admitted through the emergency department (ED) of a public hospital from 2017 to 2018 extracted using International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM code). The proportion of admitted patients who were diagnosed with diabetes over a one-year period, proportion with glycated hemoglobin A1c (HbA1c) and random blood glucose (RBG) test performed during their stay, length of stay, discharge summary information and the factors associated with elevated HbA1c (>7%/53 mmol/mol) were investigated. In total, 8.6% of ED presentations to the hospital were coded with diabetes, excluding gestational consisting of 879 patients (449 males, 430 females) aged ≥ 60 years (74.6 ± 8.9 years). In total, 98% had type 2 diabetes (n = 863), 53% were Australian-born (n = 467), and the mean body mass index (BMI, 31 ± 7 kg/m2; n = 499, 56.8%), RBG (9.8 ± 5.2 mmol/L; n = 824, 93.7%) and HbA1c (8.0 ± 2.0%; n = 137, 15.6%) and length of stay (6.7 ± 25.4 days) were similar between gender, age, and nationality (p > 0.05). Three coded patients (0.3%) were newly diagnosed during the admission. In total, 86% had elevated HbA1c, but this was recorded in 20% of discharge summaries. Patients who are on a combination therapy (adjusted odds ratio 23%, 95% confidence intervals: 7%/38%), those on SGLT2 Inhibitors (aOR, 14%: 2%/26%) or had a change in medication (aOR, 40%: 22%/59%) had lower odds of having elevated HbA1c during admission. The low diagnosis rate of diabetes and the lack of clinical assessment of HbA1c in older adults admitted through the ED of a South Western Sydney public hospital suggest that many patients with diabetes either remain undiagnosed even during admission and/or are going to the ED with unknown diabetes that is unidentified with current practices. The clinically important HbA1c results were only infrequently communicated with general practitioners (GPs).


2012 ◽  
Vol 2012 ◽  
pp. 1-4 ◽  
Author(s):  
Lakshmi Narayanan Kota ◽  
Bhagyalakshmi Mallapura Shankarappa ◽  
Prafulla Shivakumar ◽  
Shilpa Sadanand ◽  
Bhavani Shankara Bagepally ◽  
...  

Objective. To evaluate the association of Apolipoprotein E4 (ApoE4) in Alzheimer's dementia (AD) with comorbid diabetes mellitus (DM).Methods. The study included subjects with Alzheimer's dementia (AD) (n=209), individuals with non-Alzheimer's dementia (nAD) (n=122), individuals with parental history of AD (f/hAD) (n=70), and control individuals who had normal cognitive functions and no parental history of dementia (NC) (n=193). Dementia was diagnosed using International Classification of Diseases-10 revision (ICD-10) criteria. DM was assessed on the basis of self-report and/or use of antidiabetic medications. ApoE genotyping was done using sequence-specific primer polymerase chain reaction.Results. ApoE4 allele frequencies were highest among AD with comorbid DM (0.35) followed by AD without DM (0.25), nAD with DM (0.13), nAD without comorbid DM (0.12), and NC (0.08). Frequency of ApoE4 in persons with f/hAD was 0.13. The association of AD with co-morbid DM in ApoE4 carriers was more in comparison to NC with DM (OR=5.68,P=0.04).Conclusion. There is a significant association between AD with co-morbid DM and ApoE4 genotype.


2021 ◽  
Vol 21 (S9) ◽  
Author(s):  
Shuyuan Hu ◽  
Fei Teng ◽  
Lufei Huang ◽  
Jun Yan ◽  
Haibo Zhang

Abstract Background Clinical notes are unstructured text documents generated by clinicians during patient encounters, generally are annotated with International Classification of Diseases (ICD) codes, which give formatted information about the diagnosis and treatment. ICD code has shown its potentials in many fields, but manual coding is labor-intensive and error-prone, lead to researches of automatic coding. Two specific challenges of this task are (1) given an annotated clinical notes, the reasons behind specific diagnoses and treatments are  implicit; (2) explainability is important for practical automatic coding method, the method should not only explain its prediction output but also have explainable internal mechanics. This study aims to develop an explainable CNN approach to address these two challenges. Method Our key idea is that for the automatic ICD coding task, the presence of informative snippets in the clinical text that correlated with each code plays an important role in the prediction of codes, and an informative snippet can be considered as a local and low-level feature. We infer that there exists a correspondence between a convolution filter and a local and low-level feature. Base on the inference, we come up with the Shallow and Wide Attention convolutional Mechanism (SWAM) to improve the CNN-based models’ ability to learn local and low-level features for each label. Results We evaluate our approach on MIMIC-III, an open-access dataset of ICU medical records. Our approach substantially outperforms previous results on top-50 medical code prediction on MIMIC-III dataset, the precision of the worst-performing 10% labels in previous works is increased from 0% to 53% on average. We attribute this improvement to SWAM, by which the wide architecture with attention mechanism gives the model ability to more extensively learn the unique features of different codes, and we prove it by an ablation experiment. Besides, we perform manual analysis of the performance imbalance between different codes, and preliminary conclude the characteristics that determine the difficulty of learning specific codes. Conclusions Our main contributions can be summarized into the following three: (1) We present local and low-level features, a.k.a. informative snippets play an important role in the automatic ICD coding task, and the informative snippets extracted from the clinical text provide explanations for each code. (2) We propose that there exists a correspondence between a convolution filter and a local and low-level feature. A combination of wide and shallow convolutional layer and attention layer can help the CNN-based models better learn local and low-level features. (3) We improved the precision of the worst-performing 10% labels from 0 to 53% on average.


2009 ◽  
Vol 36 (9) ◽  
pp. 2000-2008 ◽  
Author(s):  
JASVINDER A. SINGH

Objective.To study predictors of discordance between self-reported physician diagnosis and administrative database diagnosis of arthritis.Methods.A cohort of all veterans who utilized Veterans Integrated Service Network (VISN)-13 medical facilities were mailed a questionnaire that included patient self-report of physician diagnosis of arthritis and questions regarding demographics, functional limitation, and SF-36V (a validated version of the Medical Outcomes Study Short-Form 36). Kappa coefficient was used to assess the extent of agreement between self-report of physician diagnosis and administrative database definitions that incorporated International Classification of Diseases (ICD) codes and use of medications for arthritis. We identified predictors of overall discordance between self-report and administrative database diagnosis using multivariable logistic regression analyses.Results.Among 70,334 eligible veterans surveyed, 19,749 subjects had an ICD diagnosis of arthritis in the administrative database in the year prior to the survey; 34,440 answered the arthritis question and 18,464 self-reported a physician diagnosis of arthritis. Kappa coefficient showed slight to fair agreement of 0.19–0.32 between self-report and administrative database definitions of arthritis. We found significantly higher overall discordance among veterans with more comorbidities, greater age, worse functional status, lower use of outpatient and inpatient services, lower education level, and among single medical-site users.Conclusion.Low level of agreement between self-report and database diagnosis of arthritis and its significant association with patient demographic, clinical, and functional characteristics highlights the limitation of use of these strategies for identification of patients with arthritis in epidemiological studies.


2019 ◽  
Author(s):  
Katherine P. Liao ◽  
Jiehuan Sun ◽  
Tianrun A. Cai ◽  
Nicholas Link ◽  
Chuan Hong ◽  
...  

AbstractObjectiveElectronic health records (EHR) linked with biorepositories are a powerful platform for translational studies. A major bottleneck exists in the ability to phenotype patients accurately and efficiently. The objective of this study was to develop an automated high-throughput phenotyping method integrating International Classification of Diseases (ICD) codes and narrative data extracted using natural language processing (NLP).MethodWe developed a mapping method for automatically identifying relevant ICD and NLP concepts for a specific phenotype leveraging the UMLS. Aggregated ICD and NLP counts along with healthcare utilization were jointly analyzed by fitting an ensemble of latent mixture models. The MAP algorithm yields a predicted probability of phenotype for each patient and a threshold for classifying subjects with phenotype yes/no. The algorithm was validated using labeled data for 16 phenotypes from a biorepository and further tested in an independent cohort PheWAS for two SNPs with known associations.ResultsThe MAP algorithm achieved higher or similar AUC and F-scores compared to the ICD code across all 16 phenotypes. The features assembled via the automated approach had comparable accuracy to those assembled via manual curation (AUCMAP 0.943, AUCmanual 0.941). The PheWAS results suggest that the MAP approach detected previously validated associations with higher power when compared to the standard PheWAS method based on ICD codes.ConclusionThe MAP approach increased the accuracy of phenotype definition while maintaining scalability, facilitating use in studies requiring large scale phenotyping, such as PheWAS.


2017 ◽  
Vol 9 (1) ◽  
pp. 109-112 ◽  
Author(s):  
Alvin Rajkomar ◽  
Sumant R. Ranji ◽  
Bradley Sharpe

ABSTRACT Background  An important component of internal medicine residency is clinical immersion in core rotations to expose first-year residents to common diagnoses. Objective  Quantify intern experience with common diagnoses through clinical documentation in an electronic health record. Methods  We analyzed all clinical notes written by postgraduate year (PGY) 1, PGY-2, and PGY-3 residents on medicine service at an academic medical center July 1, 2012, through June 30, 2014. We quantified the number of notes written by PGY-1s at 1 of 3 hospitals where they rotate, by the number of notes written about patients with a specific principal billing diagnosis, which we defined as diagnosis-days. We used the International Classification of Diseases 9 (ICD-9) and the Clinical Classification Software (CCS) to group the diagnoses. Results  We analyzed 53 066 clinical notes covering 10 022 hospitalizations with 1436 different ICD-9 diagnoses spanning 217 CCS diagnostic categories. The 10 most common ICD-9 diagnoses accounted for 23% of diagnosis-days, while the 10 most common CCS groupings accounted for more than 40% of the diagnosis-days. Of 122 PGY-1s, 107 (88%) spent at least 2 months on the service, and 3% were exposed to all of the top 10 ICD-9 diagnoses, while 31% had experience with fewer than 5 of the top 10 diagnoses. In addition, 17% of PGY-1s saw all top 10 CCS diagnoses, and 5% had exposure to fewer than 5 CCS diagnoses. Conclusions  Automated detection of clinical experience may help programs review inpatient clinical experiences of PGY-1s.


Sign in / Sign up

Export Citation Format

Share Document