icd coding
Recently Published Documents


TOTAL DOCUMENTS

64
(FIVE YEARS 29)

H-INDEX

9
(FIVE YEARS 1)

2022 ◽  
Vol 22 (1) ◽  
Author(s):  
Zhao Shuai ◽  
Diao Xiaolin ◽  
Yuan Jing ◽  
Huo Yanni ◽  
Cui Meng ◽  
...  

Abstract Background Automated ICD coding on medical texts via machine learning has been a hot topic. Related studies from medical field heavily relies on conventional bag-of-words (BoW) as the feature extraction method, and do not commonly use more complicated methods, such as word2vec (W2V) and large pretrained models like BERT. This study aimed at uncovering the most effective feature extraction methods for coding models by comparing BoW, W2V and BERT variants. Methods We experimented with a Chinese dataset from Fuwai Hospital, which contains 6947 records and 1532 unique ICD codes, and a public Spanish dataset, which contains 1000 records and 2557 unique ICD codes. We designed coding tasks with different code frequency thresholds (denoted as $$f_s$$ f s ), with a lower threshold indicating a more complex task. Using traditional classifiers, we compared BoW, W2V and BERT variants on accomplishing these coding tasks. Results When $$f_s$$ f s was equal to or greater than 140 for Fuwai dataset, and 60 for the Spanish dataset, the BERT variants with the whole network fine-tuned was the best method, leading to a Micro-F1 of 93.9% for Fuwai data when $$f_s=200$$ f s = 200 , and a Micro-F1 of 85.41% for the Spanish dataset when $$f_s=180$$ f s = 180 . When $$f_s$$ f s fell below 140 for Fuwai dataset, and 60 for the Spanish dataset, BoW turned out to be the best, leading to a Micro-F1 of 83% for Fuwai dataset when $$f_s=20$$ f s = 20 , and a Micro-F1 of 39.1% for the Spanish dataset when $$f_s=20$$ f s = 20 . Our experiments also showed that both the BERT variants and BoW possessed good interpretability, which is important for medical applications of coding models. Conclusions This study shed light on building promising machine learning models for automated ICD coding by revealing the most effective feature extraction methods. Concretely, our results indicated that fine-tuning the whole network of the BERT variants was the optimal method for tasks covering only frequent codes, especially codes that represented unspecified diseases, while BoW was the best for tasks involving both frequent and infrequent codes. The frequency threshold where the best-performing method varied differed between different datasets due to factors like language and codeset.


2021 ◽  
Author(s):  
Sajida Raz Bhutto ◽  
Yifan Wu ◽  
Ying Yu ◽  
Akhtar Hussain Jalbani ◽  
Min Li

2021 ◽  
Author(s):  
Ping Gu ◽  
Song Yang ◽  
Qiang Li ◽  
Jiangxing Wang
Keyword(s):  

2021 ◽  
Vol 21 (S9) ◽  
Author(s):  
Shuyuan Hu ◽  
Fei Teng ◽  
Lufei Huang ◽  
Jun Yan ◽  
Haibo Zhang

Abstract Background Clinical notes are unstructured text documents generated by clinicians during patient encounters, generally are annotated with International Classification of Diseases (ICD) codes, which give formatted information about the diagnosis and treatment. ICD code has shown its potentials in many fields, but manual coding is labor-intensive and error-prone, lead to researches of automatic coding. Two specific challenges of this task are (1) given an annotated clinical notes, the reasons behind specific diagnoses and treatments are  implicit; (2) explainability is important for practical automatic coding method, the method should not only explain its prediction output but also have explainable internal mechanics. This study aims to develop an explainable CNN approach to address these two challenges. Method Our key idea is that for the automatic ICD coding task, the presence of informative snippets in the clinical text that correlated with each code plays an important role in the prediction of codes, and an informative snippet can be considered as a local and low-level feature. We infer that there exists a correspondence between a convolution filter and a local and low-level feature. Base on the inference, we come up with the Shallow and Wide Attention convolutional Mechanism (SWAM) to improve the CNN-based models’ ability to learn local and low-level features for each label. Results We evaluate our approach on MIMIC-III, an open-access dataset of ICU medical records. Our approach substantially outperforms previous results on top-50 medical code prediction on MIMIC-III dataset, the precision of the worst-performing 10% labels in previous works is increased from 0% to 53% on average. We attribute this improvement to SWAM, by which the wide architecture with attention mechanism gives the model ability to more extensively learn the unique features of different codes, and we prove it by an ablation experiment. Besides, we perform manual analysis of the performance imbalance between different codes, and preliminary conclude the characteristics that determine the difficulty of learning specific codes. Conclusions Our main contributions can be summarized into the following three: (1) We present local and low-level features, a.k.a. informative snippets play an important role in the automatic ICD coding task, and the informative snippets extracted from the clinical text provide explanations for each code. (2) We propose that there exists a correspondence between a convolution filter and a local and low-level feature. A combination of wide and shallow convolutional layer and attention layer can help the CNN-based models better learn local and low-level features. (3) We improved the precision of the worst-performing 10% labels from 0 to 53% on average.


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Cheng Wang ◽  
Chenlong Yao ◽  
Pengfei Chen ◽  
Jiamin Shi ◽  
Zhe Gu ◽  
...  

The study aims to explore the application of international classification of diseases (ICD) coding technology and embedded electronic medical record (EMR) system. The study established an EMR information knowledge system and collected the data of patient medical records and disease diagnostic codes on the front pages of 8 clinical departments of endocrinology, oncology, obstetrics and gynecology, ophthalmology, orthopedics, neurosurgery, and cardiovascular medicine for statistical analysis. Natural language processing-bidirectional recurrent neural network (NLP-BIRNN) algorithm was used to optimize medical records. The results showed that the coder was not clear about the basic rules of main diagnosis selection and the classification of disease coding and did not code according to the main diagnosis principles. The disease was not coded according to different conditions or specific classification, the code of postoperative complications was inaccurate, the disease diagnosis was incomplete, and the code selection was too general. The solutions adopted were as follows: communication and knowledge training should be strengthened for coders and medical personnel. BIRNN was compared with the convolutional neural network (CNN) and recurrent neural network (RNN) in accuracy, symptom accuracy, and symptom recall, and it suggested that the proposed BIRNN has higher value. Pathological language reading under artificial intelligence algorithm provides some convenience for disease diagnosis and treatment.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Lucia Otero Varela ◽  
Chelsea Doktorchik ◽  
Natalie Wiebe ◽  
Hude Quan ◽  
Catherine Eastwood

Abstract Background The International Classification of Diseases (ICD) is the reference standard for reporting diseases and health conditions globally. Variations in ICD use and data collection across countries can hinder meaningful comparisons of morbidity data. Thus, we aimed to characterize ICD and hospital morbidity data collection features worldwide. Methods An online questionnaire was created to poll the World Health Organization (WHO) member countries that were using ICD. The survey included questions focused on ICD meta-features and hospital data collection systems, and was distributed via SurveyMonkey using purposive and snowball sampling. Accordingly, senior representatives from organizations specialized in the topic, such as WHO Collaborating Centers, and other experts in ICD coding were invited to fill out the survey and forward the questionnaire to their peers. Answers were collated by country, analyzed, and presented in a narrative form with descriptive analysis. Results Responses from 47 participants were collected, representing 26 different countries using ICD. Results indicated worldwide disparities in the ICD meta-features regarding the maximum allowable coding fields for diagnosis, the definition of main condition, and the mandatory type of data fields in the hospital morbidity database. Accordingly, the most frequently reported answers were “reason for admission” as main condition definition (n = 14), having 31 or more diagnostic fields available (n = 12), and “Diagnoses” (n = 26) and “Patient demographics” (n = 25) for mandatory data fields. Discrepancies in data collection systems occurred between but also within countries, thereby revealing a lack of standardization both at the international and national level. Additionally, some countries reported specific data collection features, including the use or misuse of ICD coding, the national standards for coding or lack thereof, and the electronic abstracting systems utilized in hospitals. Conclusions Harmonizing ICD coding standards/guidelines should be a common goal to enhance international comparisons of health data. The current international status of ICD data collection highlights the need for the promotion of ICD and the adoption of the newest version, ICD-11. Furthermore, it will encourage further research on how to improve and standardize ICD coding.


2021 ◽  
Author(s):  
Tong Zhou ◽  
Pengfei Cao ◽  
Yubo Chen ◽  
Kang Liu ◽  
Jun Zhao ◽  
...  
Keyword(s):  

2021 ◽  
Author(s):  
Shang-Chi Tsai ◽  
Chao-Wei Huang ◽  
Yun-Nung Chen

Sign in / Sign up

Export Citation Format

Share Document