scholarly journals Multitask Recalibrated Aggregation Network for Medical Code Prediction

2021 ◽  
pp. 367-383
Author(s):  
Wei Sun ◽  
Shaoxiong Ji ◽  
Erik Cambria ◽  
Pekka Marttinen

AbstractMedical coding translates professionally written medical reports into standardized codes, which is an essential part of medical information systems and health insurance reimbursement. Manual coding by trained human coders is time-consuming and error-prone. Thus, automated coding algorithms have been developed, building especially on the recent advances in machine learning and deep neural networks. To solve the challenges of encoding lengthy and noisy clinical documents and capturing code associations, we propose a multitask recalibrated aggregation network. In particular, multitask learning shares information across different coding schemes and captures the dependencies between different medical codes. Feature recalibration and aggregation in shared modules enhance representation learning for lengthy notes. Experiments with a real-world MIMIC-III dataset show significantly improved predictive performance.

Author(s):  
Ulfeta Marovac ◽  
Aldina Avdić ◽  
Dragan Janković ◽  
Sead Marovac

Thanks to medical information systems, many medical reports are collected in an electronic form daily. Apart from the fields with allowed values for input (the structural part), one part of these reports consists of the free, non-structural text. It contains a more detailed description of the patient's condition, which could not be described using the structural part. Symptoms, results of laboratory analyses, accompanying diagnoses, etc. can often be found in it. Due to a lack of time, doctors often write these descriptions in non-standard ways, using their abbreviations and synonyms, and they often contain typos. All this makes it difficult to extract information in documents specific to the medical domain. This paper presents the creation of medical lexical resources for the automatic labeling of terms from diagnoses in medical reports. In order to perform the automatic marking of the free text, methods of the computer processing of natural languages are needed, as well as appropriate lexical resources. As there are no publicly available medical lexical resources for the Serbian language, as well as a corpus with medical reports, the contribution of this paper is the construction of such resources for needs of automatic marking of diagnoses. Using the proposed resources, diagnosis codes, Latin and Serbian terms specific to certain ICD-10 can be mapped with precision of 83.47%, 86.86% and 78.29%, respectively.


Author(s):  
Nuria Garcia-Santa ◽  
Beatriz San Miguel ◽  
Takanori Ugai

The field of medical coding enables to assign codes of medical classifications such as the international classification of diseases (ICD) to clinical notes, which are medical reports about patients' conditions written by healthcare professionals in natural language. These texts potentially include medical terms that define diagnosis, symptoms, drugs, treatments, etc., and the use of spontaneous language is challenging for automatic processing. Medical coding is usually performed manually by human medical coders becoming time-consuming and prone to errors. This research aims at developing new approaches that combine deep learning elements together with traditional technologies. A semantic-based proposal supported by a proprietary knowledge graph (KG), neural network implementations, and an ensemble model to resolve the medical coding are presented. A comparative discussion between the proposals where the advantages and disadvantages of each one is analysed. To evaluate approaches, two main corpus have been used: MIMIC-III and private de-identified clinical notes.


1983 ◽  
Vol 22 (03) ◽  
pp. 124-130 ◽  
Author(s):  
J. H. Bemmel

At first sight, the many applications of computers in medicine—from payroll and registration systems to computerized tomography, intensive care and diagnostics—do make a rather chaotic impression. The purpose of this article is to propose a scheme or working model for putting medical information systems in order. The model comprises six »levels of complexity«, running parallel to dependence on human interaction. Several examples are treated to illustrate the scheme. The reason why certain computer applications are more frequently used than others is analyzed. It has to be strongly considered that the differences in complexity and dependence on human involvement are not accidental but fundamental. This has consequences for research and education which are also discussed.


2020 ◽  
Author(s):  
Mikołaj Morzy ◽  
Bartłomiej Balcerzak ◽  
Adam Wierzbicki ◽  
Adam Wierzbicki

BACKGROUND With the rapidly accelerating spread of dissemination of false medical information on the Web, the task of establishing the credibility of online sources of medical information becomes a pressing necessity. The sheer number of websites offering questionable medical information presented as reliable and actionable suggestions with possibly harmful effects poses an additional requirement for potential solutions, as they have to scale to the size of the problem. Machine learning is one such solution which, when properly deployed, can be an effective tool in fighting medical disinformation on the Web. OBJECTIVE We present a comprehensive framework for designing and curating of machine learning training datasets for online medical information credibility assessment. We show how the annotation process should be constructed and what pitfalls should be avoided. Our main objective is to provide researchers from medical and computer science communities with guidelines on how to construct datasets for machine learning models for various areas of medical information wars. METHODS The key component of our approach is the active annotation process. We begin by outlining the annotation protocol for the curation of high-quality training dataset, which then can be augmented and rapidly extended by employing the human-in-the-loop paradigm to machine learning training. To circumvent the cold start problem of insufficient gold standard annotations, we propose a pre-processing pipeline consisting of representation learning, clustering, and re-ranking of sentences for the acceleration of the training process and the optimization of human resources involved in the annotation. RESULTS We collect over 10 000 annotations of sentences related to selected subjects (psychiatry, cholesterol, autism, antibiotics, vaccines, steroids, birth methods, food allergy testing) for less than $7 000 employing 9 highly qualified annotators (certified medical professionals) and we release this dataset to the general public. We develop an active annotation framework for more efficient annotation of non-credible medical statements. The results of the qualitative analysis support our claims of the efficacy of the presented method. CONCLUSIONS A set of very diverse incentives is driving the widespread dissemination of medical disinformation on the Web. An effective strategy of countering this spread is to use machine learning for automatically establishing the credibility of online medical information. This, however, requires a thoughtful design of the training pipeline. In this paper we present a comprehensive framework of active annotation. In addition, we publish a large curated dataset of medical statements labelled as credible, non-credible, or neutral.


Sign in / Sign up

Export Citation Format

Share Document