Basic Study on Automated Extraction of Symptom Related Words from Patient Complaints (Preprint)
BACKGROUND Although methods of obtaining knowledge from texts written by healthcare professionals such as electronic medical records and discharge summaries have been studied, there are few reports analyzing free-text data on patients’ complaints in Japanese. OBJECTIVE This study aimed to establish a new method for extracting keywords from patients’ free descriptions accumulated in Japanese medical institutions. METHODS We developed a system that automatically annotates free-text data with the codes of the Tenth Revision of the International Statistical Classification of Diseases and Related Health Problems (ICD10) using electronic medication history data (target period: September 1, 2015 to August 31, 2016). The performance of the system was evaluated through comparisons with data manually annotated by healthcare workers. RESULTS The number of ICD10 codes extracted from 5,000 patient statements by healthcare workers was 2,348, while the system extracted 2,236 codes. Of those cases, 1,480 matched. Compared with manual extraction, the performance of the system was 0.66 in terms of precision, 0.63 in recall, and 0.65 for the F-measure. CONCLUSIONS Our results suggested that the system was helpful for extracting and standardizing patient’s words related to symptoms from massive amounts of free-text data instead of manual work. After improving the extraction accuracy, we expect to utilize this system to detect the signals of adverse drug reactions from patients’ statements in the future.