A Medical Decision Support Tool Using Text-mining Techniques with Electronic Medical Records

Inquiry@Queen's Undergraduate Research Conference Proceedings ◽

10.24908/iqurcp.11549 ◽

2018 ◽

Author(s):

Michael Judd

Keyword(s):

Back Pain ◽

Text Mining ◽

Electronic Medical Records ◽

Lower Back Pain ◽

Medical Records ◽

Medical Condition ◽

Medical Decision ◽

Free Text ◽

Clinical Notes ◽

Lower Back

Free-text clinical notes represent a vast amount of information which in the past has been un-analyzed data. In this paper we apply text-mining methods on the free-text in electronic medical records (EMRs) to define treatment options for patients with lower back pain. The goal of the project is to develop a generalized text-mining framework that can be used not only in the treatment of lower back pain, but any medical condition. The framework takes advantage of open-source algorithms for anonymization and the clinical NLP tool Apache Clinical Text Analysis and Knowledge Extraction System (cTAKES) to form structured data from clinical notes. The machine learning algorithm uses seven years of extracted clinical notes from the primary care physician to classify 20 patients’ pattern of back pain. With the small dataset provided, the algorithm managed to achieve diagnosis accuracy of up to 100%. The twenty-patient dataset was simply too homogenous and small to make statistical claims for sensitivity and specificity. However, the system shows indicators of satisfactory performance, and we are trying to extract more data of patients who do not have back pain to be able to validate our system better.

Download Full-text

Natural Language Processing to Identify Cancer Treatments With Electronic Medical Records

JCO Clinical Cancer Informatics ◽

10.1200/cci.20.00173 ◽

2021 ◽

pp. 379-393

Author(s):

Jiaming Zeng ◽

Imon Banerjee ◽

A. Solomon Henry ◽

Douglas J. Wood ◽

Ross D. Shachter ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Electronic Medical Records ◽

Language Processing ◽

Medical Records ◽

Structured Data ◽

Free Text ◽

Esophagus Cancer ◽

Clinical Notes ◽

Patients With Cancer

PURPOSE Knowing the treatments administered to patients with cancer is important for treatment planning and correlating treatment patterns with outcomes for personalized medicine study. However, existing methods to identify treatments are often lacking. We develop a natural language processing approach with structured electronic medical records and unstructured clinical notes to identify the initial treatment administered to patients with cancer. METHODS We used a total number of 4,412 patients with 483,782 clinical notes from the Stanford Cancer Institute Research Database containing patients with nonmetastatic prostate, oropharynx, and esophagus cancer. We trained treatment identification models for each cancer type separately and compared performance of using only structured, only unstructured ( bag-of-words, doc2vec, fasttext), and combinations of both ( structured + bow, structured + doc2vec, structured + fasttext). We optimized the identification model among five machine learning methods (logistic regression, multilayer perceptrons, random forest, support vector machines, and stochastic gradient boosting). The treatment information recorded in the cancer registry is the gold standard and compares our methods to an identification baseline with billing codes. RESULTS For prostate cancer, we achieved an f1-score of 0.99 (95% CI, 0.97 to 1.00) for radiation and 1.00 (95% CI, 0.99 to 1.00) for surgery using structured + doc2vec. For oropharynx cancer, we achieved an f1-score of 0.78 (95% CI, 0.58 to 0.93) for chemoradiation and 0.83 (95% CI, 0.69 to 0.95) for surgery using doc2vec. For esophagus cancer, we achieved an f1-score of 1.0 (95% CI, 1.0 to 1.0) for both chemoradiation and surgery using all combinations of structured and unstructured data. We found that employing the free-text clinical notes outperforms using the billing codes or only structured data for all three cancer types. CONCLUSION Our results show that treatment identification using free-text clinical notes greatly improves upon the performance using billing codes and simple structured data. The approach can be used for treatment cohort identification and adapted for longitudinal cancer treatment identification.

Download Full-text