Clinical oncology textual notes analysis using machine learning and deep learning (Preprint)
UNSTRUCTURED The textual analysis and classification are important research topics, as advances in this field can foster quality in existing clinical systems. Our research explored experimentally text classification methods applied in non-synthetic oncology clinical notes corpora. The experiments were performed in a dataset with 3,308 medical notes. Experiments evaluated the following machine learning and deep learning classification methods: Multilayer Perceptron Neural network, Logistic Regression, Decision Tree classifier, Random Forest classifier, K-nearest neighbors classifier, and Long-Short Term Memory. An experiment evaluated the influence of the corpora preprocessing step on the results, allowing to identify that the classifier’s mean accuracy was leveraged from 26.1% to 86.7% with the per-clinical-event corpus and 93.9% with the per-patient corpus. The best-performing classifier was the Multilayer Perceptron, which achieved 93.90% accuracy, a Macro F1 score of 93.61%, and a Weighted F1 score of 93.99%.