Patient-oriented natural language processing: Defining a new paradigm for research and development to facilitate adoption and utilization by medical experts (Preprint)
UNSTRUCTURED The capabilities of natural language processing (NLP) methods have expanded significantly in recent years, particularly driven by advances in data science and machine learning. However, the utilization of NLP for patient-oriented clinical research and care (POCRC) is still limited. A primary reason behind this is perhaps the fact that clinical NLP methods are developed, optimized, and evaluated on narrow-focus datasets and tasks (e.g., for the detection of specific symptoms from free texts). Such research and development (R&D) approaches may be described as problem-oriented, and the developed systems only perform well for a given specialized task. As standalone systems, they are also typically not suitable for addressing the needs of POCRC, leaving a gap between the capabilities of clinical NLP methods and the needs of patient-facing medical experts. We believe that to make clinical NLP systems more valuable, future R&D efforts need to follow a new research paradigm, one that explicitly incorporates characteristics that are crucial for POCRC. We present our viewpoint about four interrelated characteristics, three representing NLP system properties and one associated with the R&D process—(i) generalizability (capability to characterize patients, not clinical problems), (ii) interpretability (ability to explain system decisions), (iii) customizability (flexibility for adaptation to distinct settings, problems and cohorts), and (iv) cross-evaluation (validated performance on heterogeneous datasets)—that are relevant for NLP systems suitable for POCRC. Using the NLP task of clinical concept detection as an example, we detail these characteristics and discuss how they may lead to increased uptake of NLP systems for POCRC.