A Multidimensional Data Fusion Model Based on Deep Learning for a Patient Similarity Network (Preprint)
BACKGROUND Precision medicine is a novel approach for patient care. It allows the prescription of the appropriate drug as well as suitable treatments to the right patient at the right time. It can be envisioned as the comparison of a new patient with existing patients having similar characteristics, which can be referred to as patient similarity. Several statistical, data mining, and deep learning models have been used to build and apply patient similarity network (PSN) for various purposes. However, the challenges associated with data heterogeneity and dimensionality make it difficult to use a single model that addresses both the challenges of reducing data dimensionality and capturing features of diverse data types, including contextual and longitudinal data. Furthermore, when applying multiple models, we can observe the additional challenges associated with the development of an optimum aggregation scheme that maintains high accuracy and preserves data veracity. OBJECTIVE In this study, we propose a multi-model PSN that considers heterogeneous data with static and dynamic characteristics for disease diagnosis for improving prediction accuracy. The static data model manages the data obtained from patient profiles, whereas the dynamic data model manages longitudinal data from patient treatment pathways and clinical data. METHODS We propose a combination of deep learning models and patient similarity network to obtain abundant clinical evidence and extract relevant information based on which similar patients can be explored and compared, thereby obtaining more accurate and comprehensive diagnosis and recommendations. We use the bidirectional encoder representations from transformers (BERT) to process and analyze the contextual data and generate word embedding, where semantic features are captured using a CNN. Dynamic data is analyzed using a long–short-term memory (LSTM)-based autoencoder, which reduces data dimensionality while preserving the temporal features of the data. Furthermore, we propose an aggregation-based fusion approach in which temporal data and clinical narrative data are combined for estimating the patient similarity. RESULTS We evaluated our proposed method through a series of experiments. The obtained results proved that our proposed deep learning-based PSN fusion model provides higher classification accuracy in determining various patient health outcomes when compared with other traditional classification algorithms. CONCLUSIONS Our multi-model highlights the intensity of the similarity between pairs of patients, thereby realizing precise diagnosis and recommendations for a new patient.