Machine learning imputation of metastatic status from open claims in melanoma patients.
e21540 Background: Metastatic status is a crucial variable in most oncology studies but is not available in claims data. The objective of this study is to develop a machine learning model for Imputation of metastatic status from claims data with ground. Truth is derived from highly curated electronic medical record data. Methods: We used a set of 11389 melanoma patients from the ConcertAI real world database of intersecting claims and EMR data that includes data from CancerLinQ Discovery. Using features from claims and our gold standard labels from EMR we built an ML model using (XGBoost) extreme gradient boosting, an algorithm that iteratively combines a set of decision trees into a single model. We used 60% of the data for training, 20% for hyper-parameter tuning, and 20% for holdout testing. The model was built using 55 features. Results: The table below summarizes results. Metrics are on the final hold out set which was unseen by the model and entirely composed of highly curated EMR data. Conclusions: We are able to build a high precision model for the imputation of metastatic melanoma status using claims data. This could enable significantly better use of claims data stemming from the ability to find a metastatic cohort with very few false positives. Providing more precise cohort identification for comparative effectiveness studies. We found features such as secondary neoplasm diagnosis, anti-neoplastic meds, and radiation ranking highly in our analysis of model feature importances. Using techniques to analyze non-linear feature interactions in our AI model we found an interaction relationship between long term anti-neoplastic therapy, reported pain and metastatic status which we plan to further study. This work is preliminary and we are working to further improve model performance.[Table: see text]