Missing data in breast cancer: Relationship with survival in national databases.
e19114 Background: National cancer registries are valuable tools used to analyze patterns of care and clinical oncology outcomes; yet, patients with missing data may impact the accuracy and generalizability of these data. We sought to evaluate the association between missing data and overall survival (OS). Methods: Using the NCDB and SEER, we compared data missingness among patients diagnosed with invasive breast cancer from 2010-2014. Key variables included: demographic variables (age, race, ethnicity, insurance, education, income), tumor variables (grade, ER, PR, HER2, TNM stage), and treatment variables (surgery in both databases; chemotherapy and radiation in NCDB). OS was compared between those with and without missing data via Cox proportional hazards models. Results: Overall, 775,996 patients in the NCDB and 263,016 in SEER were identified; missingness of at least 1 key variable was 29% and 13%, respectively. Of those, the majority were missing a tumor variable (NCDB 80%; SEER 88%), while demographic and treatment variables were missing less often. When compared to patients with complete data, missingness was associated with a greater risk of death; NCDB 17% vs. 14% (HR 1.23, 99% CI 1.21-1.25) and SEER 27% vs 14% (HR 2.11, 99% CI 2.05-2.18). Rate of death was similar whether the patient was missing 1 or ≥2 variables. When stratified by the type of missing variable, differences in OS between those with and without missing data in the NCDB were small. In SEER, reductions in OS were largest for those missing tumor variables (HR 2.26, 99% CI 2.19-2.33) or surgery data (HR 3.84, 99% CI 3.32-4.45). Among the tumor variables specifically, few clinically meaningful differences in OS were noted in the NCDB, while the most significant differences in SEER were noted in T and N stage (table). Conclusions: Missingness of select variables is associated with a worse OS and is not uncommon within large national cancer registries. Therefore, researchers must use caution when choosing inclusion/exclusion criteria for outcomes studies. Future research is needed to elucidate which patients are most often missing data and why OS differences are observed. [Table: see text]