Development and validation of a real-world clinicogenomic database.
2514 Background: Genomic findings have diagnostic, prognostic, and predictive utility in clinical oncology. Population studies have been limited by reliance on trials, registries, or institutional chart review, which are costly and represent narrow populations. Integrating electronic health record (EHR) and genomic data collected as part of routine clinical practice may overcome these hurdles. Methods: Patients in the Flatiron Health Database with non-small cell lung cancer (NSCLC) who underwent comprehensive genomic profiling (CGP) by Foundation Medicine were included. EHR processing included structured data harmonization and abstraction of variables from unstructured documents. EHR and CGP data were de-identified and linked in a HIPAA-compliant process. Data included clinical characteristics, alterations across > 300 genes, tumor mutation burden (TMB), therapies and associated real-world responses, progression, and overall survival (OS). Results: The cohort (n = 1619) had expected clinical (mean age 66; 75% with smoking hx; 80% non-squamous) and genomic (18% EGFR; 4% ALK; 1% ROS1) properties of NSCLC. Presence of a driver mutation (EGFR, ALK, ROS1, MET, BRAF, RET, or ERBB2; n = 576) was associated with younger age, female gender, non-smoking, improved OS (35 vs 19 mo, LR p < 0.0001), and prolonged survival when treated with NCCN-recommended therapy (42 vs 28 mo, LR p = 0.001). CGP identified false negative results in up to 30% of single-biomarker tests for EGFR, ALK, and ROS1. CGP accuracy was supported by clinical outcomes. For example, 5 patients with prior negative ALK-fusion testing began ALK-directed therapy after positive CGP results. All 5 exhibited at least a partial response as recorded in the EHR by treating clinicians. Immunotherapy was used in 22% of patients (n = 353). TMB predicted response to nivolumab, including in PD-L1 negative populations. We recapitulated known associations with smoking, histology, and driver mutations. Conclusions: We present and validate a new paradigm for rapidly generating large, research-grade, longitudinal clinico-genomic databases by linking genomic data with EHR clinical annotation. This method offers a powerful tool for understanding cancer genomics and advancing precision medicine.