Background:
Current cardiovascular disease (CVD) risk scores are derived from research cohorts and are particularly inaccurate in women, older adults, and those with missing data. To overcome these limitations, we aimed to develop a cohort to capitalize on the depth and breadth of clinical data within electronic health record (EHR) systems in order to develop next-generation sex-specific risk prediction scores for incident CVD.
Methods:
All individuals 30 years of age or older residing in Olmsted County, Minnesota on 1/1/2006 were identified. We developed and validated algorithms to define a variety of risk factors, thus building a comprehensive risk profile for each patient. Outcomes including myocardial infarction (MI), percutaneous intervention (PCI), coronary artery bypass graft (CABG), and CVD death were ascertained through 9/30/2017.
Results:
We identified 73,069 individuals without CVD (Table). We retrieved a total of 14,962,762 lab results; 14,534,466 diagnoses; 17,062,601 services/procedures; 1,236,998 outpatient prescriptions; 1,079,065 heart rate measurements; and 1,320,115 blood pressure measurements. The median number of blood pressure and heart rate measurements ascertained per individuals were 11 and 9, respectively. The five most prevalent conditions were: hypertension, hyperlipidemia, arthritis, depression, and cardiac arrhythmias. During follow-up 1,455 MIs, 1,581 PCI, 652 CABG, and 2,161 CVD-related deaths occurred.
Conclusions:
We developed a cohort with comprehensive risk profiles and follow-up for each patient. Using sophisticated machine learning approaches, this electronic cohort will be utilized to develop next-generation sex-specific CVD risk prediction scores. These approaches will allow us to address several challenges with use of EHR data including the ability to 1) deal with missing values, 2) assess and utilize a large number of variables without over-fitting, 3) allow non-linear relationships, and 4) use time-to-event data.