Background:
Tobacco smoking is widespread among HIV patients and is likely to contribute to the development of chronic diseases, yet accurate information on smoking is often difficult to obtain from observational data sources. We sought to validate a natural language processing (NLP) classifier to identify smokers among HIV and control cohorts and investigate whether HIV is an independent predictor of smoking and failure to quit smoking.
Methods:
We applied an NLP classifier developed within the Partners HealthCare System to electronic medical records for a cohort of HIV patients and a control cohort matched on age, gender, race, and number of clinical encounters. The NLP classifier searches free text notes for “tokens”, phrases containing smoking-related words, and assigns a smoking status (current, former, or non-smoker) to each token. We developed an algorithm for combining token classifications from a 12 month period into a single smoking status (current vs. non-smoker). We validated the yearly smoking status on a random sample of 500 patients (250 from each cohort) using as a gold standard a trained nurse medical record reviewer who assigned a tobacco smoking status to each patient for each calendar year observed. We calculated sensitivity, specificity and area under the receiver operating characteristic curve (AUC). Using NLP, we classified the full cohorts as ever versus never smokers and current versus non-smokers at last observation. We used logistic regression to assess HIV as a predictor of smoking and failure to quit (current smoking among ever smokers).
Results:
Smoking-related tokens were found in the records of 2926/3554 HIV and 7039/9601 non-HIV patients, providing 34,956 patient years of data. Using NLP to assign smoking status by year yielded sensitivity of 92.4, specificity of 86.2, and AUC of 0.89 (95% confidence interval [CI] 0.88-0.91). NLP assignment of ever versus never smoking status yielded sensitivity of 94.3, specificity of 73.4 and AUC of 0.84 (95% CI 0.81-0.87). Performance of the classifier did not vary by HIV status, gender, age, calendar year, or number of tokens/year. Ever and current smoking were more common in HIV patients than controls (54% vs. 44% and 42% vs. 30%, respectively, both p<0.001). In multivariate models adjusting for demographics, cardiovascular risk factors, coronary heart disease and history of psychiatric illness, HIV was an independent predictor of ever smoking (odds ratio [OR] 1.41, 95% CI 1.29-1.54, P <0.001), current smoking (OR 1.57, 95% CI 1.43-1.72, P<0.001), and failure to quit (OR 1.51, 95% CI 1.31-1.75, P<0.001).
Conclusions:
We validated a novel tool to ascertain smoking status from HIV observational cohort data. HIV was independently associated with both smoking and failure to quit smoking. These data underscore the need for aggressive smoking cessation strategies specific to HIV patients.