1180 The Use Of Natural Language Processing To Extract Data From Psg Sleep Study Reports Using National Vha Electronic Medical Record Data

Abstract Introduction In 2007, Congress asked the Department of Veteran Affairs to pay closer attention to the incidence of sleep disorders among veterans. We aimed to use natural language processing (NLP), a method that applies algorithms to understand the meaning and structure of sentences within Electronic Health Record (EHR) patient free-text notes, to identify the number of attended polysomnography (PSG) studies conducted in the Veterans Health Administration (VHA) and to evaluate the performance of NLP in extracting sleep data from the notes. Methods We identified 481,115 sleep studies using CPT code 95810 from 2000-19 in the national VHA. We used rule-based regular expression method (phrases: “sleep stage” and “arousal index”) to identify attended PSG reports in the patient free-text notes in the EHR, of which 69,847 records met the rule-based criteria. We randomly selected 178 notes to compare the accuracy of the algorithm in mining sleep parameters: total sleep time (TST), sleep efficiency (SE) and sleep onset latency (SOL) compared to human manual chart review. Results The number of documented PSG studies increased each year from 963 in 2000 to 14,209 in 2018. System performance of NLP compared to manually annotated reference standard in detecting sleep parameters was 83% for TST, 87% for SE, and 81% for SOL (accuracy benchmark ≥ 80%). Conclusion This study showed that NLP is a useful technique to mine EHR and extract data from patients’ free-text notes. Reasons that NLP is not 100% accurate included, the note authors used different phrasing (e.g., “recording duration”) which the NLP algorithm did not detect/extract or authors omitting sleep continuity variables from the notes. Nevertheless, this automated strategy to identify and extract sleep data can serve as an effective tool in large health care systems to be used for research and evaluation to improve sleep medicine patient care and outcomes. Support This material is based upon work supported in part by the Department of Veteran Affairs, Veterans Health Administration, Office of Research and Development, and the Center for Innovations in Quality, Effectiveness and Safety (CIN 13-413). Dr. Nowakowski is also supported by a National Institutes of Health (NIH) Grant (R01NR018342).

Download Full-text

Abstract 125: Natural Language Processing Identifies an Association Between Canadian Cardiovascular Society Angina Severity and Mortality Within the Department of Veterans Affairs

Circulation Cardiovascular Quality and Outcomes ◽

10.1161/circoutcomes.10.suppl_3.125 ◽

2017 ◽

Vol 10 (suppl_3) ◽

Author(s):

Mina Owlia ◽

John A Dodson ◽

Scott L DuVall ◽

Joanne LaFleur ◽

Olga Patterson ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Stable Angina ◽

Class I ◽

Free Text ◽

Health Administration ◽

Canadian Cardiovascular Society ◽

All Cause Mortality ◽

One Year

Background: Stable angina is estimated to affect more than 10 million Americans and is the presenting symptom in half of patients diagnosed with coronary disease. Documentation of angina severity resides as unstructured data and is often unavailable in large datasets. We used natural language processing (NLP) to identify Canadian Cardiovascular Society (CCS) angina class and determine the association with all-cause mortality in an integrated health system’s electronic records (EHR). Methods: We performed a historic cohort study using national Veterans Health Administration data between 1/1/06 and 12/31/13. Veterans with incident stable angina were identified by ICD-9-CM codes. We developed an NLP tool to extract CCS class from free text notes. Risk ratios (RR) for all-cause mortality at one year associated with CCS class were calculated using Poisson regression. Results: There were 299,577 Veterans with angina, of which 14,216 had at least one CCS class extracted via NLP. Mean age was 66.6 years, 98% were male sex, and 82% were white. Diabetes increased with CCS class, but other comorbidities were stable (Table). There were 719 deaths at one year follow-up. The adjusted RR for all-cause mortality at one-year comparing Class III to Class I and Class IV to Class I was 1.40 (95% CI 1.16 - 1.68) and 1.52 (95% CI 1.13 - 2.04), respectively. Conclusion: NLP-derived CCS class was independently associated with one year all-cause mortality. Its application may be limited by inadequate EHR documentation of angina severity.

Download Full-text

T-staging pulmonary oncology from radiological reports using natural language processing: translating into a multi-language setting

Insights into Imaging ◽

10.1186/s13244-021-01018-1 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

J. Martijn Nobel ◽

Sander Puts ◽

Jakob Weiss ◽

Hugo J. W. L. Aerts ◽

Raymond H. Mak ◽

...

Keyword(s):

Lung Cancer ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Tnm Classification ◽

Hybrid Approach ◽

Population Level ◽

Free Text ◽

Rule Based ◽

T Stage

Abstract Background In the era of datafication, it is important that medical data are accurate and structured for multiple applications. Especially data for oncological staging need to be accurate to stage and treat a patient, as well as population-level surveillance and outcome assessment. To support data extraction from free-text radiological reports, Dutch natural language processing (NLP) algorithm was built to quantify T-stage of pulmonary tumors according to the tumor node metastasis (TNM) classification. This structuring tool was translated and validated on English radiological free-text reports. A rule-based algorithm to classify T-stage was trained and validated on, respectively, 200 and 225 English free-text radiological reports from diagnostic computed tomography (CT) obtained for staging of patients with lung cancer. The automated T-stage extracted by the algorithm from the report was compared to manual staging. A graphical user interface was built for training purposes to visualize the results of the algorithm by highlighting the extracted concepts and its modifying context. Results Accuracy of the T-stage classifier was 0.89 in the validation set, 0.84 when considering the T-substages, and 0.76 when only considering tumor size. Results were comparable with the Dutch results (respectively, 0.88, 0.89 and 0.79). Most errors were made due to ambiguity issues that could not be solved by the rule-based nature of the algorithm. Conclusions NLP can be successfully applied for staging lung cancer from free-text radiological reports in different languages. Focused introduction of machine learning should be introduced in a hybrid approach to improve performance.

Download Full-text

Natural language processing to measure the frequency and mode of communication between healthcare professionals and family members of critically ill patients

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocaa263 ◽

2020 ◽

Author(s):

Filipe R Lucini ◽

Karla D Krewulak ◽

Kirsten M Fiest ◽

Sean M Bagshaw ◽

Danny J Zuege ◽

...

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Critically Ill Patients ◽

Healthcare Professionals ◽

Free Text ◽

Learning Approach ◽

Rule Based ◽

Machine Learning Approach

Abstract Objective To apply natural language processing (NLP) techniques to identify individual events and modes of communication between healthcare professionals and families of critically ill patients from electronic medical records (EMR). Materials and Methods Retrospective cohort study of 280 randomly selected adult patients admitted to 1 of 15 intensive care units (ICU) in Alberta, Canada from June 19, 2012 to June 11, 2018. Individual events and modes of communication were independently abstracted using NLP and manual chart review (reference standard). Preprocessing techniques and 2 NLP approaches (rule-based and machine learning) were evaluated using sensitivity, specificity, and area under the receiver operating characteristic curves (AUROC). Results Over 2700 combinations of NLP methods and hyperparameters were evaluated for each mode of communication using a holdout subset. The rule-based approach had the highest AUROC in 65 datasets compared to the machine learning approach in 21 datasets. Both approaches had similar performance in 17 datasets. The rule-based AUROC for the grouped categories of patient documented to have family or friends (0.972, 95% CI 0.934–1.000), visit by family/friend (0.882 95% CI 0.820–0.943) and phone call with family/friend (0.975, 95% CI: 0.952–0.998) were high. Discussion We report an automated method to quantify communication between healthcare professionals and family members of adult patients from free-text EMRs. A rule-based NLP approach had better overall operating characteristics than a machine learning approach. Conclusion NLP can automatically and accurately measure frequency and mode of documented family visitation and communication from unstructured free-text EMRs, to support patient- and family-centered care initiatives.

Download Full-text

Rule-based Natural Language Processing for Automation of Stroke Data Extraction: A Validation Study (Preprint)

10.2196/preprints.35621 ◽

2021 ◽

Author(s):

Dane Gunter ◽

Paulo Puac-Polanco ◽

Olivier Miguel ◽

Rebecca E. Thornhill ◽

Amy Y. X. Yu ◽

...

Keyword(s):

Natural Language Processing ◽

Language Processing ◽

Predictive Value ◽

Data Extraction ◽

Posterior Circulation ◽

Free Text ◽

Anterior Circulation ◽

Rule Based ◽

Related Data ◽

Radiology Reports

BACKGROUND Data extraction from radiology free-text reports is time-consuming when performed manually. Recently, more automated extraction methods using natural language processing (NLP) are proposed. A previously developed rule-based NLP algorithm showed promise in its ability to extract stroke-related data from radiology reports. OBJECTIVE We aimed to externally validate the accuracy of CHARTextract, a rule-based NLP algorithm, to extract stroke-related data from free-text radiology reports. METHODS Free-text reports of CT angiography (CTA) and perfusion (CTP) studies of consecutive patients with acute ischemic stroke admitted to a regional Stroke center for endovascular thrombectomy were analyzed from January 2015 - 2021. Stroke-related variables were manually extracted (reference standard) from the reports, including proximal and distal anterior circulation occlusion, posterior circulation occlusion, presence of ischemia, hemorrhage, Alberta stroke program early CT score (ASPECTS), and collateral status. These variables were simultaneously extracted using a rule-based NLP algorithm. The NLP algorithm's accuracy, specificity, sensitivity, positive predictive value (PPV), and negative predictive value (NPV) were assessed. RESULTS The NLP algorithm's accuracy was >90% for identifying distal anterior occlusion, posterior circulation occlusion, hemorrhage, and ASPECTS. Accuracy was 85%, 74%, and 79% for proximal anterior circulation occlusion, presence of ischemia, and collateral status respectively. The algorithm had an accuracy of 87-100% for the detection of variables not reported in radiology reports. CONCLUSIONS Rule-based NLP has a moderate to good performance for stroke-related data extraction from free-text imaging reports. The algorithm's accuracy was affected by inconsistent report styles and lexicon among reporting radiologists.

Download Full-text

Measuring Adoption of Patient Priorities-Aligned Care Using Natural Language Processing of Electronic Health Records (Preprint)

10.2196/preprints.18756 ◽

2020 ◽

Author(s):

Javad Razjouyan ◽

Jennifer Freytag ◽

Lilian Dindo ◽

Lea Kiefer ◽

Edward Odom ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

P Value ◽

Multiple Chronic Conditions ◽

Free Text ◽

Learning Approaches ◽

Health Administration ◽

Statistical Machine Learning ◽

Electronic Health

BACKGROUND Patient Priorities Care (PPC) is a model of care that aligns health care recommendations with priorities of older adults with multiple chronic conditions. Following identification of patient priorities, this information is documented in the patient’s electronic health record (EHR). OBJECTIVE Our goal is to develop and validate a natural language processing (NLP) model that reliably documents when clinicians identify patient priorities (i.e., values, outcome goals and care preferences) within the EHR as a measure of PPC adoption. METHODS Design: Retrospective analysis of unstructured EHR free-text notes using an NLP model. Setting: National Veteran Health Administration (VHA) EHR. Participants (including the sample size): 778 patient notes of 658 patients from encounters with 144 social workers in the primary care setting Measurements: Each patient’s free-text clinical note was reviewed by two independent reviewers for the presence of PPC language such as priorities, values, and goals. We developed a NLP model that utilized statistical machine learning approaches. The performance of the NLP model in training and validation with 10-fold cross-validation is reported via accuracy, recall, and precision in comparison to the chart review. RESULTS Results: Out of 778 notes, 589 (76%) were identified as containing PPC language (Kappa = 0.82, p-value < 0.001). The NLP model in the training stage had an accuracy of 0.98 (0.98, 0.99), a recall of 0.98 (0.98, 0.99), and precision of 0.98 (0.97, 1.00). The NLP model in the validation stage has an accuracy of 0.92 (0.90, 0.94), a recall of 0.84 (0.79, 0.89), and precision of 0.84 (0.77, 0.91). In contrast, an approach using simple search terms for PPC only had a precision of 0.757. CONCLUSIONS Discussion and Implications: An automated NLP model can reliably measure with high precision, recall, and accuracy when clinicians document patient priorities as a key step in the adoption of PPC. CLINICALTRIAL

Download Full-text

An Intervention With Meaning

Crisis ◽

10.1027/0227-5910/a000433 ◽

2017 ◽

Vol 38 (6) ◽

pp. 376-383 ◽

Cited By ~ 2

Author(s):

Brooke A. Levandowski ◽

Constance M. Cass ◽

Stephanie N. Miller ◽

Janet E. Kemp ◽

Kenneth R. Conner

Keyword(s):

High Risk ◽

Qualitative Methodology ◽

Veterans Health Administration ◽

Veterans Health ◽

Safety Planning ◽

Health Administration ◽

Suicide Method ◽

Treatment Providers ◽

Interactive Nature ◽

Behavioral Health Treatment

Abstract. Background: The Veterans Health Administration (VHA) health-care system utilizes a multilevel suicide prevention intervention that features the use of standardized safety plans with veterans considered to be at high risk for suicide. Aims: Little is known about clinician perceptions on the value of safety planning with veterans at high risk for suicide. Method: Audio-recorded interviews with 29 VHA behavioral health treatment providers in a southeastern city were transcribed and analyzed using qualitative methodology. Results: Clinical providers consider safety planning feasible, acceptable, and valuable to veterans at high risk for suicide owing to the collaborative and interactive nature of the intervention. Providers identified the types of veterans who easily engaged in safety planning and those who may experience more difficulty with the process. Conclusion: Additional research with VHA providers in other locations and with veteran consumers is needed.

Download Full-text