scholarly journals Measuring the Value of a Practical Text Mining Approach to Identify Patients With Housing Issues in the Free-Text Notes in Electronic Health Record: Findings of a Retrospective Cohort Study

2021 ◽  
Vol 9 ◽  
Author(s):  
Elham Hatef ◽  
Gurmehar Singh Deol ◽  
Masoud Rouhizadeh ◽  
Ashley Li ◽  
Katyusha Eibensteiner ◽  
...  

Introduction: Despite the growing efforts to standardize coding for social determinants of health (SDOH), they are infrequently captured in electronic health records (EHRs). Most SDOH variables are still captured in the unstructured fields (i.e., free-text) of EHRs. In this study we attempt to evaluate a practical text mining approach (i.e., advanced pattern matching techniques) in identifying phrases referring to housing issues, an important SDOH domain affecting value-based healthcare providers, using EHR of a large multispecialty medical group in the New England region, United States. To present how this approach would help the health systems to address the SDOH challenges of their patients we assess the demographic and clinical characteristics of patients with and without housing issues and briefly look into the patterns of healthcare utilization among the study population and for those with and without housing challenges.Methods: We identified five categories of housing issues [i.e., homelessness current (HC), homelessness history (HH), homelessness addressed (HA), housing instability (HI), and building quality (BQ)] and developed several phrases addressing each one through collaboration with SDOH experts, consulting the literature, and reviewing existing coding standards. We developed pattern-matching algorithms (i.e., advanced regular expressions), and then applied them in the selected EHR. We assessed the text mining approach for recall (sensitivity) and precision (positive predictive value) after comparing the identified phrases with manually annotated free-text for different housing issues.Results: The study dataset included EHR structured data for a total of 20,342 patients and 2,564,344 free-text clinical notes. The mean (SD) age in the study population was 75.96 (7.51). Additionally, 58.78% of the cohort were female. BQ and HI were the most frequent housing issues documented in EHR free-text notes and HH was the least frequent one. The regular expression methodology, when compared to manual annotation, had a high level of precision (positive predictive value) at phrase, note, and patient levels (96.36, 95.00, and 94.44%, respectively) across different categories of housing issues, but the recall (sensitivity) rate was relatively low (30.11, 32.20, and 41.46%, respectively).Conclusion: Results of this study can be used to advance the research in this domain, to assess the potential value of EHR's free-text in identifying patients with a high risk of housing issues, to improve patient care and outcomes, and to eventually mitigate socioeconomic disparities across individuals and communities.

2020 ◽  
Vol 41 (S1) ◽  
pp. s39-s39
Author(s):  
Pontus Naucler ◽  
Suzanne D. van der Werff ◽  
John Valik ◽  
Logan Ward ◽  
Anders Ternhag ◽  
...  

Background: Healthcare-associated infection (HAI) surveillance is essential for most infection prevention programs and continuous epidemiological data can be used to inform healthcare personal, allocate resources, and evaluate interventions to prevent HAIs. Many HAI surveillance systems today are based on time-consuming and resource-intensive manual reviews of patient records. The objective of HAI-proactive, a Swedish triple-helix innovation project, is to develop and implement a fully automated HAI surveillance system based on electronic health record data. Furthermore, the project aims to develop machine-learning–based screening algorithms for early prediction of HAI at the individual patient level. Methods: The project is performed with support from Sweden’s Innovation Agency in collaboration among academic, health, and industry partners. Development of rule-based and machine-learning algorithms is performed within a research database, which consists of all electronic health record data from patients admitted to the Karolinska University Hospital. Natural language processing is used for processing free-text medical notes. To validate algorithm performance, manual annotation was performed based on international HAI definitions from the European Center for Disease Prevention and Control, Centers for Disease Control and Prevention, and Sepsis-3 criteria. Currently, the project is building a platform for real-time data access to implement the algorithms within Region Stockholm. Results: The project has developed a rule-based surveillance algorithm for sepsis that continuously monitors patients admitted to the hospital, with a sensitivity of 0.89 (95% CI, 0.85–0.93), a specificity of 0.99 (0.98–0.99), a positive predictive value of 0.88 (0.83–0.93), and a negative predictive value of 0.99 (0.98–0.99). The healthcare-associated urinary tract infection surveillance algorithm, which is based on free-text analysis and negations to define symptoms, had a sensitivity of 0.73 (0.66–0.80) and a positive predictive value of 0.68 (0.61–0.75). The sensitivity and positive predictive value of an algorithm based on significant bacterial growth in urine culture only was 0.99 (0.97–1.00) and 0.39 (0.34–0.44), respectively. The surveillance system detected differences in incidences between hospital wards and over time. Development of surveillance algorithms for pneumonia, catheter-related infections and Clostridioides difficile infections, as well as machine-learning–based models for early prediction, is ongoing. We intend to present results from all algorithms. Conclusions: With access to electronic health record data, we have shown that it is feasible to develop a fully automated HAI surveillance system based on algorithms using both structured data and free text for the main healthcare-associated infections.Funding: Sweden’s Innovation Agency and Stockholm County CouncilDisclosures: None


2019 ◽  
Vol 3 (s1) ◽  
pp. 38-38
Author(s):  
Safa Kaleem ◽  
Christa B. Swisher

OBJECTIVES/SPECIFIC AIMS: 1. Determine positive predictive value, negative predictive value, sensitivity, and specificity of Neuro ICU nurse interpretation of real-time bedside qEEG. 2. Determine difference in time to detection of first seizure between Neuro ICU nurse qEEG interpretation and EEG fellow reads of cEEG. 3. Determine what qualities of seizures make detection by neuro ICU nurses more or less likely – e.g. duration of seizures, type of seizures, spatial extent of seizures. METHODS/STUDY POPULATION: Recruit neuro ICU nurses taking care of 150 patients admitted to the Neuro ICU at Duke University Hospital who are initiated on cEEG monitoring. Nurses will be consented for their participation in the study. Neuro ICU nurses will evaluate the qEE RESULTS/ANTICIPATED RESULTS: From literature estimates of a 20% seizure prevalence in critical care settings, we hope to have 30 patients with seizures and 120 without. Based on prior study in the Duke Neuro ICU, we hypothesize that Neuro ICU nurses will have sensitivity and DISCUSSION/SIGNIFICANCE OF IMPACT: This is the first prospective study of neuro ICU nurse interpretation of real-time bedside qEEG in patients with unknown NCSE/NCS presence. If nurse sensitivity, specificity, and positive predictive value are clinically useful, which we deem would be so at a sensitivity of 70% or greater, with acceptable false alarm rate, nurse readings of qEEG could significantly decrease the time to treatment of seizures in the Neuro ICU patient population, and perhaps could improve patient outcomes.


BMJ Open ◽  
2019 ◽  
Vol 9 (10) ◽  
pp. e031373 ◽  
Author(s):  
Jennifer Anne Davidson ◽  
Amitava Banerjee ◽  
Rutendo Muzambi ◽  
Liam Smeeth ◽  
Charlotte Warren-Gash

IntroductionCardiovascular diseases (CVDs) are among the leading causes of death globally. Electronic health records (EHRs) provide a rich data source for research on CVD risk factors, treatments and outcomes. Researchers must be confident in the validity of diagnoses in EHRs, particularly when diagnosis definitions and use of EHRs change over time. Our systematic review provides an up-to-date appraisal of the validity of stroke, acute coronary syndrome (ACS) and heart failure (HF) diagnoses in European primary and secondary care EHRs.Methods and analysisWe will systematically review the published and grey literature to identify studies validating diagnoses of stroke, ACS and HF in European EHRs. MEDLINE, EMBASE, SCOPUS, Web of Science, Cochrane Library, OpenGrey and EThOS will be searched from the dates of inception to April 2019. A prespecified search strategy of subject headings and free-text terms in the title and abstract will be used. Two reviewers will independently screen titles and abstracts to identify eligible studies, followed by full-text review. We require studies to compare clinical codes with a suitable reference standard. Additionally, at least one validation measure (sensitivity, specificity, positive predictive value or negative predictive value) or raw data, for the calculation of a validation measure, is necessary. We will then extract data from the eligible studies using standardised tables and assess risk of bias in individual studies using the Quality Assessment of Diagnostic Accuracy Studies 2 tool. Data will be synthesised into a narrative format and heterogeneity assessed. Meta-analysis will be considered when a sufficient number of homogeneous studies are available. The overall quality of evidence will be assessed using the Grading of Recommendations, Assessment, Development and Evaluation tool.Ethics and disseminationThis is a systematic review, so it does not require ethical approval. Our results will be submitted for peer-review publication.PROSPERO registration numberCRD42019123898


2021 ◽  
pp. 1753495X2110409
Author(s):  
Melanie Nana ◽  
Florence Tydeman ◽  
Georgie Bevan ◽  
Harriet Boulding ◽  
Kimberley Kavanagh ◽  
...  

Background Difficulty accessing medication and poor patient experience have been implicated as risk factors for termination of pregnancy and suicidal ideation in women with hyperemesis gravidarum. We aimed to gain further insight into these factors in order to further inform and improve patient care. Methods We performed a sub-analysis on quantitative data generated through a UK-wide survey of 5071 participants. A qualitative analysis of free text comments was performed using an inductive thematic approach. Results 41.2% % of women taking prescribed medications had to actively request them. ‘Extremely poor’ or ‘poor’ experiences were described in 39.4% and 30.0% of participants in primary and secondary care respectively. Protective factors for termination of pregnancy and suicidal ideation include holistic support from family, friends and healthcare providers. Conclusion Optimal care in hyperemesis gravidarum should incorporate timely access to pharmacotherapy, assessment of mental health, consideration of referral to specialist services and care being delivered in a compassionate manner.


Rheumatology ◽  
2019 ◽  
Vol 59 (5) ◽  
pp. 1059-1065 ◽  
Author(s):  
Sizheng Steven Zhao ◽  
Chuan Hong ◽  
Tianrun Cai ◽  
Chang Xu ◽  
Jie Huang ◽  
...  

Abstract Objectives To develop classification algorithms that accurately identify axial SpA (axSpA) patients in electronic health records, and compare the performance of algorithms incorporating free-text data against approaches using only International Classification of Diseases (ICD) codes. Methods An enriched cohort of 7853 eligible patients was created from electronic health records of two large hospitals using automated searches (⩾1 ICD codes combined with simple text searches). Key disease concepts from free-text data were extracted using NLP and combined with ICD codes to develop algorithms. We created both supervised regression-based algorithms—on a training set of 127 axSpA cases and 423 non-cases—and unsupervised algorithms to identify patients with high probability of having axSpA from the enriched cohort. Their performance was compared against classifications using ICD codes only. Results NLP extracted four disease concepts of high predictive value: ankylosing spondylitis, sacroiliitis, HLA-B27 and spondylitis. The unsupervised algorithm, incorporating both the NLP concept and ICD code for AS, identified the greatest number of patients. By setting the probability threshold to attain 80% positive predictive value, it identified 1509 axSpA patients (mean age 53 years, 71% male). Sensitivity was 0.78, specificity 0.94 and area under the curve 0.93. The two supervised algorithms performed similarly but identified fewer patients. All three outperformed traditional approaches using ICD codes alone (area under the curve 0.80–0.87). Conclusion Algorithms incorporating free-text data can accurately identify axSpA patients in electronic health records. Large cohorts identified using these novel methods offer exciting opportunities for future clinical research.


Blood ◽  
2019 ◽  
Vol 134 (Supplement_1) ◽  
pp. 4689-4689
Author(s):  
Sriman Swarup ◽  
Somedeb Ball ◽  
Nimesh Adhikari ◽  
Anita Sultan ◽  
Khatrina Swarup ◽  
...  

Introduction: Heparin induced thrombocytopenia (HIT) is a severe prothrombotic condition, usually triggered by exposure to heparin products. It is characterized by platelet activation induced by the formation of antibodies to the platelet factor 4 (PF4)/ heparin polyanion complexes. Diagnostic algorithm includes clinical scoring (4T score) alongside serological test for detection of these antibodies (HIT-Ab), while serotonin release assay (SRA) remains the gold- standard for confirmation. The automated latex immunoturbidometric assay (LIA) has recently been FDA approved as a screening tool for HIT and is a potential alternative to the conventional particle immunofiltration assay (PIFA) for time-sensitive detection of HIT-Ab to guide treatment considerations. We recently introduced LIA in our institution. In this study, we present our experience with LIA in comparison to PIFA in the diagnosis of HIT. Methods: We retrospectively reviewed the charts of all the patients on whom a PIFA was ordered between March 2017 and March 2018 in our hospital. We collected information on the results of the PIFA and SRA (if available). We replaced PIFA with LIA for HIT screening. Then, we introduced a structured protocol for diagnosis of HIT in our institution by incorporating 4T scoring alongside LIA order in the electronic medical record (EMR), in December 2018. We reviewed the EMR of all the patients on whom HIT-Ab test (LIA) was ordered between January and June of 2019, and collected similar information as before. All the data were compiled in a single master excel sheet for calculation of performance characteristics (sensitivity, specificity, positive and negative predictive values) for both PIFA and LIA. A patient was considered to have the diagnosis of HIT if the result of SRA was available and positive. Results: In the first phase, a total of 31 orders for SRA was noted against 170 PIFA orders. Five patients had a positive SRA, of whom two were PIFA negative. Half the patients with a negative SRA result were positive for PIFA. Hence, the sensitivity and specificity of PIFA test for our study population were noted to be 60% and 50%, respectively. PIFA had a positive predictive value (PPV) of mere 18.75% for the diagnosis of HIT, whereas the negative predictive value (NPV) was found to be 86.66%. Introduction of structured protocol for HIT diagnosis substantially reduced the number of inappropriate SRA orders in the second phase. On review of data for six months with the new HIT-Ab test LIA, SRA was ordered in only eight patients, to go with 69 orders for the LIA. The result of LIA was positive in all three patients with a positive SRA, whereas it was false positive in four instances. Only one patient was negative for both LIA and SRA during this period. LIA was found to be 100% sensitive and 20% specific for the diagnosis of HIT in our sample. PPV and NPV for LIA were 42.85% and 100%, respectively. Conclusion: The sensitivity and specificity of LIA were found to be 100% and 20%, respectively, in our study population, which is different from the earlier report (Warkentin et al. 2017). The small sample size is a limitation of our study. Higher PPV and NPV for LIA, with its quick turnaround time, make it a useful alternative for the time-sensitive determination of post-test probability for HIT in patients. [HIT- Ab- Heparin Induced Thrombocytopenia Antibody, PIFA- Particle Immunofiltration Assay, LIA- Latex Immunoturbidometric Assay, SRA- Serotonin Release Assay, +ve- Positive, -ve - Negative, PPV- Positive Predictive Value, NPV- Negative Predictive Value] Disclosures No relevant conflicts of interest to declare.


2019 ◽  
Vol 2019 ◽  
pp. 1-5 ◽  
Author(s):  
Maduka Donatus Ughasoro ◽  
Anazoeze Jude Madu ◽  
Iheoma Clara Kela-Eke

Background. Anaemia in children has high mortality. We present the results of assessment of the accuracy of Haemoglobin Colour Scale in identifying anaemia compared with HemoCue assay. Methods. The presence of anaemia in 524 children from four communities was screened using the Haemoglobin Colour Scale (HCS) and HemoCue assay. Independent healthcare providers that estimated the haemoglobin level using Hb-301 haemoglobinometer were different from those that read the colour scale. The sensitivity, specificity, positive predictive value, and negative predictive value were estimated. Results. Of the 524 children surveyed, 44.5% (233/524), 50% (262/524), and 32.2% (168/524) were found to be anaemic using the HemoCue, HCS (p= 0.25), and clinical pallor (p=0.03) respectively. Using the HemoCue as standard, the sensitivity of the HCS and clinical pallor was 89.1% and 72.1%, respectively, and specificity 90.2% and 84.6%, respectively. 74.7 % of the colour scale result was within the 1.0g/dl of the HemoCue reading and 23 % was within 2.0g/dl. Conclusion. The HCS can improve the ability to detect anaemia especially where the use of the HemoCue is not feasible as in the resource poor countries. However, every case of anaemia requires further investigation to determine the underlying causes.


2019 ◽  
Author(s):  
Daniel Leightley ◽  
David Pernet ◽  
Sumithra Velupillai ◽  
Robert J Stewart ◽  
Katharine M Mark ◽  
...  

BACKGROUND Electronic health care records (EHRs) are a rich source of health-related information, with potential for secondary research use. In the United Kingdom, there is no national marker for identifying those who have previously served in the Armed Forces, making analysis of the health and well-being of veterans using EHRs difficult. OBJECTIVE This study aimed to develop a tool to identify veterans from free-text clinical documents recorded in a psychiatric EHR database. METHODS Veterans were manually identified using the South London and Maudsley (SLaM) Biomedical Research Centre Clinical Record Interactive Search—a database holding secondary mental health care electronic records for the SLaM National Health Service Foundation Trust. An iterative approach was taken; first, a structured query language (SQL) method was developed, which was then refined using natural language processing and machine learning to create the Military Service Identification Tool (MSIT) to identify if a patient was a civilian or veteran. Performance, defined as correct classification of veterans compared with incorrect classification, was measured using positive predictive value, negative predictive value, sensitivity, F1 score, and accuracy (otherwise termed Youden Index). RESULTS A gold standard dataset of 6672 free-text clinical documents was manually annotated by human coders. Of these documents, 66.00% (4470/6672) were then used to train the SQL and MSIT approaches and 34.00% (2202/6672) were used for testing the approaches. To develop the MSIT, an iterative 2-stage approach was undertaken. In the first stage, an SQL method was developed to identify veterans using a keyword rule–based approach. This approach obtained an accuracy of 0.93 in correctly predicting civilians and veterans, a positive predictive value of 0.81, a sensitivity of 0.75, and a negative predictive value of 0.95. This method informed the second stage, which was the development of the MSIT using machine learning, which, when tested, obtained an accuracy of 0.97, a positive predictive value of 0.90, a sensitivity of 0.91, and a negative predictive value of 0.98. CONCLUSIONS The MSIT has the potential to be used in identifying veterans in the United Kingdom from free-text clinical documents, providing new and unique insights into the health and well-being of this population and their use of mental health care services.


2019 ◽  
Vol 37 (15_suppl) ◽  
pp. 6554-6554
Author(s):  
Robert Michael Daly ◽  
Dmitriy Gorenshteyn ◽  
Lior Gazit ◽  
Stefania Sokolowski ◽  
Kevin Nicholas ◽  
...  

6554 Background: Acute care accounts for half of cancer expenditures and is a measure of poor quality care. Identifying patients at high risk for emergency department (ED) visits enables institutions to target resources to those most likely to benefit. Risk stratification models developed to date have not been meaningfully employed in oncology, and there is a need for clinically relevant models to improve patient care. Methods: We established and applied a predictive framework for clinical use with attention to modeling technique, clinician feedback, and application metrics. The model employs electronic health record data from initial visit to first antineoplastic administration for patients at our institution from January 2014 to June 2017. The binary dependent variable is occurrence of an ED visit within the first 6 months of treatment. The final regularized multivariable logistic regression model was chosen based on clinical and statistical significance. In order to accommodate for the needs to the program, parameter selection and model calibration were optimized to suit the positive predictive value of the top 25% of observations as ranked by model-determined risk. Results: There are 5,752 antineoplastic administration starts in our training set, and 1,457 in our test set. The positive predictive value of this model for the top 25% riskiest new start antineoplastic patients is 0.53. From over 1,400 data features, the model was refined to include 400 clinically relevant ones spanning demographics, pathology, clinician notes, labs, medications, and psychosocial information. At the patient level, specific features determining risk are surfaced in a web application, RiskExplorer, to enable clinician review of individual patient risk. This physician facing application provides the individual risk score for the patient as well as their quartile of risk when compared to the population of new start antineoplastic patients. For the top quartile of patients, the risk for an ED visit within the first 6 months of treatment is greater than or equal to 49%. Conclusions: We have constructed a framework to build a clinically relevant risk model. We are now piloting it to identify those likely to benefit from a home-based, digital symptom management intervention.


2018 ◽  
Vol 46 (3-4) ◽  
pp. 150-158 ◽  
Author(s):  
Bendix Labeit ◽  
Hannah Mueller ◽  
Paul Muhle ◽  
Inga Claus ◽  
Tobias Warnecke ◽  
...  

Background: For the early detection of post-stroke dysphagia (PSD), valid screening parameters are crucial as part of a step-wise diagnostic procedure. This study examines the role of the National Institute of Health Stroke Scale (NIH-SS) as a potential low-threshold screening parameter. Methods: During a ten-year period, 687 newly admitted patients at University Hospital Muenster were included in a retrospective analysis, if they had ischemic or haemorrhagic stroke confirmed by neuroimaging and had received NIH-SS scoring and endoscopic swallowing evaluation upon admission. The NIH-SS score was correlated with dysphagia severity as measured by the validated 6-point fiberoptic endoscopic dysphagia severity score (FEDSS), and the ideal cut-off score to predict PSD, defined as FEDSS > 1, was calculated. Supra- and infratentorial strokes were analysed separately due to their differing role in the pathophysiology of neurogenic dysphagia. Results: NIH-SS and dysphagia severity show a significant positive correlation in the whole study population (R2 = 0.745) as well as in both analysed subgroups (R2 = 0.494 for supra- and R2 = 0.646 for infratentorial strokes, p < 0.0005, respectively). For supratentorial strokes, the ideal NIH-SS cut-off is > 9 (sensitivity 68.3%, specificity 61.5%, positive predictive value 89.7%, negative predictive value 28.4%). For infratentorial strokes, a lower ideal cut-off > 5 was calculated (sensitivity 67.4%, specificity 85.0%, positive predictive value 95.1%, negative predictive value 37.8%). Conclusions: NIH-SS may be used as an adjunct to predict dysphagia in acute stroke patients with moderate sensitivity and specificity. Differentiation between supra- and infratentorial regions is essential not to miss dysphagia in infratentorial stroke.


Sign in / Sign up

Export Citation Format

Share Document