scholarly journals Early prediction of diagnostic-related groups and estimation of hospital cost by processing clinical notes

2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Jinghui Liu ◽  
Daniel Capurro ◽  
Anthony Nguyen ◽  
Karin Verspoor

AbstractAs healthcare providers receive fixed amounts of reimbursement for given services under DRG (Diagnosis-Related Groups) payment, DRG codes are valuable for cost monitoring and resource allocation. However, coding is typically performed retrospectively post-discharge. We seek to predict DRGs and DRG-based case mix index (CMI) at early inpatient admission using routine clinical text to estimate hospital cost in an acute setting. We examined a deep learning-based natural language processing (NLP) model to automatically predict per-episode DRGs and corresponding cost-reflecting weights on two cohorts (paid under Medicare Severity (MS) DRG or All Patient Refined (APR) DRG), without human coding efforts. It achieved macro-averaged area under the receiver operating characteristic curve (AUC) scores of 0·871 (SD 0·011) on MS-DRG and 0·884 (0·003) on APR-DRG in fivefold cross-validation experiments on the first day of ICU admission. When extended to simulated patient populations to estimate average cost-reflecting weights, the model increased its accuracy over time and obtained absolute CMI error of 2·40 (1·07%) and 12·79% (2·31%), respectively on the first day. As the model could adapt to variations in admission time, cohort size, and requires no extra manual coding efforts, it shows potential to help estimating costs for active patients to support better operational decision-making in hospitals.

2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Jinying Chen ◽  
Catarina I. Kiefe ◽  
Marc Gagnier ◽  
Darleen Lessard ◽  
David McManus ◽  
...  

Abstract Background Patients with acute coronary syndromes often experience non-specific (generic) pain after hospital discharge. However, evidence about the association between post-discharge non-specific pain and rehospitalization remains limited. Methods We analyzed data from the Transitions, Risks, and Actions in Coronary Events Center for Outcomes Research and Education (TRACE-CORE) prospective cohort. TRACE-CORE followed patients with acute coronary syndromes for 24 months post-discharge from the index hospitalization, collected patient-reported generic pain (using SF-36) and chest pain (using the Seattle Angina Questionnaire) and rehospitalization events. We assessed the association between generic pain and 30-day rehospitalization using multivariable logistic regression (N = 787). We also examined the associations among patient-reported pain, pain documentation identified by natural language processing (NLP) from electronic health record (EHR) notes, and the outcome. Results Patients were 62 years old (SD = 11.4), with 5.1% Black or Hispanic individuals and 29.9% women. Within 30 days post-discharge, 87 (11.1%) patients were re-hospitalized. Patient-reported mild-to-moderate pain, without EHR documentation, was associated with 30-day rehospitalization (odds ratio [OR]: 2.03, 95% confidence interval [CI]: 1.14–3.62, reference: no pain) after adjusting for baseline characteristics; while patient-reported mild-to-moderate pain with EHR documentation (presumably addressed) was not (OR: 1.23, 95% CI: 0.52–2.90). Severe pain was also associated with 30-day rehospitalization (OR: 3.16, 95% CI: 1.32–7.54), even after further adjusting for chest pain (OR: 2.59, 95% CI: 1.06–6.35). Conclusions Patient-reported post-discharge generic pain was positively associated with 30-day rehospitalization. Future studies should further disentangle the impact of cardiac and non-cardiac pain on rehospitalization and develop strategies to support the timely management of post-discharge pain by healthcare providers.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Lisa Grossman Liu ◽  
Raymond H. Grossman ◽  
Elliot G. Mitchell ◽  
Chunhua Weng ◽  
Karthik Natarajan ◽  
...  

AbstractThe recognition, disambiguation, and expansion of medical abbreviations and acronyms is of upmost importance to prevent medically-dangerous misinterpretation in natural language processing. To support recognition, disambiguation, and expansion, we present the Medical Abbreviation and Acronym Meta-Inventory, a deep database of medical abbreviations. A systematic harmonization of eight source inventories across multiple healthcare specialties and settings identified 104,057 abbreviations with 170,426 corresponding senses. Automated cross-mapping of synonymous records using state-of-the-art machine learning reduced redundancy, which simplifies future application. Additional features include semi-automated quality control to remove errors. The Meta-Inventory demonstrated high completeness or coverage of abbreviations and senses in new clinical text, a substantial improvement over the next largest repository (6–14% increase in abbreviation coverage; 28–52% increase in sense coverage). To our knowledge, the Meta-Inventory is the most complete compilation of medical abbreviations and acronyms in American English to-date. The multiple sources and high coverage support application in varied specialties and settings. This allows for cross-institutional natural language processing, which previous inventories did not support. The Meta-Inventory is available at https://bit.ly/github-clinical-abbreviations.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Pilar López-Úbeda ◽  
Alexandra Pomares-Quimbaya ◽  
Manuel Carlos Díaz-Galiano ◽  
Stefan Schulz

Abstract Background Controlled vocabularies are fundamental resources for information extraction from clinical texts using natural language processing (NLP). Standard language resources available in the healthcare domain such as the UMLS metathesaurus or SNOMED CT are widely used for this purpose, but with limitations such as lexical ambiguity of clinical terms. However, most of them are unambiguous within text limited to a given clinical specialty. This is one rationale besides others to classify clinical text by the clinical specialty to which they belong. Results This paper addresses this limitation by proposing and applying a method that automatically extracts Spanish medical terms classified and weighted per sub-domain, using Spanish MEDLINE titles and abstracts as input. The hypothesis is biomedical NLP tasks benefit from collections of domain terms that are specific to clinical subdomains. We use PubMed queries that generate sub-domain specific corpora from Spanish titles and abstracts, from which token n-grams are collected and metrics of relevance, discriminatory power, and broadness per sub-domain are computed. The generated term set, called Spanish core vocabulary about clinical specialties (SCOVACLIS), was made available to the scientific community and used in a text classification problem obtaining improvements of 6 percentage points in the F-measure compared to the baseline using Multilayer Perceptron, thus demonstrating the hypothesis that a specialized term set improves NLP tasks. Conclusion The creation and validation of SCOVACLIS support the hypothesis that specific term sets reduce the level of ambiguity when compared to a specialty-independent and broad-scope vocabulary.


2020 ◽  
Author(s):  
Thomas Gross ◽  
Felix Amsler

Zusammenfassung Hintergrund Es galt herauszufinden, wie kostendeckend die Versorgung potenziell Schwerverletzter in einem Schweizer Traumazentrum ist, und inwieweit Spitalgewinne bzw. -verluste mit patientenbezogenen Unfall‑, Behandlungs- oder Outcome-Daten korrelieren. Methodik Analyse aller 2018 im Schockraum (SR) bzw. mit Verletzungsschwere New Injury Severity Score (NISS) ≥8 notfallmäßig stationär behandelter Patienten eines Schwerverletztenzentrums der Schweiz (uni- und multivariate Analyse; p < 0,05). Ergebnisse Für das Studienkollektiv (n = 513; Ø NISS = 18) resultierte gemäß Spitalkostenträgerrechnung ein Defizit von 1,8 Mio. CHF. Bei einem Gesamtdeckungsgrad von 86 % waren 66 % aller Fälle defizitär (71 % der Allgemein- vs. 42 % der Zusatzversicherten; p < 0,001). Im Mittel betrug das Defizit 3493.- pro Patient (allg. Versicherte, Verlust 4545.-, Zusatzversicherte, Gewinn 1318.-; p < 0,001). Auch „in“- und „underlier“ waren in 63 % defizitär. SR-Fälle machten häufiger Verlust als Nicht-SR-Fälle (73 vs. 58 %; p = 0,002) wie auch Traumatologie- vs. Neurochirurgiefälle (72 vs. 55 %; p < 0,001). In der multivariaten Analyse ließen sich 43 % der Varianz erhaltener Erlöse mit den untersuchten Variablen erklären. Hingegen war der ermittelte Deckungsgrad nur zu 11 % (korr. R2) durch die Variablen SR, chirurgisches Fachgebiet, Intensivaufenthalt, Thoraxverletzungsstärke und Spitalletalität zu beschreiben. Case-Mix-Index gemäß aktuellen Diagnosis Related Groups (DRG) und Versicherungsklasse addierten weitere 13 % zu insgesamt 24 % erklärter Varianz. Diskussion Die notfallmäßige Versorgung potenziell Schwerverletzter an einem Schweizer Traumazentrum erweist sich nur in einem Drittel der Fälle als zumindest kostendeckend, dies v. a. bei Zusatzversicherten, Patienten mit einem hohen Case-Mix-Index oder einer IPS- bzw. kombinierten Polytrauma- und Schädel-Hirn-Trauma-DRG-Abrechnungsmöglichkeit.


Heart ◽  
2021 ◽  
pp. heartjnl-2021-319769
Author(s):  
Meghan Reading Turchioe ◽  
Alexander Volodarskiy ◽  
Jyotishman Pathak ◽  
Drew N Wright ◽  
James Enlou Tcheng ◽  
...  

Natural language processing (NLP) is a set of automated methods to organise and evaluate the information contained in unstructured clinical notes, which are a rich source of real-world data from clinical care that may be used to improve outcomes and understanding of disease in cardiology. The purpose of this systematic review is to provide an understanding of NLP, review how it has been used to date within cardiology and illustrate the opportunities that this approach provides for both research and clinical care. We systematically searched six scholarly databases (ACM Digital Library, Arxiv, Embase, IEEE Explore, PubMed and Scopus) for studies published in 2015–2020 describing the development or application of NLP methods for clinical text focused on cardiac disease. Studies not published in English, lacking a description of NLP methods, non-cardiac focused and duplicates were excluded. Two independent reviewers extracted general study information, clinical details and NLP details and appraised quality using a checklist of quality indicators for NLP studies. We identified 37 studies developing and applying NLP in heart failure, imaging, coronary artery disease, electrophysiology, general cardiology and valvular heart disease. Most studies used NLP to identify patients with a specific diagnosis and extract disease severity using rule-based NLP methods. Some used NLP algorithms to predict clinical outcomes. A major limitation is the inability to aggregate findings across studies due to vastly different NLP methods, evaluation and reporting. This review reveals numerous opportunities for future NLP work in cardiology with more diverse patient samples, cardiac diseases, datasets, methods and applications.


2013 ◽  
Vol 07 (04) ◽  
pp. 377-405 ◽  
Author(s):  
TRAVIS GOODWIN ◽  
SANDA M. HARABAGIU

The introduction of electronic medical records (EMRs) enabled the access of unprecedented volumes of clinical data, both in structured and unstructured formats. A significant amount of this clinical data is expressed within the narrative portion of the EMRs, requiring natural language processing techniques to unlock the medical knowledge referred to by physicians. This knowledge, derived from the practice of medical care, complements medical knowledge already encoded in various structured biomedical ontologies. Moreover, the clinical knowledge derived from EMRs also exhibits relational information between medical concepts, derived from the cohesion property of clinical text, which is an attractive attribute that is currently missing from the vast biomedical knowledge bases. In this paper, we describe an automatic method of generating a graph of clinically related medical concepts by considering the belief values associated with those concepts. The belief value is an expression of the clinician's assertion that the concept is qualified as present, absent, suggested, hypothetical, ongoing, etc. Because the method detailed in this paper takes into account the hedging used by physicians when authoring EMRs, the resulting graph encodes qualified medical knowledge wherein each medical concept has an associated assertion (or belief value) and such qualified medical concepts are spanned by relations of different strengths, derived from the clinical contexts in which concepts are used. In this paper, we discuss the construction of a qualified medical knowledge graph (QMKG) and treat it as a BigData problem addressed by using MapReduce for deriving the weighted edges of the graph. To be able to assess the value of the QMKG, we demonstrate its usage for retrieving patient cohorts by enabling query expansion that produces greatly enhanced results against state-of-the-art methods.


2017 ◽  
Vol 13 (4) ◽  
Author(s):  
J. Manimaran ◽  
T. Velmurugan

AbstractBackground:Clinical Text Analysis and Knowledge Extraction System (cTAKES) is an open-source natural language processing (NLP) system. In recent development modules of cTAKES, a negation detection (ND) algorithm is used to improve annotation capabilities and simplify automatic identification of negative context in large clinical documents. In this research, the two types of ND algorithms used are lexicon and syntax, which are analyzed using a database made openly available by the National Center for Biomedical Computing. The aim of this analysis is to find the pros and cons of these algorithms.Methods:Patient medical reports were collected from three institutions included the 2010 i2b2/VA Clinical NLP Challenge, which is the input data for this analysis. This database includes patient discharge summaries and progress notes. The patient data is fed into five ND algorithms: NegEx, ConText, pyConTextNLP, DEEPEN and Negation Resolution (NR). NegEx, ConText and pyConTextNLP are lexicon-based, whereas DEEPEN and NR are syntax-based. The results from these five ND algorithms are post-processed and compared with the annotated data. Finally, the performance of these ND algorithms is evaluated by computing standard measures including F-measure, kappa statistics and ROC, among others, as well as the execution time of each algorithm.Results:This research is tested through practical implementation based on the accuracy of each algorithm’s results and computational time to evaluate its performance in order to find a robust and reliable ND algorithm.Conclusions:The performance of the chosen ND algorithms is analyzed based on the results produced by this research approach. The time and accuracy of each algorithm are calculated and compared to suggest the best method.


2021 ◽  
Author(s):  
Jiaming Zeng ◽  
Michael F. Gensheimer ◽  
Daniel L. Rubin ◽  
Susan Athey ◽  
Ross D. Shachter

AbstractIn medicine, randomized clinical trials (RCT) are the gold standard for informing treatment decisions. Observational comparative effectiveness research (CER) is often plagued by selection bias, and expert-selected covariates may not be sufficient to adjust for confounding. We explore how the unstructured clinical text in electronic medical records (EMR) can be used to reduce selection bias and improve medical practice. We develop a method based on natural language processing to uncover interpretable potential confounders from the clinical text. We validate our method by comparing the hazard ratio (HR) from survival analysis with and without the confounders against the results from established RCTs. We apply our method to four study cohorts built from localized prostate and lung cancer datasets from the Stanford Cancer Institute Research Database and show that our method adjusts the HR estimate towards the RCT results. We further confirm that the uncovered terms can be interpreted by an oncologist as potential confounders. This research helps enable more credible causal inference using data from EMRs, offers a transparent way to improve the design of observational CER, and could inform high-stake medical decisions. Our method can also be applied to studies within and beyond medicine to extract important information from observational data to support decisions.


2011 ◽  
pp. 2085-2095
Author(s):  
John P. Pestian ◽  
Lukasz Itert ◽  
Charlotte Andersen

Approximately 57 different types of clinical annotations construct a patient’s medical record. These annotations include radiology reports, discharge summaries, and surgical and nursing notes. Hospitals typically produce millions of text-based medical records over the course of a year. These records are essential for the delivery of care, but many are underutilized or not utilized at all for clinical research. The textual data found in these annotations is a rich source of insights into aspects of clinical care and the clinical delivery system. Recent regulatory actions, however, require that, in many cases, data not obtained through informed consent or data not related to the delivery of care must be made anonymous (as referred to by regulators as harmless), before they can be used. This article describes a practical approach with which Cincinnati Children’s Hospital Medical Center (CCHMC), a large pediatric academic medical center with more than 761,000 annual patient encounters, developed open source software for making pediatric clinical text harmless without losing its rich meaning. Development of the software dealt with many of the issues that often arise in natural language processing, such as data collection, disambiguation, and data scrubbing.


Sign in / Sign up

Export Citation Format

Share Document