scholarly journals 758. High-Throughput Mining of Electronic Medical Records Using Generalizable Autonomous Scripts

2019 ◽  
Vol 6 (Supplement_2) ◽  
pp. S338-S338
Author(s):  
Ryan H Rochat ◽  
Gail J Demmler-Harrison

Abstract Background The electronic medical record (EMR) has become a modern compendium of health information, from broad clinical assessments down to an individual’s heart rate. The wealth of information in these EMRs hold promise for clinical discovery and hypothesis generation. Unfortunately, as these systems have become more robust, mining them for relevant clinical information is hindered by the overall data architecture, and often requires the expertise of a clinical informatician to extract relevant data. However, as the information presented to the clinician through the digital workspace is derived from the core EMR database, the format is well structured and can be mined using text recognition and parsing scripts. Methods Here we present a program which can parse output from Epic Hyperspace®, generating a relational database of clinical information. To facilitate ease of use, our protocol capitalizes on the familiarity of Microsoft Excel® as an intermediary for storing the raw output from the EMR, with data parsing and processing scripts written in SAS V9.4 (Cary, North Carolina). Results As a proof of concept, we extracted the diagnosis codes and standard laboratories for 190 patients seen in our Congenital Cytomegalovirus Clinic at Texas Children’s Hospital in Houston, Texas. Manual extraction of these data into Microsoft Excel® took 1 hour, and the scripts to parse the data took less than 5 seconds to run. Data from these patients included: 3800 ICD-10 codes (along with their metadata) and 33,000 individual laboratory values. In total, more than 850,000 characters were extracted from the EMR using this technique. Manual review of 10 randomly selected charts, found the data in perfect concordant with the EMR, a direct reflection of the fidelity of the parsing scripts. On average, an experienced user was able to enter three ICD-10 codes each minute, and six individual laboratory values per minute. At best, this same process would have taken at least 110 hours using a conventional chart review technique. Conclusion High-throughput data mining tools have the potential to improve the feasibility of studies dependent upon information stored in the EMR. When coupled with specific content knowledge, this approach can consolidate months of data collection into a day’s task. Disclosures All authors: No reported disclosures

Author(s):  
Ibrahim Sahin ◽  
Canan Ersoy ◽  
Ilker Ercan ◽  
Melahat Dirican

Objective: Our aim is to perform an analysis, using big data, of cases diagnosed with primary hypothyroidism and aged 18 and over who presented to our hospital, by evaluating the laboratory and socio-demographic data of the patients. Clustering analysis was performed in the big dataset for the purpose of structure-search study on the subject. Methods: According to ICD 10 diagnoses of hypothyroidism between 2005 to 2018 in our hospital 130159 patients aged 18 and over with E03 and E06 diagnosis codes were included in the study. Since drugs containing levothyroxine used in primary hypothyroidism treatment have an effect on the measured hormone levels, in our study, TSH, fT3 and fT4 laboratory values in the first diagnosis of cases who had not received any treatment as part of the diagnosis according to demographics were analysed. Patients with one or more missing laboratory values were excluded, and data of 2680 patients with complete data and TSH values above 4.94 mU/L were retained. Analysis was made with the k means clustering technique, with the data separated into two sets. k means clustering was performed by including age, TSH, fT3 and fT4 variables. Cliff’s Delta effect size coefficients and confidence intervals were calculated to perform size of the difference. Results: The higher prevalence of primary hypothyroidism in female and the peak in hypothyroidism at 4-5 decades in both genders were observed. In which ages were low, fT3 and fT4 values were higher, whereas TSH values were lower in male. In which ages were low, TSH values were higher, whereas fT4 values were lower in female. Conclusion: This study is the first big data analysis study carried out about primary hypothyroidism in our country. Despite the difficulties in implementation, it should not be forgotten that studies like these are important methods for enabling data to be created in our country.


2019 ◽  
Vol 6 (Supplement_2) ◽  
pp. S802-S803
Author(s):  
Ryan H Rochat ◽  
Gail J Demmler-Harrison

Abstract Background There is limited data on the indirect and non-medical costs associated with congenital cytomegalovirus (cCMV). Attempts to predict the economic impact of disease often rely on secondary analyses of large private databases, and may not capture the full spectrum of a disease. The granularity of billing codes in the Electronic Medical Record (EMR) make it possible to track health outcomes over time, however, with over 80,000 unique codes in ICD-10, selecting the appropriate codes requires specific content knowledge and can lead to bias in categorization. The Systematized Nomenclature of Medicine—Clinical Terms (SNOMED-CT)® provides physicians a tool to find specific ICD-10 on the basis of semantic terms. These terms can be used to build disease state-specific clusters of ICD-10 codes by which to study the economic impact of any disease, including this potentially devastating congenital infection. Methods Using a series of data parsing and processing scripts written in SAS V9.4 (Cary, NC), we extracted the diagnosis codes for 190 patients seen in our Congenital Cytomegalovirus Clinic at Texas Children’s Hospital in Houston, Texas. This data were consolidated into a relational database of clinical information. Through a second program we developed, clusters of ICD-10 codes were imputed from the SNOMED-CT® on the basis of semantic terms associated with cCMV (e.g., “hearing problem,” “developmental disability,” “neurological problem”). Results A total of 190 patients have been seen in our clinic with an ICD-10 diagnosis of CMV infection, 144 of these had cCMV, and 102 of these were born after 1/1/2008 (the inception date of our EMR). 60% of these patients were Caucasian (21% Hispanic), and 25% African American. 54 (53%) had hearing deficits, 17 (16%) had hearing aids, and 55 (54%) had developmental abnormalities. The average time (in years) to development of specific deficits are shown in Figure 1. Conclusion The spectrum of disease of cCMV is broad and has been well studied in the past. The EMR gives us the potential to further study this disease in finer detail and identify rates of disease progression by mining the ICD-10 codes associated with these patients throughout time. These results should prove invaluable for generating cost-models for the economic impact of cCMV. Disclosures All authors: No reported disclosures.


2020 ◽  
Vol 10 (1) ◽  
pp. 103
Author(s):  
Vida Abedi ◽  
Jiang Li ◽  
Manu K. Shivakumar ◽  
Venkatesh Avula ◽  
Durgesh P. Chaudhary ◽  
...  

Background. The imputation of missingness is a key step in Electronic Health Records (EHR) mining, as it can significantly affect the conclusions derived from the downstream analysis in translational medicine. The missingness of laboratory values in EHR is not at random, yet imputation techniques tend to disregard this key distinction. Consequently, the development of an adaptive imputation strategy designed specifically for EHR is an important step in improving the data imbalance and enhancing the predictive power of modeling tools for healthcare applications. Method. We analyzed the laboratory measures derived from Geisinger’s EHR on patients in three distinct cohorts—patients tested for Clostridioides difficile (Cdiff) infection, patients with a diagnosis of inflammatory bowel disease (IBD), and patients with a diagnosis of hip or knee osteoarthritis (OA). We extracted Logical Observation Identifiers Names and Codes (LOINC) from which we excluded those with 75% or more missingness. The comorbidities, primary or secondary diagnosis, as well as active problem lists, were also extracted. The adaptive imputation strategy was designed based on a hybrid approach. The comorbidity patterns of patients were transformed into latent patterns and then clustered. Imputation was performed on a cluster of patients for each cohort independently to show the generalizability of the method. The results were compared with imputation applied to the complete dataset without incorporating the information from comorbidity patterns. Results. We analyzed a total of 67,445 patients (11,230 IBD patients, 10,000 OA patients, and 46,215 patients tested for C. difficile infection). We extracted 495 LOINC and 11,230 diagnosis codes for the IBD cohort, 8160 diagnosis codes for the Cdiff cohort, and 2042 diagnosis codes for the OA cohort based on the primary/secondary diagnosis and active problem list in the EHR. Overall, the most improvement from this strategy was observed when the laboratory measures had a higher level of missingness. The best root mean square error (RMSE) difference for each dataset was recorded as −35.5 for the Cdiff, −8.3 for the IBD, and −11.3 for the OA dataset. Conclusions. An adaptive imputation strategy designed specifically for EHR that uses complementary information from the clinical profile of the patient can be used to improve the imputation of missing laboratory values, especially when laboratory codes with high levels of missingness are included in the analysis.


Lab on a Chip ◽  
2021 ◽  
Author(s):  
A. Nicolas ◽  
F. Schavemaker ◽  
K. Kosim ◽  
D. Kurek ◽  
M. Haarmans ◽  
...  

We present an instrument for simultaneously measuring TEER in up to 80 perfused epithelial tubules on an OrganoPlate. The sensitivity, speed and ease of use enables screening of tubules during formation, drug exposure and inflammatory processes.


2021 ◽  
Vol 39 (15_suppl) ◽  
pp. e18843-e18843
Author(s):  
Helen Latimer ◽  
Samantha Tomicki ◽  
Gabriela Dieguez ◽  
Paul Cockrum ◽  
George P. Kim

e18843 Background: The Department of Health and Human Services (HHS) designed the 340B drug pricing program to allow institutions that service specialty populations to acquire drugs at lower prices. Objective: To analyze the dispersion in total cost of care (TCOC) for Medicare FFS patients (pts) with metastatic pancreatic cancer (m-PANC) treated at 340B or non-340B institutions, by NCCN Category 1 regimen. Methods: We identified pts with m-PANC using ICD-10 diagnosis codes in the 2016-18 Medicare Parts A/B/D 100% Research Identifiable Files. Study pts had 2+ claims with a pancreatic cancer diagnosis and Medicare FFS coverage for 6 months pre- and 3 months post-metastasis diagnosis. Study pts were treated with NCCN Category 1 regimens: 1L gemcitabine monotherapy (gem-mono), 1L gemcitabine/nab-paclitaxel (gem-nab), 1L FOLFIRINOX (FFX), and 2L liposomal irinotecan-based regimen (nal-IRI). Pts were attributed to 340B or non-340B institutions based on plurality of chemotherapy claims. TCOC reflects insurer-paid services per line of therapy (LOT) for 3 categories: chemotherapy/supportive drugs (chemo/Rx), inpatient care (IP), and other outpatient care (OP). We grouped pts by quartile (qrt) and evaluated drivers of TCOC and mean rates of admissions (admits/pt). Results: We identified 2,697 (340B) and 3,839 (non-340B) pts taking NCCN Category 1 regimens. Gem-mono represented 1% and 4% of all pts in 340B and non-340B institutions, respectively. Gem-nab accounted for 72% of pts in both cohorts. For gem-nab, FFX, and nal-IRI pts, median TCOC was similar in both cohorts, although mean TCOC by qrt was lower at 340B institutions than non-340B institutions, except for gem-nab in the 1st qrt. The components of TCOC were similar between 340B and non-340B institutions in all qrts. In both cohorts, % IP costs increased between the 1st and 4th qrt (340B:15% to 23%, non-340B:14% to 25%). From the 1st to the 4th qrt, admits/pt increased in both cohorts. In the 340B cohort, nal-IRI pts had the lowest admits/pt while gem-nab pts had the highest in all qrts. In the non-340B cohort, nal-IRI pts had the lowest admits/pt except for in the 1st qrt. Conclusions: Median TCOC was lower at 340B institutions than non-340B institutions for all regimens, and the range of TCOC dispersion was also smaller at 340B institutions. Across qrts, chemotherapy accounted for approximately half the TCOC; however, IP costs were proportionally higher in the 4th qrt. Comparing regimens, despite 2L nal-IRI pts being more heavily pretreated, median costs in each cohort were similar to 1L gem-nab and 1L FFX, while admits/pt were generally lower than 1L gem-nab and 1L FFX across qrts and cohorts.


2013 ◽  
Vol 29 (01) ◽  
pp. 1-16
Author(s):  
Yu-Ping Yang ◽  
Harvey Castner ◽  
Randy Dull ◽  
James R. Dydo ◽  
Dennis Fanguy

A weld shrinkage prediction model was developed for thin uniform ship panels to predict in-plane shrinkage. The weld shrinkage prediction model consists of a series of empirical equations developed by analysis of shrinkage data from welded panels fabricated in the shipyards. These panels ranged in thickness from 3 mm to 9.5 mm and were welded with processes including submerged arc, flux cored arc, and gas metal arc welding. All fabrication data were carefully recorded using practices that were common over each of the shipyards. Measurements of the panels were made throughout each step of fabrication to provide accurate weld shrinkage data. The data were then analyzed by regression analysis to produce equations that permit the calculation of weld shrinkage based on the conditions used for fabrication. These shrinkage model equations were embedded in a Microsoft Excel spreadsheet for ease of use.


Stroke ◽  
2021 ◽  
Vol 52 (Suppl_1) ◽  
Author(s):  
Kori S Zachrison ◽  
Sijia Li ◽  
Mathew J Reeves ◽  
Opeolu M Adeoye ◽  
Carlos A Camargo ◽  
...  

Background: Administrative data are frequently used in stroke research. Ensuring accurate identification of ischemic stroke patients, and those receiving thrombolysis and endovascular thrombectomy (EVT) is critical to ensure representativeness and generalizability. We examined differences in patient samples based on different modes of identification, and propose a strategy for future patient and procedure identification in large administrative databases. Methods: We used nonpublic administrative data from the state of California to identify all ischemic stroke patients discharged from an emergency department or inpatient hospitalization from 2010-2017 based on ICD-9 (2010-2015), ICD-10 (2015-2017), and MS-DRG discharge codes. We identified patients with interhospital transfers, patients receiving thrombolytics, and patients treated with EVT based on ICD, CPT and MS-DRG codes. We determined what proportion of these transfers and procedures would have been identified with ICD versus MS-DRG discharge codes. Results: Of 365,099 ischemic stroke encounters, most (87.7%) had both a stroke-related ICD-9 or ICD-10 code and stroke-related MS-DRG code; 12.3% had only an ICD-9 or ICD-10 code, and 0.02% had only a MS-DRG code. Nearly all transfers (99.9%) were identified using ICD codes. We identified32,433 thrombolytic-treated patients (8.9% of total) using ICD, CPT, and MS-DRG codes; the combination of ICD and CPT codes identified nearly all (98%). We identified 7,691 patients treated with EVT (2.1% of total) using ICD and MS-DRG codes; both MS-DRG and ICD-9/-10 codes were necessary because ICD codes alone missed 13.2% of EVTs. CPT codes only pertain to outpatient/ED patients and are not useful for EVT identification. Conclusions: ICD-9/-10 diagnosis codes capture nearly all ischemic stroke encounters and transfers, while the combination of ICD-9/-10 and CPT codes are adequate for identifying thrombolytic treatment in administrative datasets. However, MS-DRG codes are necessary in addition to ICD codes for identifying EVT, likely due to favorable reimbursement for EVT-related MS-DRG codes incentivizing accurate coding.


2018 ◽  
Vol Volume 10 ◽  
pp. 1503-1508 ◽  
Author(s):  
Jacob Bodilsen ◽  
Michael Dalager-Pedersen ◽  
Nicolai Kjærgaard ◽  
Diederik van de Beek ◽  
Matthijs C Brouwer ◽  
...  

2020 ◽  
Author(s):  
Sumantra Monty Ghosh ◽  
Khokan Sikdar ◽  
Adetola Koleade ◽  
Peter Farris ◽  
Jordan Ross ◽  
...  

Abstract Background: Individuals experiencing homelessness (IEH) tend to have increased length of stay (LOS) in acute care settings, which negatively impacts health care costs and resource utilization. It is unclear however, what specific factors account for this increased LOS. This study attempts to define which diagnoses most impact LOS for IEH and if there are differences based on their demographics. Methods: A retrospective cohort study was conducted looking at ICD-10 diagnosis codes and LOS for patients identified as IEH seen in Emergency Departments (ED) and also for those admitted to. Data were stratified based on diagnosis, gender and age. Statistical analysis was conducted to determine which ICD-10 diagnoses were significantly associated with increased ED and inpatient LOS for IEH compared to housed individuals.Results: Homelessness admissions were associated with increased LOS regardless of gender or age group. The absolute mean difference of LOS between IEH and housed individuals was 1.62 hours [95% CI 1.49 – 1.75] in the ED and 3.02 days [95% CI 2.42-3.62] for inpatients. Males age 18-24 years spent on average 7.12 more days in hospital, and females aged 25-34 spent 7.32 more days in hospital compared to their housed counterparts. Thirty-one diagnoses were associated with increased LOS in EDs for IEH compared to their housed counterparts; maternity concerns and coronary artery disease were associated with significantly increased inpatient LOS. Conclusion: Homelessness significantly increases the LOS of individuals within both ED and inpatient settings. We have identified numerous diagnoses that are associated with increased LOS in IE; these inform the prioritization and development of targeted interventions to improve the health of IEH.


2021 ◽  
Vol 16 (6) ◽  
Author(s):  
David W Rhee ◽  
Jay Pendse ◽  
Hing Chan ◽  
David T Stern ◽  
Daniel J Sartori

The COVID-19 pandemic has dramatically disrupted the educational experience of medical trainees. However, a detailed characterization of exactly how trainees’ clinical experiences have been affected is lacking. Here, we profile residents’ inpatient clinical experiences across the four training hospitals of NYU’s Internal Medicine Residency Program during the pandemic’s first wave. We mined ICD-10 principal diagnosis codes attributed to residents from February 1, 2020, to May 31, 2020. We translated these codes into discrete medical content areas using a newly developed “crosswalk tool.” Residents’ clinical exposure was enriched in infectious diseases (ID) and cardiovascular disease content at baseline. During the pandemic’s surge, ID became the dominant content area. Exposure to other content was dramatically reduced, with clinical diversity repopulating only toward the end of the study period. Such characterization can be leveraged to provide effective practice habits feedback, guide didactic and self-directed learning, and potentially predict competency-based outcomes for trainees in the COVID era.


Sign in / Sign up

Export Citation Format

Share Document