scholarly journals Word embeddings trained on published case reports are lightweight, effective for clinical tasks, and free of protected health information

2019 ◽  
Author(s):  
Zachary N. Flamholz ◽  
Lyle H. Ungar ◽  
Gary E. Weissman

AbstractRationaleWord embeddings are used to create vector representations of text data but not all embeddings appropriately capture clinical information, are free of protected health information, and are computationally accessible to most researchers.MethodsWe trained word embeddings on published case reports because their language mimics that of clinical notes, the manuscripts are already de-identified by virtue of being published, and the corpus is much smaller than those trained on large, publicly available datasets. We tested the performance of these embeddings across five clinically relevant tasks and compared the results to embeddings trained on a large Wikipedia corpus, all publicly available manuscripts, notes from the MIMIC-III database using fastText, GloVe, and word2vec, and using different dimensions. Tasks included clinical applications of lexicographic coverage, semantic similarity, clustering purity, linguistic regularity, and mortality prediction.ResultsThe embeddings trained using the published case reports performed as well as if not better on most tasks than those using other corpora. The embeddings trained using all published manuscripts had the most consistent performance across all tasks and required a corpus with 100 times as many tokens as the corpus comprised of only case reports. Embeddings trained on the MIMIC-III dataset had small but marginally better scores on the clustering tasks which was also based on clinical notes from the MIMIC-III dataset. Embeddings trained on the Wikipedia corpus, although containing almost twice as many tokens as all available published manuscripts, performed poorly compared to those trained on medical and clinical corpora.ConclusionWord embeddings trained on freely available published case reports performed well for most clinical task, are free of protected health information, and are small compared to commonly used embeddings trained on larger clinical and non-clinical corpora. The optimal corpus, dimension size, and which embedding model to use for a given task involves tradeoffs in privacy, reproducibility, performance, and computational resources.

2021 ◽  
Author(s):  
Xianghao Zhan ◽  
Marie Humbert-Droz ◽  
Pritam Mukherjee ◽  
Olivier Gevaert

AbstractMining the structured data in electronic health records(EHRs) enables many clinical applications while the information in free-text clinical notes often remains untapped. Free-text notes are unstructured data harder to use in machine learning while structured diagnostic codes can be missing or even erroneous. To improve the quality of diagnostic codes, this work extracts structured diagnostic codes from the unstructured notes concerning cardiovascular diseases. Five old and new word embeddings were used to vectorize over 5 million progress notes from Stanford EHR and logistic regression was used to predict eight ICD-10 codes of common cardiovascular diseases. The models were interpreted by the important words in predictions and analyses of false positive cases. Trained on Stanford notes, the model transferability was tested in the prediction of corresponding ICD-9 codes of the MIMIC-III discharge summaries. The word embeddings and logistic regression showed good performance in the diagnostic code extraction with TF-IDF as the best word embedding model showing AU-ROC ranging from 0.9499 to 0.9915 and AUPRC ranging from 0.2956 to 0.8072. The models also showed transferability when tested on MIMIC-III data set with AUROC ranging from 0.7952 to 0.9790 and AUPRC ranging from 0.2353 to 0.8084. Model interpretability was showed by the important words with clinical meanings matching each disease. This study shows the feasibility to accurately extract structured diagnostic codes, impute missing codes and correct erroneous codes from free-text clinical notes with interpretable models for clinicians, which helps improve the data quality of diagnostic codes for information retrieval and downstream machine-learning applications.


2020 ◽  
Vol 3 (1) ◽  
Author(s):  
Beau Norgeot ◽  
Kathleen Muenzen ◽  
Thomas A. Peterson ◽  
Xuancheng Fan ◽  
Benjamin S. Glicksberg ◽  
...  

2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Michael Rutherford ◽  
Seong K. Mun ◽  
Betty Levine ◽  
William Bennett ◽  
Kirk Smith ◽  
...  

AbstractWe developed a DICOM dataset that can be used to evaluate the performance of de-identification algorithms. DICOM objects (a total of 1,693 CT, MRI, PET, and digital X-ray images) were selected from datasets published in the Cancer Imaging Archive (TCIA). Synthetic Protected Health Information (PHI) was generated and inserted into selected DICOM Attributes to mimic typical clinical imaging exams. The DICOM Standard and TCIA curation audit logs guided the insertion of synthetic PHI into standard and non-standard DICOM data elements. A TCIA curation team tested the utility of the evaluation dataset. With this publication, the evaluation dataset (containing synthetic PHI) and de-identified evaluation dataset (the result of TCIA curation) are released on TCIA in advance of a competition, sponsored by the National Cancer Institute (NCI), for algorithmic de-identification of medical image datasets. The competition will use a much larger evaluation dataset constructed in the same manner. This paper describes the creation of the evaluation datasets and guidelines for their use.


2010 ◽  
Vol 01 (01) ◽  
pp. 1-10 ◽  
Author(s):  
S. E. Ross ◽  
B. K. Mellis ◽  
B. L. Beaty ◽  
L. M. Schilling ◽  
A. J. Davidson ◽  
...  

SummaryObjective: Assess the interest in and preferences of ambulatory practitioners in HIE.Background: Health information exchange (HIE) may improve the quality and efficiency of care. Identifying the value proposition for smaller ambulatory practices may help those practices engage in HIE.Methods: Survey of primary care and specialist practitioners in the State of Colorado.Results: Clinical data were commonly (always [2%], often [29%] or sometimes [49%]) missing during clinic visits. Of 12 data types proposed as available through HIE, ten were considered “extremely useful” by most practitioners. “Clinical notes/consultation reports,” “diagnosis or problem lists,” and “hospital discharge summaries” were considered the three most useful data types. Interest in EKG reports, diagnosis/problem lists, childhood immunizations, and discharge summaries differed among ambulatory practitioner groups (primary care, obstetrics-gynecology, and internal medicine subspecialties).Conclusion: Practitioners express strong interest in most of the data types, but opinions differed by specialties on what types were most important. All providers felt that a system that provided all data types would be useful. These results support the potential benefit of HIE in ambulatory practices.


2020 ◽  
Vol 29 (01) ◽  
pp. 104-114
Author(s):  
Ursula H. Hübner ◽  
Nicole Egbert ◽  
Georg Schulte

Objective: The more people there are who use clinical information systems (CIS) beyond their traditional intramural confines, the more promising the benefits are, and the more daunting the risks will be. This review thus explores the areas of ethical debates prompted by CIS conceptualized as smart systems reaching out to patients and citizens. Furthermore, it investigates the ethical competencies and education needed to use these systems appropriately. Methods: A literature review covering ethics topics in combination with clinical and health information systems, clinical decision support, health information exchange, and various mobile devices and media was performed searching the MEDLINE database for articles from 2016 to 2019 with a focus on 2018 and 2019. A second search combined these keywords with education. Results: By far, most of the discourses were dominated by privacy, confidentiality, and informed consent issues. Intertwined with confidentiality and clear boundaries, the provider-patient relationship has gained much attention. The opacity of algorithms and the lack of explicability of the results pose a further challenge. The necessity of sociotechnical ethics education was underpinned in many studies including advocating education for providers and patients alike. However, only a few publications expanded on ethical competencies. In the publications found, empirical research designs were employed to capture the stakeholders’ attitudes, but not to evaluate specific implementations. Conclusion: Despite the broad discourses, ethical values have not yet found their firm place in empirically rigorous health technology evaluation studies. Similarly, sociotechnical ethics competencies obviously need detailed specifications. These two gaps set the stage for further research at the junction of clinical information systems and ethics.


2013 ◽  
Vol 20 (02) ◽  
pp. 308-312
Author(s):  
ABDUS SALAM, ◽  
SAIF-UD-DIN SAIF,

Background: An incompletely filled Radiology Request Form (RRF) is a common problem faced by both radiologists andradiographers. Objective: The study was carried out to objectively evaluate the adequacy of completion of radiology request forms in atertiary care centre. Indoor and outdoor patient departments of POF Hospital Wah Cantonment. Design: Descriptive, retrospective study.Setting: Radiology Department POF Hospital, Wah Cantonment. Period: 01 Jul 2009 to 01 September 2009. Methods: A total of 1500request forms received by the radiology department from 01Jul 2009 to 01Sep 2009 were reviewed. These included requests for a varietyof examinations from different departments within POF Hospital, Wah Cantonment. A database of the collected forms was created, notingwhich of the various fields were adequately completed. Results: Only 270 out of the 1500 forms were completed in full and 1230 were notcompletely filled. The only parameter fulfilled in all the forms was the presence of referring doctor’s signature. The commonest blankfields were as follows: patient location: 62%, clinical notes: 67.26%, doctor's name: 47.33% and date of referral: 14.2%. Conclusions:The inadequate transmission of clinical information observed in this study is typical example of the various problems that radiologistshave to face.


Author(s):  
G. Sridevi Devasena ◽  
S. Kanmani

<p>Wireless Body Area Networks (WBANs) are fundamental technology in health care that permits the information of a patient’s essential body parameters to be gathered by the sensors. However, the safety and concealment defense of the gathered information is a key uncertain problem. A Hybrid Key Management (HKM) scheme [13] is worked based on Public Key Cryptography (PKC)-authentication scheme. This scheme uses a oneway hash function to construct a Merkle Tree. The PKC method increase the computational complexity and lacking scalability. Additionally, it increases expensive computation, communication costs and delay. To overcome this problem, Robust Security for Protected Health Information by ECC with signature Hash Function in WBAN (RSP) is proposed. The system employs hash-chain based key signature technique to achieve efficient, secure transmission from sensor to user in WBAN. Moreover, Elliptical Curve Cryptography algorithm is used to verifies the authenticate sensor. In addition, it describes the experimental results of the proposed system demonstrate the efficient data communication in a network.</p>


2021 ◽  
Author(s):  
Yan Luo ◽  
Krystal Dozier ◽  
Carin Ikenberg

BACKGROUND An electronic personal health record (ePHR), also known as a personal health record (PHR), was broadly defined as an electronic application through which individuals can access, manage, and share their health information in a secure and confidential environment. Although ePHRs can benefit individuals as well as caregivers and healthcare providers, the use of ePHRs among individuals continues to remain low. The relationship between age and ePHRs use has been documented in previous studies, which indicated younger age was related to higher ePHRs use, and patients who are younger were more likely to use ePHRs. OBJECTIVE The current study aims to examine the relationship between human-technology interaction factors and ePHRs use among adults, and then compare the different effects of human-technology interaction factors on ePHRs use between younger adults (18-54 years old) and older adults (55 years of age and over). METHODS We analyzed the from the Health Information National Trends Survey (HINTS5, Cycle 3) collected from U.S. adults aged 18 years old and over in 2019. Descriptive analysis was conducted for all variables and each item of ePHRs use. Bivariate tests (Pearson test for categorical variable and F-test for continuous variables) were conducted over four age groups. Lastly, adjusting for socio-demographics and healthcare resources, a weighted multiple linear regression was conducted to examine the relationship between human-technology interaction factors and ePHRs use. RESULTS The final sample size was 1,363 and divided into two age groups: 18-54 years old and 55 years of age and older. The average level of ePHRs use was low (Mean=2.76, range=0-8). There is no significant difference on average ePHRs use between two age groups. Including clinical notes was positively related to ePHRs use in both groups: 18-54 years old (beta=0.28, P<0.01), 55 years old and above (beta=0.15, P<0.01). While accessing ePHRs using a smartphone app was only associated with ePHRs use among younger adults (beta=0.29, P<0.001), ease to understand health information in ePHRs was positively linked to ePHRs use only among older adults (beta=0.13, P<0.01). CONCLUSIONS This study found that including clinical notes was positively related to ePHRs use in both age groups, which suggested that including clinical notes as a part of ePHRs might improve the effective use of ePHRs among patients. Moreover, accessing ePHRs using a smartphone app was associated with higher ePHRs use among younger adults while ease of understanding health information in ePHRs was linked to higher ePHRs use among older adults. The design of ePHRs should provide the option of being accessible through mobile devices to promote greater ePHRs use among young people. For older adults, providers could add additional notes to explain health information recorded in the ePHRs.


Sign in / Sign up

Export Citation Format

Share Document