scholarly journals Medical Concept Embedding with Time-Aware Attention

Author(s):  
Xiangrui Cai ◽  
Jinyang Gao ◽  
Kee Yuan Ngiam ◽  
Beng Chin Ooi ◽  
Ying Zhang ◽  
...  

Embeddings of medical concepts such as medication, procedure and diagnosis codes in Electronic Medical Records (EMRs) are central to healthcare analytics. Previous work on medical concept embedding takes medical concepts and EMRs as words and documents respectively. Nevertheless, such models miss out the temporal nature of EMR data. On the one hand, two consecutive medical concepts do not indicate they are temporally close, but the correlations between them can be revealed by the time gap. On the other hand, the temporal scopes of medical concepts often vary greatly (e.g., common cold and diabetes). In this paper, we propose to incorporate the temporal information to embed medical codes. Based on the Continuous Bag-of-Words model, we employ the attention mechanism to learn a ``soft'' time-aware context window for each medical concept. Experiments on public and proprietary datasets through clustering and nearest neighbour search tasks demonstrate the effectiveness of our model, showing that it outperforms five state-of-the-art baselines.

Author(s):  
David Liebovitz

Electronic medical records provide potential benefits and also drawbacks. Potential benefits include increased patient safety and efficiency. Potential drawbacks include newly introduced errors and diminished workflow efficiency. In the patient safety context, medication errors account for significant patient harm. Electronic prescribing (e-prescribing) offers the promise of automated drug interaction and dosage verification. In addition, the process of enabling e-prescriptions also provides access to an often unrecognized benefit, that of viewing the dispensed medication history. This information is often critical to understanding patient symptoms. Obtaining significant value from electronic medical records requires use of standardized terminology for both targeted decision support and population-based management. Further, generating documentation for a billable encounter requires usage of proper codes. The emergence of International Classification of Diseases (ICD)-10 holds promise in facilitating identification of a more precise patient code while also presenting drawbacks given its complexity. This article will focus on elements of e-prescribing and use of structured chart content, including diagnosis codes as they relate to physician office practices.


10.2196/29120 ◽  
2021 ◽  
Vol 9 (11) ◽  
pp. e29120
Author(s):  
Bruna Stella Zanotto ◽  
Ana Paula Beck da Silva Etges ◽  
Avner dal Bosco ◽  
Eduardo Gabriel Cortes ◽  
Renata Ruschel ◽  
...  

Background With the rapid adoption of electronic medical records (EMRs), there is an ever-increasing opportunity to collect data and extract knowledge from EMRs to support patient-centered stroke management. Objective This study aims to compare the effectiveness of state-of-the-art automatic text classification methods in classifying data to support the prediction of clinical patient outcomes and the extraction of patient characteristics from EMRs. Methods Our study addressed the computational problems of information extraction and automatic text classification. We identified essential tasks to be considered in an ischemic stroke value-based program. The 30 selected tasks were classified (manually labeled by specialists) according to the following value agenda: tier 1 (achieved health care status), tier 2 (recovery process), care related (clinical management and risk scores), and baseline characteristics. The analyzed data set was retrospectively extracted from the EMRs of patients with stroke from a private Brazilian hospital between 2018 and 2019. A total of 44,206 sentences from free-text medical records in Portuguese were used to train and develop 10 supervised computational machine learning methods, including state-of-the-art neural and nonneural methods, along with ontological rules. As an experimental protocol, we used a 5-fold cross-validation procedure repeated 6 times, along with subject-wise sampling. A heatmap was used to display comparative result analyses according to the best algorithmic effectiveness (F1 score), supported by statistical significance tests. A feature importance analysis was conducted to provide insights into the results. Results The top-performing models were support vector machines trained with lexical and semantic textual features, showing the importance of dealing with noise in EMR textual representations. The support vector machine models produced statistically superior results in 71% (17/24) of tasks, with an F1 score >80% regarding care-related tasks (patient treatment location, fall risk, thrombolytic therapy, and pressure ulcer risk), the process of recovery (ability to feed orally or ambulate and communicate), health care status achieved (mortality), and baseline characteristics (diabetes, obesity, dyslipidemia, and smoking status). Neural methods were largely outperformed by more traditional nonneural methods, given the characteristics of the data set. Ontological rules were also effective in tasks such as baseline characteristics (alcoholism, atrial fibrillation, and coronary artery disease) and the Rankin scale. The complementarity in effectiveness among models suggests that a combination of models could enhance the results and cover more tasks in the future. Conclusions Advances in information technology capacity are essential for scalability and agility in measuring health status outcomes. This study allowed us to measure effectiveness and identify opportunities for automating the classification of outcomes of specific tasks related to clinical conditions of stroke victims, and thus ultimately assess the possibility of proactively using these machine learning techniques in real-world situations.


2018 ◽  
Vol 2018 ◽  
pp. 1-8
Author(s):  
Saroochi Agarwal ◽  
Duc T. Nguyen ◽  
Justin D. Lew ◽  
Brenda Campbell ◽  
Edward A. Graviss

Background. The QuantiFERON Gold In-Tube (QFT-G) assay is used to identify individuals with tuberculosis infection and gives quantitative and qualitative results including positive, negative, or indeterminate results (that cannot be interpreted clinically). Several factors, including immunosuppression and preanalytical factors, have been suggested to be significantly associated with indeterminate QFT-G results. An online education program was designed and implemented to reduce the rate of indeterminate QFT-G test results at Houston Methodist Hospital (HMH). Methods. Data from patients’ electronic medical records having indeterminate QFT-G results between 01/2015 and 05/2016 at HMH in Houston, TX, were administratively extracted for (1) medical unit where QFT-G phlebotomy was performed, (2) demographics, and (3) ICD-9/10 diagnosis codes. Unit nurses identified with high proportions of indeterminate QFT-G results were emailed a link to an online pretest educational program with a QFT-G blood collection and handling presentation, and a posttest assessment. Results. Of the 332 nurses emailed, 94 (28.4%) voluntarily completed both tests within the 6-month time allotted. The nurses that completed the education program had a significantly higher posteducation test score than on the pretest (70.2% versus 55.3%, p<0.001, effect size=0.82). Improved posttest score was seen in 67.0% of participants. No reduction in the proportion of indeterminate test results was seen overall at HMH in the 6 months after education. Conclusions. A targeted education program was able to successfully increase nurses’ knowledge of blood collection and handling procedures for the QFT-G test, but no association was found between the improvement of posttest score and indeterminate QFT-G test results.


2015 ◽  
Vol 58 ◽  
pp. S150-S157 ◽  
Author(s):  
Nai-Wen Chang ◽  
Hong-Jie Dai ◽  
Jitendra Jonnagaddala ◽  
Chih-Wei Chen ◽  
Richard Tzong-Han Tsai ◽  
...  

2012 ◽  
Vol 19 (5) ◽  
pp. 786-791 ◽  
Author(s):  
Ozlem Uzuner ◽  
Andreea Bodnari ◽  
Shuying Shen ◽  
Tyler Forbush ◽  
John Pestian ◽  
...  

2020 ◽  
Vol 38 (4) ◽  
pp. 725-744
Author(s):  
Xiaojuan Zhang ◽  
Xixi Jiang ◽  
Jiewen Qin

Purpose The purpose of this study is to generate diversified results for temporally ambiguous queries and the candidate queries are ensured to have a high coverage of subtopics, which are derived from different temporal periods. Design/methodology/approach Two novel time-aware query suggestion diversification models are developed by integrating semantics and temporality information involved in queries into two state-of-the-art explicit diversification algorithms (i.e. IA-select and xQuaD), respectively, and then specifying the components on which these two models rely on. Most importantly, first explored is how to explicitly determine query subtopics for each unique query from the query log or clicked documents and then modeling the subtopics into query suggestion diversification. The discussion on how to mine temporal intent behind a query from query log is also followed. Finally, to verify the effectiveness of the proposal, experiments on a real-world query log are conducted. Findings Preliminary experiments demonstrate that the proposed method can significantly outperform the existing state-of-the-art methods in terms of producing the candidate query suggestion for temporally ambiguous queries. Originality/value This study reports the first attempt to generate query suggestions indicating diverse interested time points to the temporally ambiguous (input) queries. The research will be useful in enhancing users’ search experience through helping them to formulate accurate queries for their search tasks. In addition, the approaches investigated in the paper are general enough to be used in many domains; that is, experimental information retrieval systems, Web search engines, document archives and digital libraries.


2021 ◽  
Author(s):  
Bruna Stella Zanotto ◽  
Ana Paula Beck da Silva Etges ◽  
Avner dal Bosco ◽  
Eduardo Gabriel Cortes ◽  
Renata Ruschel ◽  
...  

BACKGROUND With the rapid adoption of electronic medical records (EMRs), there is an ever-increasing opportunity to collect data and extract knowledge from EMRs to support patient-centered stroke management. OBJECTIVE This study aims to compare the effectiveness of state-of-the-art automatic text classification methods in classifying data to support the prediction of clinical patient outcomes and the extraction of patient characteristics from EMRs. METHODS Our study addressed the computational problems of information extraction and automatic text classification. We identified essential tasks to be considered in an ischemic stroke value-based program. The 30 selected tasks were classified (manually labeled by specialists) according to the following value agenda: tier 1 (achieved health care status), tier 2 (recovery process), care related (clinical management and risk scores), and baseline characteristics. The analyzed data set was retrospectively extracted from the EMRs of patients with stroke from a private Brazilian hospital between 2018 and 2019. A total of 44,206 sentences from free-text medical records in Portuguese were used to train and develop 10 supervised computational machine learning methods, including state-of-the-art neural and nonneural methods, along with ontological rules. As an experimental protocol, we used a 5-fold cross-validation procedure repeated 6 times, along with <i>subject-wise sampling</i>. A heatmap was used to display comparative result analyses according to the best algorithmic effectiveness (F1 score), supported by statistical significance tests. A feature importance analysis was conducted to provide insights into the results. RESULTS The top-performing models were support vector machines trained with lexical and semantic textual features, showing the importance of dealing with noise in EMR textual representations. The support vector machine models produced statistically superior results in 71% (17/24) of tasks, with an F1 score &gt;80% regarding care-related tasks (patient treatment location, fall risk, thrombolytic therapy, and pressure ulcer risk), the process of recovery (ability to feed orally or ambulate and communicate), health care status achieved (mortality), and baseline characteristics (diabetes, obesity, dyslipidemia, and smoking status). Neural methods were largely outperformed by more traditional nonneural methods, given the characteristics of the data set. Ontological rules were also effective in tasks such as baseline characteristics (alcoholism, atrial fibrillation, and coronary artery disease) and the Rankin scale. The complementarity in effectiveness among models suggests that a combination of models could enhance the results and cover more tasks in the future. CONCLUSIONS Advances in information technology capacity are essential for scalability and agility in measuring health status outcomes. This study allowed us to measure effectiveness and identify opportunities for automating the classification of outcomes of specific tasks related to clinical conditions of stroke victims, and thus ultimately assess the possibility of proactively using these machine learning techniques in real-world situations.


2014 ◽  
Author(s):  
C. McKenna ◽  
B. Gaines ◽  
C. Hatfield ◽  
S. Helman ◽  
L. Meyer ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document