scholarly journals Learning Phenotypes and Dynamic Patient Representations via RNN Regularized Collective Non-Negative Tensor Factorization

Author(s):  
Kejing Yin ◽  
Dong Qian ◽  
William K. Cheung ◽  
Benjamin C. M. Fung ◽  
Jonathan Poon

Non-negative Tensor Factorization (NTF) has been shown effective to discover clinically relevant and interpretable phenotypes from Electronic Health Records (EHR). Existing NTF based computational phenotyping models aggregate data over the observation window, resulting in the learned phenotypes being mixtures of disease states appearing at different times. We argue that by separating the clinical events happening at different times in the input tensor, the temporal dynamics and the disease progression within the observation window could be modeled and the learned phenotypes will correspond to more specific disease states. Yet how to construct the tensor for data samples with different temporal lengths and properly capture the temporal relationship specific to each individual data sample remains an open challenge. In this paper, we propose a novel Collective Non-negative Tensor Factorization (CNTF) model where each patient is represented by a temporal tensor, and all of the temporal tensors are factorized collectively with the phenotype definitions being shared across all patients. The proposed CNTF model is also flexible to incorporate non-temporal data modality and RNN-based temporal regularization. We validate the proposed model using MIMIC-III dataset, and the empirical results show that the learned phenotypes are clinically interpretable. Moreover, the proposed CNTF model outperforms the state-of-the-art computational phenotyping models for the mortality prediction task.

2020 ◽  
Author(s):  
Tianran Zhang ◽  
Muhao Chen ◽  
Alex A. T. Bui

Electronic health records (EHRs) contain both ordered and unordered chronologies of clinical events that occur during a patient encounter. However, during data preprocessing steps, many predictive models impose a predefined order on unordered clinical events sets (e.g., alphabetical, natural order from the chart, etc.), which is potentially incompatible with the temporal nature of the sequence and predictive task. To address this issue, we proposeDPSS, which seeks to capture each patient's clinical event records as sequences of event sets. Foreach clinical event set, we assume that the predictive model should be invariant to the order of concurrent events and thus employ a novel permutation sampling mechanism. This paper evaluates the use of this permuted sampling method given different data-driven models for predicting a heart failure (HF) diagnosis in sub-sequent patient visits. Experimental results using the MIMIC-III dataset show that the permutation sampling mechanism offers improved discriminative power based on the area under the receiver operating curve (AUROC) and precision-recall curve (pr-AUC) metrics as HF diagnosis prediction becomes more robust to different data ordering schemes.


Author(s):  
Kejing Yin ◽  
William K. Cheung ◽  
Yang Liu ◽  
Benjamin C. M. Fung ◽  
Jonathan Poon

Non-negative tensor factorization has been shown effective for discovering phenotypes from the EHR data with minimal human supervision. In most cases, an interaction tensor of the elements in the EHR (e.g., diagnoses and medications) has to be first established before the factorization can be applied. Such correspondence information however is often missing. While different heuristics can be used to estimate the missing correspondence, any errors introduced will in turn cause inaccuracy for the subsequent phenotype discovery task. This is especially true for patients with multiple diseases diagnosed (e.g., under critical care). To alleviate this limitation, we propose the hidden interaction tensor factorization (HITF) where the diagnosis-medication correspondence and the underlying phenotypes are inferred simultaneously. We formulate it under a Poisson non-negative tensor factorization framework and learn the HITF model via maximum likelihood estimation. For performance evaluation, we applied HITF to the MIMIC III dataset. Our empirical results show that both the phenotypes and the correspondence inferred are clinically meaningful. In addition, the inferred HITF model outperforms a number of state-of-the-art methods for mortality prediction.


2006 ◽  
Vol 45 (03) ◽  
pp. 240-245 ◽  
Author(s):  
A. Shabo

Summary Objectives: This paper pursues the challenge of sustaining lifetime electronic health records (EHRs) based on a comprehensive socio-economic-medico-legal model. The notion of a lifetime EHR extends the emerging concept of a longitudinal and cross-institutional EHR and is invaluable information for increasing patient safety and quality of care. Methods: The challenge is how to compile and sustain a coherent EHR across the lifetime of an individual. Several existing and hypothetical models are described, analyzed and compared in an attempt to suggest a preferred approach. Results: The vision is that lifetime EHRs should be sustained by new players in the healthcare arena, who will function as independent health record banks (IHRBs). Multiple competing IHRBs would be established and regulated following preemptive legislation. They should be neither owned by healthcare providers nor by health insurer/payers or government agencies. The new legislation should also stipulate that the records located in these banks be considered the medico-legal copies of an individual’s records, and that healthcare providers no longer serve as the legal record keepers. Conclusions: The proposed model is not centered on any of the current players in the field; instead, it is focussed on the objective service of sustaining individual EHRs, much like financial banks maintain and manage financial assets. This revolutionary structure provides two main benefits: 1) Healthcare organizations will be able to cut the costs of long-term record keeping, and 2) healthcare providers will be able to provide better care based on the availability of a lifelong EHR of their new patients.


2018 ◽  
Vol 2018 ◽  
pp. 1-11 ◽  
Author(s):  
Xiangwen Liao ◽  
Lingying Zhang ◽  
Jingjing Wei ◽  
Dingda Yang ◽  
Guolong Chen

User influence is a very important factor for microblog user recommendation in mobile social network. However, most existing user influence analysis works ignore user’s temporal features and fail to filter the marketing users with low influence, which limits the performance of recommendation methods. In this paper, a Tensor Factorization based User Cluster (TFUC) model is proposed. We firstly identify latent influential users by neural network clustering. Then, we construct a features tensor according to latent influential user’s opinion, activity, and network centrality information. Furthermore, user influences are predicted by the latent factors resulting from the temporal restrained CP decomposition. Finally, we recommend microblog users considering both user influence and content similarity. Our experimental results show that the proposed model significantly improves recommendation performance. Meanwhile, the mean average precision of TFUC outperforms the baselines with 3.4% at least.


2021 ◽  
Vol 35 (1) ◽  
pp. 71-76
Author(s):  
Shaik Shabbeer ◽  
Edara Srinivasa Reddy

Artificial Intelligence (AI) has its roots in every area in the present scenario. Healthcare is one of the markets in which AI has greatly grown in recent years. The tremendous increase in health data generation and the substantial evolution of the robust data analysis tools have contributed to AI improvement in health care and research, leading to increased service efficiency. Health reporting is stored as Electronic Health Records (EHR), providing information on the patients sought temporarily. EHR data have different issues, such as heterogeneity, missing values, distortion, noise, time, etc. This study reflects the irregularity of appointment that refers to the irregular timing of the operations (patient visits). Congestive heart failure (CHF) is a grave clinical disorder caused by an insufficient blood supply in the bloodstream owing to a heart muscle dysfunction. Most people suffer from CHF which result in death or immediate recognition. A multi-layer perceptron (MLP) model was used to treat visit stage abnormalities. The studies on the Medical Knowledge Mart for Intensive Care-III (MIMIC-III) dataset and the findings obtained indicate that the lack of a visit stage affects the estimation of the clinical outcome. It has been demonstrated that the readmission and reduction of the prediction model for mortality conditions is beneficial. Compared with baseline models, the proposed model is successful.


Information ◽  
2020 ◽  
Vol 11 (11) ◽  
pp. 512
Author(s):  
William Connor Horne ◽  
Zina Ben Miled

Improved health care services can benefit from a more seamless exchange of medical information between patients and health care providers. This exchange is especially important considering the increasing trends in mobility, comorbidity and outbreaks. However, current Electronic Health Records (EHR) tend to be institution-centric, often leaving the medical information of the patient fragmented and more importantly inaccessible to the patient for sharing with other health providers in a timely manner. Nearly a decade ago, several client–server models for personal health records (PHR) were proposed. The aim of these previous PHRs was to address data fragmentation issues. However, these models were not widely adopted by patients. This paper discusses the need for a new PHR model that can enhance the patient experience by making medical services more accessible. The aims of the proposed model are to (1) help patients maintain a complete lifelong health record, (2) facilitate timely communication and data sharing with health care providers from multiple institutions and (3) promote integration with advanced third-party services (e.g., risk prediction for chronic diseases) that require access to the patient’s health data. The proposed model is based on a Peer-to-Peer (P2P) network as opposed to the client–server architecture of the previous PHR models. This architecture consists of a central index server that manages the network and acts as a mediator, a peer client for patients and providers that allows them to manage health records and connect to the network, and a service client that enables third-party providers to offer services to the patients. This distributed architecture is essential since it promotes ownership of the health record by the patient instead of the health care institution. Moreover, it allows the patient to subscribe to an extended range of personalized e-health services.


2020 ◽  
Vol 27 (8) ◽  
pp. 1244-1251
Author(s):  
Romain Bey ◽  
Romain Goussault ◽  
François Grolleau ◽  
Mehdi Benchoufi ◽  
Raphaël Porcher

Abstract Objective We introduce fold-stratified cross-validation, a validation methodology that is compatible with privacy-preserving federated learning and that prevents data leakage caused by duplicates of electronic health records (EHRs). Materials and Methods Fold-stratified cross-validation complements cross-validation with an initial stratification of EHRs in folds containing patients with similar characteristics, thus ensuring that duplicates of a record are jointly present either in training or in validation folds. Monte Carlo simulations are performed to investigate the properties of fold-stratified cross-validation in the case of a model data analysis using both synthetic data and MIMIC-III (Medical Information Mart for Intensive Care-III) medical records. Results In situations in which duplicated EHRs could induce overoptimistic estimations of accuracy, applying fold-stratified cross-validation prevented this bias, while not requiring full deduplication. However, a pessimistic bias might appear if the covariate used for the stratification was strongly associated with the outcome. Discussion Although fold-stratified cross-validation presents low computational overhead, to be efficient it requires the preliminary identification of a covariate that is both shared by duplicated records and weakly associated with the outcome. When available, the hash of a personal identifier or a patient’s date of birth provides such a covariate. On the contrary, pseudonymization interferes with fold-stratified cross-validation, as it may break the equality of the stratifying covariate among duplicates. Conclusion Fold-stratified cross-validation is an easy-to-implement methodology that prevents data leakage when a model is trained on distributed EHRs that contain duplicates, while preserving privacy.


BMC Medicine ◽  
2019 ◽  
Vol 17 (1) ◽  
Author(s):  
B. D. Nicholson ◽  
P. Aveyard ◽  
C. R. Bankhead ◽  
W. Hamilton ◽  
F. D. R. Hobbs ◽  
...  

Abstract Background Excess weight and unexpected weight loss are associated with multiple disease states and increased morbidity and mortality, but weight measurement is not routine in many primary care settings. The aim of this study was to characterise who has had their weight recorded in UK primary care, how frequently, by whom and in relation to which clinical events, symptoms and diagnoses. Methods A longitudinal analysis of UK primary care electronic health records (EHR) data from 2000 to 2017. Descriptive statistics were used to summarise weight recording in terms of patient sociodemographic characteristics, health professional encounters, clinical events, symptoms and diagnoses. Negative binomial regression was used to model the likelihood of having a weight record each year, and Cox regression to the likelihood of repeated weight recording. Results A total of 14,049,871 weight records were identified in the EHR of 4,918,746 patients during the study period, representing 26,998,591 person-years of observation. Around a third of patients had a weight record each year. Forty-nine percent of weight records were repeated within a year with an average time to a repeat weight record of 1.92 years. Weight records were most often taken by nursing staff (38–42%) and GPs (37–39%) as part of a routine clinical care, such as chronic disease reviews (16%), medication reviews (6–8%) and health checks (6–7%), or were associated with consultations for contraception (5–8%), respiratory disease (5%) and obesity (1%). Patient characteristics independently associated with an increased likelihood of weight recording were as follows: female sex, younger and older adults, non-drinkers, ex-smokers, low or high BMI, being more deprived, diagnosed with a greater number of comorbidities and consulting more frequently. The effect of policy-level incentives to record weight did not appear to be sustained after they were removed. Conclusion Weight recording is not a routine activity in UK primary care. It is recorded for around a third of patients each year and is repeated on average every 2 years for these patients. It is more common in females with higher BMI and in those with comorbidity. Incentive payments and their removal appear to be associated with increases and decreases in weight recording.


2017 ◽  
Vol 65 ◽  
pp. 105-119 ◽  
Author(s):  
Jing Zhao ◽  
Panagiotis Papapetrou ◽  
Lars Asker ◽  
Henrik Boström

Author(s):  
Yusuke Tanaka ◽  
Tomoharu Iwata ◽  
Takeshi Kurashima ◽  
Hiroyuki Toda ◽  
Naonori Ueda

Analyzing people flows is important for better navigation and location-based advertising. Since the location information of people is often aggregated for protecting privacy, it is not straightforward to estimate transition populations between locations from aggregated data. Here, aggregated data are incoming and outgoing people counts at each location; they do not contain tracking information of individuals. This paper proposes a probabilistic model for estimating unobserved transition populations between locations from only aggregated data. With the proposed model, temporal dynamics of people flows are assumed to be probabilistic diffusion processes over a network, where nodes are locations and edges are paths between locations. By maximizing the likelihood with flow conservation constraints that incorporate travel duration distributions between locations, our model can robustly estimate transition populations between locations. The statistically significant improvement of our model is demonstrated using real-world datasets of pedestrian data in exhibition halls, bike trip data and taxi trip data in New York City.


Sign in / Sign up

Export Citation Format

Share Document