Estimating Missing Values in Multivariate-Time-Series Clinical Data using Gradient Boosting Tree on Temporal and Cross-variable Features

Author(s):  
Xiao Xu ◽  
Junmei Wang ◽  
Xian Xu ◽  
Yuyao Sun ◽  
Quanhe Chen ◽  
...  
2020 ◽  
Vol 34 (04) ◽  
pp. 5956-5963
Author(s):  
Xianfeng Tang ◽  
Huaxiu Yao ◽  
Yiwei Sun ◽  
Charu Aggarwal ◽  
Prasenjit Mitra ◽  
...  

Multivariate time series (MTS) forecasting is widely used in various domains, such as meteorology and traffic. Due to limitations on data collection, transmission, and storage, real-world MTS data usually contains missing values, making it infeasible to apply existing MTS forecasting models such as linear regression and recurrent neural networks. Though many efforts have been devoted to this problem, most of them solely rely on local dependencies for imputing missing values, which ignores global temporal dynamics. Local dependencies/patterns would become less useful when the missing ratio is high, or the data have consecutive missing values; while exploring global patterns can alleviate such problem. Thus, jointly modeling local and global temporal dynamics is very promising for MTS forecasting with missing values. However, work in this direction is rather limited. Therefore, we study a novel problem of MTS forecasting with missing values by jointly exploring local and global temporal dynamics. We propose a new framework øurs, which leverages memory network to explore global patterns given estimations from local perspectives. We further introduce adversarial training to enhance the modeling of global temporal distribution. Experimental results on real-world datasets show the effectiveness of øurs for MTS forecasting with missing values and its robustness under various missing ratios.


2020 ◽  
Vol 34 (01) ◽  
pp. 930-937
Author(s):  
Qingxiong Tan ◽  
Mang Ye ◽  
Baoyao Yang ◽  
Siqi Liu ◽  
Andy Jinhua Ma ◽  
...  

Due to the discrepancy of diseases and symptoms, patients usually visit hospitals irregularly and different physiological variables are examined at each visit, producing large amounts of irregular multivariate time series (IMTS) data with missing values and varying intervals. Existing methods process IMTS into regular data so that standard machine learning models can be employed. However, time intervals are usually determined by the status of patients, while missing values are caused by changes in symptoms. Therefore, we propose a novel end-to-end Dual-Attention Time-Aware Gated Recurrent Unit (DATA-GRU) for IMTS to predict the mortality risk of patients. In particular, DATA-GRU is able to: 1) preserve the informative varying intervals by introducing a time-aware structure to directly adjust the influence of the previous status in coordination with the elapsed time, and 2) tackle missing values by proposing a novel dual-attention structure to jointly consider data-quality and medical-knowledge. A novel unreliability-aware attention mechanism is designed to handle the diversity in the reliability of different data, while a new symptom-aware attention mechanism is proposed to extract medical reasons from original clinical records. Extensive experimental results on two real-world datasets demonstrate that DATA-GRU can significantly outperform state-of-the-art methods and provide meaningful clinical interpretation.


2018 ◽  
Vol 8 (1) ◽  
Author(s):  
Zhengping Che ◽  
Sanjay Purushotham ◽  
Kyunghyun Cho ◽  
David Sontag ◽  
Yan Liu

2021 ◽  
Vol 6 (1) ◽  
pp. 1-4
Author(s):  
Bo Yuan Chang ◽  
Mohamed A. Naiel ◽  
Steven Wardell ◽  
Stan Kleinikkink ◽  
John S. Zelek

Over the past years, researchers have proposed various methods to discover causal relationships among time-series data as well as algorithms to fill in missing entries in time-series data. Little to no work has been done in combining the two strategies for the purpose of learning causal relationships using unevenly sampled multivariate time-series data. In this paper, we examine how the causal parameters learnt from unevenly sampled data (with missing entries) deviates from the parameters learnt using the evenly sampled data (without missing entries). However, to obtain the causal relationship from a given time-series requires evenly sampled data, which suggests filling the missing data values before obtaining the causal parameters. Therefore, the proposed method is based on applying a Gaussian Process Regression (GPR) model for missing data recovery, followed by several pairwise Granger causality equations in Vector Autoregssive form to fit the recovered data and obtain the causal parameters. Experimental results show that the causal parameters generated by using GPR data filling offers much lower RMSE than the dummy model (fill with last seen entry) under all missing values percentage, suggesting that GPR data filling can better preserve the causal relationships when compared with dummy data filling, thus should be considered when dealing with unevenly sampled time-series causality learning.


Author(s):  
Yonghong Luo ◽  
Ying Zhang ◽  
Xiangrui Cai ◽  
Xiaojie Yuan

The missing values, appear in most of multivariate time series, prevent advanced analysis of multivariate time series data. Existing imputation approaches try to deal with missing values by deletion, statistical imputation, machine learning based imputation and generative imputation. However, these methods are either incapable of dealing with temporal information or multi-stage. This paper proposes an end-to-end generative model E²GAN to impute missing values in multivariate time series. With the help of the discriminative loss and the squared error loss, E²GAN can impute the incomplete time series by the nearest generated complete time series at one stage. Experiments on multiple real-world datasets show that our model outperforms the baselines on the imputation accuracy and achieves state-of-the-art classification/regression results on the downstream applications. Additionally, our method also gains better time efficiency than multi-stage method on the training of neural networks.


2018 ◽  
Vol 34 (2) ◽  
pp. 503-522
Author(s):  
Markus Fröhlich

Abstract Early estimates for Austrian short term indices were produced using multivariate time-series models. The article presents a simulation study with different models (vector error correction models, vector autoregressive models in levels – both with unadjusted and seasonally adjusted time-series) used for estimating total turnover, production, etc. In a preliminary step, before time-series were provided for nowcasting, the data had to undergo an editing process. In this case a time-series approach was selected for data-editing as well, because of the very specific structure of Austrian enterprises. For this task basically the seasonal adjustment program X13Arima-Seats was used for identifying and replacing outlying observations, imputation of missing values and generating univariate forecasts for every single time series.


GigaScience ◽  
2019 ◽  
Vol 8 (11) ◽  
Author(s):  
Johann de Jong ◽  
Mohammad Asif Emon ◽  
Ping Wu ◽  
Reagon Karki ◽  
Meemansa Sood ◽  
...  

Abstract Background Precision medicine requires a stratification of patients by disease presentation that is sufficiently informative to allow for selecting treatments on a per-patient basis. For many diseases, such as neurological disorders, this stratification problem translates into a complex problem of clustering multivariate and relatively short time series because (i) these diseases are multifactorial and not well described by single clinical outcome variables and (ii) disease progression needs to be monitored over time. Additionally, clinical data often additionally are hindered by the presence of many missing values, further complicating any clustering attempts. Findings The problem of clustering multivariate short time series with many missing values is generally not well addressed in the literature. In this work, we propose a deep learning–based method to address this issue, variational deep embedding with recurrence (VaDER). VaDER relies on a Gaussian mixture variational autoencoder framework, which is further extended to (i) model multivariate time series and (ii) directly deal with missing values. We validated VaDER by accurately recovering clusters from simulated and benchmark data with known ground truth clustering, while varying the degree of missingness. We then used VaDER to successfully stratify patients with Alzheimer disease and patients with Parkinson disease into subgroups characterized by clinically divergent disease progression profiles. Additional analyses demonstrated that these clinical differences reflected known underlying aspects of Alzheimer disease and Parkinson disease. Conclusions We believe our results show that VaDER can be of great value for future efforts in patient stratification, and multivariate time-series clustering in general.


1987 ◽  
Vol 44 (2) ◽  
pp. 408-421 ◽  
Author(s):  
Roy Mendelssohn ◽  
Philippe Cury

In this paper we analyze time series of catch per unit of effort (CPUE) from 1966 to 1982 of small pelagic species off the Ivory Coast using sea surface temperature (SST) collected by merchant ships. A fill-in model is used to estimate missing values of CPUE and SST in the areas in which the fishery operates. A multivariate time series model of the fortnightly data is able to explain 43% of the observed variance in CPUE from 1966 to 1982. A model estimated by using only the data from 1966 to 1980 produced reasonable forecasts of the fortnightly CPUE for 1981–82. A new approach for estimating optimal transformations of variables in the model is used to examine the form of the relationships between CPUE and its predictors. The biological interpretation of the estimated transformations is consistent with previous results on the dynamics of zooplankton in the same area.


2008 ◽  
Vol 8 (3) ◽  
pp. 12343-12370 ◽  
Author(s):  
A. Nebot ◽  
V. Mugica ◽  
A. Escobet

Abstract. MILAGRO project was conducted in Mexico City during March 2006 with the main objective of study the local and global impact of pollution generated by megacities. The research presented in this paper is framed in MILAGRO project and is focused on the study and development of modeling methodologies that allow the forecasting of daily ozone concentrations. The present work aims to develop Fuzzy Inductive Reasoning (FIR) models using the Visual-FIR platform. FIR offers a model-based approach to modeling and predicting either univariate or multivariate time series. Visual-FIR offers an easy-friendly environment to perform this task. In this research, long term prediction of maximum ozone concentration in the downtown of Mexico City Metropolitan Area is performed. The data were registered every hour and include missing values. Two modeling perspectives are analyzed, i.e. monthly and seasonal models. The results show that the developed models are able to predict the diurnal variation of ozone, including its maximum daily value in an accurate manner.


2019 ◽  
Vol 9 (15) ◽  
pp. 3041 ◽  
Author(s):  
Qianting Li ◽  
Yong Xu

Multivariate time series are often accompanied with missing values, especially in clinical time series, which usually contain more than 80% of missing data, and the missing rates between different variables vary widely. However, few studies address these missing rate differences and extract univariate missing patterns simultaneously before mixing them in the model training procedure. In this paper, we propose a novel recurrent neural network called variable sensitive GRU (VS-GRU), which utilizes the different missing rate of each variable as another input and learns the feature of different variables separately, reducing the harmful impact of variables with high missing rates. Experiments show that VS-GRU outperforms the state-of-the-art method in two real-world clinical datasets (MIMIC-III, PhysioNet).


Sign in / Sign up

Export Citation Format

Share Document