Outlier-Robust Multi-Aspect Streaming Tensor Completion and Factorization

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/442 ◽

2019 ◽

Author(s):

Mehrnaz Najafi ◽

Lifang He ◽

Philip S. Yu

Keyword(s):

Real World ◽

Missing Values ◽

Real Data ◽

Low Rank ◽

Tensor Factorization ◽

Tensor Completion ◽

Smooth Optimization ◽

Novel Method ◽

Real World Datasets ◽

Tensor Data

With the increasing popularity of streaming tensor data such as videos and audios, tensor factorization and completion have attracted much attention recently in this area. Existing work usually assume that streaming tensors only grow in one mode. However, in many real-world scenarios, tensors may grow in multiple modes (or dimensions), i.e., multi-aspect streaming tensors. Standard streaming methods cannot directly handle this type of data elegantly. Moreover, due to inevitable system errors, data may be contaminated by outliers, which cause significant deviations from real data values and make such research particularly challenging. In this paper, we propose a novel method for Outlier-Robust Multi-Aspect Streaming Tensor Completion and Factorization (OR-MSTC), which is a technique capable of dealing with missing values and outliers in multi-aspect streaming tensor data. The key idea is to decompose the tensor structure into an underlying low-rank clean tensor and a structured-sparse error (outlier) tensor, along with a weighting tensor to mask missing data. We also develop an efficient algorithm to solve the non-convex and non-smooth optimization problem of OR-MSTC. Experimental results on various real-world datasets show the superiority of the proposed method over the baselines and its robustness against outliers.

Multi-version Tensor Completion for Time-delayed Spatio-temporal Data

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/400 ◽

2021 ◽

Author(s):

Cheng Qian ◽

Nikos Kargas ◽

Cao Xiao ◽

Lucas Glass ◽

Nicholas Sidiropoulos ◽

...

Keyword(s):

Real World ◽

Mean Squared Error ◽

Ground Truth ◽

Low Rank ◽

Tensor Model ◽

Temporal Data ◽

Tensor Completion ◽

Spatio Temporal ◽

Tensor Data ◽

Over Time

Real-world spatio-temporal data is often incomplete or inaccurate due to various data loading delays. For example, a location-disease-time tensor of case counts can have multiple delayed updates of recent temporal slices for some locations or diseases. Recovering such missing or noisy (under-reported) elements of the input tensor can be viewed as a generalized tensor completion problem. Existing tensor completion methods usually assume that i) missing elements are randomly distributed and ii) noise for each tensor element is i.i.d. zero-mean. Both assumptions can be violated for spatio-temporal tensor data. We often observe multiple versions of the input tensor with different under-reporting noise levels. The amount of noise can be time- or location-dependent as more updates are progressively introduced to the tensor. We model such dynamic data as a multi-version tensor with an extra tensor mode capturing the data updates. We propose a low-rank tensor model to predict the updates over time. We demonstrate that our method can accurately predict the ground-truth values of many real-world tensors. We obtain up to 27.2% lower root mean-squared-error compared to the best baseline method. Finally, we extend our method to track the tensor data over time, leading to significant computational savings.

RON-Gauss: Enhancing Utility in Non-Interactive Private Data Release

Proceedings on Privacy Enhancing Technologies ◽

10.2478/popets-2019-0003 ◽

2019 ◽

Vol 2019 (1) ◽

pp. 26-46 ◽

Cited By ~ 2

Author(s):

Thee Chanyaswad ◽

Changchang Liu ◽

Prateek Mittal

Keyword(s):

Machine Learning ◽

Real World ◽

Differential Privacy ◽

Real Data ◽

The Novel ◽

Private Data ◽

Data Release ◽

Machine Learning Applications ◽

Order Of Magnitude ◽

Real World Datasets

Abstract A key challenge facing the design of differential privacy in the non-interactive setting is to maintain the utility of the released data. To overcome this challenge, we utilize the Diaconis-Freedman-Meckes (DFM) effect, which states that most projections of high-dimensional data are nearly Gaussian. Hence, we propose the RON-Gauss model that leverages the novel combination of dimensionality reduction via random orthonormal (RON) projection and the Gaussian generative model for synthesizing differentially-private data. We analyze how RON-Gauss benefits from the DFM effect, and present multiple algorithms for a range of machine learning applications, including both unsupervised and supervised learning. Furthermore, we rigorously prove that (a) our algorithms satisfy the strong ɛ-differential privacy guarantee, and (b) RON projection can lower the level of perturbation required for differential privacy. Finally, we illustrate the effectiveness of RON-Gauss under three common machine learning applications – clustering, classification, and regression – on three large real-world datasets. Our empirical results show that (a) RON-Gauss outperforms previous approaches by up to an order of magnitude, and (b) loss in utility compared to the non-private real data is small. Thus, RON-Gauss can serve as a key enabler for real-world deployment of privacy-preserving data release.

Comparison of Missing Data Infilling Mechanisms for Recovering a Real-World Single Station Streamflow Observation

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18168375 ◽

2021 ◽

Vol 18 (16) ◽

pp. 8375

Author(s):

Thelma Dede Baddoo ◽

Zhijia Li ◽

Samuel Nii Odai ◽

Kenneth Rodolphe Chabi Boni ◽

Isaac Kwesi Nooni ◽

...

Keyword(s):

Missing Data ◽

Multiple Imputation ◽

Real World ◽

Missing Values ◽

Total Error ◽

Extensive Study ◽

Error Measurement ◽

Missing Data Imputation ◽

Single Station ◽

Real World Datasets

Reconstructing missing streamflow data can be challenging when additional data are not available, and missing data imputation of real-world datasets to investigate how to ascertain the accuracy of imputation algorithms for these datasets are lacking. This study investigated the necessary complexity of missing data reconstruction schemes to obtain the relevant results for a real-world single station streamflow observation to facilitate its further use. This investigation was implemented by applying different missing data mechanisms spanning from univariate algorithms to multiple imputation methods accustomed to multivariate data taking time as an explicit variable. The performance accuracy of these schemes was assessed using the total error measurement (TEM) and a recommended localized error measurement (LEM) in this study. The results show that univariate missing value algorithms, which are specially developed to handle univariate time series, provide satisfactory results, but the ones which provide the best results are usually time and computationally intensive. Also, multiple imputation algorithms which consider the surrounding observed values and/or which can understand the characteristics of the data provide similar results to the univariate missing data algorithms and, in some cases, perform better without the added time and computational downsides when time is taken as an explicit variable. Furthermore, the LEM would be especially useful when the missing data are in specific portions of the dataset or where very large gaps of ‘missingness’ occur. Finally, proper handling of missing values of real-world hydroclimatic datasets depends on imputing and extensive study of the particular dataset to be imputed.

Joint Modeling of Local and Global Temporal Dynamics for Multivariate Time Series Forecasting with Missing Values

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6056 ◽

2020 ◽

Vol 34 (04) ◽

pp. 5956-5963

Author(s):

Xianfeng Tang ◽

Huaxiu Yao ◽

Yiwei Sun ◽

Charu Aggarwal ◽

Prasenjit Mitra ◽

...

Keyword(s):

Time Series ◽

Real World ◽

Missing Values ◽

Temporal Dynamics ◽

Multivariate Time Series ◽

Temporal Distribution ◽

Joint Modeling ◽

Adversarial Training ◽

Global Patterns ◽

Real World Datasets

Multivariate time series (MTS) forecasting is widely used in various domains, such as meteorology and traffic. Due to limitations on data collection, transmission, and storage, real-world MTS data usually contains missing values, making it infeasible to apply existing MTS forecasting models such as linear regression and recurrent neural networks. Though many efforts have been devoted to this problem, most of them solely rely on local dependencies for imputing missing values, which ignores global temporal dynamics. Local dependencies/patterns would become less useful when the missing ratio is high, or the data have consecutive missing values; while exploring global patterns can alleviate such problem. Thus, jointly modeling local and global temporal dynamics is very promising for MTS forecasting with missing values. However, work in this direction is rather limited. Therefore, we study a novel problem of MTS forecasting with missing values by jointly exploring local and global temporal dynamics. We propose a new framework øurs, which leverages memory network to explore global patterns given estimations from local perspectives. We further introduce adversarial training to enhance the modeling of global temporal distribution. Experimental results on real-world datasets show the effectiveness of øurs for MTS forecasting with missing values and its robustness under various missing ratios.

An Accelerated Symmetric Nonnegative Matrix Factorization Algorithm Using Extrapolation

Symmetry ◽

10.3390/sym12071187 ◽

2020 ◽

Vol 12 (7) ◽

pp. 1187

Author(s):

Peitao Wang ◽

Zhaoshui He ◽

Jun Lu ◽

Beihai Tan ◽

YuLei Bai ◽

...

Keyword(s):

Real World ◽

Matrix Factorization ◽

Nonnegative Matrix Factorization ◽

Nonnegative Matrix ◽

Low Rank ◽

Tensor Factorization ◽

Real World Data ◽

Restart Strategy ◽

Extrapolation Scheme ◽

Symmetric Nonnegative Matrix Factorization

Symmetric nonnegative matrix factorization (SNMF) approximates a symmetric nonnegative matrix by the product of a nonnegative low-rank matrix and its transpose. SNMF has been successfully used in many real-world applications such as clustering. In this paper, we propose an accelerated variant of the multiplicative update (MU) algorithm of He et al. designed to solve the SNMF problem. The accelerated algorithm is derived by using the extrapolation scheme of Nesterov and a restart strategy. The extrapolation scheme plays a leading role in accelerating the MU algorithm of He et al. and the restart strategy ensures that the objective function of SNMF is monotonically decreasing. We apply the accelerated algorithm to clustering problems and symmetric nonnegative tensor factorization (SNTF). The experiment results on both synthetic and real-world data show that it is more than four times faster than the MU algorithm of He et al. and performs favorably compared to recent state-of-the-art algorithms.

Tensor Factorization with Total Variation and Tikhonov Regularization for Low-Rank Tensor Completion in Imaging Data

Journal of Mathematical Imaging and Vision ◽

10.1007/s10851-019-00933-9 ◽

2019 ◽

Vol 62 (6-7) ◽

pp. 900-918

Author(s):

Xue-Lei Lin ◽

Michael K. Ng ◽

Xi-Le Zhao

Keyword(s):

Total Variation ◽

Tikhonov Regularization ◽

Low Rank ◽

Tensor Factorization ◽

Tensor Completion ◽

Imaging Data ◽

Rank Tensor

Exploring Periodicity and Interactivity in Multi-Interest Framework for Sequential Recommendation

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/197 ◽

2021 ◽

Author(s):

Gaode Chen ◽

Xinghua Zhang ◽

Yanyan Zhao ◽

Cong Xue ◽

Ji Xiang

Keyword(s):

Real World ◽

Information Overload ◽

State Of The Art ◽

Recommendation Systems ◽

Time Interval ◽

Interest Representation ◽

Novel Method ◽

Real World Datasets ◽

Item Representation ◽

Global And Local

Sequential recommendation systems alleviate the problem of information overload, and have attracted increasing attention in the literature. Most prior works usually obtain an overall representation based on the user’s behavior sequence, which can not sufficiently reflect the multiple interests of the user. To this end, we propose a novel method called PIMI to mitigate this issue. PIMI can model the user’s multi-interest representation effectively by considering both the periodicity and interactivity in the item sequence. Specifically, we design a periodicity-aware module to utilize the time interval information between user’s behaviors. Meanwhile, an ingenious graph is proposed to enhance the interactivity between items in user’s behavior sequence, which can capture both global and local item features. Finally, a multi-interest extraction module is applied to describe user’s multiple interests based on the obtained item representation. Extensive experiments on two real-world datasets Amazon and Taobao show that PIMI outperforms state-of-the-art methods consistently.

Convex Coupled Matrix and Tensor Completion

Neural Computation ◽

10.1162/neco_a_01123 ◽

2018 ◽

Vol 30 (11) ◽

pp. 3095-3127 ◽

Cited By ~ 3

Author(s):

Kishan Wimalawarne ◽

Makoto Yamada ◽

Hiroshi Mamitsuka

Keyword(s):

Optimal Solution ◽

Real Data ◽

Excess Risk ◽

Low Rank ◽

Trace Norm ◽

Tensor Completion ◽

The Matrix ◽

Risk Bounds ◽

Matrix Trace

We propose a set of convex low-rank inducing norms for coupled matrices and tensors (hereafter referred to as coupled tensors), in which information is shared between the matrices and tensors through common modes. More specifically, we first propose a mixture of the overlapped trace norm and the latent norms with the matrix trace norm, and then, propose a completion model regularized using these norms to impute coupled tensors. A key advantage of the proposed norms is that they are convex and can be used to find a globally optimal solution, whereas existing methods for coupled learning are nonconvex. We also analyze the excess risk bounds of the completion model regularized using our proposed norms and show that they can exploit the low-rankness of coupled tensors, leading to better bounds compared to those obtained using uncoupled norms. Through synthetic and real-data experiments, we show that the proposed completion model compares favorably with existing ones.

Unifying Tensor Factorization and Tensor Nuclear Norm Approaches for Low-Rank Tensor Completion

Neurocomputing ◽

10.1016/j.neucom.2021.06.020 ◽

2021 ◽

Author(s):

Shiqiang Du ◽

Qingjiang Xiao ◽

Yuqing Shi ◽

Rita Cucchiara ◽

Yide Ma

Keyword(s):

Nuclear Norm ◽

Low Rank ◽

Tensor Factorization ◽

Tensor Completion ◽

Rank Tensor ◽

Tensor Nuclear Norm

Model Selection for Non-Negative Tensor Factorization with Minimum Description Length

Entropy ◽

10.3390/e21070632 ◽

2019 ◽

Vol 21 (7) ◽

pp. 632

Author(s):

Yunhui Fu ◽

Shin Matsushima ◽

Kenji Yamanishi

Keyword(s):

Missing Values ◽

Minimum Description Length ◽

Selection Criterion ◽

Real Data ◽

Tensor Factorization ◽

Trial And Error ◽

Negative Factor ◽

Mdl Principle ◽

Normalized Maximum Likelihood ◽

Selection For

Non-negative tensor factorization (NTF) is a widely used multi-way analysis approach that factorizes a high-order non-negative data tensor into several non-negative factor matrices. In NTF, the non-negative rank has to be predetermined to specify the model and it greatly influences the factorized matrices. However, its value is conventionally determined by specialists’ insights or trial and error. This paper proposes a novel rank selection criterion for NTF on the basis of the minimum description length (MDL) principle. Our methodology is unique in that (1) we apply the MDL principle on tensor slices to overcome a problem caused by the imbalance between the number of elements in a data tensor and that in factor matrices, and (2) we employ the normalized maximum likelihood (NML) code-length for histogram densities. We employ synthetic and real data to empirically demonstrate that our method outperforms other criteria in terms of accuracies for estimating true ranks and for completing missing values. We further show that our method can produce ranks suitable for knowledge discovery.