scholarly journals Outlier-Robust Multi-Aspect Streaming Tensor Completion and Factorization

Author(s):  
Mehrnaz Najafi ◽  
Lifang He ◽  
Philip S. Yu

With the increasing popularity of streaming tensor data such as videos and audios, tensor factorization and completion have attracted much attention recently in this area. Existing work usually assume that streaming tensors only grow in one mode. However, in many real-world scenarios, tensors may grow in multiple modes (or dimensions), i.e., multi-aspect streaming tensors. Standard streaming methods cannot directly handle this type of data elegantly. Moreover, due to inevitable system errors, data may be contaminated by outliers, which cause significant deviations from real data values and make such research particularly challenging. In this paper, we propose a novel method for Outlier-Robust Multi-Aspect Streaming Tensor Completion and Factorization (OR-MSTC), which is a technique capable of dealing with missing values and outliers in multi-aspect streaming tensor data. The key idea is to decompose the tensor structure into an underlying low-rank clean tensor and a structured-sparse error (outlier) tensor, along with a weighting tensor to mask missing data. We also develop an efficient algorithm to solve the non-convex and non-smooth optimization problem of OR-MSTC. Experimental results on various real-world datasets show the superiority of the proposed method over the baselines and its robustness against outliers.


Author(s):  
Cheng Qian ◽  
Nikos Kargas ◽  
Cao Xiao ◽  
Lucas Glass ◽  
Nicholas Sidiropoulos ◽  
...  

Real-world spatio-temporal data is often incomplete or inaccurate due to various data loading delays. For example, a location-disease-time tensor of case counts can have multiple delayed updates of recent temporal slices for some locations or diseases. Recovering such missing or noisy (under-reported) elements of the input tensor can be viewed as a generalized tensor completion problem. Existing tensor completion methods usually assume that i) missing elements are randomly distributed and ii) noise for each tensor element is i.i.d. zero-mean. Both assumptions can be violated for spatio-temporal tensor data. We often observe multiple versions of the input tensor with different under-reporting noise levels. The amount of noise can be time- or location-dependent as more updates are progressively introduced to the tensor. We model such dynamic data as a multi-version tensor with an extra tensor mode capturing the data updates. We propose a low-rank tensor model to predict the updates over time. We demonstrate that our method can accurately predict the ground-truth values of many real-world tensors. We obtain up to 27.2% lower root mean-squared-error compared to the best baseline method. Finally, we extend our method to track the tensor data over time, leading to significant computational savings.



2019 ◽  
Vol 2019 (1) ◽  
pp. 26-46 ◽  
Author(s):  
Thee Chanyaswad ◽  
Changchang Liu ◽  
Prateek Mittal

Abstract A key challenge facing the design of differential privacy in the non-interactive setting is to maintain the utility of the released data. To overcome this challenge, we utilize the Diaconis-Freedman-Meckes (DFM) effect, which states that most projections of high-dimensional data are nearly Gaussian. Hence, we propose the RON-Gauss model that leverages the novel combination of dimensionality reduction via random orthonormal (RON) projection and the Gaussian generative model for synthesizing differentially-private data. We analyze how RON-Gauss benefits from the DFM effect, and present multiple algorithms for a range of machine learning applications, including both unsupervised and supervised learning. Furthermore, we rigorously prove that (a) our algorithms satisfy the strong ɛ-differential privacy guarantee, and (b) RON projection can lower the level of perturbation required for differential privacy. Finally, we illustrate the effectiveness of RON-Gauss under three common machine learning applications – clustering, classification, and regression – on three large real-world datasets. Our empirical results show that (a) RON-Gauss outperforms previous approaches by up to an order of magnitude, and (b) loss in utility compared to the non-private real data is small. Thus, RON-Gauss can serve as a key enabler for real-world deployment of privacy-preserving data release.



Author(s):  
Thelma Dede Baddoo ◽  
Zhijia Li ◽  
Samuel Nii Odai ◽  
Kenneth Rodolphe Chabi Boni ◽  
Isaac Kwesi Nooni ◽  
...  

Reconstructing missing streamflow data can be challenging when additional data are not available, and missing data imputation of real-world datasets to investigate how to ascertain the accuracy of imputation algorithms for these datasets are lacking. This study investigated the necessary complexity of missing data reconstruction schemes to obtain the relevant results for a real-world single station streamflow observation to facilitate its further use. This investigation was implemented by applying different missing data mechanisms spanning from univariate algorithms to multiple imputation methods accustomed to multivariate data taking time as an explicit variable. The performance accuracy of these schemes was assessed using the total error measurement (TEM) and a recommended localized error measurement (LEM) in this study. The results show that univariate missing value algorithms, which are specially developed to handle univariate time series, provide satisfactory results, but the ones which provide the best results are usually time and computationally intensive. Also, multiple imputation algorithms which consider the surrounding observed values and/or which can understand the characteristics of the data provide similar results to the univariate missing data algorithms and, in some cases, perform better without the added time and computational downsides when time is taken as an explicit variable. Furthermore, the LEM would be especially useful when the missing data are in specific portions of the dataset or where very large gaps of ‘missingness’ occur. Finally, proper handling of missing values of real-world hydroclimatic datasets depends on imputing and extensive study of the particular dataset to be imputed.



2020 ◽  
Vol 34 (04) ◽  
pp. 5956-5963
Author(s):  
Xianfeng Tang ◽  
Huaxiu Yao ◽  
Yiwei Sun ◽  
Charu Aggarwal ◽  
Prasenjit Mitra ◽  
...  

Multivariate time series (MTS) forecasting is widely used in various domains, such as meteorology and traffic. Due to limitations on data collection, transmission, and storage, real-world MTS data usually contains missing values, making it infeasible to apply existing MTS forecasting models such as linear regression and recurrent neural networks. Though many efforts have been devoted to this problem, most of them solely rely on local dependencies for imputing missing values, which ignores global temporal dynamics. Local dependencies/patterns would become less useful when the missing ratio is high, or the data have consecutive missing values; while exploring global patterns can alleviate such problem. Thus, jointly modeling local and global temporal dynamics is very promising for MTS forecasting with missing values. However, work in this direction is rather limited. Therefore, we study a novel problem of MTS forecasting with missing values by jointly exploring local and global temporal dynamics. We propose a new framework øurs, which leverages memory network to explore global patterns given estimations from local perspectives. We further introduce adversarial training to enhance the modeling of global temporal distribution. Experimental results on real-world datasets show the effectiveness of øurs for MTS forecasting with missing values and its robustness under various missing ratios.



Symmetry ◽  
2020 ◽  
Vol 12 (7) ◽  
pp. 1187
Author(s):  
Peitao Wang ◽  
Zhaoshui He ◽  
Jun Lu ◽  
Beihai Tan ◽  
YuLei Bai ◽  
...  

Symmetric nonnegative matrix factorization (SNMF) approximates a symmetric nonnegative matrix by the product of a nonnegative low-rank matrix and its transpose. SNMF has been successfully used in many real-world applications such as clustering. In this paper, we propose an accelerated variant of the multiplicative update (MU) algorithm of He et al. designed to solve the SNMF problem. The accelerated algorithm is derived by using the extrapolation scheme of Nesterov and a restart strategy. The extrapolation scheme plays a leading role in accelerating the MU algorithm of He et al. and the restart strategy ensures that the objective function of SNMF is monotonically decreasing. We apply the accelerated algorithm to clustering problems and symmetric nonnegative tensor factorization (SNTF). The experiment results on both synthetic and real-world data show that it is more than four times faster than the MU algorithm of He et al. and performs favorably compared to recent state-of-the-art algorithms.



Author(s):  
Gaode Chen ◽  
Xinghua Zhang ◽  
Yanyan Zhao ◽  
Cong Xue ◽  
Ji Xiang

Sequential recommendation systems alleviate the problem of information overload, and have attracted increasing attention in the literature. Most prior works usually obtain an overall representation based on the user’s behavior sequence, which can not sufficiently reflect the multiple interests of the user. To this end, we propose a novel method called PIMI to mitigate this issue. PIMI can model the user’s multi-interest representation effectively by considering both the periodicity and interactivity in the item sequence. Specifically, we design a periodicity-aware module to utilize the time interval information between user’s behaviors. Meanwhile, an ingenious graph is proposed to enhance the interactivity between items in user’s behavior sequence, which can capture both global and local item features. Finally, a multi-interest extraction module is applied to describe user’s multiple interests based on the obtained item representation. Extensive experiments on two real-world datasets Amazon and Taobao show that PIMI outperforms state-of-the-art methods consistently.



2018 ◽  
Vol 30 (11) ◽  
pp. 3095-3127 ◽  
Author(s):  
Kishan Wimalawarne ◽  
Makoto Yamada ◽  
Hiroshi Mamitsuka

We propose a set of convex low-rank inducing norms for coupled matrices and tensors (hereafter referred to as coupled tensors), in which information is shared between the matrices and tensors through common modes. More specifically, we first propose a mixture of the overlapped trace norm and the latent norms with the matrix trace norm, and then, propose a completion model regularized using these norms to impute coupled tensors. A key advantage of the proposed norms is that they are convex and can be used to find a globally optimal solution, whereas existing methods for coupled learning are nonconvex. We also analyze the excess risk bounds of the completion model regularized using our proposed norms and show that they can exploit the low-rankness of coupled tensors, leading to better bounds compared to those obtained using uncoupled norms. Through synthetic and real-data experiments, we show that the proposed completion model compares favorably with existing ones.



2021 ◽  
Author(s):  
Shiqiang Du ◽  
Qingjiang Xiao ◽  
Yuqing Shi ◽  
Rita Cucchiara ◽  
Yide Ma


Entropy ◽  
2019 ◽  
Vol 21 (7) ◽  
pp. 632
Author(s):  
Yunhui Fu ◽  
Shin Matsushima ◽  
Kenji Yamanishi

Non-negative tensor factorization (NTF) is a widely used multi-way analysis approach that factorizes a high-order non-negative data tensor into several non-negative factor matrices. In NTF, the non-negative rank has to be predetermined to specify the model and it greatly influences the factorized matrices. However, its value is conventionally determined by specialists’ insights or trial and error. This paper proposes a novel rank selection criterion for NTF on the basis of the minimum description length (MDL) principle. Our methodology is unique in that (1) we apply the MDL principle on tensor slices to overcome a problem caused by the imbalance between the number of elements in a data tensor and that in factor matrices, and (2) we employ the normalized maximum likelihood (NML) code-length for histogram densities. We employ synthetic and real data to empirically demonstrate that our method outperforms other criteria in terms of accuracies for estimating true ranks and for completing missing values. We further show that our method can produce ranks suitable for knowledge discovery.



Sign in / Sign up

Export Citation Format

Share Document