scholarly journals Beyond Unfolding: Exact Recovery of Latent Convex Tensor Decomposition Under Reshuffling

2020 ◽  
Vol 34 (04) ◽  
pp. 4602-4609
Author(s):  
Chao Li ◽  
Mohammad Emtiyaz Khan ◽  
Zhun Sun ◽  
Gang Niu ◽  
Bo Han ◽  
...  

Exact recovery of tensor decomposition (TD) methods is a desirable property in both unsupervised learning and scientific data analysis. The numerical defects of TD methods, however, limit their practical applications on real-world data. As an alternative, convex tensor decomposition (CTD) was proposed to alleviate these problems, but its exact-recovery property is not properly addressed so far. To this end, we focus on latent convex tensor decomposition (LCTD), a practically widely-used CTD model, and rigorously prove a sufficient condition for its exact-recovery property. Furthermore, we show that such property can be also achieved by a more general model than LCTD. In the new model, we generalize the classic tensor (un-)folding into reshuffling operation, a more flexible mapping to relocate the entries of the matrix into a tensor. Armed with the reshuffling operations and exact-recovery property, we explore a totally novel application for (generalized) LCTD, i.e., image steganography. Experimental results on synthetic data validate our theory, and results on image steganography show that our method outperforms the state-of-the-art methods.

2021 ◽  
Vol 14 (6) ◽  
pp. 1040-1052
Author(s):  
Haibo Wang ◽  
Chaoyi Ma ◽  
Olufemi O Odegbile ◽  
Shigang Chen ◽  
Jih-Kwon Peir

Measuring flow spread in real time from large, high-rate data streams has numerous practical applications, where a data stream is modeled as a sequence of data items from different flows and the spread of a flow is the number of distinct items in the flow. Past decades have witnessed tremendous performance improvement for single-flow spread estimation. However, when dealing with numerous flows in a data stream, it remains a significant challenge to measure per-flow spread accurately while reducing memory footprint. The goal of this paper is to introduce new multi-flow spread estimation designs that incur much smaller processing overhead and query overhead than the state of the art, yet achieves significant accuracy improvement in spread estimation. We formally analyze the performance of these new designs. We implement them in both hardware and software, and use real-world data traces to evaluate their performance in comparison with the state of the art. The experimental results show that our best sketch significantly improves over the best existing work in terms of estimation accuracy, data item processing throughput, and online query throughput.


Author(s):  
Zhi Lu ◽  
Yang Hu ◽  
Bing Zeng

Factorization models have been extensively used for recovering the missing entries of a matrix or tensor. However, directly computing all of the entries using the learned factorization models is prohibitive when the size of the matrix/tensor is large. On the other hand, in many applications, such as collaborative filtering, we are only interested in a few entries that are the largest among them. In this work, we propose a sampling-based approach for finding the top entries of a tensor which is decomposed by the CANDECOMP/PARAFAC model. We develop an algorithm to sample the entries with probabilities proportional to their values. We further extend it to make the sampling proportional to the $k$-th power of the values, amplifying the focus on the top ones. We provide theoretical analysis of the sampling algorithm and evaluate its performance on several real-world data sets. Experimental results indicate that the proposed approach is orders of magnitude faster than exhaustive computing. When applied to the special case of searching in a matrix, it also requires fewer samples than the other state-of-the-art method.


2020 ◽  
Vol 10 (3) ◽  
pp. 797 ◽  
Author(s):  
Rafał Zdunek ◽  
Tomasz Sadowski

The issue of image completion has been developed considerably over the last two decades, and many computational strategies have been proposed to fill-in missing regions in an incomplete image. When the incomplete image contains many small-sized irregular missing areas, a good alternative seems to be the matrix or tensor decomposition algorithms that yield low-rank approximations. However, this approach uses heuristic rank adaptation techniques, especially for images with many details. To tackle the obstacles of low-rank completion methods, we propose to model the incomplete images with overlapping blocks of Tucker decomposition representations where the factor matrices are determined by a hybrid version of the Gaussian radial basis function and polynomial interpolation. The experiments, carried out for various image completion and resolution up-scaling problems, demonstrate that our approach considerably outperforms the baseline and state-of-the-art low-rank completion methods.


Author(s):  
Ian M. Anderson

B2-ordered iron aluminide intermetallic alloys exhibit a combination of attractive properties such as low density and good corrosion resistance. However, the practical applications of these alloys are limited by their poor fracture toughness and low room temperature ductility. One current strategy for overcoming these undesirable properties is to attempt to modify the basic chemistry of the materials with alloying additions. These changes in the chemistry of the material cannot be fully understood without a knowledge of the site-distribution of the alloying elements. In this paper, the site-distributions of a series of 3d-transition metal alloying additions in B2-ordered iron aluminides are studied with ALCHEMI.A series of seven alloys of stoichiometry Fe50AL45Me5, with Me = {Ti, V, Cr, Mn, Co, Ni, Cu}, were prepared with identical heating cycles. Microalloying additions of 0.2% B and 0.1% Zr were also incorporated to strengthen the grain boundaries, but these alloying additions have little influence on the matrix chemistry and are incidental to this study.


Author(s):  
K Sobha Rani

Collaborative filtering suffers from the problems of data sparsity and cold start, which dramatically degrade recommendation performance. To help resolve these issues, we propose TrustSVD, a trust-based matrix factorization technique. By analyzing the social trust data from four real-world data sets, we conclude that not only the explicit but also the implicit influence of both ratings and trust should be taken into consideration in a recommendation model. Hence, we build on top of a state-of-the-art recommendation algorithm SVD++ which inherently involves the explicit and implicit influence of rated items, by further incorporating both the explicit and implicit influence of trusted users on the prediction of items for an active user. To our knowledge, the work reported is the first to extend SVD++ with social trust information. Experimental results on the four data sets demonstrate that our approach TrustSVD achieves better accuracy than other ten counterparts, and can better handle the concerned issues.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
João Lobo ◽  
Rui Henriques ◽  
Sara C. Madeira

Abstract Background Three-way data started to gain popularity due to their increasing capacity to describe inherently multivariate and temporal events, such as biological responses, social interactions along time, urban dynamics, or complex geophysical phenomena. Triclustering, subspace clustering of three-way data, enables the discovery of patterns corresponding to data subspaces (triclusters) with values correlated across the three dimensions (observations $$\times$$ × features $$\times$$ × contexts). With increasing number of algorithms being proposed, effectively comparing them with state-of-the-art algorithms is paramount. These comparisons are usually performed using real data, without a known ground-truth, thus limiting the assessments. In this context, we propose a synthetic data generator, G-Tric, allowing the creation of synthetic datasets with configurable properties and the possibility to plant triclusters. The generator is prepared to create datasets resembling real 3-way data from biomedical and social data domains, with the additional advantage of further providing the ground truth (triclustering solution) as output. Results G-Tric can replicate real-world datasets and create new ones that match researchers needs across several properties, including data type (numeric or symbolic), dimensions, and background distribution. Users can tune the patterns and structure that characterize the planted triclusters (subspaces) and how they interact (overlapping). Data quality can also be controlled, by defining the amount of missing, noise or errors. Furthermore, a benchmark of datasets resembling real data is made available, together with the corresponding triclustering solutions (planted triclusters) and generating parameters. Conclusions Triclustering evaluation using G-Tric provides the possibility to combine both intrinsic and extrinsic metrics to compare solutions that produce more reliable analyses. A set of predefined datasets, mimicking widely used three-way data and exploring crucial properties was generated and made available, highlighting G-Tric’s potential to advance triclustering state-of-the-art by easing the process of evaluating the quality of new triclustering approaches.


2021 ◽  
Vol 15 (4) ◽  
pp. 1-46
Author(s):  
Kui Yu ◽  
Lin Liu ◽  
Jiuyong Li

In this article, we aim to develop a unified view of causal and non-causal feature selection methods. The unified view will fill in the gap in the research of the relation between the two types of methods. Based on the Bayesian network framework and information theory, we first show that causal and non-causal feature selection methods share the same objective. That is to find the Markov blanket of a class attribute, the theoretically optimal feature set for classification. We then examine the assumptions made by causal and non-causal feature selection methods when searching for the optimal feature set, and unify the assumptions by mapping them to the restrictions on the structure of the Bayesian network model of the studied problem. We further analyze in detail how the structural assumptions lead to the different levels of approximations employed by the methods in their search, which then result in the approximations in the feature sets found by the methods with respect to the optimal feature set. With the unified view, we can interpret the output of non-causal methods from a causal perspective and derive the error bounds of both types of methods. Finally, we present practical understanding of the relation between causal and non-causal methods using extensive experiments with synthetic data and various types of real-world data.


2021 ◽  
pp. 1-13
Author(s):  
Qingtian Zeng ◽  
Xishi Zhao ◽  
Xiaohui Hu ◽  
Hua Duan ◽  
Zhongying Zhao ◽  
...  

Word embeddings have been successfully applied in many natural language processing tasks due to its their effectiveness. However, the state-of-the-art algorithms for learning word representations from large amounts of text documents ignore emotional information, which is a significant research problem that must be addressed. To solve the above problem, we propose an emotional word embedding (EWE) model for sentiment analysis in this paper. This method first applies pre-trained word vectors to represent document features using two different linear weighting methods. Then, the resulting document vectors are input to a classification model and used to train a text sentiment classifier, which is based on a neural network. In this way, the emotional polarity of the text is propagated into the word vectors. The experimental results on three kinds of real-world data sets demonstrate that the proposed EWE model achieves superior performances on text sentiment prediction, text similarity calculation, and word emotional expression tasks compared to other state-of-the-art models.


2021 ◽  
pp. 095400832110003
Author(s):  
Ruiyi Li ◽  
Chengcheng Ding ◽  
Juan Yu ◽  
Xiaodong Wang ◽  
Pei Huang

In this article, the polyimide (PI) composite films with synergistically improving thermal conductivity were prepared by adding a few graphene nanoplatelets (GNP) and various hexagonal boron nitride (h-BN) contents into the PI matrix. The thermal conductivity of PI composite film with 1 wt% GNP and 30 wt% h-BN content was 1.21 W(m·k)− 1, which was higher than that of the PI composite film with 30 wt% h-BN content (0.45 W(m·k)− 1), the synergistic efficiency of GNP under various h-BN content (10 wt%, 20 wt%, and 30 wt%) were 1.70, 2.71, and 3.09, respectively. And it was found that the increased h-BN content can suppress the dielectric properties caused by GNP in the matrix. The dielectric permittivity and dielectric loss tangent of 1 wt% GNP/PI composite film were 10.69, 0.661 at 103 Hz, respectively, and that of the 30 wt% h-BN + GNP/PI composite film were 4.29 and 0.1367, respectively. Moreover, the mechanical properties of the PI composite film were suitable for practical applications. And the heat resistance index and the residual rate at 700°C of PI composite film increased to 326.8°C, 74.43%, respectively, and these of PI film were 292.6°C and 59.26%. Thus, it may provide a reference value for applying the filler hybridization/PI film in the electronic packaging materials.


2021 ◽  
Vol 15 (3) ◽  
pp. 1-28
Author(s):  
Xueyan Liu ◽  
Bo Yang ◽  
Hechang Chen ◽  
Katarzyna Musial ◽  
Hongxu Chen ◽  
...  

Stochastic blockmodel (SBM) is a widely used statistical network representation model, with good interpretability, expressiveness, generalization, and flexibility, which has become prevalent and important in the field of network science over the last years. However, learning an optimal SBM for a given network is an NP-hard problem. This results in significant limitations when it comes to applications of SBMs in large-scale networks, because of the significant computational overhead of existing SBM models, as well as their learning methods. Reducing the cost of SBM learning and making it scalable for handling large-scale networks, while maintaining the good theoretical properties of SBM, remains an unresolved problem. In this work, we address this challenging task from a novel perspective of model redefinition. We propose a novel redefined SBM with Poisson distribution and its block-wise learning algorithm that can efficiently analyse large-scale networks. Extensive validation conducted on both artificial and real-world data shows that our proposed method significantly outperforms the state-of-the-art methods in terms of a reasonable trade-off between accuracy and scalability. 1


Sign in / Sign up

Export Citation Format

Share Document