Beyond Unfolding: Exact Recovery of Latent Convex Tensor Decomposition Under Reshuffling

Exact recovery of tensor decomposition (TD) methods is a desirable property in both unsupervised learning and scientific data analysis. The numerical defects of TD methods, however, limit their practical applications on real-world data. As an alternative, convex tensor decomposition (CTD) was proposed to alleviate these problems, but its exact-recovery property is not properly addressed so far. To this end, we focus on latent convex tensor decomposition (LCTD), a practically widely-used CTD model, and rigorously prove a sufficient condition for its exact-recovery property. Furthermore, we show that such property can be also achieved by a more general model than LCTD. In the new model, we generalize the classic tensor (un-)folding into reshuffling operation, a more flexible mapping to relocate the entries of the matrix into a tensor. Armed with the reshuffling operations and exact-recovery property, we explore a totally novel application for (generalized) LCTD, i.e., image steganography. Experimental results on synthetic data validate our theory, and results on image steganography show that our method outperforms the state-of-the-art methods.

Download Full-text

Randomized error removal for online spread estimation in data streaming

Proceedings of the VLDB Endowment ◽

10.14778/3447689.3447707 ◽

2021 ◽

Vol 14 (6) ◽

pp. 1040-1052

Author(s):

Haibo Wang ◽

Chaoyi Ma ◽

Olufemi O Odegbile ◽

Shigang Chen ◽

Jih-Kwon Peir

Keyword(s):

Data Stream ◽

State Of The Art ◽

The State ◽

High Rate ◽

Estimation Accuracy ◽

Data Streaming ◽

Real World Data ◽

Practical Applications ◽

Spread Estimation ◽

Error Removal

Measuring flow spread in real time from large, high-rate data streams has numerous practical applications, where a data stream is modeled as a sequence of data items from different flows and the spread of a flow is the number of distinct items in the flow. Past decades have witnessed tremendous performance improvement for single-flow spread estimation. However, when dealing with numerous flows in a data stream, it remains a significant challenge to measure per-flow spread accurately while reducing memory footprint. The goal of this paper is to introduce new multi-flow spread estimation designs that incur much smaller processing overhead and query overhead than the state of the art, yet achieves significant accuracy improvement in spread estimation. We formally analyze the performance of these new designs. We implement them in both hardware and software, and use real-world data traces to evaluate their performance in comparison with the state of the art. The experimental results show that our best sketch significantly improves over the best existing work in terms of estimation accuracy, data item processing throughput, and online query throughput.

Download Full-text

Sampling for Approximate Maximum Search in Factorized Tensor

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/334 ◽

2017 ◽

Author(s):

Zhi Lu ◽

Yang Hu ◽

Bing Zeng

Keyword(s):

Theoretical Analysis ◽

Collaborative Filtering ◽

Real World ◽

State Of The Art ◽

The Other ◽

Data Sets ◽

Real World Data ◽

Parafac Model ◽

The Matrix ◽

Special Case

Factorization models have been extensively used for recovering the missing entries of a matrix or tensor. However, directly computing all of the entries using the learned factorization models is prohibitive when the size of the matrix/tensor is large. On the other hand, in many applications, such as collaborative filtering, we are only interested in a few entries that are the largest among them. In this work, we propose a sampling-based approach for finding the top entries of a tensor which is decomposed by the CANDECOMP/PARAFAC model. We develop an algorithm to sample the entries with probabilities proportional to their values. We further extend it to make the sampling proportional to the $k$-th power of the values, amplifying the focus on the top ones. We provide theoretical analysis of the sampling algorithm and evaluate its performance on several real-world data sets. Experimental results indicate that the proposed approach is orders of magnitude faster than exhaustive computing. When applied to the special case of searching in a matrix, it also requires fewer samples than the other state-of-the-art method.

Download Full-text

Image Completion with Hybrid Interpolation in Tensor Representation

Applied Sciences ◽

10.3390/app10030797 ◽

2020 ◽

Vol 10 (3) ◽

pp. 797 ◽

Cited By ~ 2

Author(s):

Rafał Zdunek ◽

Tomasz Sadowski

Keyword(s):

Polynomial Interpolation ◽

State Of The Art ◽

Tensor Decomposition ◽

Good Alternative ◽

Low Rank ◽

Decomposition Algorithms ◽

Image Completion ◽

The Matrix ◽

Low Rank Approximations ◽

Incomplete Images

The issue of image completion has been developed considerably over the last two decades, and many computational strategies have been proposed to fill-in missing regions in an incomplete image. When the incomplete image contains many small-sized irregular missing areas, a good alternative seems to be the matrix or tensor decomposition algorithms that yield low-rank approximations. However, this approach uses heuristic rank adaptation techniques, especially for images with many details. To tackle the obstacles of low-rank completion methods, we propose to model the incomplete images with overlapping blocks of Tucker decomposition representations where the factor matrices are determined by a hybrid version of the Gaussian radial basis function and polynomial interpolation. The experiments, carried out for various image completion and resolution up-scaling problems, demonstrate that our approach considerably outperforms the baseline and state-of-the-art low-rank completion methods.

Download Full-text

ALCHEMI of B2-Ordered Fe50Al45Me5 alloys

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100165203 ◽

1996 ◽

Vol 54 ◽

pp. 548-549

Author(s):

Ian M. Anderson

Keyword(s):

Room Temperature ◽

Iron Aluminide ◽

Low Density ◽

Intermetallic Alloys ◽

Practical Applications ◽

Alloying Additions ◽

Site Distribution ◽

Good Corrosion Resistance ◽

Room Temperature Ductility ◽

The Matrix

B2-ordered iron aluminide intermetallic alloys exhibit a combination of attractive properties such as low density and good corrosion resistance. However, the practical applications of these alloys are limited by their poor fracture toughness and low room temperature ductility. One current strategy for overcoming these undesirable properties is to attempt to modify the basic chemistry of the materials with alloying additions. These changes in the chemistry of the material cannot be fully understood without a knowledge of the site-distribution of the alloying elements. In this paper, the site-distributions of a series of 3d-transition metal alloying additions in B2-ordered iron aluminides are studied with ALCHEMI.A series of seven alloys of stoichiometry Fe50AL45Me5, with Me = {Ti, V, Cr, Mn, Co, Ni, Cu}, were prepared with identical heating cycles. Microalloying additions of 0.2% B and 0.1% Zr were also incorporated to strengthen the grain boundaries, but these alloying additions have little influence on the matrix chemistry and are incidental to this study.

Download Full-text

TrustSVD: A Novel Trust-Based Matrix Factorization Model with User Trust and Item Ratings

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse.v7i11.422 ◽

2017 ◽

Vol 7 (11) ◽

pp. 7 ◽

Cited By ~ 1

Author(s):

K Sobha Rani

Keyword(s):

Matrix Factorization ◽

Social Trust ◽

State Of The Art ◽

Data Sets ◽

Real World Data ◽

Recommendation Algorithm ◽

Active User ◽

Factorization Model ◽

The Social ◽

Matrix Factorization Technique

Collaborative filtering suffers from the problems of data sparsity and cold start, which dramatically degrade recommendation performance. To help resolve these issues, we propose TrustSVD, a trust-based matrix factorization technique. By analyzing the social trust data from four real-world data sets, we conclude that not only the explicit but also the implicit influence of both ratings and trust should be taken into consideration in a recommendation model. Hence, we build on top of a state-of-the-art recommendation algorithm SVD++ which inherently involves the explicit and implicit influence of rated items, by further incorporating both the explicit and implicit influence of trusted users on the prediction of items for an active user. To our knowledge, the work reported is the first to extend SVD++ with social trust information. Experimental results on the four data sets demonstrate that our approach TrustSVD achieves better accuracy than other ten counterparts, and can better handle the concerned issues.

Download Full-text

G-Tric: generating three-way synthetic datasets with triclustering solutions

BMC Bioinformatics ◽

10.1186/s12859-020-03925-4 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

João Lobo ◽

Rui Henriques ◽

Sara C. Madeira

Keyword(s):

State Of The Art ◽

Synthetic Data ◽

Ground Truth ◽

Real Data ◽

Three Dimensions ◽

Additional Advantage ◽

Urban Dynamics ◽

Data Generator ◽

Real World Datasets ◽

Synthetic Datasets

Abstract Background Three-way data started to gain popularity due to their increasing capacity to describe inherently multivariate and temporal events, such as biological responses, social interactions along time, urban dynamics, or complex geophysical phenomena. Triclustering, subspace clustering of three-way data, enables the discovery of patterns corresponding to data subspaces (triclusters) with values correlated across the three dimensions (observations $$\times$$ × features $$\times$$ × contexts). With increasing number of algorithms being proposed, effectively comparing them with state-of-the-art algorithms is paramount. These comparisons are usually performed using real data, without a known ground-truth, thus limiting the assessments. In this context, we propose a synthetic data generator, G-Tric, allowing the creation of synthetic datasets with configurable properties and the possibility to plant triclusters. The generator is prepared to create datasets resembling real 3-way data from biomedical and social data domains, with the additional advantage of further providing the ground truth (triclustering solution) as output. Results G-Tric can replicate real-world datasets and create new ones that match researchers needs across several properties, including data type (numeric or symbolic), dimensions, and background distribution. Users can tune the patterns and structure that characterize the planted triclusters (subspaces) and how they interact (overlapping). Data quality can also be controlled, by defining the amount of missing, noise or errors. Furthermore, a benchmark of datasets resembling real data is made available, together with the corresponding triclustering solutions (planted triclusters) and generating parameters. Conclusions Triclustering evaluation using G-Tric provides the possibility to combine both intrinsic and extrinsic metrics to compare solutions that produce more reliable analyses. A set of predefined datasets, mimicking widely used three-way data and exploring crucial properties was generated and made available, highlighting G-Tric’s potential to advance triclustering state-of-the-art by easing the process of evaluating the quality of new triclustering approaches.

Download Full-text

A Unified View of Causal and Non-causal Feature Selection

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3436891 ◽

2021 ◽

Vol 15 (4) ◽

pp. 1-46

Author(s):

Kui Yu ◽

Lin Liu ◽

Jiuyong Li

Keyword(s):

Feature Selection ◽

Bayesian Network ◽

Synthetic Data ◽

Selection Methods ◽

Bayesian Network Model ◽

Real World Data ◽

Feature Sets ◽

Unified View ◽

Optimal Feature ◽

Different Levels

In this article, we aim to develop a unified view of causal and non-causal feature selection methods. The unified view will fill in the gap in the research of the relation between the two types of methods. Based on the Bayesian network framework and information theory, we first show that causal and non-causal feature selection methods share the same objective. That is to find the Markov blanket of a class attribute, the theoretically optimal feature set for classification. We then examine the assumptions made by causal and non-causal feature selection methods when searching for the optimal feature set, and unify the assumptions by mapping them to the restrictions on the structure of the Bayesian network model of the studied problem. We further analyze in detail how the structural assumptions lead to the different levels of approximations employed by the methods in their search, which then result in the approximations in the feature sets found by the methods with respect to the optimal feature set. With the unified view, we can interpret the output of non-causal methods from a causal perspective and derive the error bounds of both types of methods. Finally, we present practical understanding of the relation between causal and non-causal methods using extensive experiments with synthetic data and various types of real-world data.

Download Full-text

Learning emotional word embeddings for sentiment analysis

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-201993 ◽

2021 ◽

pp. 1-13

Author(s):

Qingtian Zeng ◽

Xishi Zhao ◽

Xiaohui Hu ◽

Hua Duan ◽

Zhongying Zhao ◽

...

Keyword(s):

Sentiment Analysis ◽

Language Processing ◽

State Of The Art ◽

Research Problem ◽

Emotional Word ◽

Classification Model ◽

Data Sets ◽

Word Embeddings ◽

Real World Data ◽

Text Documents

Word embeddings have been successfully applied in many natural language processing tasks due to its their effectiveness. However, the state-of-the-art algorithms for learning word representations from large amounts of text documents ignore emotional information, which is a significant research problem that must be addressed. To solve the above problem, we propose an emotional word embedding (EWE) model for sentiment analysis in this paper. This method first applies pre-trained word vectors to represent document features using two different linear weighting methods. Then, the resulting document vectors are input to a classification model and used to train a text sentiment classifier, which is based on a neural network. In this way, the emotional polarity of the text is propagated into the word vectors. The experimental results on three kinds of real-world data sets demonstrate that the proposed EWE model achieves superior performances on text sentiment prediction, text similarity calculation, and word emotional expression tasks compared to other state-of-the-art models.

Download Full-text

Enhanced thermal conductivity of polyimide composite film filled with hybrid fillers

High Performance Polymers ◽

10.1177/09540083211000393 ◽

2021 ◽

pp. 095400832110003

Author(s):

Ruiyi Li ◽

Chengcheng Ding ◽

Juan Yu ◽

Xiaodong Wang ◽

Pei Huang

Keyword(s):

Thermal Conductivity ◽

Composite Film ◽

Hexagonal Boron Nitride ◽

Composite Films ◽

Resistance Index ◽

Reference Value ◽

Residual Rate ◽

Practical Applications ◽

Hybrid Fillers ◽

The Matrix

In this article, the polyimide (PI) composite films with synergistically improving thermal conductivity were prepared by adding a few graphene nanoplatelets (GNP) and various hexagonal boron nitride (h-BN) contents into the PI matrix. The thermal conductivity of PI composite film with 1 wt% GNP and 30 wt% h-BN content was 1.21 W(m·k)− 1, which was higher than that of the PI composite film with 30 wt% h-BN content (0.45 W(m·k)− 1), the synergistic efficiency of GNP under various h-BN content (10 wt%, 20 wt%, and 30 wt%) were 1.70, 2.71, and 3.09, respectively. And it was found that the increased h-BN content can suppress the dielectric properties caused by GNP in the matrix. The dielectric permittivity and dielectric loss tangent of 1 wt% GNP/PI composite film were 10.69, 0.661 at 103 Hz, respectively, and that of the 30 wt% h-BN + GNP/PI composite film were 4.29 and 0.1367, respectively. Moreover, the mechanical properties of the PI composite film were suitable for practical applications. And the heat resistance index and the residual rate at 700°C of PI composite film increased to 326.8°C, 74.43%, respectively, and these of PI film were 292.6°C and 59.26%. Thus, it may provide a reference value for applying the filler hybridization/PI film in the electronic packaging materials.

Download Full-text

A Scalable Redefined Stochastic Blockmodel

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3442589 ◽

2021 ◽

Vol 15 (3) ◽

pp. 1-28

Author(s):

Xueyan Liu ◽

Bo Yang ◽

Hechang Chen ◽

Katarzyna Musial ◽

Hongxu Chen ◽

...

Keyword(s):

Large Scale ◽

Network Science ◽

Learning Algorithm ◽

State Of The Art ◽

Real World Data ◽

Computational Overhead ◽

Stochastic Blockmodel ◽

Np Hard Problem ◽

Large Scale Networks ◽

The Cost

Stochastic blockmodel (SBM) is a widely used statistical network representation model, with good interpretability, expressiveness, generalization, and flexibility, which has become prevalent and important in the field of network science over the last years. However, learning an optimal SBM for a given network is an NP-hard problem. This results in significant limitations when it comes to applications of SBMs in large-scale networks, because of the significant computational overhead of existing SBM models, as well as their learning methods. Reducing the cost of SBM learning and making it scalable for handling large-scale networks, while maintaining the good theoretical properties of SBM, remains an unresolved problem. In this work, we address this challenging task from a novel perspective of model redefinition. We propose a novel redefined SBM with Poisson distribution and its block-wise learning algorithm that can efficiently analyse large-scale networks. Extensive validation conducted on both artificial and real-world data shows that our proposed method significantly outperforms the state-of-the-art methods in terms of a reasonable trade-off between accuracy and scalability. 1

Download Full-text