Deep Interest-Shifting Network with Meta-Embeddings for Fresh Item Recommendation

Nowadays, people have an increasing interest in fresh products such as new shoes and cosmetics. To this end, an E-commerce platform Taobao launched a fresh-item hub page on the recommender system, with which customers can freely and exclusively explore and purchase fresh items, namely, the New Tendency page. In this work, we make a first attempt to tackle the fresh-item recommendation task with two major challenges. First, a fresh-item recommendation scenario usually faces the challenge that the training data are highly deficient due to low page views. In this paper, we propose a deep interest-shifting network (DisNet), which transfers knowledge from a huge number of auxiliary data and then shifts user interests with contextual information. Furthermore, three interpretable interest-shifting operators are introduced. Second, since the items are fresh, many of them have never been exposed to users, leading to a severe cold-start problem. Though this problem can be alleviated by knowledge transfer, we further babysit these fully cold-start items by a relational meta-Id-embedding generator (RM-IdEG). Specifically, it trains the item id embeddings in a learning-to-learn manner and integrates relational information for better embedding performance. We conducted comprehensive experiments on both synthetic datasets as well as a real-world dataset. Both DisNet and RM-IdEG significantly outperform state-of-the-art approaches, respectively. Empirical results clearly verify the effectiveness of the proposed techniques, which are arguably promising and scalable in real-world applications.

Download Full-text

Internal and Contextual Attention Network for Cold-start Multi-channel Matching in Recommendation

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/379 ◽

2020 ◽

Author(s):

Ruobing Xie ◽

Zhijie Qiu ◽

Jun Rao ◽

Yi Liu ◽

Bo Zhang ◽

...

Keyword(s):

Real World ◽

Large Scale ◽

Recommendation System ◽

Contextual Information ◽

Recommendation Systems ◽

Cold Start ◽

Personalized Recommendation ◽

Small Subset ◽

Multiple Channels ◽

Attention Network

Real-world integrated personalized recommendation systems usually deal with millions of heterogeneous items. It is extremely challenging to conduct full corpus retrieval with complicated models due to the tremendous computation costs. Hence, most large-scale recommendation systems consist of two modules: a multi-channel matching module to efficiently retrieve a small subset of candidates, and a ranking module for precise personalized recommendation. However, multi-channel matching usually suffers from cold-start problems when adding new channels or new data sources. To solve this issue, we propose a novel Internal and contextual attention network (ICAN), which highlights channel-specific contextual information and feature field interactions between multiple channels. In experiments, we conduct both offline and online evaluations with case studies on a real-world integrated recommendation system. The significant improvements confirm the effectiveness and robustness of ICAN, especially for cold-start channels. Currently, ICAN has been deployed on WeChat Top Stories used by millions of users. The source code can be obtained from https://github.com/zhijieqiu/ICAN.

Download Full-text

Multi-Scale Selective Feedback Network with Dual Loss for Real Image Denoising

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/101 ◽

2021 ◽

Author(s):

Xiaowan Hu ◽

Yuanhao Cai ◽

Zhihong Liu ◽

Haoqian Wang ◽

Yulun Zhang

Keyword(s):

Image Denoising ◽

Real World ◽

Contextual Information ◽

Noise Removal ◽

Feedback Mechanism ◽

Iterative Refinement ◽

Training Data ◽

Multi Scale ◽

Feedback Network ◽

Deep Layers

The feedback mechanism in the human visual system extracts high-level semantics from noisy scenes. It then guides low-level noise removal, which has not been fully explored in image denoising networks based on deep learning. The commonly used fully-supervised network optimizes parameters through paired training data. However, unpaired images without noise-free labels are ubiquitous in the real world. Therefore, we proposed a multi-scale selective feedback network (MSFN) with the dual loss. We allow shallow layers to access valuable contextual information from the following deep layers selectively between two adjacent time steps. Iterative refinement mechanism can remove complex noise from coarse to fine. The dual regression is designed to reconstruct noisy images to establish closed-loop supervision that is training-friendly for unpaired data. We use the dual loss to optimize the primary clean-to-noisy task and the dual noisy-to-clean task simultaneously. Extensive experiments prove that our method achieves state-of-the-art results and shows better adaptability on real-world images than the existing methods.

Download Full-text

Information-Theoretic Generalization Bounds for Meta-Learning and Applications

Entropy ◽

10.3390/e23010126 ◽

2021 ◽

Vol 23 (1) ◽

pp. 126

Author(s):

Sharu Theresa Jose ◽

Osvaldo Simeone

Keyword(s):

Learning Algorithm ◽

Broad Class ◽

Performance Measure ◽

Training Data ◽

Learning To Learn ◽

Data Set ◽

Information Theoretic ◽

Meta Learning ◽

Task Training ◽

Test Sets

Meta-learning, or “learning to learn”, refers to techniques that infer an inductive bias from data corresponding to multiple related tasks with the goal of improving the sample efficiency for new, previously unobserved, tasks. A key performance measure for meta-learning is the meta-generalization gap, that is, the difference between the average loss measured on the meta-training data and on a new, randomly selected task. This paper presents novel information-theoretic upper bounds on the meta-generalization gap. Two broad classes of meta-learning algorithms are considered that use either separate within-task training and test sets, like model agnostic meta-learning (MAML), or joint within-task training and test sets, like reptile. Extending the existing work for conventional learning, an upper bound on the meta-generalization gap is derived for the former class that depends on the mutual information (MI) between the output of the meta-learning algorithm and its input meta-training data. For the latter, the derived bound includes an additional MI between the output of the per-task learning procedure and corresponding data set to capture within-task uncertainty. Tighter bounds are then developed for the two classes via novel individual task MI (ITMI) bounds. Applications of the derived bounds are finally discussed, including a broad class of noisy iterative algorithms for meta-learning.

Download Full-text

Knowledge Transfer with Weighted Adversarial Network for Cold-Start Store Site Recommendation

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3442203 ◽

2021 ◽

Vol 15 (3) ◽

pp. 1-27

Author(s):

Yan Liu ◽

Bin Guo ◽

Daqing Zhang ◽

Djamal Zeghlache ◽

Jingmin Chen ◽

...

Keyword(s):

Large Scale ◽

Cold Start ◽

Weighting Scheme ◽

Training Data ◽

Chain Store ◽

Useful Knowledge ◽

Adversarial Network ◽

Brick And Mortar ◽

New City ◽

Learning Machine

Store site recommendation aims to predict the value of the store at candidate locations and then recommend the optimal location to the company for placing a new brick-and-mortar store. Most existing studies focus on learning machine learning or deep learning models based on large-scale training data of existing chain stores in the same city. However, the expansion of chain enterprises in new cities suffers from data scarcity issues, and these models do not work in the new city where no chain store has been placed (i.e., cold-start problem). In this article, we propose a unified approach for cold-start store site recommendation, Weighted Adversarial Network with Transferability weighting scheme (WANT), to transfer knowledge learned from a data-rich source city to a target city with no labeled data. In particular, to promote positive transfer, we develop a discriminator to diminish distribution discrepancy between source city and target city with different data distributions, which plays the minimax game with the feature extractor to learn transferable representations across cities by adversarial learning. In addition, to further reduce the risk of negative transfer, we design a transferability weighting scheme to quantify the transferability of examples in source city and reweight the contribution of relevant source examples to transfer useful knowledge. We validate WANT using a real-world dataset, and experimental results demonstrate the effectiveness of our proposed model over several state-of-the-art baseline models.

Download Full-text

Structured (De)composable Representations Trained with Neural Networks

Computers ◽

10.3390/computers9040079 ◽

2020 ◽

Vol 9 (4) ◽

pp. 79

Author(s):

Graham Spinks ◽

Marie-Francine Moens

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Contextual Information ◽

Learning To Learn ◽

Class Label ◽

Impact Performance ◽

Novel Technique ◽

Language Data ◽

Concept Classes ◽

Generic Representation

This paper proposes a novel technique for representing templates and instances of concept classes. A template representation refers to the generic representation that captures the characteristics of an entire class. The proposed technique uses end-to-end deep learning to learn structured and composable representations from input images and discrete labels. The obtained representations are based on distance estimates between the distributions given by the class label and those given by contextual information, which are modeled as environments. We prove that the representations have a clear structure allowing decomposing the representation into factors that represent classes and environments. We evaluate our novel technique on classification and retrieval tasks involving different modalities (visual and language data). In various experiments, we show how the representations can be compressed and how different hyperparameters impact performance.

Download Full-text

Improving recommender systems’ performance on cold-start users and controversial items by a new similarity model

International Journal of Web Information Systems ◽

10.1108/ijwis-07-2015-0024 ◽

2016 ◽

Vol 12 (2) ◽

pp. 126-149 ◽

Cited By ~ 4

Author(s):

Masoud Mansoury ◽

Mehdi Shajari

Keyword(s):

Real World ◽

Design Methodology ◽

Cold Start ◽

Selection Function ◽

Data Sets ◽

Real World Data ◽

Content Type ◽

User Similarity ◽

Active User ◽

Similarity Model

Purpose This paper aims to improve the recommendations performance for cold-start users and controversial items. Collaborative filtering (CF) generates recommendations on the basis of similarity between users. It uses the opinions of similar users to generate the recommendation for an active user. As a similarity model or a neighbor selection function is the key element for effectiveness of CF, many variations of CF are proposed. However, these methods are not very effective, especially for users who provide few ratings (i.e. cold-start users). Design/methodology/approach A new user similarity model is proposed that focuses on improving recommendations performance for cold-start users and controversial items. To show the validity of the authors’ similarity model, they conducted some experiments and showed the effectiveness of this model in calculating similarity values between users even when only few ratings are available. In addition, the authors applied their user similarity model to a recommender system and analyzed its results. Findings Experiments on two real-world data sets are implemented and compared with some other CF techniques. The results show that the authors’ approach outperforms previous CF techniques in coverage metric while preserves accuracy for cold-start users and controversial items. Originality/value In the proposed approach, the conditions in which CF is unable to generate accurate recommendations are addressed. These conditions affect CF performance adversely, especially in the cold-start users’ condition. The authors show that their similarity model overcomes CF weaknesses effectively and improve its performance even in the cold users’ condition.

Download Full-text

Attention Enhanced Serial Unet++ Network for Removing Unevenly Distributed Haze

Electronics ◽

10.3390/electronics10222868 ◽

2021 ◽

Vol 10 (22) ◽

pp. 2868

Author(s):

Wenxuan Zhao ◽

Yaqin Zhao ◽

Liqi Feng ◽

Jiaxi Tang

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Real World ◽

Large Scale ◽

Learning Strategy ◽

Contextual Information ◽

Small Scale ◽

Image Dehazing ◽

Atmospheric Scattering ◽

Real World Datasets

The purpose of image dehazing is the reduction of the image degradation caused by suspended particles for supporting high-level visual tasks. Besides the atmospheric scattering model, convolutional neural network (CNN) has been used for image dehazing. However, the existing image dehazing algorithms are limited in face of unevenly distributed haze and dense haze in real-world scenes. In this paper, we propose a novel end-to-end convolutional neural network called attention enhanced serial Unet++ dehazing network (AESUnet) for single image dehazing. We attempt to build a serial Unet++ structure that adopts a serial strategy of two pruned Unet++ blocks based on residual connection. Compared with the simple Encoder–Decoder structure, the serial Unet++ module can better use the features extracted by encoders and promote contextual information fusion in different resolutions. In addition, we take some improvement measures to the Unet++ module, such as pruning, introducing the convolutional module with ResNet structure, and a residual learning strategy. Thus, the serial Unet++ module can generate more realistic images with less color distortion. Furthermore, following the serial Unet++ blocks, an attention mechanism is introduced to pay different attention to haze regions with different concentrations by learning weights in the spatial domain and channel domain. Experiments are conducted on two representative datasets: the large-scale synthetic dataset RESIDE and the small-scale real-world datasets I-HAZY and O-HAZY. The experimental results show that the proposed dehazing network is not only comparable to state-of-the-art methods for the RESIDE synthetic datasets, but also surpasses them by a very large margin for the I-HAZY and O-HAZY real-world dataset.

Download Full-text

Holdout-Based Empirical Assessment of Mixed-Type Synthetic Data

Frontiers in Big Data ◽

10.3389/fdata.2021.679939 ◽

2021 ◽

Vol 4 ◽

Author(s):

Michael Platzer ◽

Thomas Reutterer

Keyword(s):

Mixed Type ◽

Synthetic Data ◽

Training Data ◽

Privacy Risk ◽

Individual Level ◽

Empirical Assessment ◽

Model Free ◽

Private Data ◽

Synthetic Datasets ◽

The Individual

AI-based data synthesis has seen rapid progress over the last several years and is increasingly recognized for its promise to enable privacy-respecting high-fidelity data sharing. This is reflected by the growing availability of both commercial and open-sourced software solutions for synthesizing private data. However, despite these recent advances, adequately evaluating the quality of generated synthetic datasets is still an open challenge. We aim to close this gap and introduce a novel holdout-based empirical assessment framework for quantifying the fidelity as well as the privacy risk of synthetic data solutions for mixed-type tabular data. Measuring fidelity is based on statistical distances of lower-dimensional marginal distributions, which provide a model-free and easy-to-communicate empirical metric for the representativeness of a synthetic dataset. Privacy risk is assessed by calculating the individual-level distances to closest record with respect to the training data. By showing that the synthetic samples are just as close to the training as to the holdout data, we yield strong evidence that the synthesizer indeed learned to generalize patterns and is independent of individual training records. We empirically demonstrate the presented framework for seven distinct synthetic data solutions across four mixed-type datasets and compare these then to traditional data perturbation techniques. Both a Python-based implementation of the proposed metrics and the demonstration study setup is made available open-source. The results highlight the need to systematically assess the fidelity just as well as the privacy of these emerging class of synthetic data generators.

Download Full-text

Zero-Shot Feature Selection via Transferring Supervised Knowledge

International Journal of Data Warehousing and Mining ◽

10.4018/ijdwm.2021040101 ◽

2021 ◽

Vol 17 (2) ◽

pp. 1-20

Author(s):

Zheng Wang ◽

Qiao Wang ◽

Tingzhang Zhao ◽

Chaokun Wang ◽

Xiaojun Ye

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Dimensionality Reduction ◽

Real World ◽

Rapid Growth ◽

Learning Systems ◽

Training Data ◽

Effective Technique ◽

Supervised Methods ◽

Real World Datasets

Feature selection, an effective technique for dimensionality reduction, plays an important role in many machine learning systems. Supervised knowledge can significantly improve the performance. However, faced with the rapid growth of newly emerging concepts, existing supervised methods might easily suffer from the scarcity and validity of labeled data for training. In this paper, the authors study the problem of zero-shot feature selection (i.e., building a feature selection model that generalizes well to “unseen” concepts with limited training data of “seen” concepts). Specifically, they adopt class-semantic descriptions (i.e., attributes) as supervision for feature selection, so as to utilize the supervised knowledge transferred from the seen concepts. For more reliable discriminative features, they further propose the center-characteristic loss which encourages the selected features to capture the central characteristics of seen concepts. Extensive experiments conducted on various real-world datasets demonstrate the effectiveness of the method.

Download Full-text

Glean

Proceedings of the VLDB Endowment ◽

10.14778/3447689.3447703 ◽

2021 ◽

Vol 14 (6) ◽

pp. 997-1005

Author(s):

Sandeep Tata ◽

Navneet Potti ◽

James B. Wendt ◽

Lauro Beltrão Costa ◽

Marc Najork ◽

...

Keyword(s):

Machine Learning ◽

Data Management ◽

Real World ◽

Empirical Studies ◽

Ground Truth ◽

Training Data ◽

Ground Truth Data ◽

Document Type ◽

Machine Learning Model ◽

Structured Information

Extracting structured information from templatic documents is an important problem with the potential to automate many real-world business workflows such as payment, procurement, and payroll. The core challenge is that such documents can be laid out in virtually infinitely different ways. A good solution to this problem is one that generalizes well not only to known templates such as invoices from a known vendor, but also to unseen ones. We developed a system called Glean to tackle this problem. Given a target schema for a document type and some labeled documents of that type, Glean uses machine learning to automatically extract structured information from other documents of that type. In this paper, we describe the overall architecture of Glean, and discuss three key data management challenges : 1) managing the quality of ground truth data, 2) generating training data for the machine learning model using labeled documents, and 3) building tools that help a developer rapidly build and improve a model for a given document type. Through empirical studies on a real-world dataset, we show that these data management techniques allow us to train a model that is over 5 F1 points better than the exact same model architecture without the techniques we describe. We argue that for such information-extraction problems, designing abstractions that carefully manage the training data is at least as important as choosing a good model architecture.

Download Full-text