Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks

Factorization Machines (FMs) are a supervised learning approach that enhances the linear regression model by incorporating the second-order feature interactions. Despite effectiveness, FM can be hindered by its modelling of all feature interactions with the same weight, as not all feature interactions are equally useful and predictive. For example, the interactions with useless features may even introduce noises and adversely degrade the performance. In this work, we improve FM by discriminating the importance of different feature interactions. We propose a novel model named Attentional Factorization Machine (AFM), which learns the importance of each feature interaction from data via a neural attention network. Extensive experiments on two real-world datasets demonstrate the effectiveness of AFM. Empirically, it is shown on regression task AFM betters FM with a 8.6% relative improvement, and consistently outperforms the state-of-the-art deep learning methods Wide&Deep [Cheng et al., 2016] and DeepCross [Shan et al., 2016] with a much simpler structure and fewer model parameters. Our implementation of AFM is publicly available at: https://github.com/hexiangnan/attentional_factorization_machine

Download Full-text

Interaction-Aware Factorization Machines for Recommender Systems

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33013804 ◽

2019 ◽

Vol 33 ◽

pp. 3804-3811 ◽

Cited By ~ 2

Author(s):

Fuxing Hong ◽

Dongbo Huang ◽

Ge Chen

Keyword(s):

Neural Network ◽

Deep Learning ◽

Interaction Effect ◽

State Of The Art ◽

Feature Interaction ◽

Learning Approach ◽

Field Interaction ◽

Feature Interactions ◽

Factorization Machine ◽

Novel Model

Factorization Machine (FM) is a widely used supervised learning approach by effectively modeling of feature interactions. Despite the successful application of FM and its many deep learning variants, treating every feature interaction fairly may degrade the performance. For example, the interactions of a useless feature may introduce noises; the importance of a feature may also differ when interacting with different features. In this work, we propose a novel model named Interaction-aware Factorization Machine (IFM) by introducing Interaction-Aware Mechanism (IAM), which comprises the feature aspect and the field aspect, to learn flexible interactions on two levels. The feature aspect learns feature interaction importance via an attention network while the field aspect learns the feature interaction effect as a parametric similarity of the feature interaction vector and the corresponding field interaction prototype. IFM introduces more structured control and learns feature interaction importance in a stratified manner, which allows for more leverage in tweaking the interactions on both feature-wise and field-wise levels. Besides, we give a more generalized architecture and propose Interaction-aware Neural Network (INN) and DeepIFM to capture higher-order interactions. To further improve both the performance and efficiency of IFM, a sampling scheme is developed to select interactions based on the field aspect importance. The experimental results from two well-known datasets show the superiority of the proposed models over the state-of-the-art methods.

Download Full-text

Learning Feature Interactions with Lorentzian Factorization Machine

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6119 ◽

2020 ◽

Vol 34 (04) ◽

pp. 6470-6477

Author(s):

Canran Xu ◽

Ming Wu

Keyword(s):

Deep Learning ◽

Hyperbolic Space ◽

Recommendation System ◽

Triangle Inequality ◽

State Of The Art ◽

Learning Methods ◽

New Model ◽

User Behaviors ◽

Feature Interactions ◽

Factorization Machine

Learning representations for feature interactions to model user behaviors is critical for recommendation system and click-trough rate (CTR) predictions. Recent advances in this area are empowered by deep learning methods which could learn sophisticated feature interactions and achieve the state-of-the-art result in an end-to-end manner. These approaches require large number of training parameters integrated with the low-level representations, and thus are memory and computational inefficient. In this paper, we propose a new model named “LorentzFM” that can learn feature interactions embedded in a hyperbolic space in which the violation of triangle inequality for Lorentz distances is available. To this end, the learned representation is benefited by the peculiar geometric properties of hyperbolic triangles, and result in a significant reduction in the number of parameters (20% to 80%) because all the top deep learning layers are not required. With such a lightweight architecture, LorentzFM achieves comparable and even materially better results than the deep learning methods such as DeepFM, xDeepFM and Deep & Cross in both recommendation and CTR prediction tasks.

Download Full-text

An Input-aware Factorization Machine for Sparse Prediction

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/203 ◽

2019 ◽

Cited By ~ 2

Author(s):

Yantao Yu ◽

Zhen Wang ◽

Bo Yuan

Keyword(s):

Neural Network ◽

Deep Learning ◽

Real World ◽

State Of The Art ◽

Overall Performance ◽

Factorization Machine ◽

The Impact ◽

Novel Model ◽

Individual Input ◽

Better Than

Factorization machines (FMs) are a class of general predictors working effectively with sparse data, which represents features using factorized parameters and weights. However, the accuracy of FMs can be adversely affected by the fixed representation trained for each feature, as the same feature is usually not equally predictive and useful in different instances. In fact, the inaccurate representation of features may even introduce noise and degrade the overall performance. In this work, we improve FMs by explicitly considering the impact of individual input upon the representation of features. We propose a novel model named \textit{Input-aware Factorization Machine} (IFM), which learns a unique input-aware factor for the same feature in different instances via a neural network. Comprehensive experiments on three real-world recommendation datasets are used to demonstrate the effectiveness and mechanism of IFM. Empirical results indicate that IFM is significantly better than the standard FM model and consistently outperforms four state-of-the-art deep learning based methods.

Download Full-text

Discrete Factorization Machines for Fast Feature-based Recommendation

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/479 ◽

2018 ◽

Cited By ~ 7

Author(s):

Han Liu ◽

Xiangnan He ◽

Fuli Feng ◽

Liqiang Nie ◽

Rui Liu ◽

...

Keyword(s):

Mobile Applications ◽

State Of The Art ◽

Side Information ◽

Computational Cost ◽

Model Parameters ◽

Item Score ◽

Efficient Storage ◽

Feature Based ◽

Factorization Machine ◽

Real World Datasets

User and item features of side information are crucial for accurate recommendation. However, the large number of feature dimensions, e.g., usually larger than 107, results in expensive storage and computational cost. This prohibits fast recommendation especially on mobile applications where the computational resource is very limited. In this paper, we develop a generic feature-based recommendation model, called Discrete Factorization Machine (DFM), for fast and accurate recommendation. DFM binarizes the real-valued model parameters (e.g., float32) of every feature embedding into binary codes (e.g., boolean), and thus supports efficient storage and fast user-item score computation. To avoid the severe quantization loss of the binarization, we propose a convergent updating rule that resolves the challenging discrete optimization of DFM. Through extensive experiments on two real-world datasets, we show that 1) DFM consistently outperforms state-of-the-art binarized recommendation models, and 2) DFM shows very competitive performance compared to its real-valued version (FM), demonstrating the minimized quantization loss.

Download Full-text

Deep Field-Aware Interaction Machine for Click-Through Rate Prediction

Mobile Information Systems ◽

10.1155/2021/5575249 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Gaofeng Qi ◽

Ping Li

Keyword(s):

Field Experiments ◽

State Of The Art ◽

Feature Interaction ◽

Structure Form ◽

Feature Interactions ◽

Great Performance ◽

Field Information ◽

Factorization Machine ◽

Click Through Rate ◽

Modeling Feature

Modeling feature interactions is of crucial importance to predict click-through rate (CTR) in industrial recommender systems. Because of great performance and efficiency, the factorization machine (FM) has been a popular approach to learn feature interaction. Recently, several variants of FM are proposed to improve its performance, and they have proven the field information to play an important role. However, feature-length in a field is usually small; we observe that when there are multiple nonzero features within a field, the interaction between fields is not enough to represent the feature interaction between different fields due to the problem of short feature-length. In this work, we propose a novel neural CTR model named DeepFIM by introducing Field-aware Interaction Machine (FIM), which provides a layered structure form to describe intrafield and interfield feature interaction, to solve the short-expression problem caused by the short feature-length in the field. Experiments show that our model achieves comparable and even materially better results than the state-of-the-art methods.

Download Full-text

A Dual Input-aware Factorization Machine for CTR Prediction

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/434 ◽

2020 ◽

Author(s):

Wantong Lu ◽

Yantao Yu ◽

Yongzhe Chang ◽

Zhen Wang ◽

Chenhui Li ◽

...

Keyword(s):

Real World ◽

Predictive Power ◽

State Of The Art ◽

Model Parameters ◽

Feature Vectors ◽

Feature Representations ◽

Factorization Machine ◽

Click Through Rate ◽

Novel Model ◽

Original Feature

Factorization Machines (FMs) refer to a class of general predictors working with real valued feature vectors, which are well-known for their ability to estimate model parameters under significant sparsity and have found successful applications in many areas such as the click-through rate (CTR) prediction. However, standard FMs only produce a single fixed representation for each feature across different input instances, which may limit the CTR model’s expressive and predictive power. Inspired by the success of Input-aware Factorization Machines (IFMs), which aim to learn more flexible and informative representations of a given feature according to different input instances, we propose a novel model named Dual Input-aware Factorization Machines (DIFMs) that can adaptively reweight the original feature representations at the bit-wise and vector-wise levels simultaneously. Furthermore, DIFMs strategically integrate various components including Multi-Head Self-Attention, Residual Networks and DNNs into a unified end-to-end model. Comprehensive experiments on two real-world CTR prediction datasets show that the DIFM model can outperform several state-of-the-art models consistently.

Download Full-text

Deep Learning for Transient Image Reconstruction from ToF Data

Sensors ◽

10.3390/s21061962 ◽

2021 ◽

Vol 21 (6) ◽

pp. 1962

Author(s):

Enrico Buratto ◽

Adriano Simonetto ◽

Gianluca Agresti ◽

Henrik Schäfer ◽

Pietro Zanuttigh

Keyword(s):

Deep Learning ◽

State Of The Art ◽

Light Response ◽

Real Data ◽

Depth Image ◽

Learning Approach ◽

Multiple Reflections ◽

Noisy Input ◽

Novel Approach ◽

Incoming Light

In this work, we propose a novel approach for correcting multi-path interference (MPI) in Time-of-Flight (ToF) cameras by estimating the direct and global components of the incoming light. MPI is an error source linked to the multiple reflections of light inside a scene; each sensor pixel receives information coming from different light paths which generally leads to an overestimation of the depth. We introduce a novel deep learning approach, which estimates the structure of the time-dependent scene impulse response and from it recovers a depth image with a reduced amount of MPI. The model consists of two main blocks: a predictive model that learns a compact encoded representation of the backscattering vector from the noisy input data and a fixed backscattering model which translates the encoded representation into the high dimensional light response. Experimental results on real data show the effectiveness of the proposed approach, which reaches state-of-the-art performances.

Download Full-text

One for All: Neural Joint Modeling of Entities and Events

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33016851 ◽

2019 ◽

Vol 33 ◽

pp. 6851-6858 ◽

Cited By ~ 4

Author(s):

Trung Minh Nguyen ◽

Thien Huu Nguyen

Keyword(s):

Deep Learning ◽

Recent Work ◽

State Of The Art ◽

Contextual Information ◽

Joint Modeling ◽

Event Extraction ◽

Event Trigger ◽

The Individual ◽

Novel Model ◽

Argument Roles

The previous work for event extraction has mainly focused on the predictions for event triggers and argument roles, treating entity mentions as being provided by human annotators. This is unrealistic as entity mentions are usually predicted by some existing toolkits whose errors might be propagated to the event trigger and argument role recognition. Few of the recent work has addressed this problem by jointly predicting entity mentions, event triggers and arguments. However, such work is limited to using discrete engineering features to represent contextual information for the individual tasks and their interactions. In this work, we propose a novel model to jointly perform predictions for entity mentions, event triggers and arguments based on the shared hidden representations from deep learning. The experiments demonstrate the benefits of the proposed method, leading to the state-of-the-art performance for event extraction.

Download Full-text

A recommendations model with multiaspect awareness and hierarchical user-product attention mechanisms

Computer Science and Information Systems ◽

10.2298/csis190925024b ◽

2020 ◽

Vol 17 (3) ◽

pp. 849-865

Author(s):

Zhongqin Bi ◽

Shuming Dou ◽

Zhe Liu ◽

Yongbin Li

Keyword(s):

State Of The Art ◽

Weight Vector ◽

User Preferences ◽

The Other ◽

Attention Networks ◽

Proposed Model ◽

Network Methods ◽

Public Datasets ◽

Novel Model ◽

Attention Weight

Neural network methods have been trained to satisfactorily learn user/product representations from textual reviews. A representation can be considered as a multiaspect attention weight vector. However, in several existing methods, it is assumed that the user representation remains unchanged even when the user interacts with products having diverse characteristics, which leads to inaccurate recommendations. To overcome this limitation, this paper proposes a novel model to capture the varying attention of a user for different products by using a multilayer attention framework. First, two individual hierarchical attention networks are used to encode the users and products to learn the user preferences and product characteristics from review texts. Then, we design an attention network to reflect the adaptive change in the user preferences for each aspect of the targeted product in terms of the rating and review. The results of experiments performed on three public datasets demonstrate that the proposed model notably outperforms the other state-of-the-art baselines, thereby validating the effectiveness of the proposed approach.

Download Full-text

Unsupervised Deep Learning via Affinity Diffusion

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6757 ◽

2020 ◽

Vol 34 (07) ◽

pp. 11029-11036

Author(s):

Jiabo Huang ◽

Qi Dong ◽

Shaogang Gong ◽

Xiatian Zhu

Keyword(s):

Deep Learning ◽

State Of The Art ◽

General Purpose ◽

Training Data ◽

Learning Approach ◽

Model Learning ◽

Feature Representations ◽

Discriminative Feature ◽

Training Samples ◽

Unsupervised Deep Learning

Convolutional neural networks (CNNs) have achieved unprecedented success in a variety of computer vision tasks. However, they usually rely on supervised model learning with the need for massive labelled training data, limiting dramatically their usability and deployability in real-world scenarios without any labelling budget. In this work, we introduce a general-purpose unsupervised deep learning approach to deriving discriminative feature representations. It is based on self-discovering semantically consistent groups of unlabelled training samples with the same class concepts through a progressive affinity diffusion process. Extensive experiments on object image classification and clustering show the performance superiority of the proposed method over the state-of-the-art unsupervised learning models using six common image recognition benchmarks including MNIST, SVHN, STL10, CIFAR10, CIFAR100 and ImageNet.

Download Full-text