Multi-Level Deep Cascade Trees for Conversion Rate Prediction in Recommendation System

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.3301338 ◽

2019 ◽

Vol 33 ◽

pp. 338-345 ◽

Cited By ~ 3

Author(s):

Hong Wen ◽

Jing Zhang ◽

Quan Lin ◽

Keping Yang ◽

Pipei Huang

Keyword(s):

Conversion Rate ◽

Recommendation System ◽

Feature Learning ◽

Feature Representation ◽

Gradient Boosting ◽

Data Set ◽

Rate Prediction ◽

Clear Explanation ◽

Cascade Structure ◽

Multi Level

Developing effective and efficient recommendation methods is very challenging for modern e-commerce platforms. Generally speaking, two essential modules named “ClickThrough Rate Prediction” (CTR) and “Conversion Rate Prediction” (CVR) are included, where CVR module is a crucial factor that affects the final purchasing volume directly. However, it is indeed very challenging due to its sparseness nature. In this paper, we tackle this problem by proposing multiLevel Deep Cascade Trees (ldcTree), which is a novel decision tree ensemble approach. It leverages deep cascade structures by stacking Gradient Boosting Decision Trees (GBDT) to effectively learn feature representation. In addition, we propose to utilize the cross-entropy in each tree of the preceding GBDT as the input feature representation for next level GBDT, which has a clear explanation, i.e., a traversal from root to leaf nodes in the next level GBDT corresponds to the combination of certain traversals in the preceding GBDT. The deep cascade structure and the combination rule enable the proposed ldcTree to have a stronger distributed feature representation ability. Moreover, inspired by ensemble learning, we propose an Ensemble ldcTree (E-ldcTree) to encourage the model’s diversity and enhance the representation ability further. Finally, we propose an improved Feature learning method based on EldcTree (F-EldcTree) for taking adequate use of weak and strong correlation features identified by pretrained GBDT models. Experimental results on off-line data set and online deployment demonstrate the effectiveness of the proposed methods.

Download Full-text

Report for Multi-Level Deep Cascade Trees for Conversion Rate Prediction in Recommendation System

2020 4th International Conference on Computer Engineering, Information Science & Application Technology (ICCIA 2020) ◽

10.23977/iccia2020002 ◽

2020 ◽

Keyword(s):

Conversion Rate ◽

Recommendation System ◽

Rate Prediction ◽

Multi Level

Download Full-text

Attention neural collaboration filtering based on GRU for recommender systems

Complex & Intelligent Systems ◽

10.1007/s40747-021-00274-4 ◽

2021 ◽

Author(s):

Hongbin Xia ◽

Yang Luo ◽

Yuan Liu

Keyword(s):

Deep Learning ◽

Collaborative Filtering ◽

Recommendation System ◽

Auxiliary Information ◽

User Preferences ◽

Feature Representation ◽

Long Distance ◽

Data Set ◽

Filtering Method ◽

The Impact

AbstractThe collaborative filtering method is widely used in the traditional recommendation system. The collaborative filtering method based on matrix factorization treats the user’s preference for the item as a linear combination of the user and the item latent vectors, and cannot learn a deeper feature representation. In addition, the cold start and data sparsity remain major problems for collaborative filtering. To tackle these problems, some scholars have proposed to use deep neural network to extract text information, but did not consider the impact of long-distance dependent information and key information on their models. In this paper, we propose a neural collaborative filtering recommender method that integrates user and item auxiliary information. This method fully integrates user-item rating information, user assistance information and item text assistance information for feature extraction. First, Stacked Denoising Auto Encoder is used to extract user features, and Gated Recurrent Unit with auxiliary information is used to extract items’ latent vectors, respectively. The attention mechanism is used to learn key information when extracting text features. Second, the latent vectors learned by deep learning techniques are used in multi-layer nonlinear networks to learn more abstract and deeper feature representations to predict user preferences. According to the verification results on the MovieLens data set, the proposed model outperforms other traditional approaches and deep learning models making it state of the art.

Download Full-text

In silico Prediction of Inhibitory Constant of Thrombin Inhibitors Using Machine Learning

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207322666181220130232 ◽

2019 ◽

Vol 21 (9) ◽

pp. 662-669 ◽

Cited By ~ 1

Author(s):

Junnan Zhao ◽

Lu Zhu ◽

Weineng Zhou ◽

Lingfeng Yin ◽

Yuchen Wang ◽

...

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Regression Tree ◽

Large Data ◽

Thrombin Inhibitors ◽

Coagulation Cascade ◽

Gradient Boosting ◽

Support Vector ◽

Data Set ◽

Descriptor Selection

Background: Thrombin is the central protease of the vertebrate blood coagulation cascade, which is closely related to cardiovascular diseases. The inhibitory constant Ki is the most significant property of thrombin inhibitors. Method: This study was carried out to predict Ki values of thrombin inhibitors based on a large data set by using machine learning methods. Taking advantage of finding non-intuitive regularities on high-dimensional datasets, machine learning can be used to build effective predictive models. A total of 6554 descriptors for each compound were collected and an efficient descriptor selection method was chosen to find the appropriate descriptors. Four different methods including multiple linear regression (MLR), K Nearest Neighbors (KNN), Gradient Boosting Regression Tree (GBRT) and Support Vector Machine (SVM) were implemented to build prediction models with these selected descriptors. Results: The SVM model was the best one among these methods with R2=0.84, MSE=0.55 for the training set and R2=0.83, MSE=0.56 for the test set. Several validation methods such as yrandomization test and applicability domain evaluation, were adopted to assess the robustness and generalization ability of the model. The final model shows excellent stability and predictive ability and can be employed for rapid estimation of the inhibitory constant, which is full of help for designing novel thrombin inhibitors.

Download Full-text

Follow the Prophet: Accurate Online Conversion Rate Prediction in the Face of Delayed Feedback

Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval ◽

10.1145/3404835.3463045 ◽

2021 ◽

Author(s):

Haoming Li ◽

Feiyang Pan ◽

Xiang Ao ◽

Zhao Yang ◽

Min Lu ◽

...

Keyword(s):

Conversion Rate ◽

Delayed Feedback ◽

Rate Prediction ◽

The Face

Download Full-text

Abstiegserfahrungen in der Kommune und der Aufstieg Unabhängiger Wählergemeinschaften. Neue Befunde aus einem Mehrebenen-Kommunalwahlpanel

Zeitschrift für Parlamentsfragen ◽

10.5771/0340-1758-2021-1-59 ◽

2021 ◽

Vol 52 (1) ◽

pp. 59-77

Author(s):

Christina-Marie Juen ◽

Markus Tepe ◽

Michael Jankowski

Keyword(s):

Panel Data ◽

Voter Turnout ◽

Party System ◽

Local Politics ◽

Tax Revenue ◽

Data Set ◽

Political Fragmentation ◽

Electoral Success ◽

Party Positions ◽

Multi Level

In Germany, Independent Local Lists (UWG) have become an integral part of local politics in recent decades . Despite their growing political importance, the reasons for their electoral rise have hardly been researched . Recent studies argue that Independent Local Lists pursue anti-party positions, which makes them attractive to voters who are dissatisfied with the party system . Assuming that a decline of confidence in established parties corresponds with the experience of local deprivation, this contribution uses a multi-level panel data set to investigate how socio-economic (emigration, aging, declining tax revenue) and politicalcultural (turnout, fragmentation) deprivation processes affect the electoral success of Independent Local Lists . The empirical findings suggest that Independent Local Lists are more successful in municipalities where voter turnout has fallen and political fragmentation has increased .

Download Full-text

mAHTPred: a sequence-based meta-predictor for improving the prediction of anti-hypertensive peptides using effective feature representation

Bioinformatics ◽

10.1093/bioinformatics/bty1047 ◽

2018 ◽

Vol 35 (16) ◽

pp. 2757-2765 ◽

Cited By ~ 63

Author(s):

Balachandran Manavalan ◽

Shaherin Basith ◽

Tae Hwan Shin ◽

Leyi Wei ◽

Gwang Lee

Keyword(s):

Nearest Neighbor ◽

Feature Representation ◽

Superior Performance ◽

Supplementary Information ◽

Gradient Boosting ◽

Support Vector ◽

Pharmaceutical Drugs ◽

K Nearest Neighbor ◽

Feature Descriptors ◽

Predicted Probability

AbstractMotivationCardiovascular disease is the primary cause of death globally accounting for approximately 17.7 million deaths per year. One of the stakes linked with cardiovascular diseases and other complications is hypertension. Naturally derived bioactive peptides with antihypertensive activities serve as promising alternatives to pharmaceutical drugs. So far, there is no comprehensive analysis, assessment of diverse features and implementation of various machine-learning (ML) algorithms applied for antihypertensive peptide (AHTP) model construction.ResultsIn this study, we utilized six different ML algorithms, namely, Adaboost, extremely randomized tree (ERT), gradient boosting (GB), k-nearest neighbor, random forest (RF) and support vector machine (SVM) using 51 feature descriptors derived from eight different feature encodings for the prediction of AHTPs. While ERT-based trained models performed consistently better than other algorithms regardless of various feature descriptors, we treated them as baseline predictors, whose predicted probability of AHTPs was further used as input features separately for four different ML-algorithms (ERT, GB, RF and SVM) and developed their corresponding meta-predictors using a two-step feature selection protocol. Subsequently, the integration of four meta-predictors through an ensemble learning approach improved the balanced prediction performance and model robustness on the independent dataset. Upon comparison with existing methods, mAHTPred showed superior performance with an overall improvement of approximately 6–7% in both benchmarking and independent datasets.Availability and implementationThe user-friendly online prediction tool, mAHTPred is freely accessible at http://thegleelab.org/mAHTPred.Supplementary informationSupplementary data are available at Bioinformatics online.

Download Full-text

A unified multi-level spectral–temporal feature learning framework for patient-specific seizure onset detection in EEG signals

Knowledge-Based Systems ◽

10.1016/j.knosys.2020.106152 ◽

2020 ◽

Vol 205 ◽

pp. 106152 ◽

Cited By ~ 1

Author(s):

Fang-Gui Tang ◽

Yu Liu ◽

Yang Li ◽

Zi-Wen Peng

Keyword(s):

Feature Learning ◽

Patient Specific ◽

Onset Detection ◽

Eeg Signals ◽

Seizure Onset ◽

Learning Framework ◽

Multi Level ◽

Temporal Feature

Download Full-text

Evaluating Word Similarity Measure of Embeddings Through Binary Classification

Journal of Computer Science Research ◽

10.30564/jcsr.v1i3.1268 ◽

2019 ◽

Vol 1 (3) ◽

Author(s):

A. Aziz Altowayan ◽

Lixin Tao

Keyword(s):

Similarity Measure ◽

Binary Classification ◽

General Purpose ◽

Feature Representation ◽

Entity Recognition ◽

Language Models ◽

Data Set ◽

Word Similarity ◽

Domain Specific ◽

Retrieval Rate

We consider the following problem: given neural language models (embeddings) each of which is trained on an unknown data set, how can we determine which model would provide a better result when used for feature representation in a downstream task such as text classification or entity recognition? In this paper, we assess the word similarity measure through analyzing its impact on word embeddings learned from various datasets and how they perform in a simple classification task. Word representations were learned and assessed under the same conditions. For training word vectors, we used the implementation of Continuous Bag of Words described in [1]. To assess the quality of the vectors, we applied the analogy questions test for word similarity described in the same paper. Further, to measure the retrieval rate of an embedding model, we introduced a new metric (Average Retrieval Error) which measures the percentage of missing words in the model. We observe that scoring a high accuracy of syntactic and semantic similarities between word pairs is not an indicator of better classification results. This observation can be justified by the fact that a domain-specific corpus contributes to the performance better than a general-purpose corpus. For reproducibility, we release our experiments scripts and results.

Download Full-text

Disentangled Feature Learning Network for Vehicle Re-Identification

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/66 ◽

2020 ◽

Author(s):

Yan Bai ◽

Yihang Lou ◽

Yongxing Dai ◽

Jun Liu ◽

Ziqian Chen ◽

...

Keyword(s):

State Of The Art ◽

Feature Learning ◽

Feature Representation ◽

Public Security ◽

The Public ◽

Common Features ◽

Learning Network ◽

Single Feature ◽

Art Performance

Vehicle Re-Identification (ReID) has attracted lots of research efforts due to its great significance to the public security. In vehicle ReID, we aim to learn features that are powerful in discriminating subtle differences between vehicles which are visually similar, and also robust against different orientations of the same vehicle. However, these two characteristics are hard to be encapsulated into a single feature representation simultaneously with unified supervision. Here we propose a Disentangled Feature Learning Network (DFLNet) to learn orientation specific and common features concurrently, which are discriminative at details and invariant to orientations, respectively. Moreover, to effectively use these two types of features for ReID, we further design a feature metric alignment scheme to ensure the consistency of the metric scales. The experiments show the effectiveness of our method that achieves state-of-the-art performance on three challenging datasets.

Download Full-text

Wheel fault diagnosis model based on multichannel attention and supervised contrastive learning

Advances in Mechanical Engineering ◽

10.1177/16878140211067024 ◽

2021 ◽

Vol 13 (12) ◽

pp. 168781402110670

Author(s):

Yanxiang Chen ◽

Zuxing Zhao ◽

Euiyoul Kim ◽

Haiyang Liu ◽

Juan Xu ◽

...

Keyword(s):

Network Architecture ◽

Feature Learning ◽

Feature Representation ◽

Diagnostic Methods ◽

Learning Ability ◽

Noise Interference ◽

Backbone Network ◽

Mechanical Methods ◽

Wheel Flat ◽

Wheel Radius

As wheels are important components of train operation, diagnosing and predicting wheel faults are essential to ensure the reliability of rail transit. Currently, the existing studies always separately deal with two main types of wheel faults, namely wheel radius difference and wheel flat, even though they are both reflected by wheel radius changes. Moreover, traditional diagnostic methods, such as mechanical methods or a combination of data analysis methods, have limited abilities to efficiently extract data features. Deep learning models have become useful tools to automatically learn features from raw vibration signals. However, research on improving the feature-learning capabilities of models under noise interference to yield higher wheel diagnostic accuracies has not yet been conducted. In this paper, a unified training framework with the same model architecture and loss function is established for two homologous wheel faults. After selecting deep residual networks (ResNets) as the backbone network to build the model, we add the squeeze and excitation (SE) module based on a multichannel attention mechanism to the backbone network to learn the global relationships among feature channels. Then the influence of noise interference features is reduced while the extraction of useful information features is enhanced, leading to the improved feature-learning ability of ResNet. To further obtain effective feature representation using the model, we introduce supervised contrastive loss (SCL) on the basis of ResNet + SE to enlarge the feature distances of different fault classes through a comparison between positive and negative examples under label supervision to obtain a better class differentiation and higher diagnostic accuracy. We also complete a regression task to predict the fault degrees of wheel radius difference and wheel flat without changing the network architecture. The extensive experimental results show that the proposed model has a high accuracy in diagnosing and predicting two types of wheel faults.

Download Full-text