real world datasets Latest Research Papers

Passenger Mobility Prediction via Representation Learning for Dynamic Directed and Weighted Graphs

ACM Transactions on Intelligent Systems and Technology ◽

10.1145/3446344 ◽

2022 ◽

Vol 13 (1) ◽

pp. 1-25

Author(s):

Yuandong Wang ◽

Hongzhi Yin ◽

Tong Chen ◽

Chunyang Liu ◽

Ben Wang ◽

...

Keyword(s):

Fundamental Problem ◽

Route Planning ◽

Representation Learning ◽

Data Representation ◽

Weighted Graphs ◽

Spatial And Temporal Patterns ◽

Graph Representations ◽

Demand Prediction ◽

Passenger Demand ◽

Real World Datasets

In recent years, ride-hailing services have been increasingly prevalent, as they provide huge convenience for passengers. As a fundamental problem, the timely prediction of passenger demands in different regions is vital for effective traffic flow control and route planning. As both spatial and temporal patterns are indispensable passenger demand prediction, relevant research has evolved from pure time series to graph-structured data for modeling historical passenger demand data, where a snapshot graph is constructed for each time slot by connecting region nodes via different relational edges (origin-destination relationship, geographical distance, etc.). Consequently, the spatiotemporal passenger demand records naturally carry dynamic patterns in the constructed graphs, where the edges also encode important information about the directions and volume (i.e., weights) of passenger demands between two connected regions. aspects in the graph-structure data. representation for DDW is the key to solve the prediction problem. However, existing graph-based solutions fail to simultaneously consider those three crucial aspects of dynamic, directed, and weighted graphs, leading to limited expressiveness when learning graph representations for passenger demand prediction. Therefore, we propose a novel spatiotemporal graph attention network, namely Gallat ( G raph prediction with all at tention) as a solution. In Gallat, by comprehensively incorporating those three intrinsic properties of dynamic directed and weighted graphs, we build three attention layers to fully capture the spatiotemporal dependencies among different regions across all historical time slots. Moreover, the model employs a subtask to conduct pretraining so that it can obtain accurate results more quickly. We evaluate the proposed model on real-world datasets, and our experimental results demonstrate that Gallat outperforms the state-of-the-art approaches.

An Uncertainty-based Neural Network for Explainable Trajectory Segmentation

ACM Transactions on Intelligent Systems and Technology ◽

10.1145/3467978 ◽

2022 ◽

Vol 13 (1) ◽

pp. 1-18

Author(s):

Xin Bi ◽

Chao Zhang ◽

Fangtong Wang ◽

Zhixun Liu ◽

Xiangguo Zhao ◽

...

Keyword(s):

Real World ◽

Feature Learning ◽

Learning Performance ◽

Bed Net ◽

Time Series Segmentation ◽

Real World Applications ◽

Real World Datasets ◽

High Level ◽

High Level Feature ◽

Trajectory Segmentation

As a variant task of time-series segmentation, trajectory segmentation is a key task in the applications of transportation pattern recognition and traffic analysis. However, segmenting trajectory is faced with challenges of implicit patterns and sparse results. Although deep neural networks have tremendous advantages in terms of high-level feature learning performance, deploying as a blackbox seriously limits the real-world applications. Providing explainable segmentations has significance for result evaluation and decision making. Thus, in this article, we address trajectory segmentation by proposing a Bayesian Encoder-Decoder Network (BED-Net) to provide accurate detection with explainability and references for the following active-learning procedures. BED-Net consists of a segmentation module based on Monte Carlo dropout and an explanation module based on uncertainty learning that provides results evaluation and visualization. Experimental results on both benchmark and real-world datasets indicate that BED-Net outperforms the rival methods and offers excellent explainability in the applications of trajectory segmentation.

Contrastive Trajectory Learning for Tour Recommendation

ACM Transactions on Intelligent Systems and Technology ◽

10.1145/3462331 ◽

2022 ◽

Vol 13 (1) ◽

pp. 1-25

Author(s):

Fan Zhou ◽

Pengyu Wang ◽

Xovee Xu ◽

Wenxin Tai ◽

Goce Trajcevski

Keyword(s):

Data Augmentation ◽

State Of The Art ◽

Implicit Feedback ◽

Data Sparsity ◽

Weak Supervision ◽

Point Of Interest ◽

Data Correlations ◽

Recommendation Accuracy ◽

Real World Datasets ◽

Trajectory Learning

The main objective of Personalized Tour Recommendation (PTR) is to generate a sequence of point-of-interest (POIs) for a particular tourist, according to the user-specific constraints such as duration time, start and end points, the number of attractions planned to visit, and so on. Previous PTR solutions are based on either heuristics for solving the orienteering problem to maximize a global reward with a specified budget or approaches attempting to learn user visiting preferences and transition patterns with the stochastic process or recurrent neural networks. However, existing learning methodologies rely on historical trips to train the model and use the next visited POI as the supervised signal, which may not fully capture the coherence of preferences and thus recommend similar trips to different users, primarily due to the data sparsity problem and long-tailed distribution of POI popularity. This work presents a novel tour recommendation model by distilling knowledge and supervision signals from the trips in a self-supervised manner. We propose Contrastive Trajectory Learning for Tour Recommendation (CTLTR), which utilizes the intrinsic POI dependencies and traveling intent to discover extra knowledge and augments the sparse data via pre-training auxiliary self-supervised objectives. CTLTR provides a principled way to characterize the inherent data correlations while tackling the implicit feedback and weak supervision problems by learning robust representations applicable for tour planning. We introduce a hierarchical recurrent encoder-decoder to identify tourists’ intentions and use the contrastive loss to discover subsequence semantics and their sequential patterns through maximizing the mutual information. Additionally, we observe that a data augmentation step as the preliminary of contrastive learning can solve the overfitting issue resulting from data sparsity. We conduct extensive experiments on a range of real-world datasets and demonstrate that our model can significantly improve the recommendation performance over the state-of-the-art baselines in terms of both recommendation accuracy and visiting orders.

An Unsupervised Aspect-Aware Recommendation Model with Explanation Text Generation

ACM Transactions on Information Systems ◽

10.1145/3483611 ◽

2022 ◽

Vol 40 (3) ◽

pp. 1-29

Author(s):

Peijie Sun ◽

Le Wu ◽

Kun Zhang ◽

Yu Su ◽

Meng Wang

Keyword(s):

Data Privacy ◽

Auxiliary Information ◽

Generation Process ◽

Text Generation ◽

Generation Task ◽

Auxiliary Data ◽

Fine Grained ◽

Aspect Extraction ◽

Learning Framework ◽

Real World Datasets

Review based recommendation utilizes both users’ rating records and the associated reviews for recommendation. Recently, with the rapid demand for explanations of recommendation results, reviews are used to train the encoder–decoder models for explanation text generation. As most of the reviews are general text without detailed evaluation, some researchers leveraged auxiliary information of users or items to enrich the generated explanation text. Nevertheless, the auxiliary data is not available in most scenarios and may suffer from data privacy problems. In this article, we argue that the reviews contain abundant semantic information to express the users’ feelings for various aspects of items, while these information are not fully explored in current explanation text generation task. To this end, we study how to generate more fine-grained explanation text in review based recommendation without any auxiliary data. Though the idea is simple, it is non-trivial since the aspect is hidden and unlabeled. Besides, it is also very challenging to inject aspect information for generating explanation text with noisy review input. To solve these challenges, we first leverage an advanced unsupervised neural aspect extraction model to learn the aspect-aware representation of each review sentence. Thus, users and items can be represented in the aspect space based on their historical associated reviews. After that, we detail how to better predict ratings and generate explanation text with the user and item representations in the aspect space. We further dynamically assign review sentences which contain larger proportion of aspect words with larger weights to control the text generation process, and jointly optimize rating prediction accuracy and explanation text generation quality with a multi-task learning framework. Finally, extensive experimental results on three real-world datasets demonstrate the superiority of our proposed model for both recommendation accuracy and explainability.

Origin-Aware Location Prediction Based on Historical Vehicle Trajectories

ACM Transactions on Intelligent Systems and Technology ◽

10.1145/3462675 ◽

2022 ◽

Vol 13 (1) ◽

pp. 1-18

Author(s):

Meng Chen ◽

Qingjie Liu ◽

Weiming Huang ◽

Teng Zhang ◽

Yixuan Zuo ◽

...

Keyword(s):

Travel Time ◽

Markov Models ◽

Prediction Method ◽

Time Difference ◽

Location Prediction ◽

Trajectory Data ◽

Difference Model ◽

The Difference ◽

Real World Datasets ◽

Joint Prediction

Next location prediction is of great importance for many location-based applications and provides essential intelligence to various businesses. In previous studies, a common approach to next location prediction is to learn the sequential transitions with massive historical trajectories based on conditional probability. Nevertheless, due to the time and space complexity, these methods (e.g., Markov models) only utilize the just passed locations to predict next locations, neglecting earlier passed locations in the trajectory. In this work, we seek to enhance the prediction performance by incorporating the travel time from all the passed locations in the query trajectory to each candidate next location. To this end, we propose a novel prediction method, namely the Travel Time Difference Model, which exploits the difference between the shortest travel time and the actual travel time to predict next locations. Moreover, we integrate the Travel Time Difference Model with a Sequential and Temporal Predictor to yield a joint model. The joint prediction model integrates local sequential transitions, temporal regularity, and global travel time information in the trajectory for the next location prediction problem. We have conducted extensive experiments on two real-world datasets: the vehicle passage record data and the taxi trajectory data. The experimental results demonstrate significant improvements in prediction accuracy over baseline methods.

Toward Fair Recommendation in Two-sided Platforms

ACM Transactions on the Web ◽

10.1145/3503624 ◽

2022 ◽

Vol 16 (2) ◽

pp. 1-34

Author(s):

Arpita Biswas ◽

Gourab K. Patro ◽

Niloy Ganguly ◽

Krishna P. Gummadi ◽

Abhijnan Chakraborty

Keyword(s):

Customer Satisfaction ◽

Real World ◽

Computation Time ◽

Well Being ◽

Personalized Recommendation ◽

Indivisible Goods ◽

Goods And Services ◽

Online Platforms ◽

Real World Datasets ◽

The Cost

Many online platforms today (such as Amazon, Netflix, Spotify, LinkedIn, and AirBnB) can be thought of as two-sided markets with producers and customers of goods and services. Traditionally, recommendation services in these platforms have focused on maximizing customer satisfaction by tailoring the results according to the personalized preferences of individual customers. However, our investigation reinforces the fact that such customer-centric design of these services may lead to unfair distribution of exposure to the producers, which may adversely impact their well-being. However, a pure producer-centric design might become unfair to the customers. As more and more people are depending on such platforms to earn a living, it is important to ensure fairness to both producers and customers. In this work, by mapping a fair personalized recommendation problem to a constrained version of the problem of fairly allocating indivisible goods, we propose to provide fairness guarantees for both sides. Formally, our proposed FairRec algorithm guarantees Maxi-Min Share of exposure for the producers, and Envy-Free up to One Item fairness for the customers. Extensive evaluations over multiple real-world datasets show the effectiveness of FairRec in ensuring two-sided fairness while incurring a marginal loss in overall recommendation quality. Finally, we present a modification of FairRec (named as FairRecPlus ) that at the cost of additional computation time, improves the recommendation performance for the customers, while maintaining the same fairness guarantees.

Factorizing Historical User Actions for Next-Day Purchase Prediction

ACM Transactions on the Web ◽

10.1145/3468227 ◽

2022 ◽

Vol 16 (1) ◽

pp. 1-26

Author(s):

Bang Liu ◽

Hanlin Zhang ◽

Linglong Kong ◽

Di Niu

Keyword(s):

User Behavior ◽

Purchase Behavior ◽

Time Decay ◽

Music Recommendation ◽

Unified Framework ◽

Transaction Data ◽

Proposed Model ◽

Recommendation Algorithms ◽

Real World Datasets ◽

User Actions

It is common practice for many large e-commerce operators to analyze daily logged transaction data to predict customer purchase behavior, which may potentially lead to more effective recommendations and increased sales. Traditional recommendation techniques based on collaborative filtering, although having gained success in video and music recommendation, are not sufficient to fully leverage the diverse information contained in the implicit user behavior on e-commerce platforms. In this article, we analyze user action records in the Alibaba Mobile Recommendation dataset from the Alibaba Tianchi Data Lab, as well as the Retailrocket recommender system dataset from the Retail Rocket website. To estimate the probability that a user will purchase a certain item tomorrow, we propose a new model called Time-decayed Multifaceted Factorizing Personalized Markov Chains (Time-decayed Multifaceted-FPMC), taking into account multiple types of user historical actions not only limited to past purchases but also including various behaviors such as clicks, collects and add-to-carts. Our model also considers the time-decay effect of the influence of past actions. To learn the parameters in the proposed model, we further propose a unified framework named Bayesian Sparse Factorization Machines. It generalizes the theory of traditional Factorization Machines to a more flexible learning structure and trains the Time-decayed Multifaceted-FPMC with the Markov Chain Monte Carlo method. Extensive evaluations based on multiple real-world datasets demonstrate that our proposed approaches significantly outperform various existing purchase recommendation algorithms.

Topic-aware Incentive Mechanism for Task Diffusion in Mobile Crowdsourcing through Social Network

ACM Transactions on Internet Technology ◽

10.1145/3487580 ◽

2022 ◽

Vol 22 (1) ◽

pp. 1-23

Author(s):

Jia Xu ◽

Yuanhang Zhou ◽

Gongyu Chen ◽

Yuqing Ding ◽

Dejun Yang ◽

...

Keyword(s):

Social Network ◽

Diffusion Model ◽

Large Scale ◽

Incentive Mechanism ◽

Estimation Algorithm ◽

Task Completion ◽

Incentive Mechanisms ◽

Mobile Crowdsourcing ◽

Real World Datasets ◽

Budget Feasible

Crowdsourcing has become an efficient paradigm to utilize human intelligence to perform tasks that are challenging for machines. Many incentive mechanisms for crowdsourcing systems have been proposed. However, most of existing incentive mechanisms assume that there are sufficient participants to perform crowdsourcing tasks. In large-scale crowdsourcing scenarios, this assumption may be not applicable. To address this issue, we diffuse the crowdsourcing tasks in social network to increase the number of participants. To make the task diffusion more applicable to crowdsourcing system, we enhance the classic Independent Cascade model so the influence is strongly connected with both the types and topics of tasks. Based on the tailored task diffusion model, we formulate the Budget Feasible Task Diffusion ( BFTD ) problem for maximizing the value function of platform with constrained budget. We design a parameter estimation algorithm based on Expectation Maximization algorithm to estimate the parameters in proposed task diffusion model. Benefitting from the submodular property of the objective function, we apply the budget-feasible incentive mechanism, which satisfies desirable properties of computational efficiency, individual rationality, budget-feasible, truthfulness, and guaranteed approximation, to stimulate the task diffusers. The simulation results based on two real-world datasets show that our incentive mechanism can improve the number of active users and the task completion rate by 9.8% and 11%, on average.

Multiple Graphs and Low-Rank Embedding for Multi-Source Heterogeneous Domain Adaptation

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3492804 ◽

2022 ◽

Vol 16 (4) ◽

pp. 1-25

Author(s):

Hanrui Wu ◽

Michael K. Ng

Keyword(s):

Domain Adaptation ◽

Low Rank ◽

Multiple Sources ◽

Target Domain ◽

Structure Information ◽

Learning Procedure ◽

Original Target ◽

Real World Datasets ◽

Iterative Optimization Algorithm ◽

Multiple Domains

Multi-source domain adaptation is a challenging topic in transfer learning, especially when the data of each domain are represented by different kinds of features, i.e., Multi-source Heterogeneous Domain Adaptation (MHDA). It is important to take advantage of the knowledge extracted from multiple sources as well as bridge the heterogeneous spaces for handling the MHDA paradigm. This article proposes a novel method named Multiple Graphs and Low-rank Embedding (MGLE), which models the local structure information of multiple domains using multiple graphs and learns the low-rank embedding of the target domain. Then, MGLE augments the learned embedding with the original target data. Specifically, we introduce the modules of both domain discrepancy and domain relevance into the multiple graphs and low-rank embedding learning procedure. Subsequently, we develop an iterative optimization algorithm to solve the resulting problem. We evaluate the effectiveness of the proposed method on several real-world datasets. Promising results show that the performance of MGLE is better than that of the baseline methods in terms of several metrics, such as AUC, MAE, accuracy, precision, F1 score, and MCC, demonstrating the effectiveness of the proposed method.

Exploiting Heterogeneous Graph Neural Networks with Latent Worker/Task Correlation Information for Label Aggregation in Crowdsourcing

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3460865 ◽

2022 ◽

Vol 16 (2) ◽

pp. 1-18

Author(s):

Hanlu Wu ◽

Tengfei Ma ◽

Lingfei Wu ◽

Fangli Xu ◽

Shouling Ji

Keyword(s):

Neural Network ◽

Neural Networks ◽

State Of The Art ◽

Superior Performance ◽

True Label ◽

Label Aggregation ◽

Correlation Information ◽

Real World Datasets ◽

Graph Neural Networks ◽

High Level

Crowdsourcing has attracted much attention for its convenience to collect labels from non-expert workers instead of experts. However, due to the high level of noise from the non-experts, a label aggregation model that infers the true label from noisy crowdsourced labels is required. In this article, we propose a novel framework based on graph neural networks for aggregating crowd labels. We construct a heterogeneous graph between workers and tasks and derive a new graph neural network to learn the representations of nodes and the true labels. Besides, we exploit the unknown latent interaction between the same type of nodes (workers or tasks) by adding a homogeneous attention layer in the graph neural networks. Experimental results on 13 real-world datasets show superior performance over state-of-the-art models.

real world datasets
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Passenger Mobility Prediction via Representation Learning for Dynamic Directed and Weighted Graphs

An Uncertainty-based Neural Network for Explainable Trajectory Segmentation

Contrastive Trajectory Learning for Tour Recommendation

An Unsupervised Aspect-Aware Recommendation Model with Explanation Text Generation

Origin-Aware Location Prediction Based on Historical Vehicle Trajectories

Toward Fair Recommendation in Two-sided Platforms

Factorizing Historical User Actions for Next-Day Purchase Prediction

Topic-aware Incentive Mechanism for Task Diffusion in Mobile Crowdsourcing through Social Network

Multiple Graphs and Low-Rank Embedding for Multi-Source Heterogeneous Domain Adaptation

Exploiting Heterogeneous Graph Neural Networks with Latent Worker/Task Correlation Information for Label Aggregation in Crowdsourcing

Export Citation Format

real world datasetsRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Passenger Mobility Prediction via Representation Learning for Dynamic Directed and Weighted Graphs

An Uncertainty-based Neural Network for Explainable Trajectory Segmentation

Contrastive Trajectory Learning for Tour Recommendation

An Unsupervised Aspect-Aware Recommendation Model with Explanation Text Generation

Origin-Aware Location Prediction Based on Historical Vehicle Trajectories

Toward Fair Recommendation in Two-sided Platforms

Factorizing Historical User Actions for Next-Day Purchase Prediction

Topic-aware Incentive Mechanism for Task Diffusion in Mobile Crowdsourcing through Social Network

Multiple Graphs and Low-Rank Embedding for Multi-Source Heterogeneous Domain Adaptation

Exploiting Heterogeneous Graph Neural Networks with Latent Worker/Task Correlation Information for Label Aggregation in Crowdsourcing

real world datasets
Recently Published Documents