scholarly journals Fairwalk: Towards Fair Graph Embedding

Author(s):  
Tahleen Rahman ◽  
Bartlomiej Surma ◽  
Michael Backes ◽  
Yang Zhang

Graph embeddings have gained huge popularity in the recent years as a powerful tool to analyze social networks. However, no prior works have studied potential bias issues inherent within graph embedding. In this paper, we make a first attempt in this direction. In particular, we concentrate on the fairness of node2vec, a popular graph embedding method. Our analyses on two real-world datasets demonstrate the existence of bias in node2vec when used for friendship recommendation. We, therefore, propose a fairness-aware embedding method, namely Fairwalk, which extends node2vec. Experimental results demonstrate that Fairwalk reduces bias under multiple fairness metrics while still preserving the utility.

Author(s):  
Hao Wang ◽  
Huawei Shen ◽  
Wentao Ouyang ◽  
Xueqi Cheng

Point-of-interest (POI) recommendation, i.e., recommending unvisited POIs for users, is a fundamental problem for location-based social networks. POI recommendation distinguishes itself from traditional item recommendation, e.g., movie recommendation, via geographical influence among POIs. Existing methods model the geographical influence between two POIs as the probability or propensity that the two POIs are co-visited by the same user given their physical distance. These methods assume that geographical influence between POIs is determined by their physical distance, failing to capture the asymmetry of geographical influence and the high variation of geographical influence across POIs. In this paper, we exploit POI-specific geographical influence to improve POI recommendation. We model the geographical influence between two POIs using three factors: the geo-influence of POI, the geo-susceptibility of POI, and their physical distance. Geo-influence captures POI?s capacity at exerting geographical influence to other POIs, and geo-susceptibility reflects POI?s propensity of being geographically influenced by other POIs. Experimental results on two real-world datasets demonstrate that POI-specific geographical influence significantly improves the performance of POI recommendation.


2019 ◽  
Vol 2019 ◽  
pp. 1-9 ◽  
Author(s):  
Xiaoye Li ◽  
Jing Yang ◽  
Zhenlong Sun ◽  
Jianpei Zhang

Aiming to provide more information about the behaviors between groups or patterns between clusters in social networks, we propose a two-step differentially private method to release the distribution of clustering coefficients across communities. The DPLM algorithm improves a Louvain method to partition one network using an exponential mechanism. We introduce an absolute gain of modularity to sanitize neighboring communities. Otherwise, the algorithm is difficult to converge due to the randomness introduced. The DPCC algorithm charts the noisy distribution of clustering coefficients as a histogram, which presents the results in an intuitive manner. We conduct experiments on three real-world datasets to evaluate the proposed method. The experimental results indicate that the proposed method provides valuable distribution results while guaranteeing ε-differential privacy. Moreover, the DPLM algorithm can obtain better modularity for the networks.


Data ◽  
2020 ◽  
Vol 6 (1) ◽  
pp. 1
Author(s):  
Ahmed Elmogy ◽  
Hamada Rizk ◽  
Amany M. Sarhan

In data mining, outlier detection is a major challenge as it has an important role in many applications such as medical data, image processing, fraud detection, intrusion detection, and so forth. An extensive variety of clustering based approaches have been developed to detect outliers. However they are by nature time consuming which restrict their utilization with real-time applications. Furthermore, outlier detection requests are handled one at a time, which means that each request is initiated individually with a particular set of parameters. In this paper, the first clustering based outlier detection framework, (On the Fly Clustering Based Outlier Detection (OFCOD)) is presented. OFCOD enables analysts to effectively find out outliers on time with request even within huge datasets. The proposed framework has been tested and evaluated using two real world datasets with different features and applications; one with 699 records, and another with five millions records. The experimental results show that the performance of the proposed framework outperforms other existing approaches while considering several evaluation metrics.


2021 ◽  
Vol 15 (3) ◽  
pp. 1-33
Author(s):  
Wenjun Jiang ◽  
Jing Chen ◽  
Xiaofei Ding ◽  
Jie Wu ◽  
Jiawei He ◽  
...  

In online systems, including e-commerce platforms, many users resort to the reviews or comments generated by previous consumers for decision making, while their time is limited to deal with many reviews. Therefore, a review summary, which contains all important features in user-generated reviews, is expected. In this article, we study “how to generate a comprehensive review summary from a large number of user-generated reviews.” This can be implemented by text summarization, which mainly has two types of extractive and abstractive approaches. Both of these approaches can deal with both supervised and unsupervised scenarios, but the former may generate redundant and incoherent summaries, while the latter can avoid redundancy but usually can only deal with short sequences. Moreover, both approaches may neglect the sentiment information. To address the above issues, we propose comprehensive Review Summary Generation frameworks to deal with the supervised and unsupervised scenarios. We design two different preprocess models of re-ranking and selecting to identify the important sentences while keeping users’ sentiment in the original reviews. These sentences can be further used to generate review summaries with text summarization methods. Experimental results in seven real-world datasets (Idebate, Rotten Tomatoes Amazon, Yelp, and three unlabelled product review datasets in Amazon) demonstrate that our work performs well in review summary generation. Moreover, the re-ranking and selecting models show different characteristics.


2020 ◽  
Vol 34 (04) ◽  
pp. 6837-6844
Author(s):  
Xiaojin Zhang ◽  
Honglei Zhuang ◽  
Shengyu Zhang ◽  
Yuan Zhou

We study a variant of the thresholding bandit problem (TBP) in the context of outlier detection, where the objective is to identify the outliers whose rewards are above a threshold. Distinct from the traditional TBP, the threshold is defined as a function of the rewards of all the arms, which is motivated by the criterion for identifying outliers. The learner needs to explore the rewards of the arms as well as the threshold. We refer to this problem as "double exploration for outlier detection". We construct an adaptively updated confidence interval for the threshold, based on the estimated value of the threshold in the previous rounds. Furthermore, by automatically trading off exploring the individual arms and exploring the outlier threshold, we provide an efficient algorithm in terms of the sample complexity. Experimental results on both synthetic datasets and real-world datasets demonstrate the efficiency of our algorithm.


2013 ◽  
Vol 24 (04) ◽  
pp. 1350022 ◽  
Author(s):  
DA-CHENG NIE ◽  
MING-JING DING ◽  
YAN FU ◽  
JUN-LIN ZHOU ◽  
ZI-KE ZHANG

Recommender systems have developed rapidly and successfully. The system aims to help users find relevant items from a potentially overwhelming set of choices. However, most of the existing recommender algorithms focused on the traditional user-item similarity computation, other than incorporating the social interests into the recommender systems. As we know, each user has their own preference field, they may influence their friends' preference in their expert field when considering the social interest on their friends' item collecting. In order to model this social interest, in this paper, we proposed a simple method to compute users' social interest on the specific items in the recommender systems, and then integrate this social interest with similarity preference. The experimental results on two real-world datasets Epinions and Friendfeed show that this method can significantly improve not only the algorithmic precision-accuracy but also the diversity-accuracy.


Author(s):  
Jie Liu ◽  
Zhicheng He ◽  
Yalou Huang

Hashtags have always been important elements in many social network platforms and micro-blog services. Semantic understanding of hashtags is a critical and fundamental task for many applications on social networks, such as event analysis, theme discovery, information retrieval, etc. However, this task is challenging due to the sparsity, polysemy, and synonymy of hashtags. In this paper, we investigate the problem of hashtag embedding by combining the short text content with the various heterogeneous relations in social networks. Specifically, we first establish a network with hashtags as its nodes. Hierarchically, each of the hashtag nodes is associated with a set of tweets and each tweet contains a set of words. Then we devise an embedding model, called Hashtag2Vec, which exploits multiple relations of hashtag-hashtag, hashtag-tweet, tweet-word, and word-word relations based on the hierarchical heterogeneous network. In addition to embedding the hashtags, our proposed framework is capable of embedding the short social texts as well. Extensive experiments are conducted on two real-world datasets, and the results demonstrate the effectiveness of the proposed method.


Author(s):  
Bogumił Kamiński ◽  
Paweł Prałat ◽  
François Théberge

Abstract Graph embedding is the transformation of vertices of a graph into set of vectors. A good embedding should capture the graph topology, vertex-to-vertex relationship and other relevant information about the graph, its subgraphs and vertices. If these objectives are achieved, an embedding is a meaningful, understandable and compressed representations of a network. Finally, vector operations are simpler and faster than comparable operations on graphs. The main challenge is that one needs to make sure that embeddings well describe the properties of the graphs. In particular, a decision has to be made on the embedding dimensionality which highly impacts the quality of an embedding. As a result, selecting the best embedding is a challenging task and very often requires domain experts. In this article, we propose a ‘divergence score’ that can be assigned to embeddings to help distinguish good ones from bad ones. This general framework provides a tool for an unsupervised graph embedding comparison. In order to achieve it, we needed to generalize the well-known Chung-Lu model to incorporate geometry which is an interesting result in its own right. In order to test our framework, we did a number of experiments with synthetic networks as well as real-world networks, and various embedding algorithms.


Author(s):  
Feiping Nie ◽  
Jing Li ◽  
Xuelong Li

In multiview learning, it is essential to assign a reasonable weight to each view according to its importance. Thus, for multiview clustering task, a wise and elegant method should achieve clustering multiview data while learning the view weights. In this paper, we address this problem by exploring a Laplacian rank constrained graph, which can be approximately as the centroid of the built graph for each view with different confidences. We start our work with a natural thought that the weights can be learned by introducing a hyperparameter. By analyzing the weakness of it, we further propose a new multiview clustering method which is totally self-weighted. Furthermore, once the target graph is obtained in our models, we can directly assign the cluster label to each data point and do not need any postprocessing such as $K$-means in standard spectral clustering. Evaluations on two synthetic datasets prove the effectiveness of our methods. Compared with several representative graph-based multiview clustering approaches on four real-world datasets, experimental results demonstrate that the proposed methods achieve the better performances and our new clustering method is more practical to use.


Author(s):  
Chunyang Ruan ◽  
Jiangang Ma ◽  
Ye Wang ◽  
Yanchun Zhang ◽  
Yun Yang

Regularities analysis for prescriptions is a significant task for traditional Chinese medicine (TCM), both in inheritance of clinical experience and in improvement of clinical quality. Recently, many methods have been proposed for regularities discovery, but this task is challenging due to the quantity, sparsity and free-style of prescriptions. In this paper, we address the specific problem of regularities discovery and propose a graph embedding based framework for regularities discovery for massive prescriptions. We model this task as a relation prediction in which the correlation of two herbs or of herb and symptom are incorporated to characterize the different relationships. Specifically, we first establish a heterogeneous network with herbs and symptoms as its nodes. We develop a bipartite embedding model termed HS2Vec to detect regularities, which explores multiple relations of herbherb, and herb-symptom based on the heterogeneous network. Experiments on four real-world datasets demonstrate that the proposed framework is very effective for regularities discovery.


Sign in / Sign up

Export Citation Format

Share Document