ACM Transactions on the Web
Latest Publications

Many online platforms today (such as Amazon, Netflix, Spotify, LinkedIn, and AirBnB) can be thought of as two-sided markets with producers and customers of goods and services. Traditionally, recommendation services in these platforms have focused on maximizing customer satisfaction by tailoring the results according to the personalized preferences of individual customers. However, our investigation reinforces the fact that such customer-centric design of these services may lead to unfair distribution of exposure to the producers, which may adversely impact their well-being. However, a pure producer-centric design might become unfair to the customers. As more and more people are depending on such platforms to earn a living, it is important to ensure fairness to both producers and customers. In this work, by mapping a fair personalized recommendation problem to a constrained version of the problem of fairly allocating indivisible goods, we propose to provide fairness guarantees for both sides. Formally, our proposed FairRec algorithm guarantees Maxi-Min Share of exposure for the producers, and Envy-Free up to One Item fairness for the customers. Extensive evaluations over multiple real-world datasets show the effectiveness of FairRec in ensuring two-sided fairness while incurring a marginal loss in overall recommendation quality. Finally, we present a modification of FairRec (named as FairRecPlus ) that at the cost of additional computation time, improves the recommendation performance for the customers, while maintaining the same fairness guarantees.

Download Full-text

Queryable Compression on Time-evolving Web and Social Networks with Streaming

ACM Transactions on the Web ◽

10.1145/3495012 ◽

2022 ◽

Vol 16 (2) ◽

pp. 1-21

Author(s):

Michael Nelson ◽

Sridhar Radhakrishnan ◽

Chandra Sekharan ◽

Amlan Chatterjee ◽

Sudhindra Gopal Krishna

Keyword(s):

Data Structure ◽

Adjacency Matrix ◽

Binary Tree ◽

Network Data ◽

Data Sets ◽

Undirected Graphs ◽

Sparse Graphs ◽

Compressed Data ◽

Over Time ◽

Better Than

Time-evolving web and social network graphs are modeled as a set of pages/individuals (nodes) and their arcs (links/relationships) that change over time. Due to their popularity, they have become increasingly massive in terms of their number of nodes, arcs, and lifetimes. However, these graphs are extremely sparse throughout their lifetimes. For example, it is estimated that Facebook has over a billion vertices, yet at any point in time, it has far less than 0.001% of all possible relationships. The space required to store these large sparse graphs may not fit in most main memories using underlying representations such as a series of adjacency matrices or adjacency lists. We propose building a compressed data structure that has a compressed binary tree corresponding to each row of each adjacency matrix of the time-evolving graph. We do not explicitly construct the adjacency matrix, and our algorithms take the time-evolving arc list representation as input for its construction. Our compressed structure allows for directed and undirected graphs, faster arc and neighborhood queries, as well as the ability for arcs and frames to be added and removed directly from the compressed structure (streaming operations). We use publicly available network data sets such as Flickr, Yahoo!, and Wikipedia in our experiments and show that our new technique performs as well or better than our benchmarks on all datasets in terms of compression size and other vital metrics.

Download Full-text

Factorizing Historical User Actions for Next-Day Purchase Prediction

ACM Transactions on the Web ◽

10.1145/3468227 ◽

2022 ◽

Vol 16 (1) ◽

pp. 1-26

Author(s):

Bang Liu ◽

Hanlin Zhang ◽

Linglong Kong ◽

Di Niu

Keyword(s):

User Behavior ◽

Purchase Behavior ◽

Time Decay ◽

Music Recommendation ◽

Unified Framework ◽

Transaction Data ◽

Proposed Model ◽

Recommendation Algorithms ◽

Real World Datasets ◽

User Actions

It is common practice for many large e-commerce operators to analyze daily logged transaction data to predict customer purchase behavior, which may potentially lead to more effective recommendations and increased sales. Traditional recommendation techniques based on collaborative filtering, although having gained success in video and music recommendation, are not sufficient to fully leverage the diverse information contained in the implicit user behavior on e-commerce platforms. In this article, we analyze user action records in the Alibaba Mobile Recommendation dataset from the Alibaba Tianchi Data Lab, as well as the Retailrocket recommender system dataset from the Retail Rocket website. To estimate the probability that a user will purchase a certain item tomorrow, we propose a new model called Time-decayed Multifaceted Factorizing Personalized Markov Chains (Time-decayed Multifaceted-FPMC), taking into account multiple types of user historical actions not only limited to past purchases but also including various behaviors such as clicks, collects and add-to-carts. Our model also considers the time-decay effect of the influence of past actions. To learn the parameters in the proposed model, we further propose a unified framework named Bayesian Sparse Factorization Machines. It generalizes the theory of traditional Factorization Machines to a more flexible learning structure and trains the Time-decayed Multifaceted-FPMC with the Markov Chain Monte Carlo method. Extensive evaluations based on multiple real-world datasets demonstrate that our proposed approaches significantly outperform various existing purchase recommendation algorithms.

Download Full-text

A Large-scale Empirical Analysis of Browser Fingerprints Properties for Web Authentication

ACM Transactions on the Web ◽

10.1145/3478026 ◽

2022 ◽

Vol 16 (1) ◽

pp. 1-62

Author(s):

Nampoina Andriamilanto ◽

Tristan Allard ◽

Gaëtan Le Guelvouit ◽

Alexandre Garel

Keyword(s):

Error Rate ◽

Large Scale ◽

Web Security ◽

Biometric Authentication ◽

Equal Error Rate ◽

Current State ◽

Large Scale Dataset ◽

Time Partitioning ◽

Digital Fingerprints ◽

Collection Time

Modern browsers give access to several attributes that can be collected to form a browser fingerprint. Although browser fingerprints have primarily been studied as a web tracking tool, they can contribute to improve the current state of web security by augmenting web authentication mechanisms. In this article, we investigate the adequacy of browser fingerprints for web authentication. We make the link between the digital fingerprints that distinguish browsers, and the biological fingerprints that distinguish Humans, to evaluate browser fingerprints according to properties inspired by biometric authentication factors. These properties include their distinctiveness, their stability through time, their collection time, their size, and the accuracy of a simple verification mechanism. We assess these properties on a large-scale dataset of 4,145,408 fingerprints composed of 216 attributes and collected from 1,989,365 browsers. We show that, by time-partitioning our dataset, more than 81.3% of our fingerprints are shared by a single browser. Although browser fingerprints are known to evolve, an average of 91% of the attributes of our fingerprints stay identical between two observations, even when separated by nearly six months. About their performance, we show that our fingerprints weigh a dozen of kilobytes and take a few seconds to collect. Finally, by processing a simple verification mechanism, we show that it achieves an equal error rate of 0.61%. We enrich our results with the analysis of the correlation between the attributes and their contribution to the evaluated properties. We conclude that our browser fingerprints carry the promise to strengthen web authentication mechanisms.

Download Full-text

Context-aware Distance Measures for Dynamic Networks

ACM Transactions on the Web ◽

10.1145/3476228 ◽

2022 ◽

Vol 16 (1) ◽

pp. 1-34

Author(s):

Yiji Zhao ◽

Youfang Lin ◽

Zhihao Wu ◽

Yang Wang ◽

Haomin Wen

Keyword(s):

Large Scale ◽

Dynamic Networks ◽

Distance Measures ◽

General Definition ◽

Mathematical Representation ◽

Network Clustering ◽

Context Aware ◽

Network Distance ◽

Practical Applications ◽

Spectral Distance

Dynamic networks are widely used in the social, physical, and biological sciences as a concise mathematical representation of the evolving interactions in dynamic complex systems. Measuring distances between network snapshots is important for analyzing and understanding evolution processes of dynamic systems. To the best of our knowledge, however, existing network distance measures are designed for static networks. Therefore, when measuring the distance between any two snapshots in dynamic networks, valuable context structure information existing in other snapshots is ignored. To guide the construction of context-aware distance measures, we propose a context-aware distance paradigm, which introduces context information to enrich the connotation of the general definition of network distance measures. A Context-aware Spectral Distance (CSD) is then given as an instance of the paradigm by constructing a context-aware spectral representation to replace the core component of traditional Spectral Distance (SD). In a node-aligned dynamic network, the context effectively helps CSD gain mainly advantages over SD as follows: (1) CSD is not affected by isospectral problems; (2) CSD satisfies all the requirements of a metric, while SD cannot; and (3) CSD is computationally efficient. In order to process large-scale networks, we develop a kCSD that computes top- k eigenvalues to further reduce the computational complexity of CSD. Although kCSD is a pseudo-metric, it retains most of the advantages of CSD. Experimental results in two practical applications, i.e., event detection and network clustering in dynamic networks, show that our context-aware spectral distance performs better than traditional spectral distance in terms of accuracy, stability, and computational efficiency. In addition, context-aware spectral distance outperforms other baseline methods.

Download Full-text

How Do Home Computer Users Browse the Web?

ACM Transactions on the Web ◽

10.1145/3473343 ◽

2022 ◽

Vol 16 (1) ◽

pp. 1-27

Author(s):

Kyle Crichton ◽

Nicolas Christin ◽

Lorrie Faith Cranor

Keyword(s):

Internet Use ◽

Computer Users ◽

Future Research ◽

Small Scale ◽

The Internet ◽

Home Computer ◽

Security Behavior ◽

Browsing Behavior ◽

The Web ◽

Analyze Data

With the ubiquity of web tracking, information on how people navigate the internet is abundantly collected yet, due to its proprietary nature, rarely distributed. As a result, our understanding of user browsing primarily derives from small-scale studies conducted more than a decade ago. To provide an broader updated perspective, we analyze data from 257 participants who consented to have their home computer and browsing behavior monitored through the Security Behavior Observatory. Compared to previous work, we find a substantial increase in tabbed browsing and demonstrate the need to include tab information for accurate web measurements. Our results confirm that user browsing is highly centralized, with 50% of internet use spent on 1% of visited websites. However, we also find that users spend a disproportionate amount of time on low-visited websites, areas with a greater likelihood of containing risky content. We then identify the primary gateways to these sites and discuss implications for future research.

Download Full-text

On the Aggression Diffusion Modeling and Minimization in Twitter

ACM Transactions on the Web ◽

10.1145/3486218 ◽

2022 ◽

Vol 16 (1) ◽

pp. 1-24

Author(s):

Marinos Poiitis ◽

Athena Vakali ◽

Nicolas Kourtellis

Keyword(s):

Online Social Networks ◽

Important Research ◽

Diffusion Modeling ◽

Propagation Modeling ◽

Twitter Data ◽

Edge Weighting ◽

Linear Threshold ◽

Important Research Topic ◽

The Social ◽

Independent Cascade

Aggression in online social networks has been studied mostly from the perspective of machine learning, which detects such behavior in a static context. However, the way aggression diffuses in the network has received little attention as it embeds modeling challenges. In fact, modeling how aggression propagates from one user to another is an important research topic, since it can enable effective aggression monitoring, especially in media platforms, which up to now apply simplistic user blocking techniques. In this article, we address aggression propagation modeling and minimization in Twitter, since it is a popular microblogging platform at which aggression had several onsets. We propose various methods building on two well-known diffusion models, Independent Cascade ( IC ) and Linear Threshold ( LT ), to study the aggression evolution in the social network. We experimentally investigate how well each method can model aggression propagation using real Twitter data, while varying parameters, such as seed users selection, graph edge weighting, users’ activation timing, and so on. It is found that the best performing strategies are the ones to select seed users with a degree-based approach, weigh user edges based on their social circles’ overlaps, and activate users according to their aggression levels. We further employ the best performing models to predict which ordinary real users could become aggressive (and vice versa) in the future, and achieve up to AUC = 0.89 in this prediction task. Finally, we investigate aggression minimization by launching competitive cascades to “inform” and “heal” aggressors. We show that IC and LT models can be used in aggression minimization, providing less intrusive alternatives to the blocking techniques currently employed by Twitter.

Download Full-text

A Large-scale Empirical Analysis of Ransomware Activities in Bitcoin

ACM Transactions on the Web ◽

10.1145/3494557 ◽

2022 ◽

Vol 16 (2) ◽

pp. 1-29

Author(s):

Kai Wang ◽

Jun Pang ◽

Dingjie Chen ◽

Yu Zhao ◽

Dapeng Huang ◽

...

Keyword(s):

Empirical Analysis ◽

Large Scale ◽

Classification Model ◽

Scale Analysis ◽

Clustering Method ◽

Fine Grained ◽

Large Scale Analysis ◽

The Impact ◽

Transfer Trajectories

Exploiting the anonymous mechanism of Bitcoin, ransomware activities demanding ransom in bitcoins have become rampant in recent years. Several existing studies quantify the impact of ransomware activities, mostly focusing on the amount of ransom. However, victims’ reactions in Bitcoin that can well reflect the impact of ransomware activities are somehow largely neglected. Besides, existing studies track ransom transfers at the Bitcoin address level, making it difficult for them to uncover the patterns of ransom transfers from a macro perspective beyond Bitcoin addresses. In this article, we conduct a large-scale analysis of ransom payments, ransom transfers, and victim migrations in Bitcoin from 2012 to 2021. First, we develop a fine-grained address clustering method to cluster Bitcoin addresses into users, which enables us to identify more addresses controlled by ransomware criminals. Second, motivated by the fact that Bitcoin activities and their participants already formed stable industries, such as Darknet and Miner , we train a multi-label classification model to identify the industry identifiers of users. Third, we identify ransom payment transactions and then quantify the amount of ransom and the number of victims in 63 ransomware activities. Finally, after we analyze the trajectories of ransom transferred across different industries and track victims’ migrations across industries, we find out that to obscure the purposes of their transfer trajectories, most ransomware criminals (e.g., operators of Locky and Wannacry) prefer to spread ransom into multiple industries instead of utilizing the services of Bitcoin mixers. Compared with other industries, Investment is highly resilient to ransomware activities in the sense that the number of users in Investment remains relatively stable. Moreover, we also observe that a few victims become active in the Darknet after paying ransom. Our findings in this work can help authorities deeply understand ransomware activities in Bitcoin. While our study focuses on ransomware, our methods are potentially applicable to other cybercriminal activities that have similarly adopted bitcoins as their payments.

Download Full-text

Measuring International Online Human Values with Word Embeddings

ACM Transactions on the Web ◽

10.1145/3501306 ◽

2022 ◽

Vol 16 (2) ◽

pp. 1-38

Author(s):

Gabriel Magno ◽

Virgilio Almeida

Keyword(s):

Social Sciences ◽

Domain Knowledge ◽

Human Values ◽

The Internet ◽

World Values Survey ◽

Values And Beliefs ◽

Highly Correlated ◽

Global Platform ◽

Online Sources ◽

Source Of Information

As the Internet grows in number of users and in the diversity of services, it becomes more influential on peoples lives. It has the potential of constructing or modifying the opinion, the mental perception, and the values of individuals. What is being created and published online is a reflection of people’s values and beliefs. As a global platform, the Internet is a great source of information for researching the online culture of many different countries. In this work we develop a methodology for measuring data from textual online sources using word embedding models, to create a country-based online human values index that captures cultural traits and values worldwide. Our methodology is applied with a dataset of 1.7 billion tweets, and then we identify their location among 59 countries. We create a list of 22 Online Values Inquiries (OVI) , each one capturing different questions from the World Values Survey, related to several values such as religion, science, and abortion. We observe that our methodology is indeed capable of capturing human values online for different counties and different topics. We also show that some online values are highly correlated (up to c = 0.69, p < 0.05) with the corresponding offline values, especially religion-related ones. Our method is generic, and we believe it is useful for social sciences specialists, such as demographers and sociologists, that can use their domain knowledge and expertise to create their own Online Values Inquiries, allowing them to analyze human values in the online environment.

Download Full-text

Cookie Banners and Privacy Policies: Measuring the Impact of the GDPR on the Web

ACM Transactions on the Web ◽

10.1145/3466722 ◽

2021 ◽

Vol 15 (4) ◽

pp. 1-42

Author(s):

Michael Kretschmer ◽

Jan Pennekamp ◽

Klaus Wehrle

Keyword(s):

Data Processing ◽

Data Protection ◽

Interface Design ◽

Service Providers ◽

Personal Data ◽

Online Services ◽

General Data Protection Regulation ◽

Opt Out ◽

Public Data ◽

The Impact

The General Data Protection Regulation (GDPR) is in effect since May of 2018. As one of the most comprehensive pieces of legislation concerning privacy, it sparked a lot of discussion on the effect it would have on users and providers of online services in particular, due to the large amount of personal data processed in this context. Almost three years later, we are interested in revisiting this question to summarize the impact this new regulation has had on actors in the World Wide Web. Using Scopus, we obtain a vast corpus of academic work to survey studies related to changes on websites since and around the time the GDPR went into force. Our findings show that the emphasis on privacy increased w.r.t. online services, but plenty potential for improvements remains. Although online services are on average more transparent regarding data processing practices in their public data policies, a majority of these policies still either lack information required by the GDPR (e.g., contact information for users to file privacy inquiries) or do not provide this information in a user-friendly form. Additionally, we summarize that online services more often provide means for their users to opt out of data processing, but regularly obstruct convenient access to such means through unnecessarily complex and sometimes illegitimate interface design. Our survey further details that this situation contradicts the preferences expressed by users both verbally and through their actions, and researchers have proposed multiple approaches to facilitate GDPR-conform data processing without negatively impacting the user experience. Thus, we compiled reoccurring points of criticism by privacy researchers and data protection authorities into a list of four guidelines for service providers to consider.

Download Full-text

ACM Transactions on the WebLatest Publications

TOTAL DOCUMENTS

H-INDEX

Published By Association For Computing Machinery

Toward Fair Recommendation in Two-sided Platforms

Queryable Compression on Time-evolving Web and Social Networks with Streaming

Factorizing Historical User Actions for Next-Day Purchase Prediction

A Large-scale Empirical Analysis of Browser Fingerprints Properties for Web Authentication

Context-aware Distance Measures for Dynamic Networks

How Do Home Computer Users Browse the Web?

On the Aggression Diffusion Modeling and Minimization in Twitter

A Large-scale Empirical Analysis of Ransomware Activities in Bitcoin

Measuring International Online Human Values with Word Embeddings

Cookie Banners and Privacy Policies: Measuring the Impact of the GDPR on the Web

ACM Transactions on the Web
Latest Publications