Unifying Online and Counterfactual Learning to Rank: A Novel Counterfactual Estimator that Effectively Utilizes Online Interventions (Extended Abstract)

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/656 ◽

2021 ◽

Author(s):

Harrie Oosterhuis ◽

Maarten de Rijke

Keyword(s):

Selection Bias ◽

State Of The Art ◽

Learning To Rank ◽

Direct Interaction ◽

Experimental Results ◽

Item Selection ◽

Online Interventions ◽

User Interactions ◽

Position Bias ◽

Ranking Systems

State-of-the-art Learning to Rank (LTR) methods for optimizing ranking systems based on user interactions are divided into online approaches – that learn by direct interaction – and counterfactual approaches – that learn from historical interactions. We propose a novel intervention-aware estimator to bridge this online/counterfactual division. The estimator corrects for the effect of position bias, trust bias, and item-selection bias by using corrections based on the behavior of the logging policy and on online interventions: changes to the logging policy made during the gathering of click data. Our experimental results show that, unlike existing counterfactual LTR methods, the intervention-aware estimator can greatly benefit from online interventions. To the best of our knowledge, this is the first method that is shown to be highly effective in both online and counterfactual scenarios.

Download Full-text

A Systematic Study of Feature Selection Methods for Learning to Rank Algorithms

International Journal of Information Retrieval Research ◽

10.4018/ijirr.2018070104 ◽

2018 ◽

Vol 8 (3) ◽

pp. 46-67 ◽

Cited By ~ 1

Author(s):

Mehrnoush Barani Shirzad ◽

Mohammad Reza Keyvanpour

Keyword(s):

Feature Selection ◽

State Of The Art ◽

Learning To Rank ◽

Future Research ◽

Selection Methods ◽

Ranking Models ◽

Ranking Systems ◽

Efficiency And Effectiveness ◽

Selection For

This article describes how feature selection for learning to rank algorithms has become an interesting issue. While noisy and irrelevant features influence performance, and result in an overfitting problem in ranking systems, reducing the number of features by illuminating irrelevant and noisy features is a solution. Several studies have applied feature selection for learning to rank, which promote efficiency and effectiveness of ranking models. As the number of features and consequently the number of irrelevant and noisy features is increasing, systematic a review of Feature selection for learning to rank methods is required. In this article, a framework to examine research on feature selection for learning to rank (FSLR) is proposed. Under this framework, the authors review the most state-of-the-art methods and suggest several criteria to analyze them. FSLR offers a structured classification of current algorithms for future research to: a) properly select strategies from existing algorithms using certain criteria or b) to find ways to develop existing methodologies.

Download Full-text

Exploiting User-Generated Content to Enrich Web Document Summarization

International Journal of Artificial Intelligence Tools ◽

10.1142/s021821301760017x ◽

2017 ◽

Vol 26 (05) ◽

pp. 1760017

Author(s):

Minh-Tien Nguyen ◽

Duc-Vu Tran ◽

Chien-Xuan Tran ◽

Minh-Le Nguyen

Keyword(s):

State Of The Art ◽

Learning To Rank ◽

Experimental Results ◽

User Generated Content ◽

Document Summarization ◽

Additional Information ◽

Web Document ◽

Social Features ◽

Highlight Extraction

User-generated content such as comments or tweets (also called by social information) following a Web document provides additional information for enriching the content of an event mentioned in sentences. This paper presents a framework named SoSVMRank, which integrates the user-generated content of a Web document to generate a highquality summarization. In order to do that, the summarization was formulated as a learning to rank task, in which comments or tweets are exploited to support sentences in a mutual reinforcement fashion. To model sentence-comment (or tweet) relation, a set of local and social features are proposed. After ranking, top m ranked sentences and comments (or tweets) are selected as the summarization. To validate the efficiency of our framework, sentence and story highlight extraction tasks were taken as a case study on three datasets in two languages, English and Vietnamese. Experimental results indicate that: (i) our new features improve the summary performance of the framework in term of ROUGE-scores compared to state-of-the-art baselines and (ii) the integration of user-generated content benefits single-document summarization.

Download Full-text

Learning from user interactions with rankings

ACM SIGIR Forum ◽

10.1145/3483382.3483402 ◽

2020 ◽

Vol 54 (2) ◽

pp. 1-2

Author(s):

Harrie Oosterhuis

Keyword(s):

Supervised Learning ◽

High Performance ◽

Large Scale ◽

State Of The Art ◽

Learning To Rank ◽

User Preferences ◽

User Preference ◽

Experimental Comparison ◽

The Third ◽

Ranking Systems

Ranking systems form the basis for online search engines and recommendation services. They process large collections of items, for instance web pages or e-commerce products, and present the user with a small ordered selection. The goal of a ranking system is to help a user find the items they are looking for with the least amount of effort. Thus the rankings they produce should place the most relevant or preferred items at the top of the ranking. Learning to rank is a field within machine learning that covers methods which optimize ranking systems w.r.t. this goal. Traditional supervised learning to rank methods utilize expert-judgements to evaluate and learn, however, in many situations such judgements are impossible or infeasible to obtain. As a solution, methods have been introduced that perform learning to rank based on user clicks instead. The difficulty with clicks is that they are not only affected by user preferences, but also by what rankings were displayed. Therefore, these methods have to prevent being biased by other factors than user preference. This thesis concerns learning to rank methods based on user clicks and specifically aims to unify the different families of these methods. The first part of the thesis consists of three chapters that look at online learning to rank algorithms which learn by directly interacting with users. Its first chapter considers large scale evaluation and shows existing methods do not guarantee correctness and user experience, we then introduce a novel method that can guarantee both. The second chapter proposes a novel pairwise method for learning from clicks that contrasts with the previous prevalent dueling-bandit methods. Our experiments show that our pairwise method greatly outperforms the dueling-bandit approach. The third chapter further confirms these findings in an extensive experimental comparison, furthermore, we also show that the theory behind the dueling-bandit approach is unsound w.r.t. deterministic ranking systems. The second part of the thesis consists of four chapters that look at counterfactual learning to rank algorithms which learn from historically logged click data. Its first chapter takes the existing approach and makes it applicable to top- k settings where not all items can be displayed at once. It also shows that state-of-the-art supervised learning to rank methods can be applied in the counterfactual scenario. The second chapter introduces a method that combines the robust generalization of feature-based models with the high-performance specialization of tabular models. The third chapter looks at evaluation and introduces a method for finding the optimal logging policy that collects click data in a way that minimizes the variance of estimated ranking metrics. By applying this method during the gathering of clicks, one can turn counterfactual evaluation into online evaluation. The fourth chapter proposes a novel counterfactual estimator that considers the possibility that the logging policy has been updated during the gathering of click data. As a result, it can learn much more efficiently when deployed in an online scenario where interventions can take place. The resulting approach is thus both online and counterfactual, our experimental results show that its performance matches the state-of-the-art in both the online and the counterfactual scenario. As a whole, the second part of this thesis proposes a framework that bridges many gaps between areas of online, counterfactual, and supervised learning to rank. It has taken approaches, previously considered independent, and unified them into a single methodology for widely applicable and effective learning to rank from user clicks. Awarded by: University of Amsterdam, Amsterdam, The Netherlands. Supervised by: Maarten de Rijke. Available at: https://hdl.handle.net/11245.1/8ff3aa38-97fb-4d2a-8127-a29a03af4d5c.

Download Full-text

Browser Security Attacks and Detection Techniques: A Case of Tabnabbing

Science & Technology Journal ◽

10.22232/stj.2020.08.01.03 ◽

2020 ◽

Vol 8 (1) ◽

pp. 33-41

Author(s):

Dr. S. Sarika ◽

Keyword(s):

Credit Card ◽

State Of The Art ◽

Experimental Results ◽

Detection Technique ◽

Security Attacks ◽

Agent Based ◽

Detection Techniques ◽

Browser Security ◽

Cyber Threats ◽

Multi Agent

Phishing is a malicious and deliberate act of sending counterfeit messages or mimicking a webpage. The goal is either to steal sensitive credentials like login information and credit card details or to install malware on a victim’s machine. Browser-based cyber threats have become one of the biggest concerns in networked architectures. The most prolific form of browser attack is tabnabbing which happens in inactive browser tabs. In a tabnabbing attack, a fake page disguises itself as a genuine page to steal data. This paper presents a multi agent based tabnabbing detection technique. The method detects heuristic changes in a webpage when a tabnabbing attack happens and give a warning to the user. Experimental results show that the method performs better when compared with state of the art tabnabbing detection techniques.

Download Full-text

Unbiased Learning to Rank

ACM Transactions on Information Systems ◽

10.1145/3439861 ◽

2021 ◽

Vol 39 (2) ◽

pp. 1-29

Author(s):

Qingyao Ai ◽

Tao Yang ◽

Huazheng Wang ◽

Jiaxin Mao

Keyword(s):

Online Learning ◽

Theoretical Foundation ◽

Learning To Rank ◽

Research Question ◽

Parameters Estimation ◽

User Interactions ◽

Empirical Performance ◽

Search Data ◽

Two Sides ◽

Important Research Question

How to obtain an unbiased ranking model by learning to rank with biased user feedback is an important research question for IR. Existing work on unbiased learning to rank (ULTR) can be broadly categorized into two groups—the studies on unbiased learning algorithms with logged data, namely, the offline unbiased learning, and the studies on unbiased parameters estimation with real-time user interactions, namely, the online learning to rank. While their definitions of unbiasness are different, these two types of ULTR algorithms share the same goal—to find the best models that rank documents based on their intrinsic relevance or utility. However, most studies on offline and online unbiased learning to rank are carried in parallel without detailed comparisons on their background theories and empirical performance. In this article, we formalize the task of unbiased learning to rank and show that existing algorithms for offline unbiased learning and online learning to rank are just the two sides of the same coin. We evaluate eight state-of-the-art ULTR algorithms and find that many of them can be used in both offline settings and online environments with or without minor modifications. Further, we analyze how different offline and online learning paradigms would affect the theoretical foundation and empirical effectiveness of each algorithm on both synthetic and real search data. Our findings provide important insights and guidelines for choosing and deploying ULTR algorithms in practice.

Download Full-text

Automatic Detection of Discrimination Actions from Social Images

Electronics ◽

10.3390/electronics10030325 ◽

2021 ◽

Vol 10 (3) ◽

pp. 325

Author(s):

Zhihao Wu ◽

Baopeng Zhang ◽

Tianchen Zhou ◽

Yan Li ◽

Jianping Fan

Keyword(s):

Action Recognition ◽

State Of The Art ◽

Automatic Detection ◽

Experimental Results ◽

Practical Approach ◽

Detection And Identification ◽

Art Methods ◽

Image Set ◽

Social Images ◽

Relationship Identification

In this paper, we developed a practical approach for automatic detection of discrimination actions from social images. Firstly, an image set is established, in which various discrimination actions and relations are manually labeled. To the best of our knowledge, this is the first work to create a dataset for discrimination action recognition and relationship identification. Secondly, a practical approach is developed to achieve automatic detection and identification of discrimination actions and relationships from social images. Thirdly, the task of relationship identification is seamlessly integrated with the task of discrimination action recognition into one single network called the Co-operative Visual Translation Embedding++ network (CVTransE++). We also compared our proposed method with numerous state-of-the-art methods, and our experimental results demonstrated that our proposed methods can significantly outperform state-of-the-art approaches.

Download Full-text

PyConvU-Net: a lightweight and multiscale network for biomedical image segmentation

BMC Bioinformatics ◽

10.1186/s12859-020-03943-2 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Changyong Li ◽

Yongxian Fan ◽

Xiaodong Cai

Keyword(s):

Image Segmentation ◽

Deep Learning ◽

State Of The Art ◽

Experimental Results ◽

Actual Situation ◽

Controlled Experiments ◽

Biomedical Image ◽

Segmentation Methods ◽

Art Performance

Abstract Background With the development of deep learning (DL), more and more methods based on deep learning are proposed and achieve state-of-the-art performance in biomedical image segmentation. However, these methods are usually complex and require the support of powerful computing resources. According to the actual situation, it is impractical that we use huge computing resources in clinical situations. Thus, it is significant to develop accurate DL based biomedical image segmentation methods which depend on resources-constraint computing. Results A lightweight and multiscale network called PyConvU-Net is proposed to potentially work with low-resources computing. Through strictly controlled experiments, PyConvU-Net predictions have a good performance on three biomedical image segmentation tasks with the fewest parameters. Conclusions Our experimental results preliminarily demonstrate the potential of proposed PyConvU-Net in biomedical image segmentation with resources-constraint computing.

Download Full-text

Evaluation of recent advances in recommender systems on Arabic content

Journal Of Big Data ◽

10.1186/s40537-021-00420-2 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Mehdi Srifi ◽

Ahmed Oussous ◽

Ayoub Ait Lahcen ◽

Salma Mouline

Keyword(s):

Recommender Systems ◽

High Performance ◽

Large Scale ◽

State Of The Art ◽

Experimental Results ◽

Recent Advances ◽

Research Gap ◽

Text Preprocessing

AbstractVarious recommender systems (RSs) have been developed over recent years, and many of them have concentrated on English content. Thus, the majority of RSs from the literature were compared on English content. However, the research investigations about RSs when using contents in other languages such as Arabic are minimal. The researchers still neglect the field of Arabic RSs. Therefore, we aim through this study to fill this research gap by leveraging the benefit of recent advances in the English RSs field. Our main goal is to investigate recent RSs in an Arabic context. For that, we firstly selected five state-of-the-art RSs devoted originally to English content, and then we empirically evaluated their performance on Arabic content. As a result of this work, we first build four publicly available large-scale Arabic datasets for recommendation purposes. Second, various text preprocessing techniques have been provided for preparing the constructed datasets. Third, our investigation derived well-argued conclusions about the usage of modern RSs in the Arabic context. The experimental results proved that these systems ensure high performance when applied to Arabic content.

Download Full-text

Contour Detection for Fibre of Preserved Szechuan Pickle Based on Dilated Convolution

Applied Sciences ◽

10.3390/app9132684 ◽

2019 ◽

Vol 9 (13) ◽

pp. 2684 ◽

Cited By ~ 1

Author(s):

Hongyang Li ◽

Lizhuang Liu ◽

Zhenqi Han ◽

Dan Zhao

Keyword(s):

Edge Detection ◽

State Of The Art ◽

Contour Detection ◽

Experimental Results ◽

Contour Method ◽

Class Differences ◽

Dilated Convolution ◽

The Mean ◽

Art Performance

Peeling fibre is an indispensable process in the production of preserved Szechuan pickle, the accuracy of which can significantly influence the quality of the products, and thus the contour method of fibre detection, as a core algorithm of the automatic peeling device, is studied. The fibre contour is a kind of non-salient contour, characterized by big intra-class differences and small inter-class differences, meaning that the feature of the contour is not discriminative. The method called dilated-holistically-nested edge detection (Dilated-HED) is proposed to detect the fibre contour, which is built based on the HED network and dilated convolution. The experimental results for our dataset show that the Pixel Accuracy (PA) is 99.52% and the Mean Intersection over Union (MIoU) is 49.99%, achieving state-of-the-art performance.

Download Full-text

Random Forest with Adaptive Local Template for Pedestrian Detection

Mathematical Problems in Engineering ◽

10.1155/2015/767423 ◽

2015 ◽

Vol 2015 ◽

pp. 1-11 ◽

Cited By ~ 2

Author(s):

Tao Xiang ◽

Tao Li ◽

Mao Ye ◽

Zijian Liu

Keyword(s):

Computer Vision ◽

Random Forest ◽

Classification Accuracy ◽

Template Matching ◽

Detection Method ◽

State Of The Art ◽

Pedestrian Detection ◽

Sliding Window ◽

Experimental Results ◽

Training Samples

Pedestrian detection with large intraclass variations is still a challenging task in computer vision. In this paper, we propose a novel pedestrian detection method based on Random Forest. Firstly, we generate a few local templates with different sizes and different locations in positive exemplars. Then, the Random Forest is built whose splitting functions are optimized by maximizing class purity of matching the local templates to the training samples, respectively. To improve the classification accuracy, we adopt a boosting-like algorithm to update the weights of the training samples in a layer-wise fashion. During detection, the trained Random Forest will vote the category when a sliding window is input. Our contributions are the splitting functions based on local template matching with adaptive size and location and iteratively weight updating method. We evaluate the proposed method on 2 well-known challenging datasets: TUD pedestrians and INRIA pedestrians. The experimental results demonstrate that our method achieves state-of-the-art or competitive performance.

Download Full-text