scholarly journals Counterfactual Fairness: Unidentification, Bound and Algorithm

Author(s):  
Yongkai Wu ◽  
Lu Zhang ◽  
Xintao Wu

Fairness-aware learning studies the problem of building machine learning models that are subject to fairness requirements. Counterfactual fairness is a notion of fairness derived from Pearl's causal model, which considers a model is fair if for a particular individual or group its prediction in the real world is the same as that in the counterfactual world where the individual(s) had belonged to a different demographic group. However, an inherent limitation of counterfactual fairness is that it cannot be uniquely quantified from the observational data in certain situations, due to the unidentifiability of the counterfactual quantity. In this paper, we address this limitation by mathematically bounding the unidentifiable counterfactual quantity, and develop a theoretically sound algorithm for constructing counterfactually fair classifiers. We evaluate our method in the experiments using both synthetic and real-world datasets, as well as compare with existing methods. The results validate our theory and show the effectiveness of our method.

2020 ◽  
Vol 34 (03) ◽  
pp. 2451-2458
Author(s):  
Akansha Bhardwaj ◽  
Jie Yang ◽  
Philippe Cudré-Mauroux

Microblogging platforms such as Twitter are increasingly being used in event detection. Existing approaches mainly use machine learning models and rely on event-related keywords to collect the data for model training. These approaches make strong assumptions on the distribution of the relevant microposts containing the keyword – referred to as the expectation of the distribution – and use it as a posterior regularization parameter during model training. Such approaches are, however, limited as they fail to reliably estimate the informativeness of a keyword and its expectation for model training. This paper introduces a Human-AI loop approach to jointly discover informative keywords for model training while estimating their expectation. Our approach iteratively leverages the crowd to estimate both keyword-specific expectation and the disagreement between the crowd and the model in order to discover new keywords that are most beneficial for model training. These keywords and their expectation not only improve the resulting performance but also make the model training process more transparent. We empirically demonstrate the merits of our approach, both in terms of accuracy and interpretability, on multiple real-world datasets and show that our approach improves the state of the art by 24.3%.


2022 ◽  
Vol 54 (9) ◽  
pp. 1-36
Author(s):  
Dylan Chou ◽  
Meng Jiang

Data-driven network intrusion detection (NID) has a tendency towards minority attack classes compared to normal traffic. Many datasets are collected in simulated environments rather than real-world networks. These challenges undermine the performance of intrusion detection machine learning models by fitting machine learning models to unrepresentative “sandbox” datasets. This survey presents a taxonomy with eight main challenges and explores common datasets from 1999 to 2020. Trends are analyzed on the challenges in the past decade and future directions are proposed on expanding NID into cloud-based environments, devising scalable models for large network data, and creating labeled datasets collected in real-world networks.


2021 ◽  
Author(s):  
Chih-Kuan Yeh ◽  
Been Kim ◽  
Pradeep Ravikumar

Understanding complex machine learning models such as deep neural networks with explanations is crucial in various applications. Many explanations stem from the model perspective, and may not necessarily effectively communicate why the model is making its predictions at the right level of abstraction. For example, providing importance weights to individual pixels in an image can only express which parts of that particular image is important to the model, but humans may prefer an explanation which explains the prediction by concept-based thinking. In this work, we review the emerging area of concept based explanations. We start by introducing concept explanations including the class of Concept Activation Vectors (CAV) which characterize concepts using vectors in appropriate spaces of neural activations, and discuss different properties of useful concepts, and approaches to measure the usefulness of concept vectors. We then discuss approaches to automatically extract concepts, and approaches to address some of their caveats. Finally, we discuss some case studies that showcase the utility of such concept-based explanations in synthetic settings and real world applications.


2021 ◽  
pp. 164-184
Author(s):  
Saiph Savage ◽  
Carlos Toxtli ◽  
Eber Betanzos-Torres

The artificial intelligence (AI) industry has created new jobs that are essential to the real world deployment of intelligent systems. Part of the job focuses on labelling data for machine learning models or having workers complete tasks that AI alone cannot do. These workers are usually known as ‘crowd workers’—they are part of a large distributed crowd that is jointly (but separately) working on the tasks although they are often invisible to end-users, leading to workers often being paid below minimum wage and having limited career growth. In this chapter, we draw upon the field of human–computer interaction to provide research methods for studying and empowering crowd workers. We present our Computational Worker Leagues which enable workers to work towards their desired professional goals and also supply quantitative information about crowdsourcing markets. This chapter demonstrates the benefits of this approach and highlights important factors to consider when researching the experiences of crowd workers.


Author(s):  
Xueru Zhang ◽  
Mohammad Mahdi Khalili ◽  
Mingyan Liu

Machine learning models developed from real-world data can inherit potential, preexisting bias in the dataset. When these models are used to inform decisions involving human beings, fairness concerns inevitably arise. Imposing certain fairness constraints in the training of models can be effective only if appropriate criteria are applied. However, a fairness criterion can be defined/assessed only when the interaction between the decisions and the underlying population is well understood. We introduce two feedback models describing how people react when receiving machine-aided decisions and illustrate that some commonly used fairness criteria can end with undesirable consequences while reinforcing discrimination.


Author(s):  
Ramazan Esmeli ◽  
Mohamed Bader-El-Den ◽  
Hassana Abdullahi

AbstractPurchase prediction has an important role for decision-makers in e-commerce to improve consumer experience, provide personalised recommendations and increase revenue. Many works investigated purchase prediction for session logs by analysing users’ behaviour to predict purchase intention after a session has ended. In most cases, e-shoppers prefer to be anonymous while browsing the websites and after a session has ended, identifying users and offering discounts can be challenging. Therefore, after a session ends, predicting purchase intention may not be useful for the e-commerce strategists. In this work, we propose and develop an early purchase prediction framework using advanced machine learning models to investigate how early purchase intention in an ongoing session can be predicted. Since users could be anonymous, this could help to give real-time offers and discounts before the session ends. We use dynamically created session features after each interaction in a session, and propose a utility scoring method to evaluate how early machine learning models can predict the probability of purchase intention. The proposed framework is validated with a real-world dataset. Computational experiments show machine learning models can identify purchase intention early with good performance in terms of Area Under Curve (AUC) score which shows success rate of machine learning models on early purchase prediction.


2020 ◽  
Vol 34 (03) ◽  
pp. 2611-2620
Author(s):  
Abir De ◽  
Paramita Koley ◽  
Niloy Ganguly ◽  
Manuel Gomez-Rodriguez

Decisions are increasingly taken by both humans and machine learning models. However, machine learning models are currently trained for full automation—they are not aware that some of the decisions may still be taken by humans. In this paper, we take a first step towards the development of machine learning models that are optimized to operate under different automation levels. More specifically, we first introduce the problem of ridge regression under human assistance and show that it is NP-hard. Then, we derive an alternative representation of the corresponding objective function as a difference of nondecreasing submodular functions. Building on this representation, we further show that the objective is nondecreasing and satisfies α-submodularity, a recently introduced notion of approximate submodularity. These properties allow a simple and efficient greedy algorithm to enjoy approximation guarantees at solving the problem. Experiments on synthetic and real-world data from two important applications—medical diagnosis and content moderation—demonstrate that the greedy algorithm beats several competitive baselines.


2021 ◽  
Author(s):  
Samuel Genheden ◽  
Agnes Mårdh ◽  
Gustav Lahti ◽  
Ola Engkvist ◽  
Simon Olsson ◽  
...  

We present machine learning models for predicting the chemical context for Buchwald-Hartwig coupling reactions. Using reaction data from in-house electronic lab notebooks, we train two models: one based on single-label data and one based on multi-label data. Both models show excellent top-3 accuracy around 90%, which suggests strong predictivity. There seems to be an advantage of including multi-label data because the multi-label model shows higher accuracy and better sensitivity for the individual contexts than the single-label model. Although the models are performant, we also show that such models need to be re-trained periodically. There is a strong temporal characteristic to the usage of different contexts. Therefore, a model trained on historical data will decrease in usefulness with time as newer and better contexts emerge and replace older ones. We hypothesize that these significant transitions in the context-use will likely affect any model predicting chemical contexts trained on historical data. Consequently, training such models warrants careful planning of what data is used for training and how often the model needs to be re-trained.


2021 ◽  
Vol 23 (06) ◽  
pp. 120-131
Author(s):  
Shubhankar Shubhankar ◽  
◽  
Siddhartha Bhaumik ◽  
Prakash Biswagar ◽  
◽  
...  

Phishing is quite possibly the most appealing technique used by attackers in the point of taking the individual subtleties of unsuspected individuals. Phishing sites are essentially tricks that are used by data fraud hoodlums and fakes. They use spam, fake sites made to look like the first sites, email, and direct messages to trick somebody into sharing significant information, like passwords and secret information. New enemies of phishing techniques are coming out each day, yet attackers think of new ways by focusing on all the new enemies of phishing techniques. So there is an earnest requirement for new strategies for the expectation of phishing sites. The paper portrays the correlation models in the classification of phishing sites for expectation utilizing distinctive Machine learning models. Different models are used for predicting which model gives the best exactness in phishing site classification. All the information is classified as either Benign for substantial Websites or Phish as Phishing Websites. Results have generated that show RF gives the best performance on this dataset for the classification of phishing sites.


2020 ◽  
Vol 2 (1) ◽  
pp. 3-6
Author(s):  
Eric Holloway

Imagination Sampling is the usage of a person as an oracle for generating or improving machine learning models. Previous work demonstrated a general system for using Imagination Sampling for obtaining multibox models. Here, the possibility of importing such models as the starting point for further automatic enhancement is explored.


Sign in / Sign up

Export Citation Format

Share Document