scholarly journals Interventional Fairness with Indirect Knowledge of Unobserved Protected Attributes

Entropy ◽  
2021 ◽  
Vol 23 (12) ◽  
pp. 1571
Author(s):  
Sainyam Galhotra ◽  
Karthikeyan Shanmugam ◽  
Prasanna Sattigeri ◽  
Kush R. Varshney

The deployment of machine learning (ML) systems in applications with societal impact has motivated the study of fairness for marginalized groups. Often, the protected attribute is absent from the training dataset for legal reasons. However, datasets still contain proxy attributes that capture protected information and can inject unfairness in the ML model. Some deployed systems allow auditors, decision makers, or affected users to report issues or seek recourse by flagging individual samples. In this work, we examined such systems and considered a feedback-based framework where the protected attribute is unavailable and the flagged samples are indirect knowledge. The reported samples were used as guidance to identify the proxy attributes that are causally dependent on the (unknown) protected attribute. We worked under the causal interventional fairness paradigm. Without requiring the underlying structural causal model a priori, we propose an approach that performs conditional independence tests on observed data to identify such proxy attributes. We theoretically proved the optimality of our algorithm, bound its complexity, and complemented it with an empirical evaluation demonstrating its efficacy on various real-world and synthetic datasets.

Author(s):  
Gabrielle Samuel ◽  
Jenn Chubb ◽  
Gemma Derrick

The governance of ethically acceptable research in higher education institutions has been under scrutiny over the past half a century. Concomitantly, recently, decision makers have required researchers to acknowledge the societal impact of their research, as well as anticipate and respond to ethical dimensions of this societal impact through responsible research and innovation principles. Using artificial intelligence population health research in the United Kingdom and Canada as a case study, we combine a mapping study of journal publications with 18 interviews with researchers to explore how the ethical dimensions associated with this societal impact are incorporated into research agendas. Researchers separated the ethical responsibility of their research with its societal impact. We discuss the implications for both researchers and actors across the Ethics Ecosystem.


Genes ◽  
2021 ◽  
Vol 12 (4) ◽  
pp. 527
Author(s):  
Eran Elhaik ◽  
Dan Graur

In the last 15 years or so, soft selective sweep mechanisms have been catapulted from a curiosity of little evolutionary importance to a ubiquitous mechanism claimed to explain most adaptive evolution and, in some cases, most evolution. This transformation was aided by a series of articles by Daniel Schrider and Andrew Kern. Within this series, a paper entitled “Soft sweeps are the dominant mode of adaptation in the human genome” (Schrider and Kern, Mol. Biol. Evolut. 2017, 34(8), 1863–1877) attracted a great deal of attention, in particular in conjunction with another paper (Kern and Hahn, Mol. Biol. Evolut. 2018, 35(6), 1366–1371), for purporting to discredit the Neutral Theory of Molecular Evolution (Kimura 1968). Here, we address an alleged novelty in Schrider and Kern’s paper, i.e., the claim that their study involved an artificial intelligence technique called supervised machine learning (SML). SML is predicated upon the existence of a training dataset in which the correspondence between the input and output is known empirically to be true. Curiously, Schrider and Kern did not possess a training dataset of genomic segments known a priori to have evolved either neutrally or through soft or hard selective sweeps. Thus, their claim of using SML is thoroughly and utterly misleading. In the absence of legitimate training datasets, Schrider and Kern used: (1) simulations that employ many manipulatable variables and (2) a system of data cherry-picking rivaling the worst excesses in the literature. These two factors, in addition to the lack of negative controls and the irreproducibility of their results due to incomplete methodological detail, lead us to conclude that all evolutionary inferences derived from so-called SML algorithms (e.g., S/HIC) should be taken with a huge shovel of salt.


2013 ◽  
Vol 29 (4) ◽  
pp. 511-537 ◽  
Author(s):  
Jeroen Pannekoek ◽  
Sander Scholtus ◽  
Mark Van der Loo

Abstract Data editing is arguably one of the most resource-intensive processes at NSIs. Forced by everincreasing budget pressure, NSIs keep searching for more efficient forms of data editing. Efficiency gains can be obtained by selective editing, that is, limiting the manual editing to influential errors, and by automating the editing process as much as possible. In our view, an optimal mix of these two strategies should be aimed for. In this article we present a decomposition of the overall editing process into a number of different tasks and give an upto- date overview of all the possibilities of automatic editing in terms of these tasks. During the design of an editing process, this decomposition may be helpful in deciding which tasks can be done automatically and for which tasks (additional) manual editing is required. Such decisions can be made a priori, based on the specific nature of the task, or by empirical evaluation, which is illustrated by examples. The decomposition in tasks, or statistical functions, also naturally leads to reuseable components, resulting in efficiency gains in process design.


2019 ◽  
Vol 30 (1) ◽  
pp. 117-139 ◽  
Author(s):  
Clinton Amos ◽  
Sebastian Brockhaus ◽  
Amydee M. Fawcett ◽  
Stanley E. Fawcett ◽  
A. Michael Knemeyer

PurposeThe purpose of this paper is to evaluate how service perceptions influence customer views of the authenticity of corporate sustainability claims. The goal of this paper is to help supply chain decision-makers better understand boundary conditions in order to design more enduring and impactful sustainability programs.Design/methodology/approachThe authors employ behavioral experiments, subjecting two theoretically derived hypotheses to verification across five diverse industries and two distinct sustainability vignettes.FindingsCustomer service perceptions emerge as a significant boundary condition to the perceived authenticity of sustainability efforts. Subjects attributed significantly higher authenticity toward sustainability efforts in above average vs below average service quality contexts. Further, respondents attributed deceptive motivations to sustainability efforts at companies with below average service.Research limitations/implicationsThe authors confirm the underlying tenet of social judgment theory, which suggests thata prioriperceptions create a zone of acceptability or rejection. Ultimately, investing in sustainability can lead to counterproductive cynicism.Practical implicationsThe authors infer that customers’ willingness to give companies credit for sustainability initiatives extends beyond service issues to any practice that influencesa prioriperceptions. Supply chain managers must rethink their role in designing both customer service and sustainability systems to achieve positive returns from sustainability investments.Originality/valueThe authors challenge the assumption that customers universally positively view sustainability efforts. If customers holda priorinegative service perceptions, otherwise well-designed sustainability programs may invoke cynical reactions. Thus, sustainability programs may not inoculate firm reputations from adverse incidents. Given they touch both service and sustainability systems, supply chain managers are positioned to holistically influence their design for competitive advantage.


2020 ◽  
Vol 14 (4) ◽  
pp. 640-652
Author(s):  
Abraham Gale ◽  
Amélie Marian

Ranking functions are commonly used to assist in decision-making in a wide variety of applications. As the general public realizes the significant societal impacts of the widespread use of algorithms in decision-making, there has been a push towards explainability and transparency in decision processes and results, as well as demands to justify the fairness of the processes. In this paper, we focus on providing metrics towards explainability and transparency of ranking functions, with a focus towards making the ranking process understandable, a priori , so that decision-makers can make informed choices when designing their ranking selection process. We propose transparent participation metrics to clarify the ranking process, by assessing the contribution of each parameter used in the ranking function in the creation of the final ranked outcome, using information about the ranking functions themselves, as well as observations of the underlying distributions of the parameter values involved in the ranking. To evaluate the outcome of the ranking process, we propose diversity and disparity metrics to measure how similar the selected objects are to each other, and to the underlying data distribution. We evaluate the behavior of our metrics on synthetic data, as well as on data and ranking functions on two real-world scenarios: high school admissions and decathlon scoring.


Author(s):  
Pan Xu ◽  
Yexuan Shi ◽  
Hao Cheng ◽  
John Dickerson ◽  
Karthik Abinav Sankararaman ◽  
...  

Online bipartite matching and allocation models are widely used to analyze and design markets such as Internet advertising, online labor, and crowdsourcing. Traditionally, vertices on one side of the market are fixed and known a priori, while vertices on the other side arrive online and are matched by a central agent to the offline side. The issue of possible conflicts among offline agents emerges in various real scenarios when we need to match each online agent with a set of offline agents.For example, in event-based social networks (e.g., Meetup), offline events conflict for some users since they will be unable to attend mutually-distant events at proximate times; in advertising markets, two competing firms may prefer not to be shown to one user simultaneously; and in online recommendation systems (e.g., Amazon Books), books of the same type “conflict” with each other in some sense due to the diversity requirement for each online buyer.The conflict nature inherent among certain offline agents raises significant challenges in both modeling and online algorithm design. In this paper, we propose a unifying model, generalizing the conflict models proposed in (She et al., TKDE 2016) and (Chen et al., TKDE 16). Our model can capture not only a broad class of conflict constraints on the offline side (which is even allowed to be sensitive to each online agent), but also allows a general arrival pattern for the online side (which is allowed to change over the online phase). We propose an efficient linear programming (LP) based online algorithm and prove theoretically that it has nearly-optimal online performance. Additionally, we propose two LP-based heuristics and test them against two natural baselines on both real and synthetic datasets. Our LP-based heuristics experimentally dominate the baseline algorithms, aligning with our theoretical predictions and supporting our unified approach.


2020 ◽  
Vol 9 (2) ◽  
pp. 104 ◽  
Author(s):  
Huan Ning ◽  
Zhenlong Li ◽  
Michael E. Hodgson ◽  
Cuizhen (Susan) Wang

This article aims to implement a prototype screening system to identify flooding-related photos from social media. These photos, associated with their geographic locations, can provide free, timely, and reliable visual information about flood events to the decision-makers. This screening system, designed for application to social media images, includes several key modules: tweet/image downloading, flooding photo detection, and a WebGIS application for human verification. In this study, a training dataset of 4800 flooding photos was built based on an iterative method using a convolutional neural network (CNN) developed and trained to detect flooding photos. The system was designed in a way that the CNN can be re-trained by a larger training dataset when more analyst-verified flooding photos are being added to the training set in an iterative manner. The total accuracy of flooding photo detection was 93% in a balanced test set, and the precision ranges from 46–63% in the highly imbalanced real-time tweets. The system is plug-in enabled, permitting flexible changes to the classification module. Therefore, the system architecture and key components may be utilized in other types of disaster events, such as wildfires, earthquakes for the damage/impact assessment.


1982 ◽  
Vol 6 (4) ◽  
pp. 41-49 ◽  
Author(s):  
Nathaniel Jones

For many years, bank decision-makers and academic researchers have recognized the significance of both commercial banks and small businesses to the overall economy of America. However, there appears to be little, if any, statistically valid empirical research dealing with the decision-making processes in commercial banks which commit funds to small businesses. This article deals specifically with the decision-making process of 30 commercial loan decision-makers as they are faced with commercial loan selection decisions concerning Small Business (SBA guaranteed) new business loans.


2020 ◽  
Author(s):  
Robin Stoffer ◽  
Caspar van Leeuwen ◽  
Damian Podareanu ◽  
Valeriu Codreanu ◽  
Menno Veerman ◽  
...  

<p><span>Large-eddy simulation (LES) is an often used technique in the geosciences to simulate turbulent oceanic and atmospheric flows. In LES, the effects of the unresolved turbulence scales on the resolved scales (via the Reynolds stress tensor) have to be parameterized with subgrid models. These subgrid models usually require strong assumptions about the relationship between the resolved flow fields and the Reynolds stress tensor, which are often violated in reality and potentially hamper their accuracy.</span></p><p><span>In this study, using the finite-difference computational fluid dynamics code MicroHH (v2.0) and turbulent channel flow as a test case (friction Reynolds number Re<sub>τ</sub> 590), we incorporated and tested a newly emerging subgrid modelling approach that does not require those assumptions. Instead, it relies on neural networks that are highly non-linear and flexible. Similar to currently used subgrid models, we designed our neural networks such that they can be applied locally in the grid domain: at each grid point the neural networks receive as an input the locally resolved flow fields (u,v,w), rather than the full flow fields. As an output, the neural networks give the Reynolds stress tensor at the considered grid point. This local application integrates well with our simulation code, and is necessary to run our code in parallel within distributed memory systems.</span></p><p><span>To allow our neural networks to learn the relationship between the specified input and output, we created a training dataset that contains ~10.000.000 samples of corresponding inputs and outputs. We derived those samples directly from high-resolution 3D direct numerical simulation (DNS) snapshots of turbulent flow fields. Since the DNS explicitly resolves all the relevant turbulence scales, by downsampling the DNS we were able to derive both the Reynolds stress tensor and the corresponding lower-resolution flow fields typical for LES. In this calculation, we took into account both the discretization and interpolation errors introduced by the finite staggered LES grid. Subsequently, using these samples we optimized the parameters of the neural networks to minimize the difference between the predicted and the ‘true’ output derived from DNS.</span></p><p><span>After that, we tested the performance of our neural networks in two different ways:</span></p><ol><li><span>A priori or offline testing, where we used a withheld part of the training dataset (10%) to test the capability of the neural networks to correctly predict the Reynolds stress tensor for data not used to optimize its parameters. We found that the neural networks were, in general, well able to predict the correct values. </span></li> <li><span>A posteriori or online testing, where we incorporated our neural networks directly into our LES. To keep the total involved computational effort feasible, we strongly enhanced the prediction speed of the neural network by relying on highly optimized matrix-vector libraries. The full successful integration of the neural networks within LES remains challenging though, mainly because the neural networks tend to introduce numerical instability into the LES. We are currently investigating ways to minimize this instability, while maintaining the high accuracy in the a priori test and the high prediction speed.</span></li> </ol>


2020 ◽  
Author(s):  
Zining Yang ◽  
Siyu Zhan ◽  
Mengshu Hou ◽  
Xiaoyang Zeng ◽  
Hao Zhu

The recent pre-trained language model has made great success in many NLP tasks. In this paper, we propose an event extraction system based on the novel pre-trained language model BERT to extract both event trigger and argument. As a deep-learningbased method, the size of the training dataset has a crucial impact on performance. To address the lacking training data problem for event extraction, we further train the pretrained language model with a carefully constructed in-domain corpus to inject event knowledge to our event extraction system with minimal efforts. Empirical evaluation on the ACE2005 dataset shows that injecting event knowledge can significantly improve the performance of event extraction.


Sign in / Sign up

Export Citation Format

Share Document