Journal of Artificial Intelligence Research

A Survey of Opponent Modeling in Adversarial Domains

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.12889 ◽

2022 ◽

Vol 73 ◽

pp. 277-327

Author(s):

Samer Nashed ◽

Shlomo Zilberstein

Keyword(s):

Method Comparison ◽

Future Research ◽

Comprehensive Overview ◽

Open Problems ◽

Research Directions ◽

Opponent Modeling ◽

Modeling Systems ◽

Future Research Directions ◽

Partially Observable ◽

New Framework

Opponent modeling is the ability to use prior knowledge and observations in order to predict the behavior of an opponent. This survey presents a comprehensive overview of existing opponent modeling techniques for adversarial domains, many of which must address stochastic, continuous, or concurrent actions, and sparse, partially observable payoff structures. We discuss all the components of opponent modeling systems, including feature extraction, learning algorithms, and strategy abstractions. These discussions lead us to propose a new form of analysis for describing and predicting the evolution of game states over time. We then introduce a new framework that facilitates method comparison, analyze a representative selection of techniques using the proposed framework, and highlight common trends among recently proposed methods. Finally, we list several open problems and discuss future research directions inspired by AI research on opponent modeling and related research in other disciplines.

Download Full-text

Doubly Robust Crowdsourcing

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.13304 ◽

2022 ◽

Vol 73 ◽

pp. 209-229

Author(s):

Chong Liu ◽

Yu-Xiang Wang

Keyword(s):

Large Scale ◽

Amazon Mechanical Turk ◽

Test Time ◽

Fair Price ◽

Data Points ◽

Label Aggregation ◽

Doubly Robust Estimation ◽

Sheer Size ◽

Statistical Estimation Problem ◽

Doubly Robust

Large-scale labeled dataset is the indispensable fuel that ignites the AI revolution as we see today. Most such datasets are constructed using crowdsourcing services such as Amazon Mechanical Turk which provides noisy labels from non-experts at a fair price. The sheer size of such datasets mandates that it is only feasible to collect a few labels per data point. We formulate the problem of test-time label aggregation as a statistical estimation problem of inferring the expected voting score. By imitating workers with supervised learners and using them in a doubly robust estimation framework, we prove that the variance of estimation can be substantially reduced, even if the learner is a poor approximation. Synthetic and real-world experiments show that by combining the doubly robust approach with adaptive worker/item selection rules, we often need much lower label cost to achieve nearly the same accuracy as in the ideal world where all workers label all data points.

Download Full-text

Preferences Single-Peaked on a Tree: Multiwinner Elections and Structural Results

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.12332 ◽

2022 ◽

Vol 73 ◽

pp. 231-276

Author(s):

Dominik Peters ◽

Lan Yu ◽

Hau Chan ◽

Edith Elkind

Keyword(s):

Polynomial Time ◽

Scoring Function ◽

Large Family ◽

Np Hard ◽

Scoring Functions ◽

Winner Determination ◽

Polynomial Time Algorithms ◽

Minimum Number ◽

Number Of Leaves ◽

Positive Results

A preference profile is single-peaked on a tree if the candidate set can be equipped with a tree structure so that the preferences of each voter are decreasing from their top candidate along all paths in the tree. This notion was introduced by Demange (1982), and subsequently Trick (1989b) described an efficient algorithm for deciding if a given profile is single-peaked on a tree. We study the complexity of multiwinner elections under several variants of the Chamberlin–Courant rule for preferences single-peaked on trees. We show that in this setting the egalitarian version of this rule admits a polynomial-time winner determination algorithm. For the utilitarian version, we prove that winner determination remains NP-hard for the Borda scoring function; indeed, this hardness results extends to a large family of scoring functions. However, a winning committee can be found in polynomial time if either the number of leaves or the number of internal vertices of the underlying tree is bounded by a constant. To benefit from these positive results, we need a procedure that can determine whether a given profile is single-peaked on a tree that has additional desirable properties (such as, e.g., a small number of leaves). To address this challenge, we develop a structural approach that enables us to compactly represent all trees with respect to which a given profile is single-peaked. We show how to use this representation to efficiently find the best tree for a given profile for use with our winner determination algorithms: Given a profile, we can efficiently find a tree with the minimum number of leaves, or a tree with the minimum number of internal vertices among trees on which the profile is single-peaked. We then explore the power and limitations of this framework: we develop polynomial-time algorithms to find trees with the smallest maximum degree, diameter, or pathwidth, but show that it is NP-hard to check whether a given profile is single-peaked on a tree that is isomorphic to a given tree, or on a regular tree.

Download Full-text

Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.12440 ◽

2022 ◽

Vol 73 ◽

pp. 173-208

Author(s):

Rodrigo Toro Icarte ◽

Toryn Q. Klassen ◽

Richard Valenzano ◽

Sheila A. McIlraith

Keyword(s):

Reinforcement Learning ◽

Finite State Machine ◽

Expressive Power ◽

State Machine ◽

Function Structure ◽

Efficient Manner ◽

Reward Function ◽

Optimal Policies ◽

Finite State ◽

Reward Functions

Reinforcement learning (RL) methods usually treat reward functions as black boxes. As such, these methods must extensively interact with the environment in order to discover rewards and optimal policies. In most RL applications, however, users have to program the reward function and, hence, there is the opportunity to make the reward function visible – to show the reward function’s code to the RL agent so it can exploit the function’s internal structure to learn optimal policies in a more sample efficient manner. In this paper, we show how to accomplish this idea in two steps. First, we propose reward machines, a type of finite state machine that supports the specification of reward functions while exposing reward function structure. We then describe different methodologies to exploit this structure to support learning, including automated reward shaping, task decomposition, and counterfactual reasoning with off-policy learning. Experiments on tabular and continuous domains, across different tasks and RL agents, show the benefits of exploiting reward structure with respect to sample efficiency and the quality of resultant policies. Finally, by virtue of being a form of finite state machine, reward machines have the expressive power of a regular language and as such support loops, sequences and conditionals, as well as the expression of temporally extended properties typical of linear temporal logic and non-Markovian reward specification.

Download Full-text

Jointly Learning Environments and Control Policies with Projected Stochastic Gradient Ascent

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.13350 ◽

2022 ◽

Vol 73 ◽

pp. 117-171

Author(s):

Adrien Bolland ◽

Ioannis Boukas ◽

Mathias Berger ◽

Damien Ernst

Keyword(s):

Reinforcement Learning ◽

Time Horizon ◽

Learning Algorithm ◽

Gradient Methods ◽

Optimization Techniques ◽

Small Scale ◽

Joint Design ◽

Gradient Ascent ◽

And Control ◽

Reinforcement Learning Algorithm

We consider the joint design and control of discrete-time stochastic dynamical systems over a finite time horizon. We formulate the problem as a multi-step optimization problem under uncertainty seeking to identify a system design and a control policy that jointly maximize the expected sum of rewards collected over the time horizon considered. The transition function, the reward function and the policy are all parametrized, assumed known and differentiable with respect to their parameters. We then introduce a deep reinforcement learning algorithm combining policy gradient methods with model-based optimization techniques to solve this problem. In essence, our algorithm iteratively approximates the gradient of the expected return via Monte-Carlo sampling and automatic differentiation and takes projected gradient ascent steps in the space of environment and policy parameters. This algorithm is referred to as Direct Environment and Policy Search (DEPS). We assess the performance of our algorithm in three environments concerned with the design and control of a mass-spring-damper system, a small-scale off-grid power system and a drone, respectively. In addition, our algorithm is benchmarked against a state-of-the-art deep reinforcement learning algorithm used to tackle joint design and control problems. We show that DEPS performs at least as well or better in all three environments, consistently yielding solutions with higher returns in fewer iterations. Finally, solutions produced by our algorithm are also compared with solutions produced by an algorithm that does not jointly optimize environment and policy parameters, highlighting the fact that higher returns can be achieved when joint optimization is performed.

Download Full-text

Online Relaxation Refinement for Satisficing Planning: On Partial Delete Relaxation, Complete Hill-Climbing, and Novelty Pruning

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.13153 ◽

2022 ◽

Vol 73 ◽

Author(s):

Maximilian Fickert ◽

Jörg Hoffmann

Keyword(s):

Local Minimum ◽

State Of The Art ◽

Convergence Property ◽

Search Space ◽

Search Algorithms ◽

Hill Climbing ◽

Ai Planning ◽

Best First Search ◽

Refinement Procedure ◽

Hill Climbing Algorithms

In classical AI planning, heuristic functions typically base their estimates on a relaxation of the input task. Such relaxations can be more or less precise, and many heuristic functions have a refinement procedure that can be iteratively applied until the desired degree of precision is reached. Traditionally, such refinement is performed offline to instantiate the heuristic for the search. However, a natural idea is to perform such refinement online instead, in situations where the heuristic is not sufficiently accurate. We introduce several online-refinement search algorithms, based on hill-climbing and greedy best-first search. Our hill-climbing algorithms perform a bounded lookahead, proceeding to a state with lower heuristic value than the root state of the lookahead if such a state exists, or refining the heuristic otherwise to remove such a local minimum from the search space surface. These algorithms are complete if the refinement procedure satisfies a suitable convergence property. We transfer the idea of bounded lookaheads to greedy best-first search with a lightweight lookahead after each expansion, serving both as a method to boost search progress and to detect when the heuristic is inaccurate, identifying an opportunity for online refinement. We evaluate our algorithms with the partial delete relaxation heuristic hCFF, which can be refined by treating additional conjunctions of facts as atomic, and whose refinement operation satisfies the convergence property required for completeness. On both the IPC domains as well as on the recently published Autoscale benchmarks, our online-refinement search algorithms significantly beat state-of-the-art satisficing planners, and are competitive even with complex portfolios.

Download Full-text

Ranking Sets of Objects: The Complexity of Avoiding Impossibility Results

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.13030 ◽

2022 ◽

Vol 73 ◽

pp. 1-65

Author(s):

Jan Maly

Keyword(s):

Linear Order ◽

Fundamental Problem ◽

Preference Order ◽

Not Given ◽

Linear Orders ◽

Domain Restriction ◽

Impossibility Results ◽

The Family ◽

Np Complete ◽

Ranking Sets

The problem of lifting a preference order on a set of objects to a preference order on a family of subsets of this set is a fundamental problem with a wide variety of applications in AI. The process is often guided by axioms postulating properties the lifted order should have. Well-known impossibility results by Kannai and Peleg and by Barbera and Pattanaik tell us that some desirable axioms – namely dominance and (strict) independence – are not jointly satisfiable for any linear order on the objects if all non-empty sets of objects are to be ordered. On the other hand, if not all non-empty sets of objects are to be ordered, the axioms are jointly satisfiable for all linear orders on the objects for some families of sets. Such families are very important for applications as they allow for the use of lifted orders, for example, in combinatorial voting. In this paper, we determine the computational complexity of recognizing such families. We show that it is \Pi_2^p-complete to decide for a given family of subsets whether dominance and independence or dominance and strict independence are jointly satisfiable for all linear orders on the objects if the lifted order needs to be total. Furthermore, we show that the problem remains coNP-complete if the lifted order can be incomplete. Additionally, we show that the complexity of these problems can increase exponentially if the family of sets is not given explicitly but via a succinct domain restriction. Finally, we show that it is NP-complete to decide for a family of subsets whether dominance and independence or dominance and strict independence are jointly satisfiable for at least one linear order on the objects.

Download Full-text

Constraint-based Diversification of JOP Gadgets

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.12848 ◽

2021 ◽

Vol 72 ◽

pp. 1471-1505

Author(s):

Rodothea Myrsini Tsoupidi ◽

Roberto Castañeda Lozano ◽

Benoit Baudry

Keyword(s):

Large Scale ◽

Distance Measure ◽

Neighborhood Search ◽

Software Systems ◽

Code Size ◽

Assembly Code ◽

Precise Control ◽

Application Specific ◽

Specific Distance ◽

Binary Programs

Modern software deployment process produces software that is uniform, and hence vulnerable to large-scale code-reuse attacks, such as Jump-Oriented Programming (JOP) attacks. Compiler-based diversification improves the resilience and security of software systems by automatically generating different assembly code versions of a given program. Existing techniques are efficient but do not have a precise control over the quality, such as the code size or speed, of the generated code variants. This paper introduces Diversity by Construction (DivCon), a constraint-based compiler approach to software diversification. Unlike previous approaches, DivCon allows users to control and adjust the conflicting goals of diversity and code quality. A key enabler is the use of Large Neighborhood Search (LNS) to generate highly diverse assembly code efficiently. For larger problems, we propose a combination of LNS with a structural decomposition of the problem. To further improve the diversification efficiency of DivCon against JOP attacks, we propose an application-specific distance measure tailored to the characteristics of JOP attacks. We evaluate DivCon with 20 functions from a popular benchmark suite for embedded systems. These experiments show that DivCon's combination of LNS and our application-specific distance measure generates binary programs that are highly resilient against JOP attacks (they share between 0.15% to 8% of JOP gadgets) with an optimality gap of 10%. Our results confirm that there is a trade-off between the quality of each assembly code version and the diversity of the entire pool of versions. In particular, the experiments show that DivCon is able to generate binary programs that share a very small number of gadgets, while delivering near-optimal code. For constraint programming researchers and practitioners, this paper demonstrates that LNS is a valuable technique for finding diverse solutions. For security researchers and software engineers, DivCon extends the scope of compiler-based diversification to performance-critical and resource-constrained applications.

Download Full-text

Learning from Disagreement: A Survey

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.12752 ◽

2021 ◽

Vol 72 ◽

pp. 1385-1470

Author(s):

Alexandra N. Uma ◽

Tommaso Fornaciari ◽

Dirk Hovy ◽

Silviu Paun ◽

Barbara Plank ◽

...

Keyword(s):

Language Processing ◽

Gold Standard ◽

Training Methods ◽

High Quality ◽

Training Models ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Growing Body ◽

Research Questions ◽

Speech Tagging

Many tasks in Natural Language Processing (NLP) and Computer Vision (CV) offer evidence that humans disagree, from objective tasks such as part-of-speech tagging to more subjective tasks such as classifying an image or deciding whether a proposition follows from certain premises. While most learning in artificial intelligence (AI) still relies on the assumption that a single (gold) interpretation exists for each item, a growing body of research aims to develop learning methods that do not rely on this assumption. In this survey, we review the evidence for disagreements on NLP and CV tasks, focusing on tasks for which substantial datasets containing this information have been created. We discuss the most popular approaches to training models from datasets containing multiple judgments potentially in disagreement. We systematically compare these different approaches by training them with each of the available datasets, considering several ways to evaluate the resulting models. Finally, we discuss the results in depth, focusing on four key research questions, and assess how the type of evaluation and the characteristics of a dataset determine the answers to these questions. Our results suggest, first of all, that even if we abandon the assumption of a gold standard, it is still essential to reach a consensus on how to evaluate models. This is because the relative performance of the various training methods is critically affected by the chosen form of evaluation. Secondly, we observed a strong dataset effect. With substantial datasets, providing many judgments by high-quality coders for each item, training directly with soft labels achieved better results than training from aggregated or even gold labels. This result holds for both hard and soft evaluation. But when the above conditions do not hold, leveraging both gold and soft labels generally achieved the best results in the hard evaluation. All datasets and models employed in this paper are freely available as supplementary materials.

Download Full-text

The Rediscovery Hypothesis: Language Models Need to Meet Linguistics

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.12788 ◽

2021 ◽

Vol 72 ◽

pp. 1343-1384

Author(s):

Vassilina Nikoulina ◽

Maxat Tezekbayev ◽

Nuradil Kozhakhmet ◽

Madina Babazhanova ◽

Matthias Gallé ◽

...

Keyword(s):

Language Modeling ◽

Language Models ◽

Linguistic Knowledge ◽

Necessary Condition ◽

Ongoing Debate ◽

Linguistic Information ◽

Information Theoretic ◽

Modern Language ◽

The Impact ◽

Linguistic Structures

There is an ongoing debate in the NLP community whether modern language models contain linguistic knowledge, recovered through so-called probes. In this paper, we study whether linguistic knowledge is a necessary condition for the good performance of modern language models, which we call the rediscovery hypothesis. In the first place, we show that language models that are significantly compressed but perform well on their pretraining objectives retain good scores when probed for linguistic structures. This result supports the rediscovery hypothesis and leads to the second contribution of our paper: an information-theoretic framework that relates language modeling objectives with linguistic information. This framework also provides a metric to measure the impact of linguistic information on the word prediction task. We reinforce our analytical results with various experiments, both on synthetic and on real NLP tasks in English.

Download Full-text

Journal of Artificial Intelligence Research
Latest Publications

TOTAL DOCUMENTS

H-INDEX

Published By Ai Access Foundation

A Survey of Opponent Modeling in Adversarial Domains

Doubly Robust Crowdsourcing

Preferences Single-Peaked on a Tree: Multiwinner Elections and Structural Results

Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning

Jointly Learning Environments and Control Policies with Projected Stochastic Gradient Ascent

Online Relaxation Refinement for Satisficing Planning: On Partial Delete Relaxation, Complete Hill-Climbing, and Novelty Pruning

Ranking Sets of Objects: The Complexity of Avoiding Impossibility Results

Constraint-based Diversification of JOP Gadgets

Learning from Disagreement: A Survey

The Rediscovery Hypothesis: Language Models Need to Meet Linguistics

Export Citation Format

Journal of Artificial Intelligence ResearchLatest Publications

TOTAL DOCUMENTS

H-INDEX

Published By Ai Access Foundation

A Survey of Opponent Modeling in Adversarial Domains

Doubly Robust Crowdsourcing

Preferences Single-Peaked on a Tree: Multiwinner Elections and Structural Results

Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning

Jointly Learning Environments and Control Policies with Projected Stochastic Gradient Ascent

Online Relaxation Refinement for Satisficing Planning: On Partial Delete Relaxation, Complete Hill-Climbing, and Novelty Pruning

Ranking Sets of Objects: The Complexity of Avoiding Impossibility Results

Constraint-based Diversification of JOP Gadgets

Learning from Disagreement: A Survey

The Rediscovery Hypothesis: Language Models Need to Meet Linguistics

Journal of Artificial Intelligence Research
Latest Publications