Discovering Reliable Dependencies from Data: Hardness and Improved Algorithms

The reliable fraction of information is an attractive score for quantifying (functional) dependencies in high-dimensional data. In this paper, we systematically explore the algorithmic implications of using this measure for optimization. We show that the problem is NP-hard, justifying worst-case exponential-time as well as heuristic search methods. We then substantially improve the practical performance for both optimization styles by deriving a novel admissible bounding function that has an unbounded potential for additional pruning over the previously proposed one. Finally, we empirically investigate the approximation ratio of the greedy algorithm and show that it produces highly competitive results in a fraction of time needed for complete branch-and-bound style search.

Download Full-text

Discovering dependencies with reliable mutual information

Knowledge and Information Systems ◽

10.1007/s10115-020-01494-9 ◽

2020 ◽

Vol 62 (11) ◽

pp. 4223-4253

Author(s):

Panagiotis Mandros ◽

Mario Boley ◽

Jilles Vreeken

Keyword(s):

Mutual Information ◽

Heuristic Search ◽

Empirical Evaluation ◽

Functional Dependencies ◽

Greedy Search ◽

Worst Case ◽

Information Theoretic ◽

Information Score ◽

Heuristic Search Methods ◽

Target Attributes

Abstract We consider the task of discovering functional dependencies in data for target attributes of interest. To solve it, we have to answer two questions: How do we quantify the dependency in a model-agnostic and interpretable way as well as reliably against sample size and dimensionality biases? How can we efficiently discover the exact or $$\alpha $$ α -approximate top-k dependencies? We address the first question by adopting information-theoretic notions. Specifically, we consider the mutual information score, for which we propose a reliable estimator that enables robust optimization in high-dimensional data. To address the second question, we then systematically explore the algorithmic implications of using this measure for optimization. We show the problem is NP-hard and justify worst-case exponential-time as well as heuristic search methods. We propose two bounding functions for the estimator, which we use as pruning criteria in branch-and-bound search to efficiently mine dependencies with approximation guarantees. Empirical evaluation shows that the derived estimator has desirable statistical properties, the bounding functions lead to effective exact and greedy search algorithms, and when combined, qualitative experiments show the framework indeed discovers highly informative dependencies.

Download Full-text

Scrubbing During Learning In Real-time Heuristic Search

Journal of Artificial Intelligence Research ◽

10.1613/jair.4908 ◽

2016 ◽

Vol 57 ◽

pp. 307-343 ◽

Cited By ~ 2

Author(s):

Nathan R. Sturtevant ◽

Vadim Bulitko

Keyword(s):

State Space ◽

Real Time ◽

Lower Bounds ◽

Heuristic Search ◽

Search Algorithm ◽

The State ◽

Worst Case ◽

Heuristic Search Algorithm ◽

Practical Performance ◽

Theoretical Results

Real-time agent-centered heuristic search is a well-studied problem where an agent that can only reason locally about the world must travel to a goal location using bounded computation and memory at each step. Many algorithms have been proposed for this problem and theoretical results have also been derived for the worst-case performance with simple examples demonstrating worst-case performance in practice. Lower bounds, however, have not been widely studied. In this paper we study best-case performance more generally and derive theoretical lower bounds for reaching the goal using LRTA*, a canonical example of a real-time agent-centered heuristic search algorithm. The results show that, given some reasonable restrictions on the state space and the heuristic function, the number of steps an LRTA*-like algorithm requires to reach the goal will grow asymptotically faster than the state space, resulting in ``scrubbing'' where the agent repeatedly visits the same state. We then show that while the asymptotic analysis does not hold for more complex real-time search algorithms, experimental results suggest that it is still descriptive of practical performance.

Download Full-text

A Branch-and-Bound Algorithm to Compute the Worst-Case Norm of Uncertain Linear Systems under Inputs with Magnitude and Rate Constraints

2006 SICE-ICASE International Joint Conference ◽

10.1109/sice.2006.315656 ◽

2006 ◽

Cited By ~ 2

Author(s):

Wathanyoo Khaisongkram ◽

David Banjerdpongchai

Keyword(s):

Linear Systems ◽

Branch And Bound ◽

Branch And Bound Algorithm ◽

Worst Case ◽

Uncertain Linear Systems ◽

Rate Constraints

Download Full-text

The CDP: A Unifying Formulation for Heuristic Search, Dynamic Programming, and Branch-and-Bound

Search in Artificial Intelligence ◽

10.1007/978-1-4613-8788-6_1 ◽

1988 ◽

pp. 1-27 ◽

Cited By ~ 5

Author(s):

Vipin Kumar ◽

Laveen N. Kanal

Keyword(s):

Dynamic Programming ◽

Branch And Bound ◽

Heuristic Search

Download Full-text

Improved High Dimensional Discrete Bayesian Network Inference using Triplet Region Construction

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.12198 ◽

2020 ◽

Vol 69 ◽

pp. 231-295

Author(s):

Peng Lin ◽

Martin Neil ◽

Norman Fenton

Keyword(s):

Network Inference ◽

General Purpose ◽

Space Complexity ◽

High Dimensional ◽

Choice Problem ◽

Worst Case ◽

Exact Inference ◽

Tree Width ◽

Inference Methods ◽

Bayesian Network Inference

Performing efficient inference on high dimensional discrete Bayesian Networks (BNs) is challenging. When using exact inference methods the space complexity can grow exponentially with the tree-width, thus making computation intractable. This paper presents a general purpose approximate inference algorithm, based on a new region belief approximation method, called Triplet Region Construction (TRC). TRC reduces the cluster space complexity for factorized models from worst-case exponential to polynomial by performing graph factorization and producing clusters of limited size. Unlike previous generations of region-based algorithms, TRC is guaranteed to converge and effectively addresses the region choice problem that bedevils other region-based algorithms used for BN inference. Our experiments demonstrate that it also achieves significantly more accurate results than competing algorithms.

Download Full-text

Results on a Super Strong Exponential Time Hypothesis

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i09.7125 ◽

2020 ◽

Vol 34 (09) ◽

pp. 13700-13703

Author(s):

Nikhil Vyas ◽

Ryan Williams

Keyword(s):

Randomized Algorithm ◽

Time Algorithm ◽

Critical Threshold ◽

Polynomial Method ◽

Variable Ratio ◽

Sat Solving ◽

Worst Case ◽

Exponential Time ◽

Exponential Time Hypothesis ◽

Special Case

All known SAT-solving paradigms (backtracking, local search, and the polynomial method) only yield a 2n(1−1/O(k)) time algorithm for solving k-SAT in the worst case, where the big-O constant is independent of k. For this reason, it has been hypothesized that k-SAT cannot be solved in worst-case 2n(1−f(k)/k) time, for any unbounded ƒ : ℕ → ℕ. This hypothesis has been called the “Super-Strong Exponential Time Hypothesis” (Super Strong ETH), modeled after the ETH and the Strong ETH. We prove two results concerning the Super-Strong ETH:1. It has also been hypothesized that k-SAT is hard to solve for randomly chosen instances near the “critical threshold”, where the clause-to-variable ratio is 2k ln 2 −Θ(1). We give a randomized algorithm which refutes the Super-Strong ETH for the case of random k-SAT and planted k-SAT for any clause-to-variable ratio. In particular, given any random k-SAT instance F with n variables and m clauses, our algorithm decides satisfiability for F in 2n(1−Ω( log k)/k) time, with high probability (over the choice of the formula and the randomness of the algorithm). It turns out that a well-known algorithm from the literature on SAT algorithms does the job: the PPZ algorithm of Paturi, Pudlak, and Zane (1998).2. The Unique k-SAT problem is the special case where there is at most one satisfying assignment. It is natural to hypothesize that the worst-case (exponential-time) complexity of Unique k-SAT is substantially less than that of k-SAT. Improving prior reductions, we show the time complexities of Unique k-SAT and k-SAT are very tightly related: if Unique k-SAT is in 2n(1−f(k)/k) time for an unbounded f, then k-SAT is in 2n(1−f(k)(1−ɛ)/k) time for every ɛ > 0. Thus, refuting Super Strong ETH in the unique solution case would refute Super Strong ETH in general.

Download Full-text

Heuristic Search Methods for a Segment Based Continuous Speech Recognizer

Speech Recognition and Coding ◽

10.1007/978-3-642-57745-1_6 ◽

1995 ◽

pp. 60-63

Author(s):

Nick Cremelie ◽

Jean-Pierre Martens

Keyword(s):

Heuristic Search ◽

Continuous Speech ◽

Search Methods ◽

Heuristic Search Methods ◽

Speech Recognizer

Download Full-text

iOpt: A Software Toolkit for Heuristic Search Methods

Principles and Practice of Constraint Programming — CP 2001 - Lecture Notes in Computer Science ◽

10.1007/3-540-45578-7_58 ◽

2001 ◽

pp. 716-729 ◽

Cited By ~ 26

Author(s):

Christos Voudouris ◽

Raphael Dorne ◽

David Lesaint ◽

Anne Liret

Keyword(s):

Heuristic Search ◽

Search Methods ◽

Software Toolkit ◽

Heuristic Search Methods

Download Full-text

A Study on Darwinian Crow Search Algorithm for Multilevel Thresholding

International Journal of Image and Graphics ◽

10.1142/s0219467822500127 ◽

2021 ◽

pp. 2250012

Author(s):

Ehsan Ehsaeyan ◽

Alireza Zolghadrasli

Keyword(s):

Heuristic Search ◽

Search Algorithm ◽

Search Space ◽

Energy Curve ◽

Local Optimum ◽

Multilevel Thresholding ◽

Curve Method ◽

Heuristic Search Methods ◽

Multi Level ◽

Computationally Expensive

Multilevel thresholding is a basic method in image segmentation. The conventional image multilevel thresholding algorithms are computationally expensive when the number of decomposed segments is high. In this paper, a novel and powerful technique is suggested for Crow Search Algorithm (CSA) devoted to segmentation applications. The main contribution of our work is to adapt Darwinian evolutionary theory with heuristic CSA. First, the population is divided into specified groups and each group tries to find better location in the search space. A policy of encouragement and punishment is set on searching agents to avoid being trapped in the local optimum and premature solutions. Moreover, to increase the convergence rate of the proposed method, a gray-scale map is applied to out-boundary agents. Ten test images are selected to measure the ability of our algorithm, compared with the famous procedure, energy curve method. Two popular entropies i.e. Otsu and Kapur are employed to evaluate the capability of the introduced algorithm. Eight different search algorithms are implemented and compared to the introduced method. The obtained results show that our method, compared with the original CSA, and other heuristic search methods, can extract multi-level thresholding more efficiently.

Download Full-text

A comparative representation approach to modern heuristic search methods in a job shop

International Journal of Logistics Economics and Globalisation ◽

10.1504/ijleg.2008.023170 ◽

2008 ◽

Vol 1 (3/4) ◽

pp. 396

Author(s):

P.D.D. Dominic ◽

Ahmad Kamil Bin Mahmood ◽

P. Parthiban ◽

S.C. Lenny Koh

Keyword(s):

Heuristic Search ◽

Job Shop ◽

Search Methods ◽

Heuristic Search Methods

Download Full-text