scholarly journals Discovering Reliable Dependencies from Data: Hardness and Improved Algorithms

Author(s):  
Panagiotis Mandros ◽  
Mario Boley ◽  
Jilles Vreeken

The reliable fraction of information is an attractive score for quantifying (functional) dependencies in high-dimensional data. In this paper, we systematically explore the algorithmic implications of using this measure for optimization. We show that the problem is NP-hard, justifying worst-case exponential-time as well as heuristic search methods. We then substantially improve the practical performance for both optimization styles by deriving a novel admissible bounding function that has an unbounded potential for additional pruning over the previously proposed one. Finally, we empirically investigate the approximation ratio of the greedy algorithm and show that it produces highly competitive results in a fraction of time needed for complete branch-and-bound style search.

2020 ◽  
Vol 62 (11) ◽  
pp. 4223-4253
Author(s):  
Panagiotis Mandros ◽  
Mario Boley ◽  
Jilles Vreeken

Abstract We consider the task of discovering functional dependencies in data for target attributes of interest. To solve it, we have to answer two questions: How do we quantify the dependency in a model-agnostic and interpretable way as well as reliably against sample size and dimensionality biases? How can we efficiently discover the exact or $$\alpha $$ α -approximate top-k dependencies? We address the first question by adopting information-theoretic notions. Specifically, we consider the mutual information score, for which we propose a reliable estimator that enables robust optimization in high-dimensional data. To address the second question, we then systematically explore the algorithmic implications of using this measure for optimization. We show the problem is NP-hard and justify worst-case exponential-time as well as heuristic search methods. We propose two bounding functions for the estimator, which we use as pruning criteria in branch-and-bound search to efficiently mine dependencies with approximation guarantees. Empirical evaluation shows that the derived estimator has desirable statistical properties, the bounding functions lead to effective exact and greedy search algorithms, and when combined, qualitative experiments show the framework indeed discovers highly informative dependencies.


2016 ◽  
Vol 57 ◽  
pp. 307-343 ◽  
Author(s):  
Nathan R. Sturtevant ◽  
Vadim Bulitko

Real-time agent-centered heuristic search is a well-studied problem where an agent that can only reason locally about the world must travel to a goal location using bounded computation and memory at each step. Many algorithms have been proposed for this problem and theoretical results have also been derived for the worst-case performance with simple examples demonstrating worst-case performance in practice. Lower bounds, however, have not been widely studied. In this paper we study best-case performance more generally and derive theoretical lower bounds for reaching the goal using LRTA*, a canonical example of a real-time agent-centered heuristic search algorithm. The results show that, given some reasonable restrictions on the state space and the heuristic function, the number of steps an LRTA*-like algorithm requires to reach the goal will grow asymptotically faster than the state space, resulting in ``scrubbing'' where the agent repeatedly visits the same state. We then show that while the asymptotic analysis does not hold for more complex real-time search algorithms, experimental results suggest that it is still descriptive of practical performance.


2020 ◽  
Vol 69 ◽  
pp. 231-295
Author(s):  
Peng Lin ◽  
Martin Neil ◽  
Norman Fenton

Performing efficient inference on high dimensional discrete Bayesian Networks (BNs) is challenging. When using exact inference methods the space complexity can grow exponentially with the tree-width, thus making computation intractable. This paper presents a general purpose approximate inference algorithm, based on a new region belief approximation method, called Triplet Region Construction (TRC). TRC reduces the cluster space complexity for factorized models from worst-case exponential to polynomial by performing graph factorization and producing clusters of limited size. Unlike previous generations of region-based algorithms, TRC is guaranteed to converge and effectively addresses the region choice problem that bedevils other region-based algorithms used for BN inference. Our experiments demonstrate that it also achieves significantly more accurate results than competing algorithms.


2020 ◽  
Vol 34 (09) ◽  
pp. 13700-13703
Author(s):  
Nikhil Vyas ◽  
Ryan Williams

All known SAT-solving paradigms (backtracking, local search, and the polynomial method) only yield a 2n(1−1/O(k)) time algorithm for solving k-SAT in the worst case, where the big-O constant is independent of k. For this reason, it has been hypothesized that k-SAT cannot be solved in worst-case 2n(1−f(k)/k) time, for any unbounded ƒ : ℕ → ℕ. This hypothesis has been called the “Super-Strong Exponential Time Hypothesis” (Super Strong ETH), modeled after the ETH and the Strong ETH. We prove two results concerning the Super-Strong ETH:1. It has also been hypothesized that k-SAT is hard to solve for randomly chosen instances near the “critical threshold”, where the clause-to-variable ratio is 2k ln 2 −Θ(1). We give a randomized algorithm which refutes the Super-Strong ETH for the case of random k-SAT and planted k-SAT for any clause-to-variable ratio. In particular, given any random k-SAT instance F with n variables and m clauses, our algorithm decides satisfiability for F in 2n(1−Ω( log k)/k) time, with high probability (over the choice of the formula and the randomness of the algorithm). It turns out that a well-known algorithm from the literature on SAT algorithms does the job: the PPZ algorithm of Paturi, Pudlak, and Zane (1998).2. The Unique k-SAT problem is the special case where there is at most one satisfying assignment. It is natural to hypothesize that the worst-case (exponential-time) complexity of Unique k-SAT is substantially less than that of k-SAT. Improving prior reductions, we show the time complexities of Unique k-SAT and k-SAT are very tightly related: if Unique k-SAT is in 2n(1−f(k)/k) time for an unbounded f, then k-SAT is in 2n(1−f(k)(1−ɛ)/k) time for every ɛ > 0. Thus, refuting Super Strong ETH in the unique solution case would refute Super Strong ETH in general.


Author(s):  
Ehsan Ehsaeyan ◽  
Alireza Zolghadrasli

Multilevel thresholding is a basic method in image segmentation. The conventional image multilevel thresholding algorithms are computationally expensive when the number of decomposed segments is high. In this paper, a novel and powerful technique is suggested for Crow Search Algorithm (CSA) devoted to segmentation applications. The main contribution of our work is to adapt Darwinian evolutionary theory with heuristic CSA. First, the population is divided into specified groups and each group tries to find better location in the search space. A policy of encouragement and punishment is set on searching agents to avoid being trapped in the local optimum and premature solutions. Moreover, to increase the convergence rate of the proposed method, a gray-scale map is applied to out-boundary agents. Ten test images are selected to measure the ability of our algorithm, compared with the famous procedure, energy curve method. Two popular entropies i.e. Otsu and Kapur are employed to evaluate the capability of the introduced algorithm. Eight different search algorithms are implemented and compared to the introduced method. The obtained results show that our method, compared with the original CSA, and other heuristic search methods, can extract multi-level thresholding more efficiently.


Author(s):  
P.D.D. Dominic ◽  
Ahmad Kamil Bin Mahmood ◽  
P. Parthiban ◽  
S.C. Lenny Koh

Sign in / Sign up

Export Citation Format

Share Document