An Approach to Reduce the Number of Conditional Independence Tests in the PC Algorithm

2021 ◽  
pp. 276-288
Author(s):  
Marcel Wienöbst ◽  
Maciej Liśkiewicz
Entropy ◽  
2021 ◽  
Vol 23 (11) ◽  
pp. 1450
Author(s):  
Ádám Zlatniczki ◽  
Marcell Stippinger ◽  
Zsigmond Benkő ◽  
Zoltán Somogyvári ◽  
András Telcs

This work is about observational causal discovery for deterministic and stochastic dynamic systems. We explore what additional knowledge can be gained by the usage of standard conditional independence tests and if the interacting systems are located in a geodesic space.


2009 ◽  
Vol 35 ◽  
pp. 449-484 ◽  
Author(s):  
F. Bromberg ◽  
D. Margaritis ◽  
V. Honavar

We present two algorithms for learning the structure of a Markov network from data: GSMN* and GSIMN. Both algorithms use statistical independence tests to infer the structure by successively constraining the set of structures consistent with the results of these tests. Until very recently, algorithms for structure learning were based on maximum likelihood estimation, which has been proved to be NP-hard for Markov networks due to the difficulty of estimating the parameters of the network, needed for the computation of the data likelihood. The independence-based approach does not require the computation of the likelihood, and thus both GSMN* and GSIMN can compute the structure efficiently (as shown in our experiments). GSMN* is an adaptation of the Grow-Shrink algorithm of Margaritis and Thrun for learning the structure of Bayesian networks. GSIMN extends GSMN* by additionally exploiting Pearl's well-known properties of the conditional independence relation to infer novel independences from known ones, thus avoiding the performance of statistical tests to estimate them. To accomplish this efficiently GSIMN uses the Triangle theorem, also introduced in this work, which is a simplified version of the set of Markov axioms. Experimental comparisons on artificial and real-world data sets show GSIMN can yield significant savings with respect to GSMN*, while generating a Markov network with comparable or in some cases improved quality. We also compare GSIMN to a forward-chaining implementation, called GSIMN-FCH, that produces all possible conditional independences resulting from repeatedly applying Pearl's theorems on the known conditional independence tests. The results of this comparison show that GSIMN, by the sole use of the Triangle theorem, is nearly optimal in terms of the set of independences tests that it infers.


Symmetry ◽  
2022 ◽  
Vol 14 (1) ◽  
pp. 149
Author(s):  
Waqar Khan ◽  
Lingfu Kong ◽  
Brekhna Brekhna ◽  
Ling Wang ◽  
Huigui Yan

Streaming feature selection has always been an excellent method for selecting the relevant subset of features from high-dimensional data and overcoming learning complexity. However, little attention is paid to online feature selection through the Markov Blanket (MB). Several studies based on traditional MB learning presented low prediction accuracy and used fewer datasets as the number of conditional independence tests is high and consumes more time. This paper presents a novel algorithm called Online Feature Selection Via Markov Blanket (OFSVMB) based on a statistical conditional independence test offering high accuracy and less computation time. It reduces the number of conditional independence tests and incorporates the online relevance and redundant analysis to check the relevancy between the upcoming feature and target variable T, discard the redundant features from Parents-Child (PC) and Spouses (SP) online, and find PC and SP simultaneously. The performance OFSVMB is compared with traditional MB learning algorithms including IAMB, STMB, HITON-MB, BAMB, and EEMB, and Streaming feature selection algorithms including OSFS, Alpha-investing, and SAOLA on 9 benchmark Bayesian Network (BN) datasets and 14 real-world datasets. For the performance evaluation, F1, precision, and recall measures are used with a significant level of 0.01 and 0.05 on benchmark BN and real-world datasets, including 12 classifiers keeping a significant level of 0.01. On benchmark BN datasets with 500 and 5000 sample sizes, OFSVMB achieved significant accuracy than IAMB, STMB, HITON-MB, BAMB, and EEMB in terms of F1, precision, recall, and running faster. It finds more accurate MB regardless of the size of the features set. In contrast, OFSVMB offers substantial improvements based on mean prediction accuracy regarding 12 classifiers with small and large sample sizes on real-world datasets than OSFS, Alpha-investing, and SAOLA but slower than OSFS, Alpha-investing, and SAOLA because these algorithms only find the PC set but not SP. Furthermore, the sensitivity analysis shows that OFSVMB is more accurate in selecting the optimal features.


2013 ◽  
Vol 22 (02) ◽  
pp. 1350005 ◽  
Author(s):  
XIA LIU ◽  
YOULONG YANG ◽  
MINGMIN ZHU

Due to the infeasibility of randomized controlled experiments, the existence of unobserved variables and the fact that equivalent direct acyclic graphs obtained generally can not be distinguished, it is difficult to learn the true causal relations of original graph. This paper presents an algorithm called BSPC based on adjacent nodes to learn the structure of Causal Bayesian Networks with unobserved variables by using observational data. It does not have to adjust the structure as the existing algorithms FCI and MBCS*, while it can guarantee to obtain the true adjacent nodes. More important is that algorithm BSPC reduces computational complexity and improves reliability of conditional independence tests. Theoretical results show that the new algorithm is correct. In addition, the advantages of BSPC in terms of the number of conditional independence tests and the number of orientation errors are illustrated with simulation experiments from which we can see that it is more suitable in order to learn the structure of Causal Bayesian Networks with latent variables. Moreover a better latent structure representation is returned.


Crisis ◽  
2019 ◽  
Vol 40 (3) ◽  
pp. 157-165 ◽  
Author(s):  
Kevin S. Kuehn ◽  
Annelise Wagner ◽  
Jennifer Velloza

Abstract. Background: Suicide is the second leading cause of death among US adolescents aged 12–19 years. Researchers would benefit from a better understanding of the direct effects of bullying and e-bullying on adolescent suicide to inform intervention work. Aims: To explore the direct and indirect effects of bullying and e-bullying on adolescent suicide attempts (SAs) and to estimate the magnitude of these effects controlling for significant covariates. Method: This study uses data from the 2015 Youth Risk Behavior Surveillance Survey (YRBS), a nationally representative sample of US high school youth. We quantified the association between bullying and the likelihood of SA, after adjusting for covariates (i.e., sexual orientation, obesity, sleep, etc.) identified with the PC algorithm. Results: Bullying and e-bullying were significantly associated with SA in logistic regression analyses. Bullying had an estimated average causal effect (ACE) of 2.46%, while e-bullying had an ACE of 4.16%. Limitations: Data are cross-sectional and temporal precedence is not known. Conclusion: These findings highlight the strong association between bullying, e-bullying, and SA.


Sign in / Sign up

Export Citation Format

Share Document