scholarly journals Scalable Probabilistic Causal Structure Discovery

Author(s):  
Dhanya Sridhar ◽  
Jay Pujara ◽  
Lise Getoor

Complex causal networks underlie many real-world problems, from the regulatory interactions between genes to the environmental patterns used to understand climate change. Computational methods seek to infer these causal networks using observational data and domain knowledge. In this paper, we identify three key requirements for inferring the structure of causal networks for scientific discovery: (1) robustness to noise in observed measurements; (2) scalability to handle hundreds of variables; and (3) flexibility to encode domain knowledge and other structural constraints. We first formalize the problem of joint probabilistic causal structure discovery.  We develop an approach using probabilistic soft logic (PSL) that exploits multiple statistical tests, supports efficient optimization over hundreds of variables, and can easily incorporate structural constraints, including imperfect domain knowledge. We compare our method against multiple well-studied approaches on biological and synthetic datasets, showing improvements of up to 20% in F1-score over the best performing baseline in realistic settings.

2021 ◽  
Author(s):  
Jarmo Mäkelä ◽  
Laila Melkas ◽  
Ivan Mammarella ◽  
Tuomo Nieminen ◽  
Suyog Chandramouli ◽  
...  

Abstract. This is a comment on "Estimating causal networks in biosphere–atmosphere interaction with the PCMCI approach" by Krich et al., Biogeosciences, 17, 1033–1061, 2020, which gives a good introduction to causal discovery, but confines the scope by investigating the outcome of a single algorithm. In this comment, we argue that the outputs of causal discovery algorithms should not usually be considered as end results but starting points and hypothesis for further study. We illustrate how not only different algorithms, but also different initial states and prior information of possible causal model structures, affect the outcome. We demonstrate how to incorporate expert domain knowledge with causal structure discovery and how to detect and take into account overfitting and concept drift.


Molecules ◽  
2018 ◽  
Vol 23 (7) ◽  
pp. 1729
Author(s):  
Yinghan Hong ◽  
Zhifeng Hao ◽  
Guizhen Mai ◽  
Han Huang ◽  
Arun Kumar Sangaiah

Exploring and detecting the causal relations among variables have shown huge practical values in recent years, with numerous opportunities for scientific discovery, and have been commonly seen as the core of data science. Among all possible causal discovery methods, causal discovery based on a constraint approach could recover the causal structures from passive observational data in general cases, and had shown extensive prospects in numerous real world applications. However, when the graph was sufficiently large, it did not work well. To alleviate this problem, an improved causal structure learning algorithm named brain storm optimization (BSO), is presented in this paper, combining K2 with brain storm optimization (K2-BSO). Here BSO is used to search optimal topological order of nodes instead of graph space. This paper assumes that dataset is generated by conforming to a causal diagram in which each variable is generated from its parent based on a causal mechanism. We designed an elaborate distance function for clustering step in BSO according to the mechanism of K2. The graph space therefore was reduced to a smaller topological order space and the order space can be further reduced by an efficient clustering method. The experimental results on various real-world datasets showed our methods outperformed the traditional search and score methods and the state-of-the-art genetic algorithm-based methods.


2000 ◽  
Vol 2 (1) ◽  
pp. 35-60 ◽  
Author(s):  
Vladan Babovic ◽  
Maarten Keijzer

Present day instrumentation networks already provide immense quantities of data, very little of which provides any insights into the basic physical processes that are occurring in the measured medium. This is to say that the data by itself contributes little to the knowledge of such processes. Data mining and knowledge discovery aim to change this situation by providing technologies that will greatly facilitate the mining of data for knowledge. In this new setting the role of a human expert is to provide domain knowledge, interpret models suggested by the computer and devise further experiments that will provide even better data coverage. Clearly, there is an enormous amount of knowledge and understanding of physical processes that should not be just thrown away. Consequently, we strongly believe that the most appropriate way forward is to combine the best of the two approaches: theory-driven, understanding-rich with data-driven discovery process. This paper describes a particular knowledge discovery algorithm—Genetic Programming (GP). Additionally, an augmented version of GP—dimensionally aware GP—which is arguably more useful in the process of scientific discovery is described in great detail. Finally, the paper concludes with an application of dimensionally aware GP to a problem of induction of an empirical relationship describing the additional resistance to flow induced by flexible vegetation.


2002 ◽  
Vol 02 (01) ◽  
pp. 107-126 ◽  
Author(s):  
MARK LAST ◽  
ABRAHAM KANDEL

Comparing frequency distributions of experimental data is a routine engineering task in the semiconductor industry. The existing statistical approaches to the problem suffer from several limitations, which can be partially overcome via the time-consuming visual examination of frequency histograms by an experienced process engineer. This paper presents a novel, fuzzy-based method for automating the cognitive process of comparing frequency histograms. We use the evolving approach of type-2 fuzzy logic to utilize the domain knowledge of human experts. The proposed method is evaluated on the actual results of an engineering experiment, where it is shown to represent the experts' perception of the visualized data more accurately than a wide range of statistical tests. We also outline the potential directions for integrating the perception-based approach with other methods of data visualization and data mining.


2021 ◽  
pp. 297-315
Author(s):  
Alireza Tamaddoni-Nezhad ◽  
David Bohan ◽  
Ghazal Afroozi Milani ◽  
Alan Raybould ◽  
Stephen Muggleton

Humanity is facing existential, societal challenges related to food security, ecosystem conservation, antimicrobial resistance, etc, and Artificial Intelligence (AI) is already playing an important role in tackling these new challenges. Most current AI approaches are limited when it comes to ‘knowledge transfer’ with humans, i.e. it is difficult to incorporate existing human knowledge and also the output knowledge is not human comprehensible. In this chapter we demonstrate how a combination of comprehensible machine learning, text-mining and domain knowledge could enhance human-machine collaboration for the purpose of automated scientific discovery where humans and computers jointly develop and evaluate scientific theories. As a case study, we describe a combination of logic-based machine learning (which included human-encoded ecological background knowledge) and text-mining from scientific publications (to verify machine-learned hypotheses) for the purpose of automated discovery of ecological interaction networks (food-webs) to detect change in agricultural ecosystems using the Farm Scale Evaluations (FSEs) of genetically modified herbicide-tolerant (GMHT) crops dataset. The results included novel food-web hypotheses, some confirmed by subsequent experimental studies (e.g. DNA analysis) and published in scientific journals. These machine-leaned food-webs were also used as the basis of a recent study revealing resilience of agro-ecosystems to changes in farming management using GMHT crops.


2020 ◽  
Author(s):  
Renato Geh ◽  
Denis Mauá ◽  
Alessandro Antonucci

Probabilistic circuits are deep probabilistic models with neural-network-like semantics capable of accurately and efficiently answering probabilistic queries without sacrificing expressiveness. Probabilistic Sentential Decision Diagrams (PSDDs) are a subclass of probabilistic circuits able to embed logical constraints to the circuit’s structure. In doing so, they obtain extra expressiveness with empirical optimal performance. Despite achieving competitive performance compared to other state-of-the-art competitors, there have been very few attempts at learning PSDDs from a combination of both data and knowledge in the form of logical formulae. Our work investigates sampling random PSDDs consistent with domain knowledge and evaluating against state-of-the-art probabilistic models. We propose a method of sampling that retains important structural constraints on the circuit’s graph that guarantee query tractability. Finally, we show that these samples are able to achieve competitive performance even on larger domains.


2019 ◽  
Vol 5 (11) ◽  
pp. eaau4996 ◽  
Author(s):  
Jakob Runge ◽  
Peer Nowack ◽  
Marlene Kretschmer ◽  
Seth Flaxman ◽  
Dino Sejdinovic

Identifying causal relationships and quantifying their strength from observational time series data are key problems in disciplines dealing with complex dynamical systems such as the Earth system or the human body. Data-driven causal inference in such systems is challenging since datasets are often high dimensional and nonlinear with limited sample sizes. Here, we introduce a novel method that flexibly combines linear or nonlinear conditional independence tests with a causal discovery algorithm to estimate causal networks from large-scale time series datasets. We validate the method on time series of well-understood physical mechanisms in the climate system and the human heart and using large-scale synthetic datasets mimicking the typical properties of real-world data. The experiments demonstrate that our method outperforms state-of-the-art techniques in detection power, which opens up entirely new possibilities to discover and quantify causal networks from time series across a range of research fields.


2011 ◽  
Vol 09 (02) ◽  
pp. 231-250 ◽  
Author(s):  
YAN LIU ◽  
ALEXANDRU NICULESCU-MIZIL ◽  
AURÉLIE LOZANO ◽  
YONG LU

Many genes and biological processes function in similar ways across different species. Cross-species gene expression analysis, as a powerful tool to characterize the dynamical properties of the cell, has found a number of applications, such as identifying a conserved core set of cell cycle genes. However, to the best of our knowledge, there is limited effort on developing appropriate techniques to capture the causality relations between genes from time-series microarray data across species. In this paper, we present hidden Markov random field regression with L1penalty to uncover the regulatory network structure for different species. The algorithm provides a framework for sharing information across species via hidden component graphs and is able to incorporate domain knowledge across species easily. We demonstrate our method on two synthetic datasets and apply it to discover causal graphs from innate immune response data.


2019 ◽  
Vol 42 ◽  
Author(s):  
Don Ross

AbstractUse of network models to identify causal structure typically blocks reduction across the sciences. Entanglement of mental processes with environmental and intentional relationships, as Borsboom et al. argue, makes reduction of psychology to neuroscience particularly implausible. However, in psychiatry, a mental disorder can involve no brain disorder at all, even when the former crucially depends on aspects of brain structure. Gambling addiction constitutes an example.


Author(s):  
Tom Beckers ◽  
Uschi Van den Broeck ◽  
Marij Renne ◽  
Stefaan Vandorpe ◽  
Jan De Houwer ◽  
...  

Abstract. In a contingency learning task, 4-year-old and 8-year-old children had to predict the outcome displayed on the back of a card on the basis of cues presented on the front. The task was embedded in either a causal or a merely predictive scenario. Within this task, either a forward blocking or a backward blocking procedure was implemented. Blocking occurred in the causal but not in the predictive scenario. Moreover, blocking was affected by the scenario to the same extent in both age groups. The pattern of results was similar for forward and backward blocking. These results suggest that even young children are sensitive to the causal structure of a contingency learning task and that the occurrence of blocking in such a task defies an explanation in terms of associative learning theory.


Sign in / Sign up

Export Citation Format

Share Document