scholarly journals A Graphical Criterion for Effect Identification in Equivalence Classes of Causal Diagrams

Author(s):  
Amin Jaber ◽  
Jiji Zhang ◽  
Elias Bareinboim

Computing the effects of interventions from observational data is an important task encountered in many data-driven sciences. The problem is addressed by identifying the post-interventional distribution with an expression that involves only quantities estimable from the pre-interventional distribution over observed variables, given some knowledge about the causal structure. In this work, we relax the requirement of having a fully specified causal structure and study the identifiability of effects with a singleton intervention (X), supposing that the structure is known only up to an equivalence class of causal diagrams, which is the output of standard structural learning algorithms (e.g., FCI). We derive a necessary and sufficient graphical criterion for the identifiability of the effect of X on all observed variables. We further establish a sufficient graphical criterion to identify the effect of X on a subset of the observed variables, and prove that it is strictly more powerful than the current state-of-the-art result on this problem.

Information ◽  
2019 ◽  
Vol 10 (3) ◽  
pp. 98 ◽  
Author(s):  
Tariq Ahmad ◽  
Allan Ramsay ◽  
Hanady Ahmed

Assigning sentiment labels to documents is, at first sight, a standard multi-label classification task. Many approaches have been used for this task, but the current state-of-the-art solutions use deep neural networks (DNNs). As such, it seems likely that standard machine learning algorithms, such as these, will provide an effective approach. We describe an alternative approach, involving the use of probabilities to construct a weighted lexicon of sentiment terms, then modifying the lexicon and calculating optimal thresholds for each class. We show that this approach outperforms the use of DNNs and other standard algorithms. We believe that DNNs are not a universal panacea and that paying attention to the nature of the data that you are trying to learn from can be more important than trying out ever more powerful general purpose machine learning algorithms.


2018 ◽  
Vol 61 ◽  
pp. 65-170 ◽  
Author(s):  
Albert Gatt ◽  
Emiel Krahmer

This paper surveys the current state of the art in Natural Language Generation (NLG), defined as the task of generating text or speech from non-linguistic input. A survey of NLG is timely in view of the changes that the field has undergone over the past two decades, especially in relation to new (usually data-driven) methods, as well as new applications of NLG technology. This survey therefore aims to (a) give an up-to-date synthesis of research on the core tasks in NLG and the architectures adopted in which such tasks are organised; (b) highlight a number of recent research topics that have arisen partly as a result of growing synergies between NLG and other areas of artificial intelligence; (c) draw attention to the challenges in NLG evaluation, relating them to similar challenges faced in other areas of NLP, with an emphasis on different evaluation methods and the relationships between them.


Author(s):  
Juan D. Correa ◽  
Jin Tian ◽  
Elias Bareinboim

Cause-and-effect relations are one of the most valuable types of knowledge sought after throughout the data-driven sciences since they translate into stable and generalizable explanations as well as efficient and robust decision-making capabilities. Inferring these relations from data, however, is a challenging task. Two of the most common barriers to this goal are known as confounding and selection biases. The former stems from the systematic bias introduced during the treatment assignment, while the latter comes from the systematic bias during the collection of units into the sample. In this paper, we consider the problem of identifiability of causal effects when both confounding and selection biases are simultaneously present. We first investigate the problem of identifiability when all the available data is biased. We prove that the algorithm proposed by [Bareinboim and Tian, 2015] is, in fact, complete, namely, whenever the algorithm returns a failure condition, no identifiability claim about the causal relation can be made by any other method. We then generalize this setting to when, in addition to the biased data, another piece of external data is available, without bias. It may be the case that a subset of the covariates could be measured without bias (e.g., from census). We examine the problem of identifiability when a combination of biased and unbiased data is available. We propose a new algorithm that subsumes the current state-of-the-art method based on the back-door criterion.


2019 ◽  
Vol 12 (2) ◽  
pp. 103
Author(s):  
Kuntoro Adi Nugroho ◽  
Yudi Eko Windarto

Various methods are available to perform feature extraction on satellite images. Among the available alternatives, deep convolutional neural network (ConvNet) is the state of the art method. Although previous studies have reported successful attempts on developing and implementing ConvNet on remote sensing application, several issues are not well explored, such as the use of depthwise convolution, final pooling layer size, and comparison between grayscale and RGB settings. The objective of this study is to perform analysis to address these issues. Two feature learning algorithms were proposed, namely ConvNet as the current state of the art for satellite image classification and Gray Level Co-occurence Matrix (GLCM) which represents a classic unsupervised feature extraction method. The experiment demonstrated consistent result with previous studies that ConvNet is superior in most cases compared to GLCM, especially with 3x3xn final pooling. The performance of the learning algorithms are much higher on features from RGB channels, except for ConvNet with relatively small number of features.


2020 ◽  
Vol 34 (05) ◽  
pp. 7472-7479
Author(s):  
Hengyi Cai ◽  
Hongshen Chen ◽  
Cheng Zhang ◽  
Yonghao Song ◽  
Xiaofang Zhao ◽  
...  

Current state-of-the-art neural dialogue systems are mainly data-driven and are trained on human-generated responses. However, due to the subjectivity and open-ended nature of human conversations, the complexity of training dialogues varies greatly. The noise and uneven complexity of query-response pairs impede the learning efficiency and effects of the neural dialogue generation models. What is more, so far, there are no unified dialogue complexity measurements, and the dialogue complexity embodies multiple aspects of attributes—specificity, repetitiveness, relevance, etc. Inspired by human behaviors of learning to converse, where children learn from easy dialogues to complex ones and dynamically adjust their learning progress, in this paper, we first analyze five dialogue attributes to measure the dialogue complexity in multiple perspectives on three publicly available corpora. Then, we propose an adaptive multi-curricula learning framework to schedule a committee of the organized curricula. The framework is established upon the reinforcement learning paradigm, which automatically chooses different curricula at the evolving learning process according to the learning status of the neural dialogue generation model. Extensive experiments conducted on five state-of-the-art models demonstrate its learning efficiency and effectiveness with respect to 13 automatic evaluation metrics and human judgments.


Information ◽  
2021 ◽  
Vol 12 (9) ◽  
pp. 348 ◽  
Author(s):  
Marten Düring ◽  
Roman Kalyakin ◽  
Estelle Bunout ◽  
Daniele Guido

The automated enrichment of mass-digitised document collections using techniques such as text mining is becoming increasingly popular. Enriched collections offer new opportunities for interface design to allow data-driven and visualisation-based search, exploration and interpretation. Most such interfaces integrate close and distant reading and represent semantic, spatial, social or temporal relations, but often lack contrastive views. Inspect and Compare (I&C) contributes to the current state of the art in interface design for historical newspapers with highly versatile side-by-side comparisons of query results and curated article sets based on metadata and semantic enrichments. I&C takes search queries and pre-curated article sets as inputs and allows comparisons based on the distributions of newspaper titles, publication dates and automatically generated enrichments, such as language, article types, topics and named entities. Contrastive views of such data reveal patterns, help humanities scholars to improve search strategies and to facilitate a critical assessment of the overall data quality. I&C is part of the impresso interface for the exploration of digitised and semantically enriched historical newspapers.


Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-17
Author(s):  
Víctor Martínez ◽  
Fernando Berzal ◽  
Juan-Carlos Cubero

Role is a fundamental concept in the analysis of the behavior and function of interacting entities in complex networks. Role discovery is the task of uncovering the hidden roles of nodes within a network. Node roles are commonly defined in terms of equivalence classes. Two nodes have the same role if they fall within the same equivalence class. Automorphic equivalence, where two nodes are equivalent when they can swap their labels to form an isomorphic graph, captures this notion of role. The binary concept of equivalence is too restrictive, and nodes in real-world networks rarely belong to the same equivalence class. Instead, a relaxed definition in terms of similarity or distance is commonly used to compute the degree to which two nodes are equivalent. In this paper, we propose a novel distance metric called automorphic distance, which measures how far two nodes are from being automorphically equivalent. We also study its application to node embedding, showing how our metric can be used to generate role-preserving vector representations of nodes. Our experiments confirm that the proposed automorphic distance metric outperforms a state-of-the-art automorphic equivalence-based metric and different state-of-the-art techniques for the generation of node embeddings in different role-related tasks.


2021 ◽  
Vol 9 (4) ◽  
pp. 250-259 ◽  
Author(s):  
Annelien Smets ◽  
Pieter Ballon ◽  
Nils Walravens

Amid the widespread diffusion of digital communication technologies, our cities are at a critical juncture as these technologies are entering all aspects of urban life. Data-driven technologies help citizens to navigate the city, find friends, or discover new places. While these technology-mediated activities come in scope of scholarly research, we lack an understanding of the underlying curation mechanisms that select and present the particular information citizens are exposed to. Nevertheless, such an understanding is crucial to deal with the risk of the socio-cultural polarization assumedly reinforced by this kind of algorithmic curation. Drawing upon the vast amount of work on algorithmic curation in online platforms, we construct an analytical lens that is applied to the urban environment to establish an understanding of algorithmic curation of urban experiences. In this way, this article demonstrates that cities could be considered as a new materiality of curational platforms. Our framework outlines the various urban information flows, curation logics, and stakeholders involved. This work contributes to the current state of the art by bridging the gap between online and offline algorithmic curation and by providing a novel conceptual framework to study this timely topic.


2020 ◽  
Vol 11 (6) ◽  
pp. 169-176
Author(s):  
Tsvetelin Anastasov ◽  

This article is an expansion of Anastasov, T.’s master’s thesis (2019) and attempts to give a clear definition and taxonomy of the Data-Driven Business Models (DDBMs) as well as illustrate data challenges and opportunities that come along with this. These definitions were cross-analyzed with 3 cases from the Asia-Pacific region to deliver concrete insights and inspiration for Western companies to reinvent their businesses in the next 5 years. A comparison between Data-Driven and Data-Centric models was given as well, not previously analyzed in the thesis, as a view on the current state-of-the-art data business models.


10.37236/6808 ◽  
2018 ◽  
Vol 25 (2) ◽  
Author(s):  
Demetris Hadjiloucas ◽  
Ioannis Michos ◽  
Christina Savvidou

Super-strong Wilf equivalence is a type of Wilf equivalence on words that was originally introduced as strong Wilf equivalence by Kitaev et al. [Electron. J. Combin. 16(2)] in $2009$. We provide a necessary and sufficient condition for two permutations in $n$ letters to be super-strongly Wilf equivalent, using distances between letters within a permutation. Furthermore, we give a characterization of such equivalence classes via two-colored binary trees. This allows us to prove, in the case of super-strong Wilf equivalence, the conjecture stated in the same article by Kitaev et al. that the cardinality of each Wilf equivalence class is a power of $2$.


Sign in / Sign up

Export Citation Format

Share Document