scholarly journals Causal Knowledge Extraction through Large-Scale Text Mining

2020 ◽  
Vol 34 (09) ◽  
pp. 13610-13611
Author(s):  
Oktie Hassanzadeh ◽  
Debarun Bhattacharjya ◽  
Mark Feblowitz ◽  
Kavitha Srinivas ◽  
Michael Perrone ◽  
...  

In this demonstration, we present a system for mining causal knowledge from large corpuses of text documents, such as millions of news articles. Our system provides a collection of APIs for causal analysis and retrieval. These APIs enable searching for the effects of a given cause and the causes of a given effect, as well as the analysis of existence of causal relation given a pair of phrases. The analysis includes a score that indicates the likelihood of the existence of a causal relation. It also provides evidence from an input corpus supporting the existence of a causal relation between input phrases. Our system uses generic unsupervised and weakly supervised methods of causal relation extraction that do not impose semantic constraints on causes and effects. We show example use cases developed for a commercial application in enterprise risk management.

2019 ◽  
Vol 3 (3) ◽  
pp. 155-164 ◽  
Author(s):  
Yukun Zheng ◽  
Yiqun Liu ◽  
Zhen Fan ◽  
Cheng Luo ◽  
Qingyao Ai ◽  
...  

Abstract A number of deep neural networks have been proposed to improve the performance of document ranking in information retrieval studies. However, the training processes of these models usually need a large scale of labeled data, leading to data shortage becoming a major hindrance to the improvement of neural ranking models’ performances. Recently, several weakly supervised methods have been proposed to address this challenge with the help of heuristics or users’ interaction in the Search Engine Result Pages (SERPs) to generate weak relevance labels. In this work, we adopt two kinds of weakly supervised relevance, BM25-based relevance and click model-based relevance, and make a deep investigation into their differences in the training of neural ranking models. Experimental results show that BM25-based relevance helps models capture more exact matching signals, while click model-based relevance enhances the rankings of documents that may be preferred by users. We further proposed a cascade ranking framework to combine the two weakly supervised relevance, which significantly promotes the ranking performance of neural ranking models and outperforms the best result in the last NTCIR-13 We Want Web (WWW) task. This work reveals the potential of constructing better document retrieval systems based on multiple kinds of weak relevance signals.


Author(s):  
Oktie Hassanzadeh ◽  
Debarun Bhattacharjya ◽  
Mark Feblowitz ◽  
Kavitha Srinivas ◽  
Michael Perrone ◽  
...  

In this paper, we study the problem of answering questions of type "Could X cause Y?" where X and Y are general phrases without any constraints. Answering such questions will assist with various decision analysis tasks such as verifying and extending presumed causal associations used for decision making. Our goal is to analyze the ability of an AI agent built using state-of-the-art unsupervised methods in answering causal questions derived from collections of cause-effect pairs from human experts. We focus only on unsupervised and weakly supervised methods due to the difficulty of creating a large enough training set with a reasonable quality and coverage. The methods we examine rely on a large corpus of text derived from news articles, and include methods ranging from large-scale application of classic NLP techniques and statistical analysis to the use of neural network based phrase embeddings and state-of-the-art neural language models.


Technologies ◽  
2020 ◽  
Vol 9 (1) ◽  
pp. 2
Author(s):  
Ashish Jaiswal ◽  
Ashwin Ramesh Babu ◽  
Mohammad Zaki Zadeh ◽  
Debapriya Banerjee ◽  
Fillia Makedon

Self-supervised learning has gained popularity because of its ability to avoid the cost of annotating large-scale datasets. It is capable of adopting self-defined pseudolabels as supervision and use the learned representations for several downstream tasks. Specifically, contrastive learning has recently become a dominant component in self-supervised learning for computer vision, natural language processing (NLP), and other domains. It aims at embedding augmented versions of the same sample close to each other while trying to push away embeddings from different samples. This paper provides an extensive review of self-supervised methods that follow the contrastive approach. The work explains commonly used pretext tasks in a contrastive learning setup, followed by different architectures that have been proposed so far. Next, we present a performance comparison of different methods for multiple downstream tasks such as image classification, object detection, and action recognition. Finally, we conclude with the limitations of the current methods and the need for further techniques and future directions to make meaningful progress.


Organics ◽  
2021 ◽  
Vol 2 (3) ◽  
pp. 142-160
Author(s):  
Keith Smith ◽  
Gamal A. El-Hiti

para-Selective processes for the chlorination of phenols using sulphuryl chloride in the presence of various sulphur-containing catalysts have been successfully developed. Several chlorinated phenols, especially those derived by para-chlorination of phenol, ortho-cresol, meta-cresol, and meta-xylenol, are of significant commercial importance, but chlorination reactions of such phenols are not always as regioselective as would be desirable. We, therefore, undertook the challenge of developing suitable catalysts that might promote greater regioselectivity under conditions that might still be applicable for the commercial manufacture of products on a large scale. In this review, we chart our progress in this endeavour from early studies involving inorganic solids as potential catalysts, through the use of simple dialkyl sulphides, which were effective but unsuitable for commercial application, and through a variety of other types of sulphur compounds, to the eventual identification of particular poly(alkylene sulphide)s as very useful catalysts. When used in conjunction with a Lewis acid such as aluminium or ferric chloride as an activator, and with sulphuryl chloride as the reagent, quantitative yields of chlorophenols can be obtained with very high regioselectivity in the presence of tiny amounts of the polymeric sulphides, usually in solvent-free conditions (unless the phenol starting material is solid at temperatures even above about 50 °C). Notably, poly(alkylene sulphide)s containing longer spacer groups are particularly para-selective in the chlorination of m-cresol and m-xylenol, while, ones with shorter spacers are particularly para-selective in the chlorination of phenol, 2-chlorophenol, and o-cresol. Such chlorination processes result in some of the highest para/ortho ratios reported for the chlorination of phenols.


Database ◽  
2021 ◽  
Vol 2021 ◽  
Author(s):  
Yifan Shao ◽  
Haoru Li ◽  
Jinghang Gu ◽  
Longhua Qian ◽  
Guodong Zhou

Abstract Extraction of causal relations between biomedical entities in the form of Biological Expression Language (BEL) poses a new challenge to the community of biomedical text mining due to the complexity of BEL statements. We propose a simplified form of BEL statements [Simplified Biological Expression Language (SBEL)] to facilitate BEL extraction and employ BERT (Bidirectional Encoder Representation from Transformers) to improve the performance of causal relation extraction (RE). On the one hand, BEL statement extraction is transformed into the extraction of an intermediate form—SBEL statement, which is then further decomposed into two subtasks: entity RE and entity function detection. On the other hand, we use a powerful pretrained BERT model to both extract entity relations and detect entity functions, aiming to improve the performance of two subtasks. Entity relations and functions are then combined into SBEL statements and finally merged into BEL statements. Experimental results on the BioCreative-V Track 4 corpus demonstrate that our method achieves the state-of-the-art performance in BEL statement extraction with F1 scores of 54.8% in Stage 2 evaluation and of 30.1% in Stage 1 evaluation, respectively. Database URL: https://github.com/grapeff/SBEL_datasets


Information ◽  
2020 ◽  
Vol 11 (2) ◽  
pp. 79 ◽  
Author(s):  
Xiaoyu Han ◽  
Yue Zhang ◽  
Wenkai Zhang ◽  
Tinglei Huang

Relation extraction is a vital task in natural language processing. It aims to identify the relationship between two specified entities in a sentence. Besides information contained in the sentence, additional information about the entities is verified to be helpful in relation extraction. Additional information such as entity type getting by NER (Named Entity Recognition) and description provided by knowledge base both have their limitations. Nevertheless, there exists another way to provide additional information which can overcome these limitations in Chinese relation extraction. As Chinese characters usually have explicit meanings and can carry more information than English letters. We suggest that characters that constitute the entities can provide additional information which is helpful for the relation extraction task, especially in large scale datasets. This assumption has never been verified before. The main obstacle is the lack of large-scale Chinese relation datasets. In this paper, first, we generate a large scale Chinese relation extraction dataset based on a Chinese encyclopedia. Second, we propose an attention-based model using the characters that compose the entities. The result on the generated dataset shows that these characters can provide useful information for the Chinese relation extraction task. By using this information, the attention mechanism we used can recognize the crucial part of the sentence that can express the relation. The proposed model outperforms other baseline models on our Chinese relation extraction dataset.


Author(s):  
Guixiang Wang ◽  
Haitao Zou ◽  
Xiaobo Zhu ◽  
Mei Ding ◽  
Chuankun Jia

Abstract Zinc-based redox flow batteries (ZRFBs) have been considered as ones of the most promising large-scale energy storage technologies owing to their low cost, high safety, and environmental friendliness. However, their commercial application is still hindered by a few key problems. First, the hydrogen evolution and zinc dendrite formation cause poor cycling life, of which needs to ameliorated or overcome by finding suitable anolytes. Second, the stability and energy density of catholytes are unsatisfactory due to oxidation, corrosion, and low electrolyte concentration. Meanwhile, highly catalytic electrode materials remain to be explored and the ion selectivity and cost efficiency of membrane materials demands further improvement. In this review, we summarize different types of ZRFBs according to their electrolyte environments including ZRFBs using neutral, acidic, and alkaline electrolytes, then highlight the advances of key materials including electrode and membrane materials for ZRFBs, and finally discuss the challenges and perspectives for the future development of high-performance ZRFBs.


Author(s):  
Yi Wang

This article describes an application that illustrates the role of data mining technology in identifying hidden causal knoledge from health and medical data repositories. Across the health care and medical enterprises, a wide variety of data is being generated at a rapid rate. Current information technologies tends to focus on a more statical side of causal knowledge and do not address the dynamic causal knowledge. This article shows that the dynamic causal relation data can be captured for treatment, payment, operations purposes and administrative directed insights. Accessing this currently unrealized knowledge potential would enable the delivery of actionable knowledge to medical practitioners, healthcare system managers, policy planners and even patients to make a significant difference in overall healthcare.


Sign in / Sign up

Export Citation Format

Share Document