Extracting and analyzing inorganic material synthesis procedures in the literature

Abstract Analyzing material synthesis procedures in the literature is required to collect structural information of material names and synthesis procedures for designing materials computationally. Since synthesis procedures are mostly written in natural language in paper or technical documents, they need to be extracted and structured into a format that can be handled by a computer through information extraction. Moreover, to represent a synthesis procedure, it is necessary to express information such as conditions and the order of operations in the procedure, but existing databases that compile structural information of material names and synthesis procedures of materials do not provide such information about procedures. It is, therefore, necessary to create a framework that extracts and organizes the information of synthesis procedures in text so that the information is enough for material development such as the order of operations and the links among materials, operations, and conditions. In this study, we construct a pipeline system that extracts synthesis procedures from a text in the form of a flow graph. The extraction system consists of preprocessing, deep learning-based entity extraction, rule-based relation extraction, and selection for paragraph-containing procedures. We applied the system to a large body of literature and extracted flow graphs (procedures) that include about 4 million entities and 3 million relations. We took several statistics on the extracted graphs and performed several analyses on the extracted graphs. We experimentally confirmed that some extracted operations were specific to the target material and the frequently extracted sub-graphs include reasonable operations.

Download Full-text

Extracting and analyzing inorganic material synthesis procedures in the literature

10.21203/rs.3.rs-636735/v1 ◽

2021 ◽

Author(s):

Kohei Makino ◽

Fusataka Kuniyoshi ◽

Jun Ozawa ◽

Makoto Miwa

Keyword(s):

Large Scale ◽

Structural Information ◽

Large Body ◽

Target Material ◽

Relation Extraction ◽

Structural Data ◽

Pipeline System ◽

Entity Extraction ◽

Material Synthesis ◽

Material Development

Abstract Analyzing synthesis procedures from a considerable amount of literature is required to collect structural information of material names and synthesis procedures for designing materials computationally. There are databases comprising structural data of material names and material synthesis procedures. However, the types of material in these databases and the material property values included are limited and insufficient. Moreover, they are primarily described in the literature in the natural language of the researcher who proposed the procedure, and thus they cannot be understood universally. It is, therefore, necessary to create a framework that represents textual synthesis procedures in a flow graph that contains crucial information for material development such as the order of operations and the linkage between operations and conditions. This will facilitate obtaining material insights from the literature. However, there are no large-scale studies on the extraction of synthesis procedures in the form of a graph and analysis thereof. In this study, we propose a pipeline system that extracts synthesis procedures from a text in the form of a graph with a clear order of operations and objects of conditioning from the literature. The system consists of preprocessing, entity extraction, which is based on Mat-ELMo and Bi-LSTM-CRF models, rule-based relation extraction, and selection for paragraph-containing procedures. We applied the system to a large body of literature and extracted various synthesis procedures. We performed basic analyses of the extracted procedures to examine their usability. We experimentally confirmed that some extracted procedures were specific to the target material, and some of the obvious procedures were correctly extracted.

Download Full-text

He says, she says. Pat says, Tricia says. How much reference resolution matters for entity extraction, relation extraction, and social network analysis

2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications ◽

10.1109/cisda.2009.5356530 ◽

2009 ◽

Cited By ~ 3

Author(s):

Jana Diesner ◽

Kathleen M. Carley

Keyword(s):

Social Network ◽

Social Network Analysis ◽

Network Analysis ◽

Relation Extraction ◽

Entity Extraction ◽

Reference Resolution

Download Full-text

Emission mechanisms in X-ray sources: (Invited discourse)

Symposium - International Astronomical Union ◽

10.1017/s0074180900004320 ◽

1970 ◽

Vol 37 ◽

pp. 208-215

Author(s):

L. Woltjer

Keyword(s):

Structural Information ◽

Large Body ◽

Limited Range ◽

Velocity Fields ◽

Emission Lines ◽

Relativistic Electrons ◽

X Ray ◽

Hot Gas ◽

Relativistic Particles ◽

The Continuum

A large body of spectral information on X-ray sources has now become available, but the interpretation remains ambiguous. If temperature variations and finite optical depth effects are taken into account, almost any spectrum can be fitted to a model of thermal bremsstrahlung. If a suitable energy spectrum is adopted for the relativistic electrons, a wide variety of synchrotron spectra becomes possible. Although one or the other interpretation may seem artificial in some cases, it nevertheless should be pointed out that a strictly isothermal source would be a miracle and that power-law type energy spectra of the relativistic particles can apply over only a limited range of energies. More satisfactory progress can be made when the spectral data are augmented with structural information, when emission lines can be studied and polarization can be measured. Not only do the intensities of emission lines give much more detailed information on the temperature and density in a hot gas than can be derived from the continuum, but at sufficient resolution velocity fields can also be studied. Useful structural information probably can be obtained only if spatial resolution of 1 arc-min or better is achieved. But one has only to look at the situation in radio astronomy to see how essential this information is for the building of quantitative models.

Download Full-text

Attention as Relation: Learning Supervised Multi-head Self-Attention for Relation Extraction

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/524 ◽

2020 ◽

Author(s):

Jie Liu ◽

Shaowei Chen ◽

Bingquan Wang ◽

Jiaxin Zhang ◽

Na Li ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

State Of The Art ◽

Relation Extraction ◽

Attention Mechanism ◽

Entity Extraction ◽

Relation Type ◽

Benchmark Datasets ◽

Relation Learning

Joint entity and relation extraction is critical for many natural language processing (NLP) tasks, which has attracted increasing research interest. However, it is still faced with the challenges of identifying the overlapping relation triplets along with the entire entity boundary and detecting the multi-type relations. In this paper, we propose an attention-based joint model, which mainly contains an entity extraction module and a relation detection module, to address the challenges. The key of our model is devising a supervised multi-head self-attention mechanism as the relation detection module to learn the token-level correlation for each relation type separately. With the attention mechanism, our model can effectively identify overlapping relations and flexibly predict the relation type with its corresponding intensity. To verify the effectiveness of our model, we conduct comprehensive experiments on two benchmark datasets. The experimental results demonstrate that our model achieves state-of-the-art performances.

Download Full-text

Self-Supervised Chinese Ontology Learning from Online Encyclopedias

The Scientific World JOURNAL ◽

10.1155/2014/848631 ◽

2014 ◽

Vol 2014 ◽

pp. 1-13 ◽

Cited By ~ 6

Author(s):

Fanghuai Hu ◽

Zhiqing Shao ◽

Tong Ruan

Keyword(s):

Machine Learning ◽

Structural Information ◽

Relation Extraction ◽

Knowledge Bases ◽

The Self ◽

Supervised Machine Learning ◽

Ontology Learning ◽

High Coverage ◽

Category Labels ◽

Training Examples

Constructing ontology manually is a time-consuming, error-prone, and tedious task. We present SSCO, a self-supervised learning based chinese ontology, which contains about 255 thousand concepts, 5 million entities, and 40 million facts. We explore the three largest online Chinese encyclopedias for ontology learning and describe how to transfer the structured knowledge in encyclopedias, including article titles, category labels, redirection pages, taxonomy systems, and InfoBox modules, into ontological form. In order to avoid the errors in encyclopedias and enrich the learnt ontology, we also apply some machine learning based methods. First, we proof that the self-supervised machine learning method is practicable in Chinese relation extraction (at least for synonymy and hyponymy) statistically and experimentally and train some self-supervised models (SVMs and CRFs) for synonymy extraction, concept-subconcept relation extraction, and concept-instance relation extraction; the advantages of our methods are that all training examples are automatically generated from the structural information of encyclopedias and a few general heuristic rules. Finally, we evaluate SSCO in two aspects, scale and precision; manual evaluation results show that the ontology has excellent precision, and high coverage is concluded by comparing SSCO with other famous ontologies and knowledge bases; the experiment results also indicate that the self-supervised models obviously enrich SSCO.

Download Full-text

KGGCN: Knowledge-Guided Graph Convolutional Networks for Distantly Supervised Relation Extraction

Applied Sciences ◽

10.3390/app11167734 ◽

2021 ◽

Vol 11 (16) ◽

pp. 7734

Author(s):

Ningyi Mao ◽

Wenti Huang ◽

Hai Zhong

Keyword(s):

Prior Knowledge ◽

Structural Information ◽

Relation Extraction ◽

Attention Mechanism ◽

Knowledge Graph ◽

Convolutional Network ◽

Convolutional Networks ◽

Lexical Resource ◽

Sentence Level ◽

The Impact

Distantly supervised relation extraction is the most popular technique for identifying semantic relation between two entities. Most prior models only focus on the supervision information present in training sentences. In addition to training sentences, external lexical resource and knowledge graphs often contain other relevant prior knowledge. However, relation extraction models usually ignore such readily available information. Moreover, previous works only utilize a selective attention mechanism over sentences to alleviate the impact of noise, they lack the consideration of the implicit interaction between sentences with relation facts. In this paper, (1) a knowledge-guided graph convolutional network is proposed based on the word-level attention mechanism to encode the sentences. It can capture the key words and cue phrases to generate expressive sentence-level features by attending to the relation indicators obtained from the external lexical resource. (2) A knowledge-guided sentence selector is proposed, which explores the semantic and structural information of triples from knowledge graph as sentence-level knowledge attention to distinguish the importance of each individual sentence. Experimental results on two widely used datasets, NYT-FB and GDS, show that our approach is able to efficiently use the prior knowledge from the external lexical resource and knowledge graph to enhance the performance of distantly supervised relation extraction.

Download Full-text

Mining of Textual Health Information from Reddit: Analysis of Chronic Diseases With Extracted Entities and Their Relations (Preprint)

10.2196/preprints.12876 ◽

2018 ◽

Cited By ~ 1

Author(s):

Vasiliki Foufi ◽

Tatsawan Timakum ◽

Christophe Gaudet-Blavignac ◽

Christian Lovis ◽

Min Song

Keyword(s):

Social Media ◽

Chronic Diseases ◽

Language Processing ◽

Relation Extraction ◽

Entity Recognition ◽

Entity Extraction ◽

Privacy And Security ◽

Mining System ◽

Social Media Platforms ◽

The Way

BACKGROUND Social media platforms constitute a rich data source for natural language processing tasks such as named entity recognition, relation extraction, and sentiment analysis. In particular, social media platforms about health provide a different insight into patient’s experiences with diseases and treatment than those found in the scientific literature. OBJECTIVE This paper aimed to report a study of entities related to chronic diseases and their relation in user-generated text posts. The major focus of our research is the study of biomedical entities found in health social media platforms and their relations and the way people suffering from chronic diseases express themselves. METHODS We collected a corpus of 17,624 text posts from disease-specific subreddits of the social news and discussion website Reddit. For entity and relation extraction from this corpus, we employed the PKDE4J tool developed by Song et al (2015). PKDE4J is a text mining system that integrates dictionary-based entity extraction and rule-based relation extraction in a highly flexible and extensible framework. RESULTS Using PKDE4J, we extracted 2 types of entities and relations: biomedical entities and relations and subject-predicate-object entity relations. In total, 82,138 entities and 30,341 relation pairs were extracted from the Reddit dataset. The most highly mentioned entities were those related to oncological disease (2884 occurrences of cancer) and asthma (2180 occurrences). The relation pair anatomy-disease was the most frequent (5550 occurrences), the highest frequent entities in this pair being cancer and lymph. The manual validation of the extracted entities showed a very good performance of the system at the entity extraction task (3682/5151, 71.48% extracted entities were correctly labeled). CONCLUSIONS This study showed that people are eager to share their personal experience with chronic diseases on social media platforms despite possible privacy and security issues. The results reported in this paper are promising and demonstrate the need for more in-depth studies on the way patients with chronic diseases express themselves on social media platforms.

Download Full-text

Function, Structure, and Evolution of the Major Facilitator Superfamily: The LacY Manifesto

Advances in Biology ◽

10.1155/2014/523591 ◽

2014 ◽

Vol 2014 ◽

pp. 1-20 ◽

Cited By ~ 8

Author(s):

M. Gregor Madej

Keyword(s):

Structural Information ◽

Large Body ◽

Major Facilitator Superfamily ◽

Lactose Permease ◽

Test Bed ◽

X Ray ◽

X Ray Crystallography ◽

Major Facilitator ◽

Mfs Transporters ◽

Alternating Access

The major facilitator superfamily (MFS) is a diverse group of secondary transporters with members found in all kingdoms of life. A paradigm for MFS is the lactose permease (LacY) of Escherichia coli, which couples the stoichiometric translocation of a galactopyranoside and an H+ across the cytoplasmic membrane. LacY has been the test bed for the development of many methods applied for the analysis of transport proteins. X-ray structures of an inward-facing conformation and the most recent structure of an almost occluded conformation confirm many conclusions from previous studies. Although structure models are critical, they are insufficient to explain the catalysis of transport. The clues to understanding transport are based on the principles of enzyme kinetics. Secondary transport is a dynamic process—static snapshots of X-ray crystallography describe it only partially. However, without structural information, the underlying chemistry is virtually impossible to conclude. A large body of biochemical/biophysical data derived from systematic studies of site-directed mutants in LacY suggests residues critically involved in the catalysis, and a working model for the symport mechanism that involves alternating access of the binding site is presented. The general concepts derived from the bacterial LacY are examined for their relevance to other MFS transporters.

Download Full-text

Teaching Pragmatics: Trends and Issues

Annual Review of Applied Linguistics ◽

10.1017/s0267190511000018 ◽

2011 ◽

Vol 31 ◽

pp. 289-310 ◽

Cited By ~ 63

Author(s):

Naoko Taguchi

Keyword(s):

Second Language ◽

Second Language Acquisition ◽

Skill Acquisition ◽

Classroom Practice ◽

Large Body ◽

Experimental Studies ◽

Practical Interest ◽

Pedagogical Practices ◽

Pragmatic Development ◽

Material Development

Theoretical, empirical, and practical interest in pragmatic competence and development for second language (L2) learners has resulted in a large body of literature on teaching L2 pragmatics. This body of literature has diverged into two major domains: (a) a group of experimental studies directly testing the efficacy of various instructional methods in pragmatics learning and (b) research that explores optimal instructional practice and resources for pragmatic development in formal classroom settings. This article reviews literature in these two domains and aims at providing a collective view of the available options for pragmatics teaching and the ways that pragmatic development can best be promoted in the classroom. In the area of instructional intervention, this article reviews studies under the common theoretical second language acquisition paradigms of explicit versus implicit instruction, input processing instruction, and skill acquisition and practice. In the area of classroom practice and resources, three domains of research and pedagogical practices are reviewed: material development and teacher education, learner strategies and autonomous learning, and incidental pragmatics learning in the classroom. Finally, this article discusses unique challenges and opportunities that have been embraced by pragmatics teaching in the current era of poststructuralism and multiculturalism.

Download Full-text

A Hierarchical Framework for Relation Extraction with Reinforcement Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33017072 ◽

2019 ◽

Vol 33 ◽

pp. 7072-7079 ◽

Cited By ~ 10

Author(s):

Ryuichi Takanobu ◽

Tianyang Zhang ◽

Jiexi Liu ◽

Minlie Huang

Keyword(s):

Reinforcement Learning ◽

Relation Extraction ◽

Extraction Process ◽

Entity Extraction ◽

Hierarchical Framework ◽

Hierarchical Reinforcement Learning ◽

Distant Supervision ◽

Public Datasets ◽

Determine Relation

Most existing methods determine relation types only after all the entities have been recognized, thus the interaction between relation types and entity mentions is not fully modeled. This paper presents a novel paradigm to deal with relation extraction by regarding the related entities as the arguments of a relation. We apply a hierarchical reinforcement learning (HRL) framework in this paradigm to enhance the interaction between entity mentions and relation types. The whole extraction process is decomposed into a hierarchy of two-level RL policies for relation detection and entity extraction respectively, so that it is more feasible and natural to deal with overlapping relations. Our model was evaluated on public datasets collected via distant supervision, and results show that it gains better performance than existing methods and is more powerful for extracting overlapping relations1.

Download Full-text