scholarly journals Extracting and analyzing inorganic material synthesis procedures in the literature

Author(s):  
Kohei Makino ◽  
Fusataka Kuniyoshi ◽  
Jun Ozawa ◽  
Makoto Miwa

Abstract Analyzing material synthesis procedures in the literature is required to collect structural information of material names and synthesis procedures for designing materials computationally. Since synthesis procedures are mostly written in natural language in paper or technical documents, they need to be extracted and structured into a format that can be handled by a computer through information extraction. Moreover, to represent a synthesis procedure, it is necessary to express information such as conditions and the order of operations in the procedure, but existing databases that compile structural information of material names and synthesis procedures of materials do not provide such information about procedures. It is, therefore, necessary to create a framework that extracts and organizes the information of synthesis procedures in text so that the information is enough for material development such as the order of operations and the links among materials, operations, and conditions. In this study, we construct a pipeline system that extracts synthesis procedures from a text in the form of a flow graph. The extraction system consists of preprocessing, deep learning-based entity extraction, rule-based relation extraction, and selection for paragraph-containing procedures. We applied the system to a large body of literature and extracted flow graphs (procedures) that include about 4 million entities and 3 million relations. We took several statistics on the extracted graphs and performed several analyses on the extracted graphs. We experimentally confirmed that some extracted operations were specific to the target material and the frequently extracted sub-graphs include reasonable operations.

2021 ◽  
Author(s):  
Kohei Makino ◽  
Fusataka Kuniyoshi ◽  
Jun Ozawa ◽  
Makoto Miwa

Abstract Analyzing synthesis procedures from a considerable amount of literature is required to collect structural information of material names and synthesis procedures for designing materials computationally. There are databases comprising structural data of material names and material synthesis procedures. However, the types of material in these databases and the material property values included are limited and insufficient. Moreover, they are primarily described in the literature in the natural language of the researcher who proposed the procedure, and thus they cannot be understood universally. It is, therefore, necessary to create a framework that represents textual synthesis procedures in a flow graph that contains crucial information for material development such as the order of operations and the linkage between operations and conditions. This will facilitate obtaining material insights from the literature. However, there are no large-scale studies on the extraction of synthesis procedures in the form of a graph and analysis thereof. In this study, we propose a pipeline system that extracts synthesis procedures from a text in the form of a graph with a clear order of operations and objects of conditioning from the literature. The system consists of preprocessing, entity extraction, which is based on Mat-ELMo and Bi-LSTM-CRF models, rule-based relation extraction, and selection for paragraph-containing procedures. We applied the system to a large body of literature and extracted various synthesis procedures. We performed basic analyses of the extracted procedures to examine their usability. We experimentally confirmed that some extracted procedures were specific to the target material, and some of the obvious procedures were correctly extracted.


1970 ◽  
Vol 37 ◽  
pp. 208-215
Author(s):  
L. Woltjer

A large body of spectral information on X-ray sources has now become available, but the interpretation remains ambiguous. If temperature variations and finite optical depth effects are taken into account, almost any spectrum can be fitted to a model of thermal bremsstrahlung. If a suitable energy spectrum is adopted for the relativistic electrons, a wide variety of synchrotron spectra becomes possible. Although one or the other interpretation may seem artificial in some cases, it nevertheless should be pointed out that a strictly isothermal source would be a miracle and that power-law type energy spectra of the relativistic particles can apply over only a limited range of energies. More satisfactory progress can be made when the spectral data are augmented with structural information, when emission lines can be studied and polarization can be measured. Not only do the intensities of emission lines give much more detailed information on the temperature and density in a hot gas than can be derived from the continuum, but at sufficient resolution velocity fields can also be studied. Useful structural information probably can be obtained only if spatial resolution of 1 arc-min or better is achieved. But one has only to look at the situation in radio astronomy to see how essential this information is for the building of quantitative models.


Author(s):  
Jie Liu ◽  
Shaowei Chen ◽  
Bingquan Wang ◽  
Jiaxin Zhang ◽  
Na Li ◽  
...  

Joint entity and relation extraction is critical for many natural language processing (NLP) tasks, which has attracted increasing research interest. However, it is still faced with the challenges of identifying the overlapping relation triplets along with the entire entity boundary and detecting the multi-type relations. In this paper, we propose an attention-based joint model, which mainly contains an entity extraction module and a relation detection module, to address the challenges. The key of our model is devising a supervised multi-head self-attention mechanism as the relation detection module to learn the token-level correlation for each relation type separately. With the attention mechanism, our model can effectively identify overlapping relations and flexibly predict the relation type with its corresponding intensity. To verify the effectiveness of our model, we conduct comprehensive experiments on two benchmark datasets. The experimental results demonstrate that our model achieves state-of-the-art performances.


2014 ◽  
Vol 2014 ◽  
pp. 1-13 ◽  
Author(s):  
Fanghuai Hu ◽  
Zhiqing Shao ◽  
Tong Ruan

Constructing ontology manually is a time-consuming, error-prone, and tedious task. We present SSCO, a self-supervised learning based chinese ontology, which contains about 255 thousand concepts, 5 million entities, and 40 million facts. We explore the three largest online Chinese encyclopedias for ontology learning and describe how to transfer the structured knowledge in encyclopedias, including article titles, category labels, redirection pages, taxonomy systems, and InfoBox modules, into ontological form. In order to avoid the errors in encyclopedias and enrich the learnt ontology, we also apply some machine learning based methods. First, we proof that the self-supervised machine learning method is practicable in Chinese relation extraction (at least for synonymy and hyponymy) statistically and experimentally and train some self-supervised models (SVMs and CRFs) for synonymy extraction, concept-subconcept relation extraction, and concept-instance relation extraction; the advantages of our methods are that all training examples are automatically generated from the structural information of encyclopedias and a few general heuristic rules. Finally, we evaluate SSCO in two aspects, scale and precision; manual evaluation results show that the ontology has excellent precision, and high coverage is concluded by comparing SSCO with other famous ontologies and knowledge bases; the experiment results also indicate that the self-supervised models obviously enrich SSCO.


2021 ◽  
Vol 11 (16) ◽  
pp. 7734
Author(s):  
Ningyi Mao ◽  
Wenti Huang ◽  
Hai Zhong

Distantly supervised relation extraction is the most popular technique for identifying semantic relation between two entities. Most prior models only focus on the supervision information present in training sentences. In addition to training sentences, external lexical resource and knowledge graphs often contain other relevant prior knowledge. However, relation extraction models usually ignore such readily available information. Moreover, previous works only utilize a selective attention mechanism over sentences to alleviate the impact of noise, they lack the consideration of the implicit interaction between sentences with relation facts. In this paper, (1) a knowledge-guided graph convolutional network is proposed based on the word-level attention mechanism to encode the sentences. It can capture the key words and cue phrases to generate expressive sentence-level features by attending to the relation indicators obtained from the external lexical resource. (2) A knowledge-guided sentence selector is proposed, which explores the semantic and structural information of triples from knowledge graph as sentence-level knowledge attention to distinguish the importance of each individual sentence. Experimental results on two widely used datasets, NYT-FB and GDS, show that our approach is able to efficiently use the prior knowledge from the external lexical resource and knowledge graph to enhance the performance of distantly supervised relation extraction.


Author(s):  
Vasiliki Foufi ◽  
Tatsawan Timakum ◽  
Christophe Gaudet-Blavignac ◽  
Christian Lovis ◽  
Min Song

BACKGROUND Social media platforms constitute a rich data source for natural language processing tasks such as named entity recognition, relation extraction, and sentiment analysis. In particular, social media platforms about health provide a different insight into patient’s experiences with diseases and treatment than those found in the scientific literature. OBJECTIVE This paper aimed to report a study of entities related to chronic diseases and their relation in user-generated text posts. The major focus of our research is the study of biomedical entities found in health social media platforms and their relations and the way people suffering from chronic diseases express themselves. METHODS We collected a corpus of 17,624 text posts from disease-specific subreddits of the social news and discussion website Reddit. For entity and relation extraction from this corpus, we employed the PKDE4J tool developed by Song et al (2015). PKDE4J is a text mining system that integrates dictionary-based entity extraction and rule-based relation extraction in a highly flexible and extensible framework. RESULTS Using PKDE4J, we extracted 2 types of entities and relations: biomedical entities and relations and subject-predicate-object entity relations. In total, 82,138 entities and 30,341 relation pairs were extracted from the Reddit dataset. The most highly mentioned entities were those related to oncological disease (2884 occurrences of cancer) and asthma (2180 occurrences). The relation pair anatomy-disease was the most frequent (5550 occurrences), the highest frequent entities in this pair being cancer and lymph. The manual validation of the extracted entities showed a very good performance of the system at the entity extraction task (3682/5151, 71.48% extracted entities were correctly labeled). CONCLUSIONS This study showed that people are eager to share their personal experience with chronic diseases on social media platforms despite possible privacy and security issues. The results reported in this paper are promising and demonstrate the need for more in-depth studies on the way patients with chronic diseases express themselves on social media platforms.


2014 ◽  
Vol 2014 ◽  
pp. 1-20 ◽  
Author(s):  
M. Gregor Madej

The major facilitator superfamily (MFS) is a diverse group of secondary transporters with members found in all kingdoms of life. A paradigm for MFS is the lactose permease (LacY) of Escherichia coli, which couples the stoichiometric translocation of a galactopyranoside and an H+ across the cytoplasmic membrane. LacY has been the test bed for the development of many methods applied for the analysis of transport proteins. X-ray structures of an inward-facing conformation and the most recent structure of an almost occluded conformation confirm many conclusions from previous studies. Although structure models are critical, they are insufficient to explain the catalysis of transport. The clues to understanding transport are based on the principles of enzyme kinetics. Secondary transport is a dynamic process—static snapshots of X-ray crystallography describe it only partially. However, without structural information, the underlying chemistry is virtually impossible to conclude. A large body of biochemical/biophysical data derived from systematic studies of site-directed mutants in LacY suggests residues critically involved in the catalysis, and a working model for the symport mechanism that involves alternating access of the binding site is presented. The general concepts derived from the bacterial LacY are examined for their relevance to other MFS transporters.


2011 ◽  
Vol 31 ◽  
pp. 289-310 ◽  
Author(s):  
Naoko Taguchi

Theoretical, empirical, and practical interest in pragmatic competence and development for second language (L2) learners has resulted in a large body of literature on teaching L2 pragmatics. This body of literature has diverged into two major domains: (a) a group of experimental studies directly testing the efficacy of various instructional methods in pragmatics learning and (b) research that explores optimal instructional practice and resources for pragmatic development in formal classroom settings. This article reviews literature in these two domains and aims at providing a collective view of the available options for pragmatics teaching and the ways that pragmatic development can best be promoted in the classroom. In the area of instructional intervention, this article reviews studies under the common theoretical second language acquisition paradigms of explicit versus implicit instruction, input processing instruction, and skill acquisition and practice. In the area of classroom practice and resources, three domains of research and pedagogical practices are reviewed: material development and teacher education, learner strategies and autonomous learning, and incidental pragmatics learning in the classroom. Finally, this article discusses unique challenges and opportunities that have been embraced by pragmatics teaching in the current era of poststructuralism and multiculturalism.


Author(s):  
Ryuichi Takanobu ◽  
Tianyang Zhang ◽  
Jiexi Liu ◽  
Minlie Huang

Most existing methods determine relation types only after all the entities have been recognized, thus the interaction between relation types and entity mentions is not fully modeled. This paper presents a novel paradigm to deal with relation extraction by regarding the related entities as the arguments of a relation. We apply a hierarchical reinforcement learning (HRL) framework in this paradigm to enhance the interaction between entity mentions and relation types. The whole extraction process is decomposed into a hierarchy of two-level RL policies for relation detection and entity extraction respectively, so that it is more feasible and natural to deal with overlapping relations. Our model was evaluated on public datasets collected via distant supervision, and results show that it gains better performance than existing methods and is more powerful for extracting overlapping relations1.


Sign in / Sign up

Export Citation Format

Share Document