Knowledge Graph Completion for the Chinese Text of Cultural Relics Based on Bidirectional Encoder Representations from Transformers with Entity-Type Information

Knowledge graph completion can make knowledge graphs more complete, which is a meaningful research topic. However, the existing methods do not make full use of entity semantic information. Another challenge is that a deep model requires large-scale manually labelled data, which greatly increases manual labour. In order to alleviate the scarcity of labelled data in the field of cultural relics and capture the rich semantic information of entities, this paper proposes a model based on the Bidirectional Encoder Representations from Transformers (BERT) with entity-type information for the knowledge graph completion of the Chinese texts of cultural relics. In this work, the knowledge graph completion task is treated as a classification task, while the entities, relations and entity-type information are integrated as a textual sequence, and the Chinese characters are used as a token unit in which input representation is constructed by summing token, segment and position embeddings. A small number of labelled data are used to pre-train the model, and then, a large number of unlabelled data are used to fine-tune the pre-training model. The experiment results show that the BERT-KGC model with entity-type information can enrich the semantics information of the entities to reduce the degree of ambiguity of the entities and relations to some degree and achieve more effective performance than the baselines in triple classification, link prediction and relation prediction tasks using 35% of the labelled data of cultural relics.

Download Full-text

Distant Supervision for Relation Extraction with Sentence Selection and Interaction Representation

Wireless Communications and Mobile Computing ◽

10.1155/2021/8889075 ◽

2021 ◽

Vol 2021 ◽

pp. 1-16

Author(s):

Tiantian Chen ◽

Nianbin Wang ◽

Hongbin Wang ◽

Haomin Zhan

Keyword(s):

Large Scale ◽

Semantic Information ◽

State Of The Art ◽

Relation Extraction ◽

Semantic Features ◽

Distant Supervision ◽

Word Level ◽

Proposed Model ◽

Relation Prediction ◽

Better Than

Distant supervision (DS) has been widely used for relation extraction (RE), which automatically generates large-scale labeled data. However, there is a wrong labeling problem, which affects the performance of RE. Besides, the existing method suffers from the lack of useful semantic features for some positive training instances. To address the above problems, we propose a novel RE model with sentence selection and interaction representation for distantly supervised RE. First, we propose a pattern method based on the relation trigger words as a sentence selector to filter out noisy sentences to alleviate the wrong labeling problem. After clean instances are obtained, we propose the interaction representation using the word-level attention mechanism-based entity pairs to dynamically increase the weights of the words related to entity pairs, which can provide more useful semantic information for relation prediction. The proposed model outperforms the strongest baseline by 2.61 in F1-score on a widely used dataset, which proves that our model performs significantly better than the state-of-the-art RE systems.

Download Full-text

A Framework for Service Semantic Description Based on Knowledge Graph

Electronics ◽

10.3390/electronics10091017 ◽

2021 ◽

Vol 10 (9) ◽

pp. 1017

Author(s):

Qitong Sun ◽

Jun Han ◽

Dianfu Ma

Keyword(s):

Service Discovery ◽

Large Scale ◽

Semantic Information ◽

Knowledge Graph ◽

Data Sets ◽

Accuracy Rate ◽

Data Set ◽

File Storage ◽

Representation Method ◽

The Relationship

To construct a large-scale service knowledge graph is necessary. We propose a method, namely semantic information extension, for service knowledge graphs. We insist on the information of services described by Web Services Description Language (WSDL) and we design the ontology layer of web service knowledge graph and construct the service graph, and using the WSDL document data set, the generated service knowledge graph contains 3738 service entities. In particular, our method can give a full performance to its effect in service discovery. To evaluate our approach, we conducted two sets of experiments to explore the relationship between services and classify services that develop by service descriptions. We constructed two experimental data sets, then designed and trained two different deep neural networks for the two tasks to extract the semantics of the natural language used in the service discovery task. In the prediction task of exploring the relationship between services, the prediction accuracy rate reached 95.1%, and in the service classification experiment, the accuracy rate of TOP5 reached 60.8%. Our experience shows that the service knowledge graph has additional advantages over traditional file storage when managing additional semantic information is effective and the new service representation method is helpful for service discovery and composition tasks.

Download Full-text

Knowledge Embedding with Geospatial Distance Restriction for Geographic Knowledge Graph Completion

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi8060254 ◽

2019 ◽

Vol 8 (6) ◽

pp. 254 ◽

Cited By ~ 1

Author(s):

Peiyuan Qiu ◽

Jialiang Gao ◽

Li Yu ◽

Feng Lu

Keyword(s):

Large Scale ◽

Semantic Network ◽

Average Error ◽

Knowledge Graph ◽

Web Resource ◽

Related Information ◽

Geographic Knowledge ◽

Relation Prediction ◽

Low Dimensional

A Geographic Knowledge Graph (GeoKG) links geographic relation triplets into a large-scale semantic network utilizing the semantic of geo-entities and geo-relations. Unfortunately, the sparsity of geo-related information distribution on the web leads to a situation where information extraction systems can hardly detect enough references of geographic information in the massive web resource to be able to build relatively complete GeoKGs. This incompleteness, due to missing geo-entities or geo-relations in GeoKG fact triplets, seriously impacts the performance of GeoKG applications. In this paper, a method with geospatial distance restriction is presented to optimize knowledge embedding for GeoKG completion. This method aims to encode both the semantic information and geospatial distance restriction of geo-entities and geo-relations into a continuous, low-dimensional vector space. Then, the missing facts of the GeoKG can be supplemented through vector operations. Specifically, the geospatial distance restriction is realized as the weights of the objective functions of current translation knowledge embedding models. These optimized models output the optimized representations of geo-entities and geo-relations for the GeoKG’s completion. The effects of the presented method are validated with a real GeoKG. Compared with the results of the original models, the presented method improves the metric Hits@10(Filter) by an average of 6.41% for geo-entity prediction, and the Hits@1(Filter) by an average of 31.92%, for geo-relation prediction. Furthermore, the capacity of the proposed method to predict the locations of unknown entities is validated. The results show the geospatial distance restriction reduced the average error distance of prediction by between 54.43% and 57.24%. All the results support the geospatial distance restriction hiding in the GeoKG contributing to refining the embedding representations of geo-entities and geo-relations, which plays a crucial role in improving the quality of GeoKG completion.

Download Full-text

Knowledge Graph Representation via Similarity-Based Embedding

Scientific Programming ◽

10.1155/2018/6325635 ◽

2018 ◽

Vol 2018 ◽

pp. 1-12

Author(s):

Zhen Tan ◽

Xiang Zhao ◽

Yang Fang ◽

Bin Ge ◽

Weidong Xiao

Keyword(s):

Large Scale ◽

Low Complexity ◽

Graph Representation ◽

Knowledge Graph ◽

Continuous Space ◽

Memory Space ◽

Representation Method ◽

Relation Prediction ◽

Low Dimensional ◽

Almost All

Knowledge graph, a typical multi-relational structure, includes large-scale facts of the world, yet it is still far away from completeness. Knowledge graph embedding, as a representation method, constructs a low-dimensional and continuous space to describe the latent semantic information and predict the missing facts. Among various solutions, almost all embedding models have high time and memory-space complexities and, hence, are difficult to apply to large-scale knowledge graphs. Some other embedding models, such as TransE and DistMult, although with lower complexity, ignore inherent features and only use correlations between different entities to represent the features of each entity. To overcome these shortcomings, we present a novel low-complexity embedding model, namely, SimE-ER, to calculate the similarity of entities in independent and associated spaces. In SimE-ER, each entity (relation) is described as two parts. The entity (relation) features in independent space are represented by the features entity (relation) intrinsically owns and, in associated space, the entity (relation) features are expressed by the entity (relation) features they connect. And the similarity between the embeddings of the same entities in different representation spaces is high. In experiments, we evaluate our model with two typical tasks: entity prediction and relation prediction. Compared with the state-of-the-art models, our experimental results demonstrate that SimE-ER outperforms existing competitors and has low time and memory-space complexities.

Download Full-text

Rule-Guided Compositional Representation Learning on Knowledge Graphs with Hierarchical Types

Mathematics ◽

10.3390/math9161978 ◽

2021 ◽

Vol 9 (16) ◽

pp. 1978

Author(s):

Yanying Mao ◽

Honghui Chen

Keyword(s):

Knowledge Representation ◽

Representation Learning ◽

Knowledge Graph ◽

Path Information ◽

Different Types ◽

Rich Information ◽

The Rich ◽

Horn Rules ◽

Low Dimensional ◽

Type Information

The representation learning of the knowledge graph projects the entities and relationships in the triples into a low-dimensional continuous vector space. Early representation learning mostly focused on the information contained in the triplet itself but ignored other useful information. Since entities have different types of representations in different scenarios, the rich information in the types of entity levels is helpful for obtaining a more complete knowledge representation. In this paper, a new knowledge representation frame (TRKRL) combining rule path information and entity hierarchical type information is proposed to exploit interpretability of logical rules and the advantages of entity hierarchical types. Specifically, for entity hierarchical type information, we consider that entities have multiple representations of different types, as well as treat it as the projection matrix of entities, using the type encoder to model entity hierarchical types. For rule path information, we mine Horn rules from the knowledge graph to guide the synthesis of relations in paths. Experimental results show that TRKRL outperforms baselines on the knowledge graph completion task, which indicates that our model is capable of using entity hierarchical type information, relation paths information, and logic rules information for representation learning.

Download Full-text

Neural methods for effective, efficient, and exposure-aware information retrieval

ACM SIGIR Forum ◽

10.1145/3476415.3476434 ◽

2021 ◽

Vol 55 (1) ◽

pp. 1-2

Author(s):

Bhaskar Mitra

Keyword(s):

Information Retrieval ◽

Language Processing ◽

Large Scale ◽

Web Search ◽

Real Life ◽

Inverted Index ◽

Information Need ◽

Product Model ◽

Performance Improvements ◽

Deep Model

Neural networks with deep architectures have demonstrated significant performance improvements in computer vision, speech recognition, and natural language processing. The challenges in information retrieval (IR), however, are different from these other application areas. A common form of IR involves ranking of documents---or short passages---in response to keyword-based queries. Effective IR systems must deal with query-document vocabulary mismatch problem, by modeling relationships between different query and document terms and how they indicate relevance. Models should also consider lexical matches when the query contains rare terms---such as a person's name or a product model number---not seen during training, and to avoid retrieving semantically related but irrelevant results. In many real-life IR tasks, the retrieval involves extremely large collections---such as the document index of a commercial Web search engine---containing billions of documents. Efficient IR methods should take advantage of specialized IR data structures, such as inverted index, to efficiently retrieve from large collections. Given an information need, the IR system also mediates how much exposure an information artifact receives by deciding whether it should be displayed, and where it should be positioned, among other results. Exposure-aware IR systems may optimize for additional objectives, besides relevance, such as parity of exposure for retrieved items and content publishers. In this thesis, we present novel neural architectures and methods motivated by the specific needs and challenges of IR tasks. We ground our contributions with a detailed survey of the growing body of neural IR literature [Mitra and Craswell, 2018]. Our key contribution towards improving the effectiveness of deep ranking models is developing the Duet principle [Mitra et al., 2017] which emphasizes the importance of incorporating evidence based on both patterns of exact term matches and similarities between learned latent representations of query and document. To efficiently retrieve from large collections, we develop a framework to incorporate query term independence [Mitra et al., 2019] into any arbitrary deep model that enables large-scale precomputation and the use of inverted index for fast retrieval. In the context of stochastic ranking, we further develop optimization strategies for exposure-based objectives [Diaz et al., 2020]. Finally, this dissertation also summarizes our contributions towards benchmarking neural IR models in the presence of large training datasets [Craswell et al., 2019] and explores the application of neural methods to other IR tasks, such as query auto-completion.

Download Full-text

Reasoning over temporal knowledge graph with temporal consistency constraints

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-210064 ◽

2021 ◽

pp. 1-10

Author(s):

Xiaojun Chen ◽

Shengbin Jia ◽

Ling Ding ◽

Yang Xiang

Keyword(s):

Short Term Memory ◽

Knowledge Graph ◽

Temporal Consistency ◽

Short Term ◽

Basic Model ◽

Consistency Constraints ◽

Memory Network ◽

Long Short Term Memory ◽

Relation Prediction ◽

Time Information

Knowledge graph reasoning or completion aims at inferring missing facts by reasoning about the information already present in the knowledge graph. In this work, we explore the problem of temporal knowledge graph reasoning that performs inference on the graph over time. Most existing reasoning models ignore the time information when learning entities and relations representations. For example, the fact (Scarlett Johansson, spouse Of, Ryan Reynolds) was true only during 2008 - 2011. To facilitate temporal reasoning, we present TA-TransRILP, which involves temporal information by utilizing RNNs and takes advantage of Integer Linear Programming. Specifically, we utilize a character-level long short-term memory network to encode relations with sequences of temporal tokens, and combine it with common reasoning model. To achieve more accurate reasoning, we further deploy temporal consistency constraints to basic model, which can help in assessing the validity of a fact better. We conduct entity prediction and relation prediction on YAGO11k and Wikidata12k datasets. Experimental results demonstrate that TA-TransRILP can make more accurate predictions by taking time information and temporal consistency constraints into account, and outperforms existing methods with a significant improvement about 6-8% on Hits@10.

Download Full-text

Mean-Field Models for EEG/MEG: From Oscillations to Waves

Brain Topography ◽

10.1007/s10548-021-00842-4 ◽

2021 ◽

Author(s):

Áine Byrne ◽

James Ross ◽

Rachel Nicks ◽

Stephen Coombes

Keyword(s):

Gap Junction ◽

Large Scale ◽

Mean Field ◽

Coarse Grained ◽

Neural Mass Model ◽

Neuron Network ◽

Mass Model ◽

Mean Field Model ◽

Neural Mass ◽

The Rich

AbstractNeural mass models have been used since the 1970s to model the coarse-grained activity of large populations of neurons. They have proven especially fruitful for understanding brain rhythms. However, although motivated by neurobiological considerations they are phenomenological in nature, and cannot hope to recreate some of the rich repertoire of responses seen in real neuronal tissue. Here we consider a simple spiking neuron network model that has recently been shown to admit an exact mean-field description for both synaptic and gap-junction interactions. The mean-field model takes a similar form to a standard neural mass model, with an additional dynamical equation to describe the evolution of within-population synchrony. As well as reviewing the origins of this next generation mass model we discuss its extension to describe an idealised spatially extended planar cortex. To emphasise the usefulness of this model for EEG/MEG modelling we show how it can be used to uncover the role of local gap-junction coupling in shaping large scale synaptic waves.

Download Full-text

ASER: A Large-scale Eventuality Knowledge Graph

Proceedings of The Web Conference 2020 ◽

10.1145/3366423.3380107 ◽

2020 ◽

Cited By ~ 1

Author(s):

Hongming Zhang ◽

Xin Liu ◽

Haojie Pan ◽

Yangqiu Song ◽

Cane Wing-Ki Leung

Keyword(s):

Large Scale ◽

Knowledge Graph

Download Full-text

SPARC Data Structure: Rationale and Design of a FAIR Standard for Biomedical Research Data

10.1101/2021.02.10.430563 ◽

2021 ◽

Author(s):

Anita Bandrowski ◽

Jeffrey S. Grethe ◽

Anna Pilko ◽

Tom Gillespie ◽

Gabi Pine ◽

...

Keyword(s):

Data Structure ◽

Biomedical Research ◽

Large Scale ◽

Open Data ◽

Cell Types ◽

Research Data ◽

Imaging Data ◽

Organ Specific ◽

The Rich ◽

Automated Tools

AbstractThe NIH Common Fund’s Stimulating Peripheral Activity to Relieve Conditions (SPARC) initiative is a large-scale program that seeks to accelerate the development of therapeutic devices that modulate electrical activity in nerves to improve organ function. Integral to the SPARC program are the rich anatomical and functional datasets produced by investigators across the SPARC consortium that provide key details about organ-specific circuitry, including structural and functional connectivity, mapping of cell types and molecular profiling. These datasets are provided to the research community through an open data platform, the SPARC Portal. To ensure SPARC datasets are Findable, Accessible, Interoperable and Reusable (FAIR), they are all submitted to the SPARC portal following a standard scheme established by the SPARC Curation Team, called the SPARC Data Structure (SDS). Inspired by the Brain Imaging Data Structure (BIDS), the SDS has been designed to capture the large variety of data generated by SPARC investigators who are coming from all fields of biomedical research. Here we present the rationale and design of the SDS, including a description of the SPARC curation process and the automated tools for complying with the SDS, including the SDS validator and Software to Organize Data Automatically (SODA) for SPARC. The objective is to provide detailed guidelines for anyone desiring to comply with the SDS. Since the SDS are suitable for any type of biomedical research data, it can be adopted by any group desiring to follow the FAIR data principles for managing their data, even outside of the SPARC consortium. Finally, this manuscript provides a foundational framework that can be used by any organization desiring to either adapt the SDS to suit the specific needs of their data or simply desiring to design their own FAIR data sharing scheme from scratch.

Download Full-text