ASER: A Large-scale Eventuality Knowledge Graph

Constructing a knowledge graph of geological hazards literature can facilitate the reuse of geological hazards literature and provide a reference for geological hazard governance. Named entity recognition (NER), as a core technology for constructing a geological hazard knowledge graph, has to face the challenges that named entities in geological hazard literature are diverse in form, ambiguous in semantics, and uncertain in context. This can introduce difficulties in designing practical features during the NER classification. To address the above problem, this paper proposes a deep learning-based NER model; namely, the deep, multi-branch BiGRU-CRF model, which combines a multi-branch bidirectional gated recurrent unit (BiGRU) layer and a conditional random field (CRF) model. In an end-to-end and supervised process, the proposed model automatically learns and transforms features by a multi-branch bidirectional GRU layer and enhances the output with a CRF layer. Besides the deep, multi-branch BiGRU-CRF model, we also proposed a pattern-based corpus construction method to construct the corpus needed for the deep, multi-branch BiGRU-CRF model. Experimental results indicated the proposed deep, multi-branch BiGRU-CRF model outperformed state-of-the-art models. The proposed deep, multi-branch BiGRU-CRF model constructed a large-scale geological hazard literature knowledge graph containing 34,457 entities nodes and 84,561 relations.

Download Full-text

Knowledge Graph Completion for the Chinese Text of Cultural Relics Based on Bidirectional Encoder Representations from Transformers with Entity-Type Information

Entropy ◽

10.3390/e22101168 ◽

2020 ◽

Vol 22 (10) ◽

pp. 1168

Author(s):

Min Zhang ◽

Guohua Geng ◽

Sheng Zeng ◽

Huaping Jia

Keyword(s):

Chinese Text ◽

Large Scale ◽

Semantic Information ◽

Training Model ◽

Knowledge Graph ◽

Deep Model ◽

The Rich ◽

Relation Prediction ◽

Type Information ◽

Cultural Relics

Knowledge graph completion can make knowledge graphs more complete, which is a meaningful research topic. However, the existing methods do not make full use of entity semantic information. Another challenge is that a deep model requires large-scale manually labelled data, which greatly increases manual labour. In order to alleviate the scarcity of labelled data in the field of cultural relics and capture the rich semantic information of entities, this paper proposes a model based on the Bidirectional Encoder Representations from Transformers (BERT) with entity-type information for the knowledge graph completion of the Chinese texts of cultural relics. In this work, the knowledge graph completion task is treated as a classification task, while the entities, relations and entity-type information are integrated as a textual sequence, and the Chinese characters are used as a token unit in which input representation is constructed by summing token, segment and position embeddings. A small number of labelled data are used to pre-train the model, and then, a large number of unlabelled data are used to fine-tune the pre-training model. The experiment results show that the BERT-KGC model with entity-type information can enrich the semantics information of the entities to reduce the degree of ambiguity of the entities and relations to some degree and achieve more effective performance than the baselines in triple classification, link prediction and relation prediction tasks using 35% of the labelled data of cultural relics.

Download Full-text

Knowledge Graph Construction and Applications for Web Search and Beyond

Data Intelligence ◽

10.1162/dint_a_00019 ◽

2019 ◽

Vol 1 (4) ◽

pp. 333-349 ◽

Cited By ~ 1

Author(s):

Peilu Wang ◽

Hao Jiang ◽

Jingfang Xu ◽

Qi Zhang

Keyword(s):

Search Engine ◽

Intelligent Systems ◽

Large Scale ◽

Web Search ◽

Question Answering ◽

Knowledge Graph ◽

Graph Databases ◽

Distributed Search ◽

Knowledge Based ◽

Dialog System

Knowledge graph (KG) has played an important role in enhancing the performance of many intelligent systems. In this paper, we introduce the solution of building a large-scale multi-source knowledge graph from scratch in Sogou Inc., including its architecture, technical implementation and applications. Unlike previous works that build knowledge graph with graph databases, we build the knowledge graph on top of SogouQdb, a distributed search engine developed by Sogou Web Search Department, which can be easily scaled to support petabytes of data. As a supplement to the search engine, we also introduce a series of models to support inference and graph based querying. Currently, the data of Sogou knowledge graph that are collected from 136 different websites and constantly updated consist of 54 million entities and over 600 million entity links. We also introduce three applications of knowledge graph in Sogou Inc.: entity detection and linking, knowledge based question answering and knowledge based dialog system. These applications have been used in Web search products to help user acquire information more efficiently.

Download Full-text

Robustifying Scholia: paving the way for knowledge discovery and research assessment through Wikidata

Research Ideas and Outcomes ◽

10.3897/rio.5.e35820 ◽

2019 ◽

Vol 5 ◽

Cited By ~ 2

Author(s):

Lane Rasberry ◽

Egon Willighagen ◽

Finn Nielsen ◽

Daniel Mietchen

Keyword(s):

Knowledge Discovery ◽

Large Scale ◽

Knowledge Workers ◽

Open Data ◽

Linked Open Data ◽

Data Curation ◽

Research Assessment ◽

Knowledge Graph ◽

Incomplete Datasets ◽

Active Contributor

Knowledge workers like researchers, students, journalists, research evaluators or funders need tools to explore what is known, how it was discovered, who made which contributions, and where the scholarly record has gaps. Existing tools and services of this kind are not available as Linked Open Data, but Wikidata is. It has the technology, active contributor base, and content to build a large-scale knowledge graph for scholarship, also known as WikiCite. Scholia visualizes this graph in an exploratory interface with profiles and links to the literature. However, it is just a working prototype. This project aims to "robustify Scholia" with back-end development and testing based on pilot corpora. The main objective at this stage is to attain stability in challenging cases such as server throttling and handling of large or incomplete datasets. Further goals include integrating Scholia with data curation and manuscript writing workflows, serving more languages, generating usage stats, and documentation.

Download Full-text

Research progress of large-scale knowledge graph completion technology

Scientia Sinica Informationis ◽

10.1360/n112018-00225 ◽

2020 ◽

Vol 50 (4) ◽

pp. 551-575 ◽

Cited By ~ 1

Author(s):

Zhijuan DU ◽

Xiaofeng MENG ◽

Shuo WANG

Keyword(s):

Large Scale ◽

Research Progress ◽

Knowledge Graph

Download Full-text

An Incremental Reasoning Algorithm for Large Scale Knowledge Graph

Knowledge Science, Engineering and Management - Lecture Notes in Computer Science ◽

10.1007/978-3-319-99365-2_45 ◽

2018 ◽

pp. 503-513 ◽

Cited By ~ 1

Author(s):

Yifei Wang ◽

Jie Luo

Keyword(s):

Large Scale ◽

Knowledge Graph ◽

Incremental Reasoning ◽

Reasoning Algorithm

Download Full-text

CROssBAR: Comprehensive Resource of Biomedical Relations with Deep Learning Applications and Knowledge Graph Representations

10.1101/2020.09.14.296889 ◽

2020 ◽

Author(s):

Tunca Doğan ◽

Heval Atas ◽

Vishal Joshi ◽

Ahmet Atakan ◽

Ahmet Sureyya Rifaioglu ◽

...

Keyword(s):

Deep Learning ◽

Protein Interactions ◽

Large Scale ◽

Host Protein ◽

Knowledge Graph ◽

Biomedical Data ◽

Data Resource ◽

Graph Representations ◽

Link Type ◽

Systemic Analysis

AbstractSystemic analysis of available large-scale biological and biomedical data is critical for developing novel and effective treatment approaches against both complex and infectious diseases. Owing to the fact that different sections of the biomedical data is produced by different organizations/institutions using various types of technologies, the data are scattered across individual computational resources, without any explicit relations/connections to each other, which greatly hinders the comprehensive multi-omics-based analysis of data. We aimed to address this issue by constructing a new biological and biomedical data resource, CROssBAR, a comprehensive system that integrates large-scale biomedical data from various resources and store them in a new NoSQL database, enrich these data with deep-learning-based prediction of relations between numerous biomedical entities, rigorously analyse the enriched data to obtain biologically meaningful modules and display them to users via easy-to-interpret, interactive and heterogenous knowledge graph (KG) representations within an open access, user-friendly and online web-service at https://crossbar.kansil.org. As a use-case study, we constructed CROssBAR COVID-19 KGs (available at: https://crossbar.kansil.org/covid_main.php) that incorporate relevant virus and host genes/proteins, interactions, pathways, phenotypes and other diseases, as well as known and completely new predicted drugs/compounds. Our COVID-19 graphs can be utilized for a systems-level evaluation of relevant virus-host protein interactions, mechanisms, phenotypic implications and potential interventions.

Download Full-text

Knowledge Graphs

Biodiversity Information Science and Standards ◽

10.3897/biss.5.73796 ◽

2021 ◽

Vol 5 ◽

Author(s):

Roderic Page

Keyword(s):

Knowledge Management ◽

Large Scale ◽

Personal Knowledge ◽

Knowledge Graph ◽

Specific Knowledge ◽

Management Tools ◽

Global Projects ◽

Knowledge Graphs ◽

Constructing Knowledge ◽

Knowledge Management Tools

Knowledge graphs embody the idea of "everything connected to everything else." As attractive as this seems, there is a substantial gap between the dream of fully interconnected knowledge and the reality of data that is still mostly siloed, or weakly connected by shared strings such as taxonomic names. How do we move forward? Do we focus on building our own domain- or project-specific knowledge graphs, or do we engage with global projects such as Wikidata? Do we construct knowledge graphs, or focus on making our data "knowledge graph ready" by adopting structured markup in the hope that knowledge graphs will spontaneously self-assemble from that data? Do we focus on large-scale, database-driven projects (e.g., triple stores in the cloud), or do we rely on more localised and distributed approaches, such as annotations (e.g., hypothes.is), "content-hash" systems where a cryptographic hash of the data is also its identifier (Elliott et al. 2020), or the growing number of personal knowledge management tools (e.g., Roam, Obsidian, LogSeq)? This talk will share experiences (the good, bad, and the ugly) as I have tried to transition from naïve advocacy to constructing knowledge graphs (Page 2019), or participating in their construction (Page 2021).

Download Full-text

The Coronavirus Network Explorer: Mining a large-scale knowledge graph for effects of SARS-CoV-2 on host cell function

10.1101/2020.09.14.296327 ◽

2020 ◽

Author(s):

Andreas Krämer ◽

Jean-Noël Billaud ◽

Stuart Tugendreich ◽

Dan Shiftman ◽

Martin Jones ◽

...

Keyword(s):

Host Cell ◽

Drug Targets ◽

Large Scale ◽

Cell Function ◽

Affinity Purification ◽

Viral Proteins ◽

Biomedical Literature ◽

Knowledge Graph ◽

Cell Functions ◽

Interactive Network

Building on recent work that identified human host proteins that interact with SARS-CoV-2 viral proteins in the context of an affinity-purification mass spectrometry screen, we use a machine learning-based approach to connect the viral proteins to relevant biological functions and diseases in a large-scale knowledge graph derived from the biomedical literature. Our aim is to explore how SARS-CoV-2 could interfere with various host cell functions, and also to identify additional drug targets amongst the host genes that could potentially be modulated against COVID-19. Results are presented in the form of interactive network visualizations, that allow exploration of underlying experimental evidence. A selection of networks is discussed in the context of recent clinical observations.

Download Full-text

ASER: A Large-scale Eventuality Knowledge Graph

GIS-KG: building a large-scale hierarchical knowledge graph for geographic information science

Deep Learning-Based Named Entity Recognition and Knowledge Graph Construction for Geological Hazards

Knowledge Graph Completion for the Chinese Text of Cultural Relics Based on Bidirectional Encoder Representations from Transformers with Entity-Type Information

Knowledge Graph Construction and Applications for Web Search and Beyond

Robustifying Scholia: paving the way for knowledge discovery and research assessment through Wikidata

Research progress of large-scale knowledge graph completion technology

An Incremental Reasoning Algorithm for Large Scale Knowledge Graph

CROssBAR: Comprehensive Resource of Biomedical Relations with Deep Learning Applications and Knowledge Graph Representations

Knowledge Graphs

The Coronavirus Network Explorer: Mining a large-scale knowledge graph for effects of SARS-CoV-2 on host cell function

Export Citation Format