Building a Large-Scale Knowledge Graph for Elementary Education in China

Author(s):  
Wei Zheng ◽  
Zhichun Wang ◽  
Mingchen Sun ◽  
Yanrong Wu ◽  
Kaiman Li
Author(s):  
Hongming Zhang ◽  
Xin Liu ◽  
Haojie Pan ◽  
Yangqiu Song ◽  
Cane Wing-Ki Leung
Keyword(s):  

2019 ◽  
Vol 9 (1) ◽  
pp. 15 ◽  
Author(s):  
Runyu Fan ◽  
Lizhe Wang ◽  
Jining Yan ◽  
Weijing Song ◽  
Yingqian Zhu ◽  
...  

Constructing a knowledge graph of geological hazards literature can facilitate the reuse of geological hazards literature and provide a reference for geological hazard governance. Named entity recognition (NER), as a core technology for constructing a geological hazard knowledge graph, has to face the challenges that named entities in geological hazard literature are diverse in form, ambiguous in semantics, and uncertain in context. This can introduce difficulties in designing practical features during the NER classification. To address the above problem, this paper proposes a deep learning-based NER model; namely, the deep, multi-branch BiGRU-CRF model, which combines a multi-branch bidirectional gated recurrent unit (BiGRU) layer and a conditional random field (CRF) model. In an end-to-end and supervised process, the proposed model automatically learns and transforms features by a multi-branch bidirectional GRU layer and enhances the output with a CRF layer. Besides the deep, multi-branch BiGRU-CRF model, we also proposed a pattern-based corpus construction method to construct the corpus needed for the deep, multi-branch BiGRU-CRF model. Experimental results indicated the proposed deep, multi-branch BiGRU-CRF model outperformed state-of-the-art models. The proposed deep, multi-branch BiGRU-CRF model constructed a large-scale geological hazard literature knowledge graph containing 34,457 entities nodes and 84,561 relations.


Entropy ◽  
2020 ◽  
Vol 22 (10) ◽  
pp. 1168
Author(s):  
Min Zhang ◽  
Guohua Geng ◽  
Sheng Zeng ◽  
Huaping Jia

Knowledge graph completion can make knowledge graphs more complete, which is a meaningful research topic. However, the existing methods do not make full use of entity semantic information. Another challenge is that a deep model requires large-scale manually labelled data, which greatly increases manual labour. In order to alleviate the scarcity of labelled data in the field of cultural relics and capture the rich semantic information of entities, this paper proposes a model based on the Bidirectional Encoder Representations from Transformers (BERT) with entity-type information for the knowledge graph completion of the Chinese texts of cultural relics. In this work, the knowledge graph completion task is treated as a classification task, while the entities, relations and entity-type information are integrated as a textual sequence, and the Chinese characters are used as a token unit in which input representation is constructed by summing token, segment and position embeddings. A small number of labelled data are used to pre-train the model, and then, a large number of unlabelled data are used to fine-tune the pre-training model. The experiment results show that the BERT-KGC model with entity-type information can enrich the semantics information of the entities to reduce the degree of ambiguity of the entities and relations to some degree and achieve more effective performance than the baselines in triple classification, link prediction and relation prediction tasks using 35% of the labelled data of cultural relics.


2019 ◽  
Vol 1 (4) ◽  
pp. 333-349 ◽  
Author(s):  
Peilu Wang ◽  
Hao Jiang ◽  
Jingfang Xu ◽  
Qi Zhang

Knowledge graph (KG) has played an important role in enhancing the performance of many intelligent systems. In this paper, we introduce the solution of building a large-scale multi-source knowledge graph from scratch in Sogou Inc., including its architecture, technical implementation and applications. Unlike previous works that build knowledge graph with graph databases, we build the knowledge graph on top of SogouQdb, a distributed search engine developed by Sogou Web Search Department, which can be easily scaled to support petabytes of data. As a supplement to the search engine, we also introduce a series of models to support inference and graph based querying. Currently, the data of Sogou knowledge graph that are collected from 136 different websites and constantly updated consist of 54 million entities and over 600 million entity links. We also introduce three applications of knowledge graph in Sogou Inc.: entity detection and linking, knowledge based question answering and knowledge based dialog system. These applications have been used in Web search products to help user acquire information more efficiently.


2013 ◽  
Vol 421 ◽  
pp. 725-730
Author(s):  
Song Bin Bao

English, which is specially used in the field of manufacturing systems, belongs to ESP (English for specific purposes). In order to improve the effect of ESP education in China, it is very necessary to create an English-Chinese parallel corpus for aiding ESP teaching and learning. In this paper, a novel method is presented to create a small-scale English-Chinese parallel corpus by means of TMS (translation memory system). Firstly, the suitable English and Chinese texts are collected from network, publication and human translation; secondly, The English and Chinese texts are aligned and formatted by using the related TMS functions; then Chinese texts are split into words by using ICWSS (Intelligent Chinese Word Segmentation System); finally, the English-Chinese corpus is stored in cloud database. This small-scale English-Chinese parallel corpus can be searched through ParaConc and meet the basic needs of ESP teaching and learning. Since the method does not need to design new algorithm nor develop new software system, the construction of the corpus is much easier and more flexible compared to general large-scale corpus.


2019 ◽  
Vol 5 ◽  
Author(s):  
Lane Rasberry ◽  
Egon Willighagen ◽  
Finn Nielsen ◽  
Daniel Mietchen

Knowledge workers like researchers, students, journalists, research evaluators or funders need tools to explore what is known, how it was discovered, who made which contributions, and where the scholarly record has gaps. Existing tools and services of this kind are not available as Linked Open Data, but Wikidata is. It has the technology, active contributor base, and content to build a large-scale knowledge graph for scholarship, also known as WikiCite. Scholia visualizes this graph in an exploratory interface with profiles and links to the literature. However, it is just a working prototype. This project aims to "robustify Scholia" with back-end development and testing based on pilot corpora. The main objective at this stage is to attain stability in challenging cases such as server throttling and handling of large or incomplete datasets. Further goals include integrating Scholia with data curation and manuscript writing workflows, serving more languages, generating usage stats, and documentation.


2020 ◽  
Vol 50 (4) ◽  
pp. 551-575 ◽  
Author(s):  
Zhijuan DU ◽  
Xiaofeng MENG ◽  
Shuo WANG

2020 ◽  
Author(s):  
Tunca Doğan ◽  
Heval Atas ◽  
Vishal Joshi ◽  
Ahmet Atakan ◽  
Ahmet Sureyya Rifaioglu ◽  
...  

AbstractSystemic analysis of available large-scale biological and biomedical data is critical for developing novel and effective treatment approaches against both complex and infectious diseases. Owing to the fact that different sections of the biomedical data is produced by different organizations/institutions using various types of technologies, the data are scattered across individual computational resources, without any explicit relations/connections to each other, which greatly hinders the comprehensive multi-omics-based analysis of data. We aimed to address this issue by constructing a new biological and biomedical data resource, CROssBAR, a comprehensive system that integrates large-scale biomedical data from various resources and store them in a new NoSQL database, enrich these data with deep-learning-based prediction of relations between numerous biomedical entities, rigorously analyse the enriched data to obtain biologically meaningful modules and display them to users via easy-to-interpret, interactive and heterogenous knowledge graph (KG) representations within an open access, user-friendly and online web-service at https://crossbar.kansil.org. As a use-case study, we constructed CROssBAR COVID-19 KGs (available at: https://crossbar.kansil.org/covid_main.php) that incorporate relevant virus and host genes/proteins, interactions, pathways, phenotypes and other diseases, as well as known and completely new predicted drugs/compounds. Our COVID-19 graphs can be utilized for a systems-level evaluation of relevant virus-host protein interactions, mechanisms, phenotypic implications and potential interventions.


Sign in / Sign up

Export Citation Format

Share Document