scholarly journals Preliminary Study on the Knowledge Graph Construction of Chinese Ancient History and Culture

Information ◽  
2020 ◽  
Vol 11 (4) ◽  
pp. 186 ◽  
Author(s):  
Shuang Liu ◽  
Hui Yang ◽  
Jiayi Li ◽  
Simon Kolmanič

The domestic population has paid increasing attention to ancient Chinese history and culture with the continuous improvement of people’s living standards, the rapid economic growth, and the rapid advancement of information science and technology. The use of information technology has been proven to promote the spread and development of historical culture, and it is becoming a necessary means to promote our traditional culture. This paper will build a knowledge graph of ancient Chinese history and culture in order to facilitate the public to more quickly and accurately understand the relevant knowledge of ancient Chinese history and culture. The construction process is as follows: firstly, use crawler technology to obtain text and table data related to ancient history and culture on Baidu Encyclopedia (similar to Wikipedia) and ancient Chinese history and culture related pages. Among them, the crawler technology crawls the semi-structured data in the information box (InfoBox) in the Baidu Encyclopedia to directly construct the triples required for the knowledge graph, crawls the introductory text information of the entries in Baidu Encyclopedia, and specialized historical and cultural websites (history Chunqiu.com, On History.com) to extract unstructured entities and relationships. Secondly, entity recognition and relationship extraction are performed on an unstructured text. The entity recognition part uses the Bidirectional Long Short-Term Memory-Convolutional Neural Networks-Conditions Random Field (BiLSTM-CNN-CRF) model for entity extraction. The relationship extraction between entities is performed by using the open source tool DeepKE (information extraction tool with language recognition ability developed by Zhejiang University) to extract the relationships between entities. After obtaining the entity and the relationship between the entities, supplement it with the triple data that were constructed from the semi-structured data in the existing knowledge base and Baidu Encyclopedia information box. Subsequently, the ontology construction and the quality evaluation of the entire constructed knowledge graph are performed to form the final knowledge graph of ancient Chinese history and culture.

2021 ◽  
pp. 1-21
Author(s):  
Wenguang Wang ◽  
Yonglin Xu ◽  
Chunhui Du ◽  
Yunwen Chen ◽  
Yijie Wang ◽  
...  

Abstract With the development of entity extraction, relationship extraction, knowledge reasoning, and entity linking, knowledge graph technology has been in full swing in recent years. To better promote the development of knowledge graph, especially in the Chinese language and in the financial industry, we built a high-quality data set, named financial research report knowledge graph (FR2KG), and organized the automated construction of financial knowledge graph evaluation at the 2020 China Knowledge Graph and Semantic Computing Conference (CCKS2020). FR2KG consists of 17,799 entities, 26,798 relationship triples, and 1,328 attribute triples covering 10 entity types, 19 relationship types, and 6 attributes. Participants are required to develop a constructor that will automatically construct a financial knowledge graph based on the FR2KG. In addition, we summarized the technologies for automatically constructing knowledge graphs, and introduced the methods used by the winners and the results of this evaluation.


Author(s):  
Farzana Rashid ◽  
Fahmida Hamid

Named Entity Recognition (NER) belongs to the field of Information Extraction (IE) and Natural LanguageProcessing (NLP). NER aims to find and categorize named entities present in the textual data into recognizable classes. Named entities play vital roles in other related fields like question-answering, relationship extraction, and machine translation. Researchers have done a significant amount of work (e.g., dataset construction and analysis) in this direction for several languages like English, Spanish, Chinese, Russian, Arabic, to name a few. We do not find a comparable amount of work for several South-Asian languages like Bengali/Bangla. Hence, as part of the initial phase, we have constructed a qualitative dataset in Bengali.In this paper, we identify the presence of Named Entities (NEs) in the Bengali text (sentences), classify them in standardized categories, and test whether an automatic detection of NE is possible. We present a new corpus and experimental results. Our dataset, annotated by multiple humans, shows promising results (F-measures ranging from 0.72 to 0.84) in different setups (support vector machine (SVM) setups with simple language features and Long-Short Term Memory (LSTM) setup with various word embedding).


2021 ◽  
Author(s):  
Qingwen Tian ◽  
Shixing Zhou ◽  
Yu Cheng ◽  
Jianxia Chen ◽  
Yi Gao ◽  
...  

Knowledge Graph is a semantic network that reveals the relationship between entities, which construction is to describe various entities, concepts and their relationships in the real world. Since knowledge graph can effectively reveal the relationship between the different knowledge items, it has been widely utilized in the intelligent education. In particular, relation extraction is the critical part of knowledge graph and plays a very important role in the construction of knowledge graph. According to the different magnitude of data labeling, entity relationship extraction tasks of deep learning can be divided into two categories: supervised and distant supervised. Supervised learning approaches can extract effective entity relationships. However, these approaches rely on labeled data heavily resulting in the time-consuming and laborconsuming. The distant supervision approach is widely concerned by researchers because it can generate the entity relation extraction automatically. However, the development and application of the distant supervised approach has been seriously hindered due to the noises, lack of information and disequilibrium in the relation extraction tasks. Inspired by the above analysis, the paper proposes a novel curriculum points relationship extraction model based on the distant supervision. In particular, firstly the research of the distant supervised relationship extraction model based on the sentence bag attention mechanism to extract the relationship of curriculum points. Secondly, the research of knowledge graph construction based on the knowledge ontology. Thirdly, the development of curriculum semantic retrieval platform based on Web. Compared with the existing advanced models, the AUC of this system is increased by 14.2%; At the same time, taking "big data processing" course in computer field as an example, the relationship extraction result with F1 value of 88.1% is realized. The experimental results show that the proposed model provides an effective solution for the development and application of knowledge graph in the field of intelligent education.


Author(s):  
Shuang Liu ◽  
Hui Yang ◽  
Jiayi Li ◽  
Simon Kolmanič

AbstractWith rapid development of the Internet, people have undergone tremendous changes in the way they obtain information. In recent years, knowledge graph is becoming a popular tool for the public to acquire knowledge. For knowledge graph of Chinese history and culture, most researchers adopted traditional named entity recognition methods to extract entity information from unstructured historical text data. However, the traditional named entity recognition method has certain defects, and it is easy to ignore the association between entities. To extract entities from a large amount of historical and cultural information more accurately and efficiently, this paper proposes one named entity recognition model combining Bidirectional Encoder Representations from Transformers and Bidirectional Long Short-Term Memory-Conditional Random Field (BERT-BiLSTM-CRF). First, a BERT pre-trained language model is used to encode a single character to obtain a vector representation corresponding to each character. Then one Bidirectional Long Short-Term Memory (BiLSTM) layer is applied to semantically encode the input text. Finally, the label with the highest probability is output through the Conditional Random Field (CRF) layer to obtain each character’s category. This model uses the Bidirectional Encoder Representations from Transformers (BERT) pre-trained language model to replace the static word vectors trained in the traditional way. In comparison, the BERT pre-trained language model can dynamically generate semantic vectors according to the context of words, which improves the representation ability of word vectors. The experimental results prove that the model proposed in this paper has achieved excellent results in the task of named entity recognition in the field of historical culture. Compared with the existing named entity identification methods, the precision rate, recall rate, and $$F_1$$ F 1 value have been significantly improved.


2018 ◽  
Vol 10 (9) ◽  
pp. 3292 ◽  
Author(s):  
Hangzhou Yang ◽  
Huiying Gao

Increasingly popular virtualized healthcare services such as online health consultations have significantly changed the way in which health information is sought, and can alleviate geographic barriers, time constraints, and medical resource shortage problems. These online patient–doctor communications have been generating abundant amounts of healthcare-related data. Medical entity extraction from these data is the foundation of medical knowledge discovery, including disease surveillance and adverse drug reaction detection, which can potentially enhance the sustainability of healthcare. Previous studies that focus on health-related entity extraction have certain limitations such as demanding tough handcrafted feature engineering, failing to extract out-of-vocabulary entities, and being unsuitable for the Chinese social media context. Motivated by these observations, this study proposes a novel model named CNMER (Chinese Medical Entity Recognition) using deep neural networks for medical entity recognition in Chinese online health consultations. The designed model utilizes Bidirectional Long Short-Term Memory and Conditional Random Fields as the basic architecture, and uses character embedding and context word embedding to automatically learn effective features to recognize and classify medical-related entities. Exploiting the consultation text collected from a prevalent online health community in China, the evaluation results indicate that the proposed method significantly outperforms the related state-of-the-art models that focus on the Chinese medical entity recognition task. We expect that our model can contribute to the sustainable development of the virtualized healthcare industry.


Author(s):  
Fuhua Shang ◽  
Qiuyu Ding ◽  
Ruishan Du ◽  
Maojun Cao ◽  
Huanyu Chen

The analysis of user behavior provides a large amount of useful information. After being extracted, this information is called user knowledge. User knowledge plays a guiding role in implementing user-centric updates for software platforms. A good representation and application of user knowledge can accelerate the development of a software platform and improve its quality. This paper aims to further the utilization of user knowledge by mining the user knowledge that is implicit in user behavior and then constructing a knowledge graph of this behavior. First, the association between a software bug and a software component is mined from the user knowledge. Then, the knowledge entity extraction and relationship extraction are performed from the development code and the user behavior. Finally, the knowledge is stored in the graph database, from which it can be visually retrieved. Relevant experiments on CIFLog, an integrated logging processing software platform, have proved the effectiveness of this research. Constructing a user behavior knowledge graph can improve the utilization of user knowledge as well as the quality of software platform development.


Author(s):  
Xinghui Zhu ◽  
Zhuoyang Zou ◽  
Bo Qiao ◽  
Kui Fang ◽  
Yiming Chen

Knowledge Graph has gradually become one of core drivers advancing the Internet and AI in recent years, while there is currently no normal knowledge graph in the field of agriculture. Named Entity Recognition (NER), one important step in constructing knowledge graphs, has become a hot topic in both academia and industry. With the help of the Bidirectional Long Short-Term Memory Network (Bi-LSTM) and Conditional Random Field (CRF) model, we introduce a method of ensemble learning, and implement a named entity recognition model ELER. Our model achieves good results for the CoNLL2003 data set, the accuracy and F1 value in the best experimental results are respectively improved by 1.37% and 0.7% when compared with the BiLSTM-CRF model. In addition, our model achieves an F1 score of 91% for the agricultural data set AgriNER2018, which proves the validity of ELER model for small agriculture sample data sets and lays a foundation for the construction of agricultural knowledge graphs.


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Yue Wu ◽  
Jie Huang ◽  
Caie Xu ◽  
Huilin Zheng ◽  
Lei Zhang ◽  
...  

Clinical named entity recognition (CNER) identifies entities from unstructured medical records and classifies them into predefined categories. It is of great significance for follow-up clinical studies. Most of the existing CNER methods fail to give enough thought to Chinese radical-level characteristics and the specialty of the Chinese field. This paper proposes the Ra-RC model, which combines radical features and a deep learning structure to fix this problem. A bidirectional encoder representation of transformer (RoBERTa) is utilized to learn medical features thoroughly. Simultaneously, we use the bidirectional long short-term memory (BiLSTM) network to extract radical-level information to capture the internal relevance of characteristics and stitch the eigenvectors generated by RoBERTa. In addition, the relationship between labels is considered to obtain the optimal tag sequence by applying conditional random field (CRF). The experimental results demonstrate that the proposed Ra-RC model achieves F1 score 93.26% and 82.87% on the CCKS2017 and CCKS2019 datasets, respectively.


Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-8
Author(s):  
Chengyao Lv ◽  
Deng Pan ◽  
Yaxiong Li ◽  
Jianxin Li ◽  
Zong Wang

To identify relationships among entities in natural language texts, extraction of entity relationships technically provides a fundamental support for knowledge graph, intelligent information retrieval, and semantic analysis, promotes the construction of knowledge bases, and improves efficiency of searching and semantic analysis. Traditional methods of relationship extraction, either those proposed at the earlier times or those based on traditional machine learning and deep learning, have focused on keeping relationships and entities in their own silos: extracting relationships and entities are conducted in steps before obtaining the mappings. To address this problem, a novel Chinese relationship extraction method is proposed in this paper. Firstly, the triple is treated as an entity relation chain and can identify the entity before the relationship and predict its corresponding relationship and the entity after the relationship. Secondly, the Joint Extraction of Entity Mentions and Relations model is based on the Bidirectional Long Short-Term Memory and Maximum Entropy Markov Model (Bi-MEMM). Experimental results indicate that the proposed model can achieve a precision of 79.2% which is much higher than that of traditional models.


2021 ◽  
Vol 2101 (1) ◽  
pp. 012041
Author(s):  
Xiangpeng Chen ◽  
Juntai Xie ◽  
Jianmin Gao ◽  
Rongxi Wang ◽  
Jiandong Jiang

Abstract With the deep exploitation of oil and gas resources, the non-API oil country tubular goods (OCTG) adapted to specific environments are used widely. Therefore, how to effectively characterize the quality connotation of non-API OCTG to ensure their quality has become a challenge for the petroleum industry. We propose a dynamic knowledge graph of Quality Infrastructure (QI) to solve the problems of the diversity of non-API OCTG quality influencing factors, the concealment of the relationship, and the ambiguity of the mechanism of quality improvement. Firstly, a knowledge graph ontology framework of quality infrastructure is constructed, which realizes the effective combination of product characteristics and quality basic elements. Secondly, based on the professional dictionary in the field of OCTG, entity recognition adopts the entity recognition method of LDA-BiLSTM-CRF, which effectively improves the recognition accuracy of professional vocabulary. Finally, the relationship between entity types is defined as the edge of the knowledge graph; the graph embedding method is used to supplement the edge connection and calculation weight of the knowledge graph. The QI knowledge graph constructed with this technology can well describe the quality connotation of non-API OCTG, and provide opinions and methods for guaranteeing and improving the quality of non-API OCTG.


Sign in / Sign up

Export Citation Format

Share Document