scholarly journals Data Set and Evaluation of Automated Construction of Financial Knowledge Graph

2021 ◽  
pp. 1-21
Author(s):  
Wenguang Wang ◽  
Yonglin Xu ◽  
Chunhui Du ◽  
Yunwen Chen ◽  
Yijie Wang ◽  
...  

Abstract With the development of entity extraction, relationship extraction, knowledge reasoning, and entity linking, knowledge graph technology has been in full swing in recent years. To better promote the development of knowledge graph, especially in the Chinese language and in the financial industry, we built a high-quality data set, named financial research report knowledge graph (FR2KG), and organized the automated construction of financial knowledge graph evaluation at the 2020 China Knowledge Graph and Semantic Computing Conference (CCKS2020). FR2KG consists of 17,799 entities, 26,798 relationship triples, and 1,328 attribute triples covering 10 entity types, 19 relationship types, and 6 attributes. Participants are required to develop a constructor that will automatically construct a financial knowledge graph based on the FR2KG. In addition, we summarized the technologies for automatically constructing knowledge graphs, and introduced the methods used by the winners and the results of this evaluation.

Information ◽  
2020 ◽  
Vol 11 (4) ◽  
pp. 186 ◽  
Author(s):  
Shuang Liu ◽  
Hui Yang ◽  
Jiayi Li ◽  
Simon Kolmanič

The domestic population has paid increasing attention to ancient Chinese history and culture with the continuous improvement of people’s living standards, the rapid economic growth, and the rapid advancement of information science and technology. The use of information technology has been proven to promote the spread and development of historical culture, and it is becoming a necessary means to promote our traditional culture. This paper will build a knowledge graph of ancient Chinese history and culture in order to facilitate the public to more quickly and accurately understand the relevant knowledge of ancient Chinese history and culture. The construction process is as follows: firstly, use crawler technology to obtain text and table data related to ancient history and culture on Baidu Encyclopedia (similar to Wikipedia) and ancient Chinese history and culture related pages. Among them, the crawler technology crawls the semi-structured data in the information box (InfoBox) in the Baidu Encyclopedia to directly construct the triples required for the knowledge graph, crawls the introductory text information of the entries in Baidu Encyclopedia, and specialized historical and cultural websites (history Chunqiu.com, On History.com) to extract unstructured entities and relationships. Secondly, entity recognition and relationship extraction are performed on an unstructured text. The entity recognition part uses the Bidirectional Long Short-Term Memory-Convolutional Neural Networks-Conditions Random Field (BiLSTM-CNN-CRF) model for entity extraction. The relationship extraction between entities is performed by using the open source tool DeepKE (information extraction tool with language recognition ability developed by Zhejiang University) to extract the relationships between entities. After obtaining the entity and the relationship between the entities, supplement it with the triple data that were constructed from the semi-structured data in the existing knowledge base and Baidu Encyclopedia information box. Subsequently, the ontology construction and the quality evaluation of the entire constructed knowledge graph are performed to form the final knowledge graph of ancient Chinese history and culture.


Author(s):  
Fuhua Shang ◽  
Qiuyu Ding ◽  
Ruishan Du ◽  
Maojun Cao ◽  
Huanyu Chen

The analysis of user behavior provides a large amount of useful information. After being extracted, this information is called user knowledge. User knowledge plays a guiding role in implementing user-centric updates for software platforms. A good representation and application of user knowledge can accelerate the development of a software platform and improve its quality. This paper aims to further the utilization of user knowledge by mining the user knowledge that is implicit in user behavior and then constructing a knowledge graph of this behavior. First, the association between a software bug and a software component is mined from the user knowledge. Then, the knowledge entity extraction and relationship extraction are performed from the development code and the user behavior. Finally, the knowledge is stored in the graph database, from which it can be visually retrieved. Relevant experiments on CIFLog, an integrated logging processing software platform, have proved the effectiveness of this research. Constructing a user behavior knowledge graph can improve the utilization of user knowledge as well as the quality of software platform development.


2021 ◽  
Vol 13 (6) ◽  
pp. 3191
Author(s):  
Jiyuan Tan ◽  
Qianqian Qiu ◽  
Weiwei Guo ◽  
Tingshuai Li

The integration of multi-source transportation data is complex and insufficient in most of the big cities, which made it difficult for researchers to conduct in-depth data mining to improve the policy or the management. In order to solve this problem, a top-down approach is used to construct a knowledge graph of urban traffic system in this paper. First, the model layer of the knowledge graph was used to realize the reuse and sharing of knowledge. Furthermore, the model layer then was stored in the graph database Neo4j. Second, the representation learning based knowledge reasoning model was adopted to implement knowledge completion and improve the knowledge graph. Finally, the proposed method was validated with an urban traffic data set and the results showed that the model could be used to mine the implicit relationship between traffic entities and discover traffic knowledge effectively.


Genes ◽  
2021 ◽  
Vol 12 (7) ◽  
pp. 998
Author(s):  
Peng Zhang ◽  
Yi Bu ◽  
Peng Jiang ◽  
Xiaowen Shi ◽  
Bing Lun ◽  
...  

This study builds a coronavirus knowledge graph (KG) by merging two information sources. The first source is Analytical Graph (AG), which integrates more than 20 different public datasets related to drug discovery. The second source is CORD-19, a collection of published scientific articles related to COVID-19. We combined both chemo genomic entities in AG with entities extracted from CORD-19 to expand knowledge in the COVID-19 domain. Before populating KG with those entities, we perform entity disambiguation on CORD-19 collections using Wikidata. Our newly built KG contains at least 21,700 genes, 2500 diseases, 94,000 phenotypes, and other biological entities (e.g., compound, species, and cell lines). We define 27 relationship types and use them to label each edge in our KG. This research presents two cases to evaluate the KG’s usability: analyzing a subgraph (ego-centered network) from the angiotensin-converting enzyme (ACE) and revealing paths between biological entities (hydroxychloroquine and IL-6 receptor; chloroquine and STAT1). The ego-centered network captured information related to COVID-19. We also found significant COVID-19-related information in top-ranked paths with a depth of three based on our path evaluation.


Author(s):  
Ahmad R. Alsaber ◽  
Jiazhu Pan ◽  
Adeeba Al-Hurban 

In environmental research, missing data are often a challenge for statistical modeling. This paper addressed some advanced techniques to deal with missing values in a data set measuring air quality using a multiple imputation (MI) approach. MCAR, MAR, and NMAR missing data techniques are applied to the data set. Five missing data levels are considered: 5%, 10%, 20%, 30%, and 40%. The imputation method used in this paper is an iterative imputation method, missForest, which is related to the random forest approach. Air quality data sets were gathered from five monitoring stations in Kuwait, aggregated to a daily basis. Logarithm transformation was carried out for all pollutant data, in order to normalize their distributions and to minimize skewness. We found high levels of missing values for NO2 (18.4%), CO (18.5%), PM10 (57.4%), SO2 (19.0%), and O3 (18.2%) data. Climatological data (i.e., air temperature, relative humidity, wind direction, and wind speed) were used as control variables for better estimation. The results show that the MAR technique had the lowest RMSE and MAE. We conclude that MI using the missForest approach has a high level of accuracy in estimating missing values. MissForest had the lowest imputation error (RMSE and MAE) among the other imputation methods and, thus, can be considered to be appropriate for analyzing air quality data.


Author(s):  
Sebastian Hoppe Nesgaard Jensen ◽  
Mads Emil Brix Doest ◽  
Henrik Aanæs ◽  
Alessio Del Bue

AbstractNon-rigid structure from motion (nrsfm), is a long standing and central problem in computer vision and its solution is necessary for obtaining 3D information from multiple images when the scene is dynamic. A main issue regarding the further development of this important computer vision topic, is the lack of high quality data sets. We here address this issue by presenting a data set created for this purpose, which is made publicly available, and considerably larger than the previous state of the art. To validate the applicability of this data set, and provide an investigation into the state of the art of nrsfm, including potential directions forward, we here present a benchmark and a scrupulous evaluation using this data set. This benchmark evaluates 18 different methods with available code that reasonably spans the state of the art in sparse nrsfm. This new public data set and evaluation protocol will provide benchmark tools for further development in this challenging field.


2021 ◽  
Vol 11 (15) ◽  
pp. 7104
Author(s):  
Xu Yang ◽  
Ziyi Huan ◽  
Yisong Zhai ◽  
Ting Lin

Nowadays, personalized recommendation based on knowledge graphs has become a hot spot for researchers due to its good recommendation effect. In this paper, we researched personalized recommendation based on knowledge graphs. First of all, we study the knowledge graphs’ construction method and complete the construction of the movie knowledge graphs. Furthermore, we use Neo4j graph database to store the movie data and vividly display it. Then, the classical translation model TransE algorithm in knowledge graph representation learning technology is studied in this paper, and we improved the algorithm through a cross-training method by using the information of the neighboring feature structures of the entities in the knowledge graph. Furthermore, the negative sampling process of TransE algorithm is improved. The experimental results show that the improved TransE model can more accurately vectorize entities and relations. Finally, this paper constructs a recommendation model by combining knowledge graphs with ranking learning and neural network. We propose the Bayesian personalized recommendation model based on knowledge graphs (KG-BPR) and the neural network recommendation model based on knowledge graphs(KG-NN). The semantic information of entities and relations in knowledge graphs is embedded into vector space by using improved TransE method, and we compare the results. The item entity vectors containing external knowledge information are integrated into the BPR model and neural network, respectively, which make up for the lack of knowledge information of the item itself. Finally, the experimental analysis is carried out on MovieLens-1M data set. The experimental results show that the two recommendation models proposed in this paper can effectively improve the accuracy, recall, F1 value and MAP value of recommendation.


2018 ◽  
Vol 141 (3) ◽  
Author(s):  
Artur Joao Carvalho Figueiredo ◽  
Robin Jones ◽  
Oliver J. Pountney ◽  
James A. Scobie ◽  
Gary D. Lock ◽  
...  

This paper presents volumetric velocimetry (VV) measurements for a jet in crossflow that is representative of film cooling. VV employs particle tracking to nonintrusively extract all three components of velocity in a three-dimensional volume. This is its first use in a film-cooling context. The primary research objective was to develop this novel measurement technique for turbomachinery applications, while collecting a high-quality data set that can improve the understanding of the flow structure of the cooling jet. A new facility was designed and manufactured for this study with emphasis on optical access and controlled boundary conditions. For a range of momentum flux ratios from 0.65 to 6.5, the measurements clearly show the penetration of the cooling jet into the freestream, the formation of kidney-shaped vortices, and entrainment of main flow into the jet. The results are compared to published studies using different experimental techniques, with good agreement. Further quantitative analysis of the location of the kidney vortices demonstrates their lift off from the wall and increasing lateral separation with increasing momentum flux ratio. The lateral divergence correlates very well with the self-induced velocity created by the wall–vortex interaction. Circulation measurements quantify the initial roll up and decay of the kidney vortices and show that the point of maximum circulation moves downstream with increasing momentum flux ratio. The potential for nonintrusive VV measurements in turbomachinery flow has been clearly demonstrated.


Sign in / Sign up

Export Citation Format

Share Document