Knowledge Graph Construction Framework in the Securities Domain Based on FinBERT-CRF Named Entity Recognition Model

2021 ◽  
Vol 11 (03) ◽  
pp. 135-149
Author(s):  
秋宇 任
Author(s):  
Xinghui Zhu ◽  
Zhuoyang Zou ◽  
Bo Qiao ◽  
Kui Fang ◽  
Yiming Chen

Knowledge Graph has gradually become one of core drivers advancing the Internet and AI in recent years, while there is currently no normal knowledge graph in the field of agriculture. Named Entity Recognition (NER), one important step in constructing knowledge graphs, has become a hot topic in both academia and industry. With the help of the Bidirectional Long Short-Term Memory Network (Bi-LSTM) and Conditional Random Field (CRF) model, we introduce a method of ensemble learning, and implement a named entity recognition model ELER. Our model achieves good results for the CoNLL2003 data set, the accuracy and F1 value in the best experimental results are respectively improved by 1.37% and 0.7% when compared with the BiLSTM-CRF model. In addition, our model achieves an F1 score of 91% for the agricultural data set AgriNER2018, which proves the validity of ELER model for small agriculture sample data sets and lays a foundation for the construction of agricultural knowledge graphs.


2020 ◽  
Author(s):  
Yong Fang ◽  
Yuchi Zhang ◽  
Cheng Huang

Abstract Cybersecurity has gradually become the public focus between common people and countries with the high development of Internet technology in daily life. The cybersecurity knowledge analysis methods have achieved high evolution with the help of knowledge graph technology, especially a lot of threat intelligence information could be extracted with fine granularity. But named entity recognition (NER) is the primary task for constructing security knowledge graph. Traditional NER models are difficult to determine entities that have a complex structure in the field of cybersecurity, and it is difficult to capture non-local and non-sequential dependencies. In this paper, we propose a cybersecurity entity recognition model CyberEyes that uses non-local dependencies extracted by graph convolutional neural networks. The model can capture both local context and graph-level non-local dependencies. In the evaluation experiments, our model reached an F1 score of 90.28% on the cybersecurity corpus under the gold evaluation standard for NER, which performed better than the 86.49% obtained by the classic CNN-BiLSTM-CRF model.


2021 ◽  
Vol 16 ◽  
pp. 1-10
Author(s):  
Husni Teja Sukmana ◽  
JM Muslimin ◽  
Asep Fajar Firmansyah ◽  
Lee Kyung Oh

In Indonesia, philanthropy is identical to Zakat. Zakat belongs to a specific domain because it has its characteristics of knowledge. This research studied knowledge graph in the Zakat domain called KGZ which is conducted in Indonesia. This area is still rarely performed, thus it becomes the first knowledge graph for Zakat in Indonesia. It is designed to provide basic knowledge on Zakat and managing the Zakat in Indonesia. There are some issues with building KGZ, firstly, the existing Indonesian named entity recognition (NER) is non-restricted and general-purpose based which data is obtained from a general source like news. Second, there is no dataset for NER in the Zakat domain. We define four steps to build KGZ, involving data acquisition, extracting entities and their relationship, mapping to ontology, and deploying knowledge graphs and visualizations. This research contributed a knowledge graph for Zakat (KGZ) and a building NER model for Zakat, called KGZ-NER. We defined 17 new named entity classes related to Zakat with 272 entities, 169 relationships and provided labelled datasets for KGZ-NER that are publicly accessible. We applied the Indonesian-Open Domain Information Extractor framework to process identifying entities’ relationships. Then designed modeling of information using resources description framework (RDF) to build the knowledge base for KGZ and store it to GraphDB, a product from Ontotext. This NER model has a precision 0.7641, recall 0.4544, and F1-score 0.5655. The increasing data size of KGZ is required to discover all of the knowledge of Zakat and managing Zakat in Indonesia. Moreover, sufficient resources are required in future works.


2019 ◽  
Vol 9 (1) ◽  
pp. 15 ◽  
Author(s):  
Runyu Fan ◽  
Lizhe Wang ◽  
Jining Yan ◽  
Weijing Song ◽  
Yingqian Zhu ◽  
...  

Constructing a knowledge graph of geological hazards literature can facilitate the reuse of geological hazards literature and provide a reference for geological hazard governance. Named entity recognition (NER), as a core technology for constructing a geological hazard knowledge graph, has to face the challenges that named entities in geological hazard literature are diverse in form, ambiguous in semantics, and uncertain in context. This can introduce difficulties in designing practical features during the NER classification. To address the above problem, this paper proposes a deep learning-based NER model; namely, the deep, multi-branch BiGRU-CRF model, which combines a multi-branch bidirectional gated recurrent unit (BiGRU) layer and a conditional random field (CRF) model. In an end-to-end and supervised process, the proposed model automatically learns and transforms features by a multi-branch bidirectional GRU layer and enhances the output with a CRF layer. Besides the deep, multi-branch BiGRU-CRF model, we also proposed a pattern-based corpus construction method to construct the corpus needed for the deep, multi-branch BiGRU-CRF model. Experimental results indicated the proposed deep, multi-branch BiGRU-CRF model outperformed state-of-the-art models. The proposed deep, multi-branch BiGRU-CRF model constructed a large-scale geological hazard literature knowledge graph containing 34,457 entities nodes and 84,561 relations.


2020 ◽  
Vol 2020 ◽  
pp. 1-8
Author(s):  
Lejun Gong ◽  
Zhifei Zhang ◽  
Shiqi Chen

Background. Clinical named entity recognition is the basic task of mining electronic medical records text, which are with some challenges containing the language features of Chinese electronic medical records text with many compound entities, serious missing sentence components, and unclear entity boundary. Moreover, the corpus of Chinese electronic medical records is difficult to obtain. Methods. Aiming at these characteristics of Chinese electronic medical records, this study proposed a Chinese clinical entity recognition model based on deep learning pretraining. The model used word embedding from domain corpus and fine-tuning of entity recognition model pretrained by relevant corpus. Then BiLSTM and Transformer are, respectively, used as feature extractors to identify four types of clinical entities including diseases, symptoms, drugs, and operations from the text of Chinese electronic medical records. Results. 75.06% Macro-P, 76.40% Macro-R, and 75.72% Macro-F1 aiming at test dataset could be achieved. These experiments show that the Chinese clinical entity recognition model based on deep learning pretraining can effectively improve the recognition effect. Conclusions. These experiments show that the proposed Chinese clinical entity recognition model based on deep learning pretraining can effectively improve the recognition performance.


2021 ◽  
Vol 2021 ◽  
pp. 1-8
Author(s):  
Wangping Xiong ◽  
Jun Cao ◽  
Xian Zhou ◽  
Jianqiang Du ◽  
Bin Nie ◽  
...  

Background. Chinese patent medicines are increasingly used clinically, and the prescription drug monitoring program is an effective tool to promote drug safety and maintain health. Methods. We constructed a prescription drug monitoring program for Chinese patent medicines based on knowledge graphs. First, we extracted the key information of Chinese patent medicines, diseases, and symptoms from the domain-specific corpus by the information extraction. Second, based on the extracted entities and relationships, a knowledge graph was constructed to form a rule base for the monitoring of data. Then, the named entity recognition model extracted the key information from the electronic medical record to be monitored and matched the knowledge graph to realize the monitoring of the Chinese patent medicines in the prescription. Results. Named entity recognition based on the pretrained model achieved an F1 value of 83.3% on the Chinese patent medicines dataset. On the basis of entity recognition technology and knowledge graph, we implemented a prescription drug monitoring program for Chinese patent medicines. The accuracy rate of combined medication monitoring of three or more drugs of the program increased from 68% to 86.4%. The accuracy rate of drug control monitoring increased from 70% to 97%. The response time for conflicting prescriptions with two drugs was shortened from 1.3S to 0.8S. The response time for conflicting prescriptions with three or more drugs was shortened from 5.2S to 1.4S. Conclusions. The program constructed in this study can respond quickly and improve the efficiency of monitoring prescriptions. It is of great significance to ensure the safety of patients’ medication.


2020 ◽  
Vol 10 (18) ◽  
pp. 6429
Author(s):  
SungMin Yang ◽  
SoYeop Yoo ◽  
OkRan Jeong

Along with studies on artificial intelligence technology, research is also being carried out actively in the field of natural language processing to understand and process people’s language, in other words, natural language. For computers to learn on their own, the skill of understanding natural language is very important. There are a wide variety of tasks involved in the field of natural language processing, but we would like to focus on the named entity registration and relation extraction task, which is considered to be the most important in understanding sentences. We propose DeNERT-KG, a model that can extract subject, object, and relationships, to grasp the meaning inherent in a sentence. Based on the BERT language model and Deep Q-Network, the named entity recognition (NER) model for extracting subject and object is established, and a knowledge graph is applied for relation extraction. Using the DeNERT-KG model, it is possible to extract the subject, type of subject, object, type of object, and relationship from a sentence, and verify this model through experiments.


2021 ◽  
Author(s):  
Shen Zhou Feng ◽  
Su Qian Min ◽  
Guo Jing Lei

Abstract The recognition of named entities in Chinese clinical electronic medical records is one of the basic tasks to realize smart medical care. Aiming at the insufficient text semantic representation of the traditional word vector model and the inability of the recurrent neural network (RNN) model to solve the problems of long-term dependence, a Chinese clinical electronic medical record named entity recognition model XLNet-BiLSTM-MHA-CRF based on XLNet is proposed. Use the XLNet pre-training language model as the embedding layer to vectorize the medical record text to solve the problem of ambiguity; use the bidirectional long and short-term memory network (BiLSTM) gate control unit to obtain the forward and backward semantic feature information of the sentence; Then input the feature sequence to the multi-head attention layer (multi-head attention, MHA), use MHA to obtain information represented by different subspaces of the feature sequence, enhance the relevance of context semantics and eliminate noise; finally, input the conditional random field CRF to identify the global maximum 优 sequence. The experimental results show that the XLNet-BiLSTM-Attention-CRF model has achieved good results on the CCKS-2017 named entity recognition data set.


Sign in / Sign up

Export Citation Format

Share Document