A Distributed Event Extraction Framework for Large-Scale Unstructured Text

Author(s):  
Zhigang Kan ◽  
Haibo Mi ◽  
Sen Yang ◽  
Linbo Qiao ◽  
Dawei Feng ◽  
...  
2016 ◽  
Vol 31 (2) ◽  
pp. 97-123 ◽  
Author(s):  
Alfred Krzywicki ◽  
Wayne Wobcke ◽  
Michael Bain ◽  
John Calvo Martinez ◽  
Paul Compton

AbstractData mining techniques for extracting knowledge from text have been applied extensively to applications including question answering, document summarisation, event extraction and trend monitoring. However, current methods have mainly been tested on small-scale customised data sets for specific purposes. The availability of large volumes of data and high-velocity data streams (such as social media feeds) motivates the need to automatically extract knowledge from such data sources and to generalise existing approaches to more practical applications. Recently, several architectures have been proposed for what we callknowledge mining: integrating data mining for knowledge extraction from unstructured text (possibly making use of a knowledge base), and at the same time, consistently incorporating this new information into the knowledge base. After describing a number of existing knowledge mining systems, we review the state-of-the-art literature on both current text mining methods (emphasising stream mining) and techniques for the construction and maintenance of knowledge bases. In particular, we focus on mining entities and relations from unstructured text data sources, entity disambiguation, entity linking and question answering. We conclude by highlighting general trends in knowledge mining research and identifying problems that require further research to enable more extensive use of knowledge bases.


2016 ◽  
Vol 9 (2) ◽  
Author(s):  
Janea Triplet ◽  
Andrew Harrison ◽  
Brian Mennecke ◽  
Akmal Mirsadikov

This paper introduces an approach for the examination and organization of unstructured text to identify relationships between networks of individuals. This approach uses discourse analysis to identify information providers and recipients and determines the structure of covert organizations irrespective of the language that facilitate conversations between members. Then, this method applies social network analytics to determine the arrangement of a covert organization without any a priori knowledge of the network structure. This approach is tested and validated using communication data collected in a virtual world setting. Our analysis indicates that the proposed framework successfully detected the covert structure of three information networks, and their cliques, within an online gaming community during a simulation of a large-scale event.


PLoS ONE ◽  
2013 ◽  
Vol 8 (4) ◽  
pp. e55814 ◽  
Author(s):  
Sofie Van Landeghem ◽  
Jari Björne ◽  
Chih-Hsuan Wei ◽  
Kai Hakala ◽  
Sampo Pyysalo ◽  
...  

2020 ◽  
Vol 9 (12) ◽  
pp. 712
Author(s):  
Agung Dewandaru ◽  
Dwi Hendratmo Widyantoro ◽  
Saiful Akbar

Geoparser is a fundamental component of a Geographic Information Retrieval (GIR) geoparser, which performs toponym recognition, disambiguation, and geographic coordinate resolution from unstructured text domain. However, geoparsing of news articles which report several events across many place-mentions in the document are not yet adequately handled by regular geoparser, where the scope of resolution is either toponym-level or document-level. The capacity to detect multiple events and geolocate their true coordinates along with their numerical arguments is still missing from modern geoparsers, much less in Indonesian news corpora domain. We propose an event geoparser model with three stages of processing, which tightly integrates event extraction model into geoparsing and provides precise event-level resolution scope. The model casts the geotagging and event extraction as sequence labeling and uses LSTM-CRF inferencer equipped with features derived using Aggregated Topic Model from a large corpus to increase the generalizability. Throughout the proposed workflow and features, the geoparser is able to significantly improve the identification of pseudo-location entities, resulting in a 23.43% increase for weighted F1 score compared to baseline gazetteer and POS Tag features. As a side effect of event extraction, various numerical arguments are also extracted, and the output is easily projected to a rich choropleth map from a single news document.


2017 ◽  
Vol 2017 ◽  
pp. 1-5
Author(s):  
Yunyu Shi ◽  
Jianfang Shan ◽  
Xiang Liu ◽  
Yongxiang Xia

Text representation is a basic issue of text information processing and event plays an important role in text understanding; both attract the attention of scholars. The event network conceals lexical relations in events, and its edges express logical relations between events in document. However, the events and relations are extracted from event-annotated text, which makes it hard for large-scale text automatic processing. In the paper, with expanded CEC (Chinese Event Corpus) as data source, prior knowledge of manifestation rules of event and relation as the guide, we propose an event extraction method based on knowledge-based rule of event manifestation, to achieve automatic building and improve text processing performance of event network.


2021 ◽  
Author(s):  
Qi Zhai ◽  
Zhigang Kan ◽  
Linhui Feng ◽  
Linbo Qiao ◽  
Feng Liu

Recently, Chinese event detection has attracted more and more attention. As a special kind of hieroglyphics, Chinese glyphs are semantically useful but still unexplored in this task. In this paper, we propose a novel Glyph-Aware Fusion Network, named GlyFN. It introduces the glyphs' information into the pre-trained language model representation. To obtain a better representation, we design a Vector Linear Fusion mechanism to fuse them. Specifically, it first utilizes a max-pooling to capture salient information. Then, we use the linear operation of vectors to retain unique information. Moreover, for large-scale unstructured text, we distribute the data into different clusters parallelly. Finally, we conduct extensive experiments on ACE2005 and large-scale data. Experimental results show that GlyFN obtains increases of 7.48(10.18%) and 6.17(8.7%) in the F1-score for trigger identification and classification over the state-of-the-art methods, respectively. Furthermore, the event detection task for large-scale unstructured text can be efficiently accomplished through distribution.


Author(s):  
Hao Fei ◽  
Yafeng Ren ◽  
Yue Zhang ◽  
Donghong Ji ◽  
Xiaohui Liang

Abstract Biomedical information extraction (BioIE) is an important task. The aim is to analyze biomedical texts and extract structured information such as named entities and semantic relations between them. In recent years, pre-trained language models have largely improved the performance of BioIE. However, they neglect to incorporate external structural knowledge, which can provide rich factual information to support the underlying understanding and reasoning for biomedical information extraction. In this paper, we first evaluate current extraction methods, including vanilla neural networks, general language models and pre-trained contextualized language models on biomedical information extraction tasks, including named entity recognition, relation extraction and event extraction. We then propose to enrich a contextualized language model by integrating a large scale of biomedical knowledge graphs (namely, BioKGLM). In order to effectively encode knowledge, we explore a three-stage training procedure and introduce different fusion strategies to facilitate knowledge injection. Experimental results on multiple tasks show that BioKGLM consistently outperforms state-of-the-art extraction models. A further analysis proves that BioKGLM can capture the underlying relations between biomedical knowledge concepts, which are crucial for BioIE.


2020 ◽  
Vol 48 (3) ◽  
pp. 129-136
Author(s):  
Qihang Wu ◽  
Daifeng Li ◽  
Lu Huang ◽  
Biyun Ye

Purpose Entity relation extraction is an important research direction to obtain structured information. However, most of the current methods are to determine the relations between entities in a given sentence based on a stepwise method, seldom considering entities and relations into a unified framework. The joint learning method is an optimal solution that combines relations and entities. This paper aims to optimize hierarchical reinforcement learning framework and provide an efficient model to extract entity relation. Design/methodology/approach This paper is based on the hierarchical reinforcement learning framework of joint learning and combines the model with BERT, the best language representation model, to optimize the word embedding and encoding process. Besides, this paper adjusts some punctuation marks to make the data set more standardized, and introduces positional information to improve the performance of the model. Findings Experiments show that the model proposed in this paper outperforms the baseline model with a 13% improvement, and achieve 0.742 in F1 score in NYT10 data set. This model can effectively extract entities and relations in large-scale unstructured text and can be applied to the fields of multi-domain information retrieval, intelligent understanding and intelligent interaction. Originality/value The research provides an efficient solution for researchers in a different domain to make use of artificial intelligence (AI) technologies to process their unstructured text more accurately.


Sign in / Sign up

Export Citation Format

Share Document