scholarly journals QA4IE: A Question Answering Based Framework for Information Extraction

Author(s):  
Lin Qiu ◽  
Hao Zhou ◽  
Yanru Qu ◽  
Weinan Zhang ◽  
Suoheng Li ◽  
...  
Author(s):  
Xinghua Fan

Entity and relation recognition, i.e. assigning semantic classes (e.g., person, organization and location) to entities in a given sentence and determining the relations (e.g., born-in and employee-of) that hold between the corresponding entities, is an important task in areas such as information extraction (IE) (Califf and Mooney, 1999; Chinchor, 1998; Freitag, 2000; Roth and Yih, 2001), question answering (QA) (Voorhees, 2000; Changki Lee et al., 2007) and story comprehension (Hirschman et al., 1999). In a QA system, many questions ask for the specific entities involved in some relations. For example, the question that “Where was Poe born?” in TREC-9 asks for the location entity in which Poe was born. In a typical IE extraction task such as constructing a jobs database from unstructured text, the system has to extract many meaning entities like title and salary, ideally, to determine whether the entities are associated with the same position.


Events and time are two major key terms in natural language processing due to the various event-oriented tasks these are become an essential terms in information extraction. In natural language processing and information extraction or retrieval event and time leads to several applications like text summaries, documents summaries, and question answering systems. In this paper, we present events-time graph as a new way of construction for event-time based information from text. In this event-time graph nodes are events, whereas edges represent the temporal and co-reference relations between events. In many of the previous researches of natural language processing mainly individually focused on extraction tasks and in domain-specific way but in this work we present extraction and representation of the relationship between events- time by representing with event time graph construction. Our overall system construction is in three-step process that performs event extraction, time extraction, and representing relation extraction. Each step is at a performance level comparable with the state of the art. We present Event extraction on MUC data corpus annotated with events mentions on which we train and evaluate our model. Next, we present time extraction the model of times tested for several news articles from Wikipedia corpus. Next is to represent event time relation by representation by next constructing event time graphs. Finally, we evaluate the overall quality of event graphs with the evaluation metrics and conclude the observations of the entire work


2021 ◽  
Vol 14 (8) ◽  
pp. 1254-1261
Author(s):  
Nan Tang ◽  
Ju Fan ◽  
Fangyi Li ◽  
Jianhong Tu ◽  
Xiaoyong Du ◽  
...  

Can AI help automate human-easy but computer-hard data preparation tasks that burden data scientists, practitioners, and crowd workers? We answer this question by presenting RPT, a denoising autoencoder for tuple-to-X models (" X " could be tuple, token, label, JSON, and so on). RPT is pre-trained for a tuple-to-tuple model by corrupting the input tuple and then learning a model to reconstruct the original tuple. It adopts a Transformer-based neural translation architecture that consists of a bidirectional encoder (similar to BERT) and a left-to-right autoregressive decoder (similar to GPT), leading to a generalization of both BERT and GPT. The pre-trained RPT can already support several common data preparation tasks such as data cleaning, auto-completion and schema matching. Better still, RPT can be fine-tuned on a wide range of data preparation tasks, such as value normalization, data transformation, data annotation, etc. To complement RPT, we also discuss several appealing techniques such as collaborative training and few-shot learning for entity resolution, and few-shot learning and NLP question-answering for information extraction. In addition, we identify a series of research opportunities to advance the field of data preparation.


AI Magazine ◽  
2010 ◽  
Vol 31 (3) ◽  
pp. 93 ◽  
Author(s):  
Stephen Soderland ◽  
Brendan Roof ◽  
Bo Qin ◽  
Shi Xu ◽  
Mausam ◽  
...  

Information extraction (IE) can identify a set of relations from free text to support question answering (QA). Until recently, IE systems were domain-specific and needed a combination of manual engineering and supervised learning to adapt to each target domain. A new paradigm, Open IE operates on large text corpora without any manual tagging of relations, and indeed without any pre-specified relations. Due to its open-domain and open-relation nature, Open IE is purely textual and is unable to relate the surface forms to an ontology, if known in advance. We explore the steps needed to adapt Open IE to a domain-specific ontology and demonstrate our approach of mapping domain-independent tuples to an ontology using domains from DARPA’s Machine Reading Project. Our system achieves precision over 0.90 from as few as 8 training examples for an NFL-scoring domain.


2015 ◽  
pp. 293-317
Author(s):  
Jan Kocoń ◽  
Michał Marcińczuk ◽  
Marcin Oleksy ◽  
Tomasz Bernaś ◽  
Michał Wolski

Temporal Expressions in Polish Corpus KPWrThis article presents the result of the recent research in the interpretation of Polish expressions that refer to time. These expressions are the source of information when something happens, how often something occurs or how long something lasts. Temporal information, which can be extracted from text automatically, plays significant role in many information extraction systems, such as question answering, discourse analysis, event recognition and many more. We prepared PLIMEX — a broad description of Polish temporal expressions with annotation guidelines, based on the state-of-the-art solutions for English, mainly TimeML specification. We also adapted the solution to capture the local semantics of temporal expressions, called LTIMEX. Temporal description also supports further event identification and extends event description model, focusing at anchoring events in time, ordering events and reasoning about the persistence of events. We prepared the specification, which is designed to address these issues and we annotated all documents in Polish Corpus of Wroclaw University of Technology (KPWr) using our annotation guidelines.


Sign in / Sign up

Export Citation Format

Share Document