scholarly journals Temporal Expressions in Polish Corpus KPWr

2015 ◽  
pp. 293-317
Author(s):  
Jan Kocoń ◽  
Michał Marcińczuk ◽  
Marcin Oleksy ◽  
Tomasz Bernaś ◽  
Michał Wolski

Temporal Expressions in Polish Corpus KPWrThis article presents the result of the recent research in the interpretation of Polish expressions that refer to time. These expressions are the source of information when something happens, how often something occurs or how long something lasts. Temporal information, which can be extracted from text automatically, plays significant role in many information extraction systems, such as question answering, discourse analysis, event recognition and many more. We prepared PLIMEX — a broad description of Polish temporal expressions with annotation guidelines, based on the state-of-the-art solutions for English, mainly TimeML specification. We also adapted the solution to capture the local semantics of temporal expressions, called LTIMEX. Temporal description also supports further event identification and extends event description model, focusing at anchoring events in time, ordering events and reasoning about the persistence of events. We prepared the specification, which is designed to address these issues and we annotated all documents in Polish Corpus of Wroclaw University of Technology (KPWr) using our annotation guidelines.

Author(s):  
Xiao Yang ◽  
Madian Khabsa ◽  
Miaosen Wang ◽  
Wei Wang ◽  
Ahmed Hassan Awadallah ◽  
...  

Community-based question answering (CQA) websites represent an important source of information. As a result, the problem of matching the most valuable answers to their corresponding questions has become an increasingly popular research topic. We frame this task as a binary (relevant/irrelevant) classification problem, and present an adversarial training framework to alleviate label imbalance issue. We employ a generative model to iteratively sample a subset of challenging negative samples to fool our classification model. Both models are alternatively optimized using REINFORCE algorithm. The proposed method is completely different from previous ones, where negative samples in training set are directly used or uniformly down-sampled. Further, we propose using Multi-scale Matching which explicitly inspects the correlation between words and ngrams of different levels of granularity. We evaluate the proposed method on SemEval 2016 and SemEval 2017 datasets and achieves state-of-the-art or similar performance.


2017 ◽  
Vol 01 (01) ◽  
pp. 1630002 ◽  
Author(s):  
Fattane Zarrinkalam ◽  
Ebrahim Bagheri

Social networks enable users to freely communicate with each other and share their recent news, ongoing activities or views about different topics. As a result, they can be seen as a potentially viable source of information to understand the current emerging topics/events. The ability to model emerging topics is a substantial step to monitor and summarize the information originating from social sources. Applying traditional methods for event detection which are often proposed for processing large, formal and structured documents, are less effective, due to the short length, noisiness and informality of the social posts. Recent event detection techniques address these challenges by exploiting the opportunities behind abundant information available in social networks. This article provides an overview of the state of the art in event detection from social networks.


2016 ◽  
Vol 23 (3) ◽  
pp. 385-418 ◽  
Author(s):  
JAN KOCOŃ ◽  
MICHAŁ MARCIŃCZUK

AbstractA key challenge of the Information Extraction in Natural Language Processing is the ability to recognise and classify temporal expressions (timexes). It is a crucial source of information about when something happens, how often something occurs or how long something lasts. Timexes extracted automatically from text, play a major role in many Information Extraction systems, such as question answering or event recognition. We prepared a broad specification of Polish timexes – PLIMEX. It is based on the state-of-the-art annotation guidelines for English, mainly TIMEX2 and TIMEX3 (a part of TimeML – Markup Language for Temporal and Event Expressions). We have expanded our specification for a description of the local meaning of timexes, based on LTIMEX annotation guidelines for English. Temporal description supports further event identification and extends event description model, focussing on anchoring events in time, events ordering and reasoning about the persistence of events. We prepared the specification, which is designed to address these issues, and we annotated all documents in Polish Corpus of Wroclaw University of Technology (KPWr) using our annotation guidelines. We also adapted our Liner2 machine learning system to recognise Polish timexes and we propose two-phase method to select a subset of features for Conditional Random Fields sequence labelling method. This article presents the whole process of corpus annotation, evaluation of inter-annotator agreement, extending Liner2 system with new features and evaluation of the recognition models before and after feature selection with the analysis of statistical significance of differences. Liner2 with presented models is available as open source software under the GNU General Public License.


2013 ◽  
Vol 321-324 ◽  
pp. 2013-2016
Author(s):  
Dan Dan Zhao ◽  
Liang Song ◽  
Qi Wei Yang

Temporal Expressions are important structures in natural language. Temporal information is useful in many NLP applications, such as information extraction, question answering and summarization. In this paper, we present an approach for extracting temporal expressions from Chinese texts. Using LEX parser, defining grammar rules of temporal expressions as LEX source program through the cooperation of LEX and C compiler, get the temporal expressions from unprocessed Chinese corpus. Our experiments demonstrate that on the TempEval 2010 Chinese corpus this approach is valid with the F1-measure values of 93.97%.


Author(s):  
Xinmeng Li ◽  
Mamoun Alazab ◽  
Qian Li ◽  
Keping Yu ◽  
Quanjun Yin

AbstractKnowledge graph question answering is an important technology in intelligent human–robot interaction, which aims at automatically giving answer to human natural language question with the given knowledge graph. For the multi-relation question with higher variety and complexity, the tokens of the question have different priority for the triples selection in the reasoning steps. Most existing models take the question as a whole and ignore the priority information in it. To solve this problem, we propose question-aware memory network for multi-hop question answering, named QA2MN, to update the attention on question timely in the reasoning process. In addition, we incorporate graph context information into knowledge graph embedding model to increase the ability to represent entities and relations. We use it to initialize the QA2MN model and fine-tune it in the training process. We evaluate QA2MN on PathQuestion and WorldCup2014, two representative datasets for complex multi-hop question answering. The result demonstrates that QA2MN achieves state-of-the-art Hits@1 accuracy on the two datasets, which validates the effectiveness of our model.


2020 ◽  
pp. 1-21 ◽  
Author(s):  
Clément Dalloux ◽  
Vincent Claveau ◽  
Natalia Grabar ◽  
Lucas Emanuel Silva Oliveira ◽  
Claudia Maria Cabral Moro ◽  
...  

Abstract Automatic detection of negated content is often a prerequisite in information extraction systems in various domains. In the biomedical domain especially, this task is important because negation plays an important role. In this work, two main contributions are proposed. First, we work with languages which have been poorly addressed up to now: Brazilian Portuguese and French. Thus, we developed new corpora for these two languages which have been manually annotated for marking up the negation cues and their scope. Second, we propose automatic methods based on supervised machine learning approaches for the automatic detection of negation marks and of their scopes. The methods show to be robust in both languages (Brazilian Portuguese and French) and in cross-domain (general and biomedical languages) contexts. The approach is also validated on English data from the state of the art: it yields very good results and outperforms other existing approaches. Besides, the application is accessible and usable online. We assume that, through these issues (new annotated corpora, application accessible online, and cross-domain robustness), the reproducibility of the results and the robustness of the NLP applications will be augmented.


2008 ◽  
Vol 96 (3) ◽  
pp. 512-531 ◽  
Author(s):  
W.M. Ahmed ◽  
S.J. Leavesley ◽  
B. Rajwa ◽  
M.N. Ayyaz ◽  
A. Ghafoor ◽  
...  

Author(s):  
Lin Qiu ◽  
Hao Zhou ◽  
Yanru Qu ◽  
Weinan Zhang ◽  
Suoheng Li ◽  
...  

The article describes visitors’ interpretation and understanding of the narrative about the Holocaust in the United States Holocaust Memorial Museum. Visitors comments were the material for the analysis, used methodology was discourse analysis. Different discourses were singled out in visitors’ comments. Differences between visitors’ comments given in different years were ascertained. Age differences and differences among narratives of various groups of the Museum visitors were shown. It can be concluded that the Museum fulfills various functions. Besides being a place of commemoration, it accomplishes its educational function and serves as a source of information about the Holocaust.


Sign in / Sign up

Export Citation Format

Share Document