Events Automatic Extraction from Arabic Texts

2020 ◽  
pp. 1686-1704
Author(s):  
Emna Hkiri ◽  
Souheyl Mallat ◽  
Mounir Zrigui

The event extraction task consists in determining and classifying events within an open-domain text. It is very new for the Arabic language, whereas it attained its maturity for some languages such as English and French. Events extraction was also proved to help Natural Language Processing tasks such as Information Retrieval and Question Answering, text mining, machine translation etc… to obtain a higher performance. In this article, we present an ongoing effort to build a system for event extraction from Arabic texts using Gate platform and other tools.

2016 ◽  
Vol 6 (1) ◽  
pp. 36-51 ◽  
Author(s):  
Emna Hkiri ◽  
Souheyl Mallat ◽  
Mounir Zrigui

The event extraction task consists in determining and classifying events within an open-domain text. It is very new for the Arabic language, whereas it attained its maturity for some languages such as English and French. Events extraction was also proved to help Natural Language Processing tasks such as Information Retrieval and Question Answering, text mining, machine translation etc… to obtain a higher performance. In this article, we present an ongoing effort to build a system for event extraction from Arabic texts using Gate platform and other tools.


2021 ◽  
Vol 47 (05) ◽  
Author(s):  
NGUYỄN CHÍ HIẾU

Knowledge Graphs are applied in many fields such as search engines, semantic analysis, and question answering in recent years. However, there are many obstacles for building knowledge graphs as methodologies, data and tools. This paper introduces a novel methodology to build knowledge graph from heterogeneous documents.  We use the methodologies of Natural Language Processing and deep learning to build this graph. The knowledge graph can use in Question answering systems and Information retrieval especially in Computing domain


Author(s):  
Saravanakumar Kandasamy ◽  
Aswani Kumar Cherukuri

Semantic similarity quantification between concepts is one of the inevitable parts in domains like Natural Language Processing, Information Retrieval, Question Answering, etc. to understand the text and their relationships better. Last few decades, many measures have been proposed by incorporating various corpus-based and knowledge-based resources. WordNet and Wikipedia are two of the Knowledge-based resources. The contribution of WordNet in the above said domain is enormous due to its richness in defining a word and all of its relationship with others. In this paper, we proposed an approach to quantify the similarity between concepts that exploits the synsets and the gloss definitions of different concepts using WordNet. Our method considers the gloss definitions, contextual words that are helping in defining a word, synsets of contextual word and the confidence of occurrence of a word in other word’s definition for calculating the similarity. The evaluation based on different gold standard benchmark datasets shows the efficiency of our system in comparison with other existing taxonomical and definitional measures.


Events and time are two major key terms in natural language processing due to the various event-oriented tasks these are become an essential terms in information extraction. In natural language processing and information extraction or retrieval event and time leads to several applications like text summaries, documents summaries, and question answering systems. In this paper, we present events-time graph as a new way of construction for event-time based information from text. In this event-time graph nodes are events, whereas edges represent the temporal and co-reference relations between events. In many of the previous researches of natural language processing mainly individually focused on extraction tasks and in domain-specific way but in this work we present extraction and representation of the relationship between events- time by representing with event time graph construction. Our overall system construction is in three-step process that performs event extraction, time extraction, and representing relation extraction. Each step is at a performance level comparable with the state of the art. We present Event extraction on MUC data corpus annotated with events mentions on which we train and evaluate our model. Next, we present time extraction the model of times tested for several news articles from Wikipedia corpus. Next is to represent event time relation by representation by next constructing event time graphs. Finally, we evaluate the overall quality of event graphs with the evaluation metrics and conclude the observations of the entire work


Author(s):  
Christian Aranha ◽  
Emmanuel Passos

This chapter integrates elements from Natural Language Processing, Information Retrieval, Data Mining and Text Mining to support competitive intelligence. It shows how text mining algorithms can attend to three important functionalities of CI: Filtering, Event Alerts and Search. Each of them can be mapped as a different pipeline of NLP tasks. The chapter goes in-depth in NLP techniques like spelling correction, stemming, augmenting, normalization, entity recognition, entity classification, acronyms and co-reference process. Each of them must be used in a specific moment to do a specific job. All these jobs will be integrated in a whole system. These will be ‘assembled’ in a manner specific to each application. The reader’s better understanding of the theories of NLP provided herein will result in a better ´assembly´.


Author(s):  
Isabella Gagliardi ◽  
Maria Teresa Artese

Keyword/keyphrase extraction is an important research activity in text mining, natural language processing, and information retrieval. A large number of algorithms, divided into supervised or unsupervised methods, have been designed and developed to solve the problem of automatic keyphrases extraction. The aim of the chapter is to critically discuss the unsupervised automatic keyphrases extraction algorithms, analyzing in depth their characteristics. The methods presented will be tested on different datasets, presenting in detail the data, the algorithms, and the different options tested in the runs. Moreover, most of the studies and experiments have been conducted on texts in English, while there are few experiments concerning other languages, such as Italian. Particular attention will be paid to the evaluation of the results of the methods in two different languages, English, and Italian.


2017 ◽  
Vol 11 (03) ◽  
pp. 345-371
Author(s):  
Avani Chandurkar ◽  
Ajay Bansal

With the inception of the World Wide Web, the amount of data present on the Internet is tremendous. This makes the task of navigating through this enormous amount of data quite difficult for the user. As users struggle to navigate through this wealth of information, the need for the development of an automated system that can extract the required information becomes urgent. This paper presents a Question Answering system to ease the process of information retrieval. Question Answering systems have been around for quite some time and are a sub-field of information retrieval and natural language processing. The task of any Question Answering system is to seek an answer to a free form factual question. The difficulty of pinpointing and verifying the precise answer makes question answering more challenging than simple information retrieval done by search engines. The research objective of this paper is to develop a novel approach to Question Answering based on a composition of conventional approaches of Information Retrieval (IR) and Natural Language processing (NLP). The focus is on using a structured and annotated knowledge base instead of an unstructured one. The knowledge base used here is DBpedia and the final system is evaluated on the Text REtrieval Conference (TREC) 2004 questions dataset.


Terminology ◽  
2004 ◽  
Vol 10 (1) ◽  
pp. 101-130 ◽  
Author(s):  
Michael Carl ◽  
Ecaterina Rascu ◽  
Johann Haller ◽  
Philippe Langlais

Term variation is an important issue in various applications of natural language processing (NLP) such as machine translation, information retrieval and text indexing. In this paper, we describe an ‘Abductive Terminological Database’ (ATDB) aiming to detect translations of terms and their variants in bilingual texts. We describe abduction as the process to infer specific term translation templates from multiple resources which have been induced from a bilingual text. We show that precision and recall of the ATDB increase when using more resources and when the resources interfere in a less restricted way. We discuss a way to feed back evaluation values into the induced resources thus allowing for weighted abduction which further enhances the precision of the tool.


Author(s):  
Wided Bakari ◽  
Patrice Bellot ◽  
Mahmoud Neji

With the development of electronic media and the heterogeneity of Arabic data on the Web, the idea of building a clean corpus for certain applications of natural language processing, including machine translation, information retrieval, question answer, become more and more pressing. In this manuscript, we seek to create and develop our own corpus of pair’s questions-texts. This constitution then will provide a better base for our experimentation step. Thus, we try to model this constitution by a method for Arabic insofar as it recovers texts from the web that could prove to be answers to our factual questions. To do this, we had to develop a java script that can extract from a given query a list of html pages. Then clean these pages to the extent of having a data base of texts and a corpus of pair’s question-texts. In addition, we give preliminary results of our proposal method. Some investigations for the construction of Arabic corpus are also presented in this document.


2014 ◽  
Vol 4 (3) ◽  
pp. 14-33 ◽  
Author(s):  
Vaishali Singh ◽  
Sanjay K. Dwivedi

With the huge amount of data available on web, it has turned out to be a fertile area for Question Answering (QA) research. Question answering, an instance of information retrieval research is at the cross road from several research communities such as, machine learning, statistical learning, natural language processing and pattern learning. In this paper, the authors survey the research in area of question answering with respect to different prospects of NLP, machine learning, statistical learning and pattern learning. Then they situate some of the prominent QA systems concerning these prospects and present a comparative study on the basis of question types.


Sign in / Sign up

Export Citation Format

Share Document