Events Automatic Extraction from Arabic Texts

The event extraction task consists in determining and classifying events within an open-domain text. It is very new for the Arabic language, whereas it attained its maturity for some languages such as English and French. Events extraction was also proved to help Natural Language Processing tasks such as Information Retrieval and Question Answering, text mining, machine translation etc… to obtain a higher performance. In this article, we present an ongoing effort to build a system for event extraction from Arabic texts using Gate platform and other tools.

Download Full-text

BUILD KNOWLEDGE GRAPH FROM HETEROGENEOUS DOCUMENTS

Journal of Science and Technology - IUH ◽

10.46242/jst-iuh.v47i05.761 ◽

2021 ◽

Vol 47 (05) ◽

Author(s):

NGUYỄN CHÍ HIẾU

Keyword(s):

Information Retrieval ◽

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Question Answering ◽

Semantic Analysis ◽

Knowledge Graph ◽

Question Answering Systems ◽

Knowledge Graphs

Knowledge Graphs are applied in many fields such as search engines, semantic analysis, and question answering in recent years. However, there are many obstacles for building knowledge graphs as methodologies, data and tools. This paper introduces a novel methodology to build knowledge graph from heterogeneous documents. We use the methodologies of Natural Language Processing and deep learning to build this graph. The knowledge graph can use in Question answering systems and Information retrieval especially in Computing domain

Download Full-text

LIS4: Lesk Inspired Sense Specific Semantic Similarity using WordNet

Journal of Information & Knowledge Management ◽

10.1142/s0219649221500064 ◽

2021 ◽

pp. 2150006

Author(s):

Saravanakumar Kandasamy ◽

Aswani Kumar Cherukuri

Keyword(s):

Information Retrieval ◽

Natural Language Processing ◽

Natural Language ◽

Semantic Similarity ◽

Language Processing ◽

Gold Standard ◽

Question Answering ◽

Knowledge Based ◽

Benchmark Datasets ◽

Processing Information

Semantic similarity quantification between concepts is one of the inevitable parts in domains like Natural Language Processing, Information Retrieval, Question Answering, etc. to understand the text and their relationships better. Last few decades, many measures have been proposed by incorporating various corpus-based and knowledge-based resources. WordNet and Wikipedia are two of the Knowledge-based resources. The contribution of WordNet in the above said domain is enormous due to its richness in defining a word and all of its relationship with others. In this paper, we proposed an approach to quantify the similarity between concepts that exploits the synsets and the gloss definitions of different concepts using WordNet. Our method considers the gloss definitions, contextual words that are helping in defining a word, synsets of contextual word and the confidence of occurrence of a word in other word’s definition for calculating the similarity. The evaluation based on different gold standard benchmark datasets shows the efficiency of our system in comparison with other existing taxonomical and definitional measures.

Download Full-text

Building Graph for Events and Time in Natural Language Text

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.c8419.019320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 581-586

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Information Extraction ◽

Language Processing ◽

Question Answering ◽

Relation Extraction ◽

Event Extraction ◽

Event Time ◽

Time Graph ◽

Question Answering Systems

Events and time are two major key terms in natural language processing due to the various event-oriented tasks these are become an essential terms in information extraction. In natural language processing and information extraction or retrieval event and time leads to several applications like text summaries, documents summaries, and question answering systems. In this paper, we present events-time graph as a new way of construction for event-time based information from text. In this event-time graph nodes are events, whereas edges represent the temporal and co-reference relations between events. In many of the previous researches of natural language processing mainly individually focused on extraction tasks and in domain-specific way but in this work we present extraction and representation of the relationship between events- time by representing with event time graph construction. Our overall system construction is in three-step process that performs event extraction, time extraction, and representing relation extraction. Each step is at a performance level comparable with the state of the art. We present Event extraction on MUC data corpus annotated with events mentions on which we train and evaluate our model. Next, we present time extraction the model of times tested for several news articles from Wikipedia corpus. Next is to represent event time relation by representation by next constructing event time graphs. Finally, we evaluate the overall quality of event graphs with the evaluation metrics and conclude the observations of the entire work

Download Full-text

Automatic NLP for Competitive Intelligence

Emerging Technologies of Text Mining ◽

10.4018/978-1-59904-373-9.ch003 ◽

2008 ◽

pp. 54-76 ◽

Cited By ~ 1

Author(s):

Christian Aranha ◽

Emmanuel Passos

Keyword(s):

Data Mining ◽

Information Retrieval ◽

Natural Language Processing ◽

Text Mining ◽

Language Processing ◽

Entity Recognition ◽

Competitive Intelligence ◽

Reference Process ◽

Processing Information ◽

Mining Algorithms

This chapter integrates elements from Natural Language Processing, Information Retrieval, Data Mining and Text Mining to support competitive intelligence. It shows how text mining algorithms can attend to three important functionalities of CI: Filtering, Event Alerts and Search. Each of them can be mapped as a different pipeline of NLP tasks. The chapter goes in-depth in NLP techniques like spelling correction, stemming, augmenting, normalization, entity recognition, entity classification, acronyms and co-reference process. Each of them must be used in a specific moment to do a specific job. All these jobs will be integrated in a whole system. These will be ‘assembled’ in a manner specific to each application. The reader’s better understanding of the theories of NLP provided herein will result in a better ´assembly´.

Download Full-text

Unsupervised Automatic Keyphrases Extraction on Italian Datasets

Encyclopedia of Information Science and Technology, Fifth Edition - Advances in Information Quality and Management ◽

10.4018/978-1-7998-3479-3.ch009 ◽

2021 ◽

pp. 107-126

Author(s):

Isabella Gagliardi ◽

Maria Teresa Artese

Keyword(s):

Information Retrieval ◽

Natural Language Processing ◽

Text Mining ◽

Natural Language ◽

Language Processing ◽

Important Research ◽

Keyphrase Extraction ◽

Research Activity ◽

Unsupervised Methods

Keyword/keyphrase extraction is an important research activity in text mining, natural language processing, and information retrieval. A large number of algorithms, divided into supervised or unsupervised methods, have been designed and developed to solve the problem of automatic keyphrases extraction. The aim of the chapter is to critically discuss the unsupervised automatic keyphrases extraction algorithms, analyzing in depth their characteristics. The methods presented will be tested on different datasets, presenting in detail the data, the algorithms, and the different options tested in the runs. Moreover, most of the studies and experiments have been conducted on texts in English, while there are few experiments concerning other languages, such as Italian. Particular attention will be paid to the evaluation of the results of the methods in two different languages, English, and Italian.

Download Full-text

A Composite Natural Language Processing and Information Retrieval Approach to Question Answering Using a Structured Knowledge Base

International Journal of Semantic Computing ◽

10.1142/s1793351x17400141 ◽

2017 ◽

Vol 11 (03) ◽

pp. 345-371

Author(s):

Avani Chandurkar ◽

Ajay Bansal

Keyword(s):

Information Retrieval ◽

Natural Language Processing ◽

Natural Language ◽

Knowledge Base ◽

Language Processing ◽

Question Answering ◽

Automated System ◽

Free Form ◽

Question Answering System ◽

Novel Approach

With the inception of the World Wide Web, the amount of data present on the Internet is tremendous. This makes the task of navigating through this enormous amount of data quite difficult for the user. As users struggle to navigate through this wealth of information, the need for the development of an automated system that can extract the required information becomes urgent. This paper presents a Question Answering system to ease the process of information retrieval. Question Answering systems have been around for quite some time and are a sub-field of information retrieval and natural language processing. The task of any Question Answering system is to seek an answer to a free form factual question. The difficulty of pinpointing and verifying the precise answer makes question answering more challenging than simple information retrieval done by search engines. The research objective of this paper is to develop a novel approach to Question Answering based on a composition of conventional approaches of Information Retrieval (IR) and Natural Language processing (NLP). The focus is on using a structured and annotated knowledge base instead of an unstructured one. The knowledge base used here is DBpedia and the final system is evaluated on the Text REtrieval Conference (TREC) 2004 questions dataset.

Download Full-text

Abducing term variant translations in aligned texts

Terminology ◽

10.1075/term.10.1.06car ◽

2004 ◽

Vol 10 (1) ◽

pp. 101-130 ◽

Cited By ~ 2

Author(s):

Michael Carl ◽

Ecaterina Rascu ◽

Johann Haller ◽

Philippe Langlais

Keyword(s):

Information Retrieval ◽

Natural Language Processing ◽

Natural Language ◽

Machine Translation ◽

Language Processing ◽

Text Indexing ◽

Multiple Resources ◽

Specific Term ◽

Term Variation ◽

Bilingual Text

Term variation is an important issue in various applications of natural language processing (NLP) such as machine translation, information retrieval and text indexing. In this paper, we describe an ‘Abductive Terminological Database’ (ATDB) aiming to detect translations of terms and their variants in bilingual texts. We describe abduction as the process to infer specific term translation templates from multiple resources which have been induced from a bilingual text. We show that precision and recall of the ATDB increase when using more resources and when the resources interfere in a less restricted way. We discuss a way to feed back evaluation values into the induced resources thus allowing for weighted abduction which further enhances the precision of the tool.

Download Full-text

A Preliminary Study for Building an Arabic Corpus of Pair Questions-texts from the Web: AQA-WebCorp

International Journal of Recent Contributions from Engineering Science & IT (iJES) ◽

10.3991/ijes.v4i2.5345 ◽

2016 ◽

Vol 4 (2) ◽

pp. 38

Author(s):

Wided Bakari ◽

Patrice Bellot ◽

Mahmoud Neji

Keyword(s):

Information Retrieval ◽

Natural Language Processing ◽

Natural Language ◽

Data Base ◽

Machine Translation ◽

Language Processing ◽

Electronic Media ◽

Preliminary Results ◽

Preliminary Study ◽

The Web

With the development of electronic media and the heterogeneity of Arabic data on the Web, the idea of building a clean corpus for certain applications of natural language processing, including machine translation, information retrieval, question answer, become more and more pressing. In this manuscript, we seek to create and develop our own corpus of pair’s questions-texts. This constitution then will provide a better base for our experimentation step. Thus, we try to model this constitution by a method for Arabic insofar as it recovers texts from the web that could prove to be answers to our factual questions. To do this, we had to develop a java script that can extract from a given query a list of html pages. Then clean these pages to the extent of having a data base of texts and a corpus of pair’s question-texts. In addition, we give preliminary results of our proposal method. Some investigations for the construction of Arabic corpus are also presented in this document.

Download Full-text

Question Answering

International Journal of Information Retrieval Research ◽

10.4018/ijirr.2014070102 ◽

2014 ◽

Vol 4 (3) ◽

pp. 14-33 ◽

Cited By ~ 1

Author(s):

Vaishali Singh ◽

Sanjay K. Dwivedi

Keyword(s):

Machine Learning ◽

Information Retrieval ◽

Natural Language Processing ◽

Statistical Learning ◽

Language Processing ◽

Question Answering ◽

Research Question ◽

Pattern Learning ◽

Huge Amount ◽

Fertile Area

With the huge amount of data available on web, it has turned out to be a fertile area for Question Answering (QA) research. Question answering, an instance of information retrieval research is at the cross road from several research communities such as, machine learning, statistical learning, natural language processing and pattern learning. In this paper, the authors survey the research in area of question answering with respect to different prospects of NLP, machine learning, statistical learning and pattern learning. Then they situate some of the prominent QA systems concerning these prospects and present a comparative study on the basis of question types.

Download Full-text