Arabic open information extraction system using dependency parsing

<p>There is a huge content of Arabic text available over online that requires an organization of these texts. As result, here are many applications of natural languages processing (NLP) that concerns with text organization. One of the is text classification (TC). TC helps to make dealing with unorganized text. However, it is easier to classify them into suitable class or labels. This paper is a survey of Arabic text classification. Also, it presents comparison among different methods in the classification of Arabic texts, where Arabic text is represented a complex text due to its vocabularies. Arabic language is one of the richest languages in the world, where it has many linguistic bases. The researche in Arabic language processing is very few compared to English. As a result, these problems represent challenges in the classification, and organization of specific Arabic text. Text classification (TC) helps to access the most documents, or information that has already classified into specific classes, or categories to one or more classes or categories. In addition, classification of documents facilitate search engine to decrease the amount of document to, and then to become easier to search and matching with queries.</p>

Download Full-text

Adapting Open Information Extraction to Domain-Specific Relations

AI Magazine ◽

10.1609/aimag.v31i3.2305 ◽

2010 ◽

Vol 31 (3) ◽

pp. 93 ◽

Cited By ~ 23

Author(s):

Stephen Soderland ◽

Brendan Roof ◽

Bo Qin ◽

Shi Xu ◽

Mausam ◽

...

Keyword(s):

Information Extraction ◽

Question Answering ◽

Free Text ◽

New Paradigm ◽

Target Domain ◽

Domain Specific ◽

Text Corpora ◽

Open Information Extraction ◽

Training Examples ◽

Domain Independent

Information extraction (IE) can identify a set of relations from free text to support question answering (QA). Until recently, IE systems were domain-specific and needed a combination of manual engineering and supervised learning to adapt to each target domain. A new paradigm, Open IE operates on large text corpora without any manual tagging of relations, and indeed without any pre-specified relations. Due to its open-domain and open-relation nature, Open IE is purely textual and is unable to relate the surface forms to an ontology, if known in advance. We explore the steps needed to adapt Open IE to a domain-specific ontology and demonstrate our approach of mapping domain-independent tuples to an ontology using domains from DARPA’s Machine Reading Project. Our system achieves precision over 0.90 from as few as 8 training examples for an NFL-scoring domain.

Download Full-text

Relation Extraction With Clause-Based Open Information Extraction

10.32920/17303840.v1 ◽

2021 ◽

Author(s):

Duc Thuan Vo

Keyword(s):

Natural Language ◽

Information Extraction ◽

Language Processing ◽

Question Answering ◽

Relation Extraction ◽

Linguistic Knowledge ◽

Dependency Parsing ◽

Grammatical Structure ◽

Open Information Extraction ◽

Wide Range

Information Extraction (IE) is one of the challenging tasks in natural language processing. The goal of relation extraction is to discover the relevant segments of information in large numbers of textual documents such that they can be used for structuring data. IE aims at discovering various semantic relations in natural language text and has a wide range of applications such as question answering, information retrieval, knowledge presentation, among others. This thesis proposes approaches for relation extraction with clause-based Open Information Extraction that use linguistic knowledge to capture a variety of information including semantic concepts, words, POS tags, shallow and full syntax, dependency parsing in rich syntactic and semantic structures.<div>Within the plethora of Open Information Extraction that focus on the use of syntactic and dependency parsing for the purposes of detecting relations, incoherent and uninformative relation extractions can still be found. The extracted relations can be erroneous at times and fail to have a meaningful interpretation. As such, we first propose refinements to the grammatical structure of syntactic and dependency parsing with clause structures and clause types in an effort to generate propositions that can be deemed as meaningful extractable relations. Second, considering that choosing the most efficient seeds are pivotal to the success of the bootstrapping process when extracting relations, we propose an extended clause-based pattern extraction method with selftraining for unsupervised relation extraction. The proposed self-training algorithm relies on the clause-based approach to extract a small set of seed instances in order to identify and derive new patterns. Third, we employ matrix factorization and collaborative filtering for relation extraction. To avoid the need for manually predefined schemas, we employ the notion of universal schemas that is formed as a collection of patterns derived from Open Information Extraction tools as well as from relation schemas of pre-existing datasets. While previous systems have trained relations only for entities, we exploit advanced features from relation characteristics such as clause types and semantic topics for predicting new relation instances. Finally, we present an event network representation for temporal and causal event relation extraction that benefits from existing Open IE systems to generate a set of triple relations that are then used to build an event network. The event network is bootstrapped by labeling the temporal and causal disposition of events that are directly linked to each other. The event network can be systematically traversed to identify temporal and causal relations between indirectly connected events. <br></div>

Download Full-text

The Use of Stemming in the Arabic Text and Its Impact on the Accuracy of Classification

Scientific Programming ◽

10.1155/2021/1367210 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Jaffar Atwan ◽

Mohammad Wedyan ◽

Qusay Bsoul ◽

Ahmad Hammadeen ◽

Ryan Alturki

Keyword(s):

Decision Tree ◽

High Performance ◽

Nearest Neighbor ◽

Arabic Language ◽

English Text ◽

Complex Nature ◽

Arabic Text ◽

K Nearest Neighbor ◽

The Internet Of Things

The ongoing growth in the vast amount of digital documents and other data in the Arabic language available online has increased the need for classification methods that can deal with the complex nature of such data. The classification of Arabic plays a large and important role in many modern applications and interferes with other sciences, which start from search engines and do not end with the Internet of Things. However, addressing the Arab classification errors with high performance is largely insufficient to deal with the huge quantities to reveal the classification of Arab documents; while some work was tackled out on the classification of the Arabic text, most of the research has focused on English text. The methods proposed for English are not suitable for Arabic as the morphology of the two languages differs substantially. Moreover, morphologically, the preprocessing of Arabic text is a particularly challenging task. In this study, three commonly used classification algorithms, namely, the K-nearest neighbor, Naïve Bayes, and decision tree, were implemented for Arabic text in order to assess their effectiveness with and without the use of a light stemmer in the preprocessing phase. In the experiment, a dataset from Agency France Persse (AFP) Arabic Newswire 2001 consisting of four categories and 800 files was classified using the three classifiers. The result showed that the decision tree with light stemmer had the best accuracy rate for classification algorithm with 93%.

Download Full-text

A survey of arabic text classification models

International Journal of Informatics and Communication Technology (IJ-ICT) ◽

10.11591/ijict.v8i1.pp25-28 ◽

2019 ◽

Vol 8 (1) ◽

pp. 25

Author(s):

Ahed M. F. Al Sbou

Keyword(s):

Language Processing ◽

Text Classification ◽

Arabic Language ◽

Arabic Text ◽

Classification Models ◽

Natural Languages ◽

Text Organization ◽

Arabic Text Classification ◽

Arabic Language Processing

<span>There is a huge content of Arabic text available over online that requires an organization of these texts. As result, here are many applications of natural languages processing (NLP) that concerns with text organization. One of the is text classification (TC). TC helps to make dealing with unorganized text. However, it is easier to classify them into suitable class or labels. This paper is a survey of Arabic text classification. Also, it presents comparison among different methods in the classification of Arabic texts, where Arabic text is represented a complex text due to its vocabularies. Arabic language is one of the richest languages in the world, where it has many linguistic bases. The research in Arabic language processing is very few compared to English. As a result, these problems represent challenges in the classification, and organization of specific Arabic text. Text classification (TC) helps to access the most documents, or information that has already classified into specific classes, or categories to one or more classes or categories. In addition, classification of documents facilitate search engine to decrease the amount of document to, and then to become easier to search and matching with queries.</span>

Download Full-text

Relation Extraction With Clause-Based Open Information Extraction

10.32920/17303840 ◽

2021 ◽

Author(s):

Duc Thuan Vo

Keyword(s):

Natural Language ◽

Information Extraction ◽

Language Processing ◽

Question Answering ◽

Relation Extraction ◽

Linguistic Knowledge ◽

Dependency Parsing ◽

Grammatical Structure ◽

Open Information Extraction ◽

Wide Range

Information Extraction (IE) is one of the challenging tasks in natural language processing. The goal of relation extraction is to discover the relevant segments of information in large numbers of textual documents such that they can be used for structuring data. IE aims at discovering various semantic relations in natural language text and has a wide range of applications such as question answering, information retrieval, knowledge presentation, among others. This thesis proposes approaches for relation extraction with clause-based Open Information Extraction that use linguistic knowledge to capture a variety of information including semantic concepts, words, POS tags, shallow and full syntax, dependency parsing in rich syntactic and semantic structures.<div>Within the plethora of Open Information Extraction that focus on the use of syntactic and dependency parsing for the purposes of detecting relations, incoherent and uninformative relation extractions can still be found. The extracted relations can be erroneous at times and fail to have a meaningful interpretation. As such, we first propose refinements to the grammatical structure of syntactic and dependency parsing with clause structures and clause types in an effort to generate propositions that can be deemed as meaningful extractable relations. Second, considering that choosing the most efficient seeds are pivotal to the success of the bootstrapping process when extracting relations, we propose an extended clause-based pattern extraction method with selftraining for unsupervised relation extraction. The proposed self-training algorithm relies on the clause-based approach to extract a small set of seed instances in order to identify and derive new patterns. Third, we employ matrix factorization and collaborative filtering for relation extraction. To avoid the need for manually predefined schemas, we employ the notion of universal schemas that is formed as a collection of patterns derived from Open Information Extraction tools as well as from relation schemas of pre-existing datasets. While previous systems have trained relations only for entities, we exploit advanced features from relation characteristics such as clause types and semantic topics for predicting new relation instances. Finally, we present an event network representation for temporal and causal event relation extraction that benefits from existing Open IE systems to generate a set of triple relations that are then used to build an event network. The event network is bootstrapped by labeling the temporal and causal disposition of events that are directly linked to each other. The event network can be systematically traversed to identify temporal and causal relations between indirectly connected events. <br></div>

Download Full-text

Task-Oriented Evaluation of Dependency Parsing with Open Information Extraction

Lecture Notes in Computer Science - Computational Processing of the Portuguese Language ◽

10.1007/978-3-319-99722-3_8 ◽

2018 ◽

pp. 77-82

Author(s):

Pablo Gamallo ◽

Marcos Garcia

Keyword(s):

Information Extraction ◽

Dependency Parsing ◽

Open Information Extraction ◽

Task Oriented

Download Full-text

Taʾwīl and the Meaning of the Text in the Arabic Language: An Introduction to Understanding the Qur'an

Journal of Qur anic Studies ◽

10.3366/jqs.2010.0119 ◽

2010 ◽

Vol 12 (1-2) ◽

pp. 337-314

Author(s):

ʿAbd Allāh Muḥammad al-Shāmī

Keyword(s):

Figurative Language ◽

Arabic Language ◽

Literary Text ◽

Arabic Text ◽

Comprehensive Understanding ◽

Original Meaning ◽

High Literature

The question of clarifying the meaning of a given Arabic text is a subtle one, especially as high literature texts can often be read in more than one way. Arabic is rich in figurative language and this can lead to variety in meaning, sometimes in ways that either adhere closely or diverge far from the ‘original’ meaning. In order to understand a fine literary text in Arabic, one must have a comprehensive understanding of the issue of taʾwīl, and the concept that multiplicity of meaning does not necessarily lead to contradiction. This article surveys the opinions of various literary critics and scholars of balāgha on this issue with a brief discussion of the concepts of tafsīr and sharḥ, which sometimes overlap with taʾwīl.

Download Full-text

Qudiyyatu Wuqu' al-Alfaz al-A'jamiyyat fi al-Qur'an al-Karim

SUHUF ◽

10.22548/shf.v2i1.95 ◽

2015 ◽

Vol 2 (1) ◽

pp. 11-29

Author(s):

Ahmad Akrom Malibary

Keyword(s):

Arabic Language ◽

Semitic Language ◽

Muslim Scholars

Do the non-Arabic words exist in the Qurâ€™an? There are two opinions of the Muslim scholars regarding this matter. Some reject that there are some non-Arabic words in the Qurâ€™an and some accept it. Each of the group has its own argument. Nevertheless, the strongest argument is that there are some non-Arabic words in the Qurâ€™an, considering that some of those words originated from outside Arabic language but have been absorbed and been treated as the Arabic words. It is not impossible that those words come from theÂ Semitic language which have been absorbed by some Semitic language branches and the Arabic language is one of the Semitic branches.

Download Full-text