open information extraction
Recently Published Documents


TOTAL DOCUMENTS

129
(FIVE YEARS 63)

H-INDEX

10
(FIVE YEARS 2)

Author(s):  
Sally Mohamed Ali El-Morsy ◽  
Mahmoud Hussein ◽  
Hamdy M. Mousa

<p>Arabic is a Semitic language and one of the most natural languages distinguished by the richness in morphological enunciation and derivation. This special and complex nature makes extracting information from the Arabic language difficult and always needs improvement. Open information extraction systems (OIE) have been emerged and used in different languages, especially in English. However, it has almost not been used for the Arabic language. Accordingly, this paper aims to introduce an OIE system that extracts the relation tuple from Arabic web text, exploiting Arabic dependency parsing and thinking carefully about all possible text relations. Based on clause types' propositions as extractable relations and constituents' grammatical functions, the identities of corresponding clause types are established. The proposed system named Arabic open information extraction(AOIE) can extract highly scalable Arabic text relations while being domain independent. Implementing the proposed system handles the problem using supervised strategies while the system relies on unsupervised extraction strategies. Also, the system has been implemented in several domains to avoid information extraction in a specific field. The results prove that the system achieves high efficiency in extracting clauses from large amounts of text.</p>


Data ◽  
2021 ◽  
Vol 7 (1) ◽  
pp. 3
Author(s):  
Nikolaos Panagiotou ◽  
Antonia Saravanou ◽  
Dimitrios Gunopulos

News articles generated by online media are a major source of information. In this work, we present News Monitor, a framework that automatically collects news articles from a wide variety of online news portals and performs various analysis tasks. The framework initially identifies fresh news (first stories) and clusters articles about the same incidents. For every story, at first, it extracts all of the corresponding triples and, then, it creates a knowledge base (KB) using open information extraction techniques. This knowledge base is then used to create a summary for the user. News Monitor allows for the users to use it as a search engine, ask their questions in their natural language and receive answers that have been created by the state-of-the-art framework BERT. In addition, News Monitor crawls the Twitter stream using a dynamic set of “trending” keywords in order to retrieve all messages relevant to the news. The framework is distributed, online and performs analysis in real-time. According to the evaluation results, the fake news detection techniques utilized by News Monitor allow for a F-measure of 82% in the rumor identification task and an accuracy of 92% in the stance detection tasks. The major contribution of this work can be summarized as a novel real-time and scalable architecture that combines various effective techniques under a news analysis framework.


2021 ◽  
Author(s):  
Duc Thuan Vo

Information Extraction (IE) is one of the challenging tasks in natural language processing. The goal of relation extraction is to discover the relevant segments of information in large numbers of textual documents such that they can be used for structuring data. IE aims at discovering various semantic relations in natural language text and has a wide range of applications such as question answering, information retrieval, knowledge presentation, among others. This thesis proposes approaches for relation extraction with clause-based Open Information Extraction that use linguistic knowledge to capture a variety of information including semantic concepts, words, POS tags, shallow and full syntax, dependency parsing in rich syntactic and semantic structures.<div>Within the plethora of Open Information Extraction that focus on the use of syntactic and dependency parsing for the purposes of detecting relations, incoherent and uninformative relation extractions can still be found. The extracted relations can be erroneous at times and fail to have a meaningful interpretation. As such, we first propose refinements to the grammatical structure of syntactic and dependency parsing with clause structures and clause types in an effort to generate propositions that can be deemed as meaningful extractable relations. Second, considering that choosing the most efficient seeds are pivotal to the success of the bootstrapping process when extracting relations, we propose an extended clause-based pattern extraction method with selftraining for unsupervised relation extraction. The proposed self-training algorithm relies on the clause-based approach to extract a small set of seed instances in order to identify and derive new patterns. Third, we employ matrix factorization and collaborative filtering for relation extraction. To avoid the need for manually predefined schemas, we employ the notion of universal schemas that is formed as a collection of patterns derived from Open Information Extraction tools as well as from relation schemas of pre-existing datasets. While previous systems have trained relations only for entities, we exploit advanced features from relation characteristics such as clause types and semantic topics for predicting new relation instances. Finally, we present an event network representation for temporal and causal event relation extraction that benefits from existing Open IE systems to generate a set of triple relations that are then used to build an event network. The event network is bootstrapped by labeling the temporal and causal disposition of events that are directly linked to each other. The event network can be systematically traversed to identify temporal and causal relations between indirectly connected events. <br></div>


2021 ◽  
Author(s):  
Duc Thuan Vo

Information Extraction (IE) is one of the challenging tasks in natural language processing. The goal of relation extraction is to discover the relevant segments of information in large numbers of textual documents such that they can be used for structuring data. IE aims at discovering various semantic relations in natural language text and has a wide range of applications such as question answering, information retrieval, knowledge presentation, among others. This thesis proposes approaches for relation extraction with clause-based Open Information Extraction that use linguistic knowledge to capture a variety of information including semantic concepts, words, POS tags, shallow and full syntax, dependency parsing in rich syntactic and semantic structures.<div>Within the plethora of Open Information Extraction that focus on the use of syntactic and dependency parsing for the purposes of detecting relations, incoherent and uninformative relation extractions can still be found. The extracted relations can be erroneous at times and fail to have a meaningful interpretation. As such, we first propose refinements to the grammatical structure of syntactic and dependency parsing with clause structures and clause types in an effort to generate propositions that can be deemed as meaningful extractable relations. Second, considering that choosing the most efficient seeds are pivotal to the success of the bootstrapping process when extracting relations, we propose an extended clause-based pattern extraction method with selftraining for unsupervised relation extraction. The proposed self-training algorithm relies on the clause-based approach to extract a small set of seed instances in order to identify and derive new patterns. Third, we employ matrix factorization and collaborative filtering for relation extraction. To avoid the need for manually predefined schemas, we employ the notion of universal schemas that is formed as a collection of patterns derived from Open Information Extraction tools as well as from relation schemas of pre-existing datasets. While previous systems have trained relations only for entities, we exploit advanced features from relation characteristics such as clause types and semantic topics for predicting new relation instances. Finally, we present an event network representation for temporal and causal event relation extraction that benefits from existing Open IE systems to generate a set of triple relations that are then used to build an event network. The event network is bootstrapped by labeling the temporal and causal disposition of events that are directly linked to each other. The event network can be systematically traversed to identify temporal and causal relations between indirectly connected events. <br></div>


2021 ◽  
Author(s):  
Vinicius dos Santos ◽  
Patrick R. Silva ◽  
Erica F. Souza ◽  
Katia. R. Felizardo ◽  
Willian M. Watanabe ◽  
...  

Author(s):  
Bowen Yu ◽  
Zhenyu Zhang ◽  
Jiawei Sheng ◽  
Tingwen Liu ◽  
Yubin Wang ◽  
...  

Author(s):  
Zhongguo Yang ◽  
Mingzhu Zhang ◽  
Zhongmei Zhang ◽  
Han Li ◽  
Chen Liu ◽  
...  

Information service is always a hot topic especially when the Web is accessible anywhere. In university, lecture information is very important for students and teachers who want to take part in academic meetings. Therefore, lecture news extraction is an important and imperative task. Many open information extraction methods have been proposed, but due to the high heterogeneity of websites, this task is still a challenge. In this paper, we propose a method based on fusing multiple features to locate lecture news on the university website. These features include the linked relationship between parent webpage and child webpages, the visual similarity, and the semantics of webpages. Additionally, this paper provides an information service based on a main content extraction algorithm for extracting the lecture information. Stable and invariant features enable the proposed method to adapt to various kinds of campus websites. The experiments conducted on 50 websites show the effectiveness and efficiency of the provided service.


Sign in / Sign up

Export Citation Format

Share Document