Constructing Personal Knowledge Base: Automatic Key-Phrase Extraction from Multiple-Domain Web Pages

Author(s):  
Yin-Fu Huang ◽  
Cin-Siang Ciou
Author(s):  
Sheng Zhang ◽  
Qi Luo ◽  
Yukun Feng ◽  
Ke Ding ◽  
Daniela Gifu ◽  
...  

Background: As a known key phrase extraction algorithm, TextRank is an analogue of PageRank algorithm, which relied heavily on the statistics of term frequency in the manner of co-occurrence analysis. Objective: The frequency-based characteristic made it a neck-bottle for performance enhancement, and various improved TextRank algorithms were proposed in the recent years. Most of improvements incorporated semantic information into key phrase extraction algorithm and achieved improvement. Method: In this research, taking both syntactic and semantic information into consideration, we integrated syntactic tree algorithm and word embedding and put forward an algorithm of Word Embedding and Syntactic Information Algorithm (WESIA), which improved the accuracy of the TextRank algorithm. Results: By applying our method on a self-made test set and a public test set, the result implied that the proposed unsupervised key phrase extraction algorithm outperformed the other algorithms to some extent.


Author(s):  
B Sathiya ◽  
T.V. Geetha

The prime textual sources used for ontology learning are a domain corpus and dynamic large text from web pages. The first source is limited and possibly outdated, while the second is uncertain. To overcome these shortcomings, a novel ontology learning methodology is proposed to utilize the different sources of text such as a corpus, web pages and the massive probabilistic knowledge base, Probase, for an effective automated construction of ontology. Specifically, to discover taxonomical relations among the concept of the ontology, a new web page based two-level semantic query formation methodology using the lexical syntactic patterns (LSP) and a novel scoring measure: Fitness built on Probase are proposed. Also, a syntactic and statistical measure called COS (Co-occurrence Strength) scoring, and Domain and Range-NTRD (Non-Taxonomical Relation Discovery) algorithms are proposed to accurately identify non-taxonomical relations(NTR) among concepts, using evidence from the corpus and web pages.


2018 ◽  
Vol 48 (3) ◽  
pp. 496-514 ◽  
Author(s):  
E. Laxmi Lydia ◽  
P. Krishna Kumar ◽  
K. Shankar ◽  
S. K. Lakshmanaprabu ◽  
R. M. Vidhyavathi ◽  
...  

SoftwareX ◽  
2020 ◽  
Vol 11 ◽  
pp. 100411 ◽  
Author(s):  
Aleksandr Yu. Yurin ◽  
Nikita O. Dorodnykh

Sign in / Sign up

Export Citation Format

Share Document