Constructing Personal Knowledge Base: Automatic Key-Phrase Extraction from Multiple-Domain Web Pages

Background: As a known key phrase extraction algorithm, TextRank is an analogue of PageRank algorithm, which relied heavily on the statistics of term frequency in the manner of co-occurrence analysis. Objective: The frequency-based characteristic made it a neck-bottle for performance enhancement, and various improved TextRank algorithms were proposed in the recent years. Most of improvements incorporated semantic information into key phrase extraction algorithm and achieved improvement. Method: In this research, taking both syntactic and semantic information into consideration, we integrated syntactic tree algorithm and word embedding and put forward an algorithm of Word Embedding and Syntactic Information Algorithm (WESIA), which improved the accuracy of the TextRank algorithm. Results: By applying our method on a self-made test set and a public test set, the result implied that the proposed unsupervised key phrase extraction algorithm outperformed the other algorithms to some extent.

Download Full-text

Automatic Key-phrase Extraction to Support the Understanding of Infrastructure Disaster Resilience

10.22260/isarc2019/0171 ◽

2019 ◽

Author(s):

Xuan Lv ◽

Syed Ahnaf Morshed ◽

Lu Zhang

Keyword(s):

Disaster Resilience ◽

Phrase Extraction ◽

Key Phrase Extraction

Download Full-text

Automating Key Phrase Extraction from Fault Logs to Support Post-Inspection Repair of Software Requirements

14th Innovations in Software Engineering Conference (formerly known as India Software Engineering Conference) ◽

10.1145/3452383.3452386 ◽

2021 ◽

Author(s):

Maninder Singh ◽

Gursimran Walia

Keyword(s):

Software Requirements ◽

Phrase Extraction ◽

Key Phrase Extraction

Download Full-text

Automatic Ontology Learning from Multiple Knowledge Sources of Text

International Journal of Intelligent Information Technologies ◽

10.4018/ijiit.2018040101 ◽

2018 ◽

Vol 14 (2) ◽

pp. 1-21 ◽

Cited By ~ 2

Author(s):

B Sathiya ◽

T.V. Geetha

Keyword(s):

Knowledge Base ◽

Web Pages ◽

Knowledge Sources ◽

Statistical Measure ◽

Ontology Learning ◽

Web Page ◽

Semantic Query ◽

Probabilistic Knowledge ◽

Discovery Algorithms ◽

Different Sources

The prime textual sources used for ontology learning are a domain corpus and dynamic large text from web pages. The first source is limited and possibly outdated, while the second is uncertain. To overcome these shortcomings, a novel ontology learning methodology is proposed to utilize the different sources of text such as a corpus, web pages and the massive probabilistic knowledge base, Probase, for an effective automated construction of ontology. Specifically, to discover taxonomical relations among the concept of the ontology, a new web page based two-level semantic query formation methodology using the lexical syntactic patterns (LSP) and a novel scoring measure: Fitness built on Probase are proposed. Also, a syntactic and statistical measure called COS (Co-occurrence Strength) scoring, and Domain and Range-NTRD (Non-Taxonomical Relation Discovery) algorithms are proposed to accurately identify non-taxonomical relations(NTR) among concepts, using evidence from the corpus and web pages.

Download Full-text