Key phrase Extraction by Improving TextRank with an Integration of Word Embedding and Syntactic Information

Background: As a known key phrase extraction algorithm, TextRank is an analogue of PageRank algorithm, which relied heavily on the statistics of term frequency in the manner of co-occurrence analysis. Objective: The frequency-based characteristic made it a neck-bottle for performance enhancement, and various improved TextRank algorithms were proposed in the recent years. Most of improvements incorporated semantic information into key phrase extraction algorithm and achieved improvement. Method: In this research, taking both syntactic and semantic information into consideration, we integrated syntactic tree algorithm and word embedding and put forward an algorithm of Word Embedding and Syntactic Information Algorithm (WESIA), which improved the accuracy of the TextRank algorithm. Results: By applying our method on a self-made test set and a public test set, the result implied that the proposed unsupervised key phrase extraction algorithm outperformed the other algorithms to some extent.

Download Full-text

Smart web content bookmarking with ANN based key phrase extraction algorithm

2014 14th International Conference on Advances in ICT for Emerging Regions (ICTer) ◽

10.1109/icter.2014.7083906 ◽

2014 ◽

Cited By ~ 1

Author(s):

B. M. Thosini Kumarika ◽

N. G. J. Dias

Keyword(s):

Web Content ◽

Phrase Extraction ◽

Extraction Algorithm ◽

Key Phrase Extraction

Download Full-text

THE MECHANICS OF THE PRESENTATION MINING FRAMEWORK

Jurnal Teknologi ◽

10.11113/jt.v78.9545 ◽

2016 ◽

Vol 78 (8-2) ◽

Author(s):

Vinothini Kasinathana ◽

Masrah Azrifah Azmi Murad ◽

Rahmita Wirza Rahmat ◽

Evi Indriasari Mansor ◽

Aida Mustapha

Keyword(s):

Domain Ontology ◽

Extraction System ◽

Mining System ◽

The Core ◽

Phrase Extraction ◽

Extraction Algorithm ◽

Key Phrase Extraction ◽

Key Phrases ◽

Two Stages

This paper presents the mechanics of a presentation mining system that mines keywords and key phrases from a collection of PowerPoint slides and generates a mind map using the extracted words and phrases. The core of presentation mining lies in two stages; ranking the potential phrases and extracting the keywords and key phrases. The keywords and key phrases form a mind map, which is then evaluated against a domain ontology. The results of recall and precision are also compared between the existing key phrase extraction system called the KP-Miner and the proposed presentation mining system. The key phrase extraction algorithm by the proposed presentation mining system achieved higher recall and precision than KP-Miner, hence producing a more accurate visualization of the PowerPoint slides in the form of mind map.

Download Full-text

Automatic Key-phrase Extraction to Support the Understanding of Infrastructure Disaster Resilience

10.22260/isarc2019/0171 ◽

2019 ◽

Author(s):

Xuan Lv ◽

Syed Ahnaf Morshed ◽

Lu Zhang

Keyword(s):

Disaster Resilience ◽

Phrase Extraction ◽

Key Phrase Extraction

Download Full-text

Automating Key Phrase Extraction from Fault Logs to Support Post-Inspection Repair of Software Requirements

14th Innovations in Software Engineering Conference (formerly known as India Software Engineering Conference) ◽

10.1145/3452383.3452386 ◽

2021 ◽

Author(s):

Maninder Singh ◽

Gursimran Walia

Keyword(s):

Software Requirements ◽

Phrase Extraction ◽

Key Phrase Extraction

Download Full-text

An ontology-based approach for key phrase extraction

10.3115/1667583.1667639 ◽

2009 ◽

Cited By ~ 2

Author(s):

Chau Q. Nguyen ◽

Tuoi T. Phan

Keyword(s):

Phrase Extraction ◽

Key Phrase Extraction

Download Full-text

Specializing Word Embeddings (for Parsing) by Information Bottleneck (Extended Abstract)

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/658 ◽

2020 ◽

Author(s):

Xiang Lisa Li ◽

Jason Eisner

Keyword(s):

Dimensionality Reduction ◽

Semantic Information ◽

State Of The Art ◽

Word Embedding ◽

Discrete Version ◽

Word Embeddings ◽

Continuous Version ◽

Continuous Vector ◽

Information Bottleneck ◽

Art Performance

Pre-trained word embeddings like ELMo and BERT contain rich syntactic and semantic information, resulting in state-of-the-art performance on various tasks. We propose a very fast variational information bottleneck (VIB) method to nonlinearly compress these embeddings, keeping only the information that helps a discriminative parser. We compress each word embedding to either a discrete tag or a continuous vector. In the discrete version, our automatically compressed tags form an alternative tag set: we show experimentally that our tags capture most of the information in traditional POS tag annotations, but our tag sequences can be parsed more accurately at the same level of tag granularity. In the continuous version, we show experimentally that moderately compressing the word embeddings by our method yields a more accurate parser in 8 of 9 languages, unlike simple dimensionality reduction.

Download Full-text

Efeitos de animacidade do antecedente na resolução de pronomes sujeito

Revista da Associação Portuguesa de Linguística ◽

10.26334/2183-9077/rapln4ano2018a40 ◽

2018 ◽

pp. 190-205

Author(s):

Sara Morgado ◽

Paula Luegi ◽

Maria Lobo

Keyword(s):

Semantic Information ◽

Pronoun Resolution ◽

Reading Task ◽

Object Position ◽

European Portuguese ◽

Overt Pronoun ◽

Syntactic Information ◽

Subject Pronoun ◽

Multifactorial Approach

We report two experiments, a self-paced reading task and an off-line questionnaire, that tested if the overt subject pronoun in European Portuguese was sensitive to the animacy (animate vs. inanimate) of the antecedent in object position. We found higher reading times when the overt pronoun was forced to retrieve an inanimate antecedent compared to retrieving an animate one (Experiment 1) and less object choices with inanimate antecedents (compared to animate ones). Our findings show that several factors are taken into account during the resolution of pronominal forms, including animacy features, favouring thus a multifactorial approach to pronoun retrieval (Kaiser & Trueswell, 2008). We propose that there is a hierarchy that considers both syntactic and semantic information in pronoun resolution and that within the syntactic information the prominence of entities varies according to their animacy features. Our results are neither explained by processing theories that only consider syntactic factors (Carminati, 2005), nor by theoretical accounts that associate strong pronouns with animacy features (Cardinaletti & Starke, 1999).

Download Full-text