Key phrase Extraction by Improving TextRank with an Integration of Word Embedding and Syntactic Information

Author(s):  
Sheng Zhang ◽  
Qi Luo ◽  
Yukun Feng ◽  
Ke Ding ◽  
Daniela Gifu ◽  
...  

Background: As a known key phrase extraction algorithm, TextRank is an analogue of PageRank algorithm, which relied heavily on the statistics of term frequency in the manner of co-occurrence analysis. Objective: The frequency-based characteristic made it a neck-bottle for performance enhancement, and various improved TextRank algorithms were proposed in the recent years. Most of improvements incorporated semantic information into key phrase extraction algorithm and achieved improvement. Method: In this research, taking both syntactic and semantic information into consideration, we integrated syntactic tree algorithm and word embedding and put forward an algorithm of Word Embedding and Syntactic Information Algorithm (WESIA), which improved the accuracy of the TextRank algorithm. Results: By applying our method on a self-made test set and a public test set, the result implied that the proposed unsupervised key phrase extraction algorithm outperformed the other algorithms to some extent.

2016 ◽  
Vol 78 (8-2) ◽  
Author(s):  
Vinothini Kasinathana ◽  
Masrah Azrifah Azmi Murad ◽  
Rahmita Wirza Rahmat ◽  
Evi Indriasari Mansor ◽  
Aida Mustapha

This paper presents the mechanics of a presentation mining system that mines keywords and key phrases from a collection of PowerPoint slides and generates a mind map using the extracted words and phrases. The core of presentation mining lies in two stages; ranking the potential phrases and extracting the keywords and key phrases. The keywords and key phrases form a mind map, which is then evaluated against a domain ontology. The results of recall and precision are also compared between the existing key phrase extraction system called the KP-Miner and the proposed presentation mining system. The key phrase extraction algorithm by the proposed presentation mining system achieved higher recall and precision than KP-Miner, hence producing a more accurate visualization of the PowerPoint slides in the form of mind map.


Author(s):  
Xiang Lisa Li ◽  
Jason Eisner

Pre-trained word embeddings like ELMo and BERT contain rich syntactic and semantic information, resulting in state-of-the-art performance on various tasks. We propose a very fast variational information bottleneck (VIB) method to nonlinearly compress these embeddings, keeping only the information that helps a discriminative parser. We compress each word embedding to either a discrete tag or a continuous vector. In the discrete version, our automatically compressed tags form an alternative tag set: we show experimentally that our tags capture most of the information in traditional POS tag annotations, but our tag sequences can be parsed more accurately at the same level of tag granularity. In the continuous version, we show experimentally that moderately compressing the word embeddings by our method yields a more accurate parser in 8 of 9 languages, unlike simple dimensionality reduction.


Author(s):  
Sara Morgado ◽  
Paula Luegi ◽  
Maria Lobo

We report two experiments, a self-paced reading task and an off-line questionnaire, that tested if the overt subject pronoun in European Portuguese was sensitive to the animacy (animate vs. inanimate) of the antecedent in object position. We found higher reading times when the overt pronoun was forced to retrieve an inanimate antecedent compared to retrieving an animate one (Experiment 1) and less object choices with inanimate antecedents (compared to animate ones). Our findings show that several factors are taken into account during the resolution of pronominal forms, including animacy features, favouring thus a multifactorial approach to pronoun retrieval (Kaiser & Trueswell, 2008). We propose that there is a hierarchy that considers both syntactic and semantic information in pronoun resolution and that within the syntactic information the prominence of entities varies according to their animacy features. Our results are neither explained by processing theories that only consider syntactic factors (Carminati, 2005), nor by theoretical accounts that associate strong pronouns with animacy features (Cardinaletti & Starke, 1999).


2018 ◽  
Vol 48 (3) ◽  
pp. 496-514 ◽  
Author(s):  
E. Laxmi Lydia ◽  
P. Krishna Kumar ◽  
K. Shankar ◽  
S. K. Lakshmanaprabu ◽  
R. M. Vidhyavathi ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document