Towards Indonesian Phrase Extraction: Framework and Corpus

Author(s):  
Xiaotian Lin ◽  
Nankai Lin ◽  
Lixian Xiao ◽  
Shengyi Jiang ◽  
Xinying Qiu
Keyword(s):  
Author(s):  
Sheng Zhang ◽  
Qi Luo ◽  
Yukun Feng ◽  
Ke Ding ◽  
Daniela Gifu ◽  
...  

Background: As a known key phrase extraction algorithm, TextRank is an analogue of PageRank algorithm, which relied heavily on the statistics of term frequency in the manner of co-occurrence analysis. Objective: The frequency-based characteristic made it a neck-bottle for performance enhancement, and various improved TextRank algorithms were proposed in the recent years. Most of improvements incorporated semantic information into key phrase extraction algorithm and achieved improvement. Method: In this research, taking both syntactic and semantic information into consideration, we integrated syntactic tree algorithm and word embedding and put forward an algorithm of Word Embedding and Syntactic Information Algorithm (WESIA), which improved the accuracy of the TextRank algorithm. Results: By applying our method on a self-made test set and a public test set, the result implied that the proposed unsupervised key phrase extraction algorithm outperformed the other algorithms to some extent.


2018 ◽  
Vol 48 (3) ◽  
pp. 496-514 ◽  
Author(s):  
E. Laxmi Lydia ◽  
P. Krishna Kumar ◽  
K. Shankar ◽  
S. K. Lakshmanaprabu ◽  
R. M. Vidhyavathi ◽  
...  

2017 ◽  
Vol 17 (2) ◽  
pp. 28-43 ◽  
Author(s):  
Vivien Macketanz ◽  
Eleftherios Avramidis ◽  
Aljoscha Burchardt ◽  
Jindrich Helcl ◽  
Ankit Srivastava

Abstract In this article we present a novel linguistically driven evaluation method and apply it to the main approaches of Machine Translation (Rule-based, Phrase-based, Neural) to gain insights into their strengths and weaknesses in much more detail than provided by current evaluation schemes. Translating between two languages requires substantial modelling of knowledge about the two languages, about translation, and about the world. Using English-German IT-domain translation as a case-study, we also enhance the Phrase-based system by exploiting parallel treebanks for syntax-aware phrase extraction and by interfacing with Linked Open Data (LOD) for extracting named entity translations in a post decoding framework.


Sign in / Sign up

Export Citation Format

Share Document