Korean Text Generation Using Word Embedding Enhanced by Domain Information

2019 ◽  
Vol 29 (2) ◽  
pp. 142-147
Author(s):  
Mei-ying Ren ◽  
Shin-jae Kang
Author(s):  
Diana Purwitasari ◽  
Ana Alimatus Zaqiyah ◽  
Chastine Fatichah

2020 ◽  
Author(s):  
Remo De Oliveira Gresta ◽  
Elder Cirilo

Identifiers are one of the most important sources of domain information in software development. Therefore, it is recognized that the proper use of names directly impacts the code's comprehensibility, maintainability, and quality. Our goal in this work is to expand the current knowledge about names by considering not only their quality but also their contextual similarity. To achieve that, we extracted names of four large scale open-source projects written in Java. Then, we computed the semantic similarity between classes and their attributes/variables using Fasttext, an word embedding algorithm. As a result, we could observe that source code, in general, preserve an acceptable level of contextual similarity, developers avoid to use names out of the default dictionary (e.g., domain), and files with more changes and maintained by distinct contributors tend to have better a contextual similarity.


CounterText ◽  
2015 ◽  
Vol 1 (3) ◽  
pp. 348-365 ◽  
Author(s):  
Mario Aquilina

What if the post-literary also meant that which operates in a literary space (almost) devoid of language as we know it: for instance, a space in which language simply frames the literary or poetic rather than ‘containing’ it? What if the countertextual also meant the (en)countering of literary text with non-textual elements, such as mathematical concepts, or with texts that we would not normally think of as literary, such as computer code? This article addresses these issues in relation to Nick Montfort's #!, a 2014 print collection of poems that presents readers with the output of computer programs as well as the programs themselves, which are designed to operate on principles of text generation regulated by specific constraints. More specifically, it focuses on two works in the collection, ‘Round’ and ‘All the Names of God’, which are read in relation to the notions of the ‘computational sublime’ and the ‘event’.


2015 ◽  
Author(s):  
Oren Melamud ◽  
Omer Levy ◽  
Ido Dagan

Author(s):  
Sheng Zhang ◽  
Qi Luo ◽  
Yukun Feng ◽  
Ke Ding ◽  
Daniela Gifu ◽  
...  

Background: As a known key phrase extraction algorithm, TextRank is an analogue of PageRank algorithm, which relied heavily on the statistics of term frequency in the manner of co-occurrence analysis. Objective: The frequency-based characteristic made it a neck-bottle for performance enhancement, and various improved TextRank algorithms were proposed in the recent years. Most of improvements incorporated semantic information into key phrase extraction algorithm and achieved improvement. Method: In this research, taking both syntactic and semantic information into consideration, we integrated syntactic tree algorithm and word embedding and put forward an algorithm of Word Embedding and Syntactic Information Algorithm (WESIA), which improved the accuracy of the TextRank algorithm. Results: By applying our method on a self-made test set and a public test set, the result implied that the proposed unsupervised key phrase extraction algorithm outperformed the other algorithms to some extent.


Sign in / Sign up

Export Citation Format

Share Document