A Novel Data-to-Text Generation Model with Transformer Planning and a Wasserstein Auto-Encoder

Author(s):  
Xiaohong Xu ◽  
Ting He ◽  
Huazhen Wang
2014 ◽  
Author(s):  
Yue Zhang ◽  
Kai Song ◽  
Linfeng Song ◽  
Jingbo Zhu ◽  
Qun Liu

2021 ◽  
Vol 2 (2) ◽  
pp. 1-16
Author(s):  
Shubhra Tewari ◽  
Renos Zabounidis ◽  
Ammina Kothari ◽  
Reynold Bailey ◽  
Cecilia Ovesdotter Alm

Automated journalism technology is transforming news production and changing how audiences perceive the news. As automated text-generation models advance, it is important to understand how readers perceive human-written and machine-generated content. This study used OpenAI’s GPT-2 text-generation model (May 2019 release) and articles from news organizations across the political spectrum to study participants’ reactions to human- and machine-generated articles. As participants read the articles, we collected their facial expression and galvanic skin response (GSR) data together with self-reported perceptions of article source and content credibility. We also asked participants to identify their political affinity and assess the articles’ political tone to gain insight into the relationship between political leaning and article perception. Our results indicate that the May 2019 release of OpenAI’s GPT-2 model generated articles that were misidentified as written by a human close to half the time, while human-written articles were identified correctly as written by a human about 70 percent of the time.


Mathematics ◽  
2020 ◽  
Vol 8 (9) ◽  
pp. 1558 ◽  
Author(s):  
Lingyun Xiang ◽  
Shuanghui Yang ◽  
Yuhang Liu ◽  
Qian Li ◽  
Chengzhang Zhu

With the development of natural language processing, linguistic steganography has become a research hotspot in the field of information security. However, most existing linguistic steganographic methods may suffer from the low embedding capacity problem. Therefore, this paper proposes a character-level linguistic steganographic method (CLLS) to embed the secret information into characters instead of words by employing a long short-term memory (LSTM) based language model. First, the proposed method utilizes the LSTM model and large-scale corpus to construct and train a character-level text generation model. Through training, the best evaluated model is obtained as the prediction model of generating stego text. Then, we use the secret information as the control information to select the right character from predictions of the trained character-level text generation model. Thus, the secret information is hidden in the generated text as the predicted characters having different prediction probability values can be encoded into different secret bit values. For the same secret information, the generated stego texts vary with the starting strings of the text generation model, so we design a selection strategy to find the highest quality stego text from a number of candidate stego texts as the final stego text by changing the starting strings. The experimental results demonstrate that compared with other similar methods, the proposed method has the fastest running speed and highest embedding capacity. Moreover, extensive experiments are conducted to verify the effect of the number of candidate stego texts on the quality of the final stego text. The experimental results show that the quality of the final stego text increases with the number of candidate stego texts increasing, but the growth rate of the quality will slow down.


2021 ◽  
Vol 9 (2) ◽  
pp. 334-350
Author(s):  
Md. Raisul Kibria ◽  
Mohammad Abu Yousuf

Text generation is a rapidly evolving field of Natural Language Processing (NLP) with larger Language models proposed very often setting new state-of-the-art. These models are exorbitantly effective in learning the representation of words and their internal coherence in a particular language. However, an established context-driven, end to end text generation model is very rare, even more so for the Bengali language. In this paper, we have proposed a Bidirectional gated recurrent unit (GRU) based architecture that simulates the conditional language model or the decoder portion of the sequence to sequence (seq2seq) model and is further conditioned upon the target context vectors. We have explored several ways of combining multiple context words into a fixed dimensional vector representation that is extracted from the same GloVe language model which is used to generate the embedding matrix. We have used beam search optimization to generate the sentence with the maximum cumulative log probability score. In addition, we have proposed a human scoring based evaluation metric and used it to compare the performance of the model with unidirectional LSTM and GRU networks. Empirical results prove that the proposed model performs exceedingly well in producing meaningful outcomes depicting the target context. The experiment leads to an architecture that can be applied to an extensive domain of context-driven text generation based applications and which is also a key contribution to the NLP based literature of the Bengali language.


2021 ◽  
Vol 14 (1) ◽  
pp. 516-527
Author(s):  
Ana Zaqiyah ◽  
◽  
Diana Purwitasari ◽  
Chastine Fatichah ◽  
◽  
...  

Spam detection frequently categorizes product reviews as spam and non-spam. The spam reviews may contain texts of fake reviews and non-review statements describing unrelated things about products. Most of the publicly available spam reviews are labelled as fake reviews, while non-spam texts that are not fake reviews could contain non-review statements. It is crucial to notice those non-review statements since they convey misperception to consumers. Non-review statements are hardly found, and those statements of large and long texts often need to be manually labelled, which is time-consuming. Because of the rareness in finding non-review statements, there is an imbalanced condition between non-spam as a major class and spam that consists of the non-review statement as a minor class. Augmenting fake reviews to add spam texts is ineffective because they have similar content to non-spam such as some opinion words of product features. Thus, the text generation of non-review statements is preferable for adding spam texts. Some text generation issues are the frequent neural network-based methods require much learning data, and the existing pre-trained models produce texts with different contexts to non-review statements. The augmented texts should have similar content and context represented by the structure of the non-review statement. Therefore, we propose a text generation model with content and structure-based preprocessing to produce non-review statements, which is expected to overcome imbalanced data and give better spam detection results in product reviews. Structure-based preprocessing identifies the feature structures of non-opinion words from part-of-speech tags. Those features represent the context of spam reviews in unlabeled texts. Then, content-based preprocessing appoints selected topic modeling results of non-review statements from fake reviews. Our experiments resulted an improvement on the metric value of ± 0.04, called as BLEU (Bi-Lingual Evaluation Understudy) score, for the correspondence evaluation between generated and trained texts. The metric value indicates that the generated texts are not quite identical to the trained texts of non-review statements. However, those additional texts combined with the original spam texts gave better spam detection results with an increasing value of more than 40% on average recall score.


CounterText ◽  
2015 ◽  
Vol 1 (3) ◽  
pp. 348-365 ◽  
Author(s):  
Mario Aquilina

What if the post-literary also meant that which operates in a literary space (almost) devoid of language as we know it: for instance, a space in which language simply frames the literary or poetic rather than ‘containing’ it? What if the countertextual also meant the (en)countering of literary text with non-textual elements, such as mathematical concepts, or with texts that we would not normally think of as literary, such as computer code? This article addresses these issues in relation to Nick Montfort's #!, a 2014 print collection of poems that presents readers with the output of computer programs as well as the programs themselves, which are designed to operate on principles of text generation regulated by specific constraints. More specifically, it focuses on two works in the collection, ‘Round’ and ‘All the Names of God’, which are read in relation to the notions of the ‘computational sublime’ and the ‘event’.


Sign in / Sign up

Export Citation Format

Share Document