scholarly journals Table to text generation with accurate content copying

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Yang Yang ◽  
Juan Cao ◽  
Yujun Wen ◽  
Pengzhou Zhang

AbstractGenerating fluent, coherent, and informative text from structured data is called table-to-text generation. Copying words from the table is a common method to solve the “out-of-vocabulary” problem, but it’s difficult to achieve accurate copying. In order to overcome this problem, we invent an auto-regressive framework based on the transformer that combines a copying mechanism and language modeling to generate target texts. Firstly, to make the model better learn the semantic relevance between table and text, we apply a word transformation method, which incorporates the field and position information into the target text to acquire the position of where to copy. Then we propose two auxiliary learning objectives, namely table-text constraint loss and copy loss. Table-text constraint loss is used to effectively model table inputs, whereas copy loss is exploited to precisely copy word fragments from a table. Furthermore, we improve the text search strategy to reduce the probability of generating incoherent and repetitive sentences. The model is verified by experiments on two datasets and better results are obtained than the baseline model. On WIKIBIO, the result is improved from 45.47 to 46.87 on BLEU and from 41.54 to 42.28 on ROUGE. On ROTOWIRE, the result is increased by 4.29% on CO metric, and 1.93 points higher on BLEU.

2021 ◽  
Author(s):  
Yang Yang ◽  
Juan Cao ◽  
Yujun Wen ◽  
Pengzhou Zhang

Abstract Table-to-text generation is an important task in natural language generation that aims to generate smooth, informative text based on structured data. In this paper, we propose a novel transformer-based autoregressive model that incorporates table content copying and language model based generation. At first, we propose a word transformation method to process a target text. By using target text containing fields and position information, we can help the model learn the relationship between target text and table and gain the position of where to copy. We then propose two auxiliary learning goals: table-text constraint loss and copy loss. Table-text constraint loss is introduced to effectively model table inputs, whereas copy loss is exploited to precisely copy word fragments from a table. In addition, we change the maximization-based text search strategy to reduce the probability of problems such as sentence repetition and inconsistency. On the WIKIBIO dataset, our model improves its BLUE scores from 45.47 to 46.87 and ROUGE scores from 41.54 to 42.28, outperforming state-of-the-art baseline models on automatic evaluation metrics. On the ROTOWIRE test set, compared with the best baseline model, our model gets 4.29% higher on CO metric, and 1.93 points higher on BLEU.


2010 ◽  
Vol 171-172 ◽  
pp. 94-97
Author(s):  
Rui Liu ◽  
Ming Hu Jiang

The image search engines have been effective tools to find pictures from the Internet. They provide a list of image items in response to a user’s query, and rank the items according to their relevance to the query. An image item is often accompanied with a short descriptive text, which is brief text summaries extracted from the webpage title, content, image caption, or its metadata, to provide auxiliary information about the image. In this paper, we present a new and effective descriptive text generation method by using the idea of summarizing an image’s surrounding text, using text’s position information, and finding an image’s nearest neighbors.


2012 ◽  
Vol 65 (3) ◽  
pp. 561-570 ◽  
Author(s):  
Wantong Chen ◽  
Yanzhong Zhang

GNSS relative positioning technique is an important field of study, in which the standard ‘GNSS Baseline Model’ is often used. Differencing between observation equations is used to construct the mathematical model, since this method can eliminate some common errors in the GNSS signal measurements. The ‘Orthogonal Transformation’ method can also construct the GNSS Baseline Model. However, as is described by some scholars, this model may avoid some drawbacks of Double Differencing (DD) while maintaining all the advantages. For comparison purposes, this model is evaluated and the theoretical equivalence of both approaches is proved for the short baseline from two aspects: the Integer Ambiguity Resolution and the conditional least-squares baseline vector.


2019 ◽  
Author(s):  
Sheng Shen ◽  
Daniel Fried ◽  
Jacob Andreas ◽  
Dan Klein

2018 ◽  
Author(s):  
Longxu Dou ◽  
Guanghui Qin ◽  
Jinpeng Wang ◽  
Jin-Ge Yao ◽  
Chin-Yew Lin

2019 ◽  
Vol 1 (1) ◽  
pp. 47-66 ◽  
Author(s):  
Stefanie Sirén-Heikel ◽  
Leo Leppänen ◽  
Carl-Gustav Lindén ◽  
Asta Bäck

AbstractNews automation is an emerging field within journalism, with the potential to transform newswork. Increasing access to data, combined with developing technology, will allow further inquiries into automated journalism. Producing news text using NLG (natural language generation) is currently largely undertaken in specific, predictable news domains, such as sports or finance. This interdisciplinary study investigates how elite media representatives from Finland, Europe and the US imagine the affordances of this emerging technology for their organization. Our analysis shows how the affordances of news automation are imagined as providing efficiency, increasing output and aiding in reallocating resources to pursue quality journalism. The affordances are, however, constrained by such factors as access to structured data, the quality of automation and a lack of relevant skills. In its current form, automated text generation is seen as providing only limited benefits to news organizations that are already imagining further possibilities of automation.


2021 ◽  
Author(s):  
Linyong Nan ◽  
Dragomir Radev ◽  
Rui Zhang ◽  
Amrit Rau ◽  
Abhinand Sivaprasad ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document