surface realization
Recently Published Documents





Information ◽  
2021 ◽  
Vol 12 (8) ◽  
pp. 337
Alessandro Mazzei ◽  
Mattia Cerrato ◽  
Roberto Esposito ◽  
Valerio Basile

In natural language generation, word ordering is the task of putting the words composing the output surface form in the correct grammatical order. In this paper, we propose to apply general learning-to-rank algorithms to the task of word ordering in the broader context of surface realization. The major contributions of this paper are: (i) the design of three deep neural architectures implementing pointwise, pairwise, and listwise approaches for ranking; (ii) the testing of these neural architectures on a surface realization benchmark in five natural languages belonging to different typological families. The results of our experiments show promising results, in particular highlighting the performance of the pairwise approach, paving the way for a more transparent surface realization from arbitrary tree- and graph-like structures.

2021 ◽  
Vol 52 (1) ◽  
pp. 207-226
Amina Mettouchi

Abstract Prosody is often conceived of as an important but surface realization of morphosyntactic constructions that are otherwise deemed complete. This paper challenges that view of prosody as a disambiguating, highlighting or scope-marking device, and provides evidence for the inclusion of prosody as a core formal means for the coding of cleft constructions in Kabyle, in interaction with morphosyntax. The demonstration is conducted through the recursive analysis of an annotated corpus of spontaneous data, and results in a precise formal definition of Kabyle clefts constructions, whose function is shown to be the marking of narrow focus.

2021 ◽  
Vol 9 ◽  
pp. 510-527
Ratish Puduppully ◽  
Mirella Lapata

Abstract Recent approaches to data-to-text generation have adopted the very successful encoder-decoder architecture or variants thereof. These models generate text that is fluent (but often imprecise) and perform quite poorly at selecting appropriate content and ordering it coherently. To overcome some of these issues, we propose a neural model with a macro planning stage followed by a generation stage reminiscent of traditional methods which embrace separate modules for planning and surface realization. Macro plans represent high level organization of important content such as entities, events, and their interactions; they are learned from data and given as input to the generator. Extensive experiments on two data-to-text benchmarks (RotoWire and MLB) show that our approach outperforms competitive baselines in terms of automatic and human evaluation.

2021 ◽  
Vol 9 ◽  
pp. 429-446
Anastasia Shimorina ◽  
Yannick Parmentier ◽  
Claire Gardent

Abstract The metrics standardly used to evaluate Natural Language Generation (NLG) models, such as BLEU or METEOR, fail to provide information on which linguistic factors impact performance. Focusing on Surface Realization (SR), the task of converting an unordered dependency tree into a well-formed sentence, we propose a framework for error analysis which permits identifying which features of the input affect the models’ results. This framework consists of two main components: (i) correlation analyses between a wide range of syntactic metrics and standard performance metrics and (ii) a set of techniques to automatically identify syntactic constructs that often co-occur with low performance scores. We demonstrate the advantages of our framework by performing error analysis on the results of 174 system runs submitted to the Multilingual SR shared tasks; we show that dependency edge accuracy correlate with automatic metrics thereby providing a more interpretable basis for evaluation; and we suggest ways in which our framework could be used to improve models and data. The framework is available in the form of a toolkit which can be used both by campaign organizers to provide detailed, linguistically interpretable feedback on the state of the art in multilingual SR, and by individual researchers to improve models and datasets.1

2020 ◽  
Vol 4 (1) ◽  
pp. p40
Longxing Wei

Unlike most previous studies of Codeswitching (CS) focused on describing surface configurations of switched items (i.e., where CS is structurally possible) or the switched items (i.e., what items from another language can be switched), this paper explores formulation processes of bilingual speech and the nature of the bilingual mental lexicon and its activity in CS. More specifically, it applies the Bilingual Lemma Activation Model (Wei, 2002, 2006b) to the data drawn from various naturally occurring CS instances. It claims that the mental lexicon does not simply contain lexemes and their meanings, but also lemmas, which are abstract entries in the mental lexicon that support the surface realization of actual lexemes. Lemmas are abstract in that they contain phonological, morphological, semantic, syntactic and pragmatic information about lexemes. It further claims that lemmas in the bilingual mental lexicon are language-specific and are in contact during a discourse involving CS at three levels of abstract lexical structure: lexical-conceptual structure, predicate-argument structure, and morphological realization patterns. The CS instances described and analyzed in this paper provide evidence that the bilingual speaker’s two linguistic systems are unequally activated in CS, and CS is an outcome of bilingual lemmas in contact.

2020 ◽  
Henry Elder ◽  
Robert Burke ◽  
Alexander O’Connor ◽  
Jennifer Foster

2019 ◽  
Vol 7 ◽  
pp. 327-342 ◽  
Ryan Cotterell ◽  
Christo Kirov ◽  
Mans Hulden ◽  
Jason Eisner

We quantify the linguistic complexity of different languages’ morphological systems. We verify that there is a statistically significant empirical trade-off between paradigm size and irregularity: A language’s inflectional paradigms may be either large in size or highly irregular, but never both. We define a new measure of paradigm irregularity based on the conditional entropy of the surface realization of a paradigm— how hard it is to jointly predict all the word forms in a paradigm from the lemma. We estimate irregularity by training a predictive model. Our measurements are taken on large morphological paradigms from 36 typologically diverse languages.

Sign in / Sign up

Export Citation Format

Share Document