Highly-Inflected Language Generation Using Factored Language Models

Stochastic Language Generation in Dialogue using Factored Language Models

Computational Linguistics ◽

10.1162/coli_a_00199 ◽

2014 ◽

Vol 40 (4) ◽

pp. 763-799 ◽

Cited By ~ 19

Author(s):

François Mairesse ◽

Steve Young

Keyword(s):

Language Models ◽

Data Driven ◽

Generation Task ◽

Language Generation ◽

Domain Experts ◽

Automated Evaluation ◽

Dialogue Acts ◽

Semantic Concepts ◽

Stochastic Language ◽

Using Data

Most previous work on trainable language generation has focused on two paradigms: (a) using a generation decisions of an existing generator. Both approaches rely on the existence of a handcrafted generation component, which is likely to limit their scalability to new domains. The first contribution of this article is to present Bagel, a fully data-driven generation method that treats the language generation task as a search for the most likely sequence of semantic concepts and realization phrases, according to Factored Language Models (FLMs). As domain utterances are not readily available for most natural language generation tasks, a large creative effort is required to produce the data necessary to represent human linguistic variation for nontrivial domains. This article is based on the assumption that learning to produce paraphrases can be facilitated by collecting data from a large sample of untrained annotators using crowdsourcing—rather than a few domain experts—by relying on a coarse meaning representation. A second contribution of this article is to use crowdsourced data to show how dialogue naturalness can be improved by learning to vary the output utterances generated for a given semantic input. Two data-driven methods for generating paraphrases in dialogue are presented: (a) by sampling from the n-best list of realizations produced by Bagel's FLM reranker; and (b) by learning a structured perceptron predicting whether candidate realizations are valid paraphrases. We train Bagel on a set of 1,956 utterances produced by 137 annotators, which covers 10 types of dialogue acts and 128 semantic concepts in a tourist information system for Cambridge. An automated evaluation shows that Bagel outperforms utterance class LM baselines on this domain. A human evaluation of 600 resynthesized dialogue extracts shows that Bagel's FLM output produces utterances comparable to a handcrafted baseline, whereas the perceptron classifier performs worse. Interestingly, human judges find the system sampling from the n-best list to be more natural than a system always returning the first-best utterance. The judges are also more willing to interact with the n-best system in the future. These results suggest that capturing the large variation found in human language using data-driven methods is beneficial for dialogue interaction.

Download Full-text

Truth, Lies, and Automation: How Language Models Could Change Disinformation

10.51593/2021ca003 ◽

2021 ◽

Author(s):

Ben Buchanan ◽

◽

Andrew Lohn ◽

Micah Musser ◽

Katerina Sedova

Keyword(s):

Natural Language ◽

Natural Language Generation ◽

Cutting Edge ◽

Language Models ◽

Language Generation ◽

High Performing

Growing popular and industry interest in high-performing natural language generation models has led to concerns that such models could be used to generate automated disinformation at scale. This report examines the capabilities of GPT-3--a cutting-edge AI system that writes text--to analyze its potential misuse for disinformation. A model like GPT-3 may be able to help disinformation actors substantially reduce the work necessary to write disinformation while expanding its reach and potentially also its effectiveness.

Download Full-text

Computing Accurate Grammatical Feedback in a Virtual Writing Conference for German-Speaking Elementary-School Children: An Approach Based on Natural Language Generation

CALICO Journal ◽

10.1558/cj.v26i3.626-643 ◽

2013 ◽

Vol 26 (3) ◽

pp. 626-643 ◽

Cited By ~ 1

Author(s):

Karin Harbusch ◽

Gergana Itsova ◽

Ulrich Koch ◽

Christine Kühner

Keyword(s):

Elementary School ◽

Natural Language ◽

School Children ◽

Elementary School Children ◽

Natural Language Generation ◽

Language Generation ◽

Writing Conference ◽

German Speaking

Download Full-text

Statistical Language Models for Information Retrieval A Critical Review

10.1561/9781601981875 ◽

2007 ◽

Cited By ~ 4

Author(s):

ChengXiang Zhai

Keyword(s):

Information Retrieval ◽

Critical Review ◽

Language Models ◽

Statistical Language Models

Download Full-text

Problems in Poem Writing in Korean by Dual Language Generation

The Korean Poetics Studies ◽

10.15705/kopoet..18.200704.005 ◽

2007 ◽

Vol null (18) ◽

pp. 121-148

Author(s):

Kim Yong Hee

Keyword(s):

Dual Language ◽

Language Generation

Download Full-text

Why Business Intelligence Needs Artificial Intelligence (AI) and Advanced Natural Language Generation (NLG)

Journal of Environmental Science Computer Science and Engineering & Technology ◽

10.24214/jecet.b.6.4.266274 ◽

2017 ◽

Vol 6 (4) ◽

Keyword(s):

Artificial Intelligence ◽

Natural Language ◽

Business Intelligence ◽

Natural Language Generation ◽

Language Generation

Download Full-text

Proceedings of the 8th European workshop on Natural Language Generation - EWNLG '01

10.3115/1117840 ◽

2001 ◽

Keyword(s):

Natural Language ◽

Natural Language Generation ◽

Language Generation ◽

European Workshop

Download Full-text

Proceedings of the Fifth International Natural Language Generation Conference on - INLG '08

10.3115/1708322 ◽

2008 ◽

Keyword(s):

Natural Language ◽

Natural Language Generation ◽

Language Generation

Download Full-text

The Story Of Computational Narratology

10.34048/2018.4.f3 ◽

2018 ◽

Author(s):

Sharath Srivatsa ◽

Shyam Kumar V N ◽

Srinath Srinivasa

Keyword(s):

Natural Language ◽

Computational Modeling ◽

Narrative Structure ◽

Complex Problem ◽

General Intelligence ◽

Language Understanding ◽

Language Generation ◽

Narrative Generation ◽

Growing Body ◽

Computational Narratology

In recent times, computational modeling of narratives has gained enormous interest in fields like Natural Language Understanding (NLU), Natural Language Generation (NLG), and Artificial General Intelligence (AGI). There is a growing body of literature addressing understanding of narrative structure and generation of narratives. Narrative generation is known to be a far more complex problem than narrative understanding [20].

Download Full-text

Adolescent Language: Models, Assessment, and Links to Reading

10.35542/osf.io/pf5y8 ◽

2019 ◽

Cited By ~ 1

Author(s):

Amanda Goodwin ◽

Yaacov Petscher ◽

Jamie Tock

Keyword(s):

Reading Comprehension ◽

Bifactor Model ◽

Language Models ◽

Multiple Group ◽

Global Factor ◽

Eighth Grade Students ◽

Key Aspects ◽

Future Work ◽

The Relationship ◽

Best Fit

Various models have highlighted the complexity of language. Building on foundational ideas regarding three key aspects of language, our study contributes to the literature by 1) exploring broader conceptions of morphology, vocabulary, and syntax, 2) operationalizing this theoretical model into a gamified, standardized, computer-adaptive assessment of language for fifth to eighth grade students entitled Monster, PI, and 3) uncovering further evidence regarding the relationship between language and standardized reading comprehension via this assessment. Multiple-group item response theory (IRT) across grades show that morphology was best fit by a bifactor model of task specific factors along with a global factor related to each skill. Vocabulary was best fit by a bifactor model that identifies performance overall and on specific words. Syntax, though, was best fit by a unidimensional model. Next, Monster, PI produced reliable scores suggesting language can be assessed efficiently and precisely for students via this model. Lastly, performance on Monster, PI explained more than 50% of variance in standardized reading, suggesting operationalizing language via Monster, PI can provide meaningful understandings of the relationship between language and reading comprehension. Specifically, considering just a subset of a construct, like identification of units of meaning, explained significantly less variance in reading comprehension. This highlights the importance of considering these broader constructs. Implications indicate that future work should consider a model of language where component areas are considered broadly and contributions to reading comprehension are explored via general performance on components as well as skill level performance.

Download Full-text