scholarly journals Generating with Discourse Combinatory Categorial Grammar

2010 ◽  
Vol 4 ◽  
Author(s):  
Crystal Nakatsu ◽  
Michael White

This article introduces Discourse Combinatory Categorial Grammar (DCCG) and shows how it can be used to generate multisentence paraphrases, flexibly incorporating both intra- and intersentential discourse connectives. DCCG employs a simple, practical approach to extending Combinatory Categorial Grammar (CCG) to encompass coverage of discourse-level phenomena, which furthermore makes it possible to generate clauses with multiple connectives and — in contrast to approaches based on Rhetorical Structure Theory — with rhetorical dependencies that do not form a tree. To do so, it borrows from Discourse Lexicalized Tree Adjoining Grammar (D-LTAG) the distinction between structural connectives and anaphoric discourse adverbials. Unlike D-LTAG, however, DCCG treats both sentential and discourse phenomena in the same grammar, rather than employing a separate discourse grammar. A key ingredient of this single-grammar approach is cue threading, a tightly constrained technique for extending the semantic scope of a discourse connective beyond the sentence. As DCCG requires no additions to the CCG formalism, it can be used to generate paraphrases of an entire dialogue turn using the OpenCCG realizer as-is, without the need to revise its architecture. In addition, from an interpretation perspective, a single grammar enables easier management of ambiguity across discourse and sentential levels using standard dynamic programming techniques, whereas D-LTAG has required a potentially complex interaction of sentential and discourse grammars to manage the same ambiguity. As a proof-of-concept, the article demonstrates how OpenCCG can be used with a DCCG to generate multi-sentence paraphrases that reproduce and extend those in the SPaRKy Restaurant Corpus.

2018 ◽  
Vol 44 (3) ◽  
pp. 447-482 ◽  
Author(s):  
Marco Kuhlmann ◽  
Giorgio Satta ◽  
Peter Jonsson

We study the parsing complexity of Combinatory Categorial Grammar (CCG) in the formalism of Vijay-Shanker and Weir ( 1994 ). As our main result, we prove that any parsing algorithm for this formalism will take in the worst case exponential time when the size of the grammar, and not only the length of the input sentence, is included in the analysis. This sets the formalism of Vijay-Shanker and Weir ( 1994 ) apart from weakly equivalent formalisms such as Tree Adjoining Grammar, for which parsing can be performed in time polynomial in the combined size of grammar and input sentence. Our results contribute to a refined understanding of the class of mildly context-sensitive grammars, and inform the search for new, mildly context-sensitive versions of CCG.


2015 ◽  
Vol 41 (2) ◽  
pp. 215-247 ◽  
Author(s):  
Marco Kuhlmann ◽  
Alexander Koller ◽  
Giorgio Satta

The weak equivalence of Combinatory Categorial Grammar (CCG) and Tree-Adjoining Grammar (TAG) is a central result of the literature on mildly context-sensitive grammar formalisms. However, the categorial formalism for which this equivalence has been established differs significantly from the versions of CCG that are in use today. In particular, it allows restriction of combinatory rules on a per grammar basis, whereas modern CCG assumes a universal set of rules, isolating all cross-linguistic variation in the lexicon. In this article we investigate the formal significance of this difference. Our main result is that lexicalized versions of the classical CCG formalism are strictly less powerful than TAG.


2021 ◽  
Vol 9 ◽  
pp. 707-720
Author(s):  
Lena Katharina Schiffer ◽  
Andreas Maletti

Tree-adjoining grammar (TAG) and combinatory categorial grammar (CCG) are two well-established mildly context-sensitive grammar formalisms that are known to have the same expressive power on strings (i.e., generate the same class of string languages). It is demonstrated that their expressive power on trees also essentially coincides. In fact, CCG without lexicon entries for the empty string and only first-order rules of degree at most 2 are sufficient for its full expressive power.


2021 ◽  
Author(s):  
C Qureshi ◽  
Tane Moleta ◽  
Marc Aurel Schnabel

In its ambitions, the paper aims to propose a proof of concept for a Virtual, Augmented and Mixed (VAM) environment that digitally overlays a multifaith space in order to optimize their use, essentially transforming itself to the spiritual needs of the user. In order to do so, a mixed reality experience was developed by investigating and interpreting both the tangible and intangible rituals of prayer. By incorporating an immersive experience, the project promotes the idea of a multifaith space that moves beyond the notion of an “empty white room (Crompton, 2013, p.487)”. To develop an immersive experience that caters to people of all religions or no religion is beyond the scope of this project. Hence, by creating a VAM environment for users of the Muslim faith the project may be able to support design ideologies for others, furthering research in this field.


2007 ◽  
Vol 33 (3) ◽  
pp. 355-396 ◽  
Author(s):  
Julia Hockenmaier ◽  
Mark Steedman

This article presents an algorithm for translating the Penn Treebank into a corpus of Combinatory Categorial Grammar (CCG) derivations augmented with local and long-range word-word dependencies. The resulting corpus, CCGbank, includes 99.4% of the sentences in the Penn Treebank. It is available from the Linguistic Data Consortium, and has been used to train wide-coverage statistical parsers that obtain state-of-the-art rates of dependency recovery. In order to obtain linguistically adequate CCG analyses, and to eliminate noise and inconsistencies in the original annotation, an extensive analysis of the constructions and annotations in the Penn Treebank was called for, and a substantial number of changes to the Treebank were necessary. We discuss the implications of our findings for the extraction of other linguistically expressive grammars from the Treebank, and for the design of future treebanks.


2020 ◽  
pp. 089443932090576
Author(s):  
James Hawdon ◽  
Matthew Costello ◽  
Colin Bernatzky ◽  
Salvatore J. Restifo

Does hateful rhetoric appeal to supporters of President Trump? Prior studies link sexism, racism, Islamophobia, anti-immigrant sentiment, and intolerant forms of Christianity to supporting or voting for President Trump. We extend this literature by examining whether individuals who approve of President Trump’s job performance are more accepting of the hateful rhetoric and imagery they encounter online. We do so using online survey data ( N = 465) of youth and young adults collected in December 2017. Building on previous theoretical explanations of participating in online hate that utilize routine activity theory and social learning–social structure theory, we argue that support for President Trump is a result of the “politics of status,” and support for the President thus represents an enthymeme. Our key finding is that agreement with online hate material is indeed positively associated with support for the President. Additionally, we find that one’s differential location in the social structure, online and off-line social bonds, and attitudes toward norm violations are associated with agreement with online extremist content.


2019 ◽  
Author(s):  
Michael D. Edge ◽  
Graham Coop

AbstractDirect-to-consumer (DTC) genetics services are increasingly popular for genetic genealogy, with tens of millions of customers as of 2019. Several DTC genealogy services allow users to upload their own genetic datasets in order to search for genetic relatives. A user and a target person in the database are identified as genetic relatives if the user’s uploaded genome shares one or more sufficiently long segments in common with that of the target person—that is, if the two genomes share one or more long regions identical by state (IBS). IBS matches reveal some information about the genotypes of the target person, particularly if the chromosomal locations of IBS matches are shared with the uploader. Here, we describe several methods by which an adversary who wants to learn the genotypes of people in the database can do so by uploading multiple datasets. Depending on the methods used for IBS matching and the information about IBS segments returned to the user, substantial information about users’ genotypes can be revealed with a few hundred uploaded datasets. For example, using a method we call IBS tiling, we estimate that an adversary who uploads approximately 900 publicly available genomes could recover at least one allele at SNP sites across up to 82% of the genome of a median person of European ancestries. In databases that detect IBS segments using unphased genotypes, approximately 100 uploads of falsified datasets can reveal enough genetic information to allow accurate genome-wide imputation of every person in the database. Different DTC services use different methods for identifying and reporting IBS segments, leading to differences in vulnerability to the attacks we describe. We provide a proof-of-concept demonstration that the GEDmatch database in particular uses unphased genotypes to detect IBS and is vulnerable to genotypes being revealed by artificial datasets. We suggest simple-to-implement suggestions that will prevent the exploits we describe and discuss our results in light of recent trends in genetic privacy, including the recent use of uploads to DTC genetic genealogy services by law enforcement.


Sign in / Sign up

Export Citation Format

Share Document