scholarly journals Automatic Mapping of French Discourse Connectives to PDTB Discourse Relations

Author(s):  
Majid Laali ◽  
Leila Kosseim
2016 ◽  
Vol 16 (2) ◽  
pp. 264-279 ◽  
Author(s):  
Sandrine Zufferey

Coherence relations linking discourse segments can be communicated explicitly by the use of connectives but also implicitly through juxtaposition. Some discourse relations appear, however, to be more coherent than others when conveyed implicitly. This difference is explained in the literature by the existence of default expectations guiding discourse interpretation. In this paper, we assess the factors influencing implicitation by comparing the number of implicit and explicit translations of three polysemous French connectives in translated texts across three target languages: German, English and Spanish. Each connective can convey two discourse relations: one that can easily be conveyed implicitly and one that cannot be easily conveyed implicitly in monolingual data. Results indicate that relations that can easily be conveyed implicitly are also those that are most often left implicit in translation in all target languages. We discuss these results in view of the cognitive factors influencing the explicit or implicit communication of discourse relations.


2012 ◽  
Vol 20 (2) ◽  
pp. 151-184 ◽  
Author(s):  
ZIHENG LIN ◽  
HWEE TOU NG ◽  
MIN-YEN KAN

AbstractSince the release of the large discourse-level annotation of the Penn Discourse Treebank (PDTB), research work has been carried out on certain subtasks of this annotation, such as disambiguating discourse connectives and classifying Explicit or Implicit relations. We see a need to construct a full parser on top of these subtasks and propose a way to evaluate the parser. In this work, we have designed and developed an end-to-end discourse parser-to-parse free texts in the PDTB style in a fully data-driven approach. The parser consists of multiple components joined in a sequential pipeline architecture, which includes a connective classifier, argument labeler, explicit classifier, non-explicit classifier, and attribution span labeler. Our trained parser first identifies all discourse and non-discourse relations, locates and labels their arguments, and then classifies the sense of the relation between each pair of arguments. For the identified relations, the parser also determines the attribution spans, if any, associated with them. We introduce novel approaches to locate and label arguments, and to identify attribution spans. We also significantly improve on the current state-of-the-art connective classifier. We propose and present a comprehensive evaluation from both component-wise and error-cascading perspectives, in which we illustrate how each component performs in isolation, as well as how the pipeline performs with errors propagated forward. The parser gives an overall system F1 score of 46.80 percent for partial matching utilizing gold standard parses, and 38.18 percent with full automation.


2007 ◽  
Vol 7 (2) ◽  
pp. 143-166 ◽  
Author(s):  
Anna Espunya

This paper reports on a study designed to assess the influence of the pragmatic Principle of Informativeness on the translatorial strategy of explicitation. It replicates a previous study on the occurrence of conjunctional augmentation of English present participle free adjuncts in a monolingual corpus (Kortmann 1991), with a database of translation instances from English into Catalan. The study aims at testing the validity of Kortmann’s scale of informativeness of semantic / discourse relations (e.g. Condition, Cause, Simultaneity, etc.) as a (partial) account of explicitation by means of sentence and discourse connectives. The methodology is text-based and involves collecting a database of pairs of sequences (English source text, Catalan translation), identifying the most plausible interpretation between free adjunct and matrix clause, and classifying them into instances that have undergone explicitation vs. non-explicitation. The data are analysed quantitatively (by finding the percentages of explicitation per relationship) as well as qualitatively (by analysing the kinds of semantic shifts that occur between source texts and translations).


2021 ◽  
Vol 12 (2) ◽  
pp. 1-37
Author(s):  
Lucie Polakova ◽  
Jiří Mírovský ◽  
Šárka Zikánová ◽  
Eva Hajičová

The present article investigates possibilities and limits of local (shallow) analysis of discourse coherence with respect to the phenomena of global coherence and higher composition of texts. We study corpora annotated with local discourse relations in Czech and partly in English to try and find clues in the local annotation indicating a higher discourse structure. First, we classify patterns of subsequent or overlapping pairs of local relations, and hierarchies formed by nested local relations. Special attention is then given to relations crossing paragraph boundaries and their semantic types, and to paragraph-initial discourse connectives. In the third part, we examine situations in which annotators incline to marking a large argument (larger than one sentence) of a discourse relation even with a minimality principle annotation rule in place. Our analyses bring (i) new linguistic insights regarding coherence signals in local and higher contexts, e.g. detection and description of hierarchies of local discourse relations up to 5 levels in Czech and English, description of distribution differences in semantic types in cross-paragraph and other settings, identification of Czech connectives only typical for higher structures, or the detection of prevalence of large left-sided arguments in locally annotated data; (ii) as another type of contribution, some new reflections on methodologies of the approaches under scrutiny.


2021 ◽  
Vol 12 ◽  
Author(s):  
Ludivine Crible ◽  
Mathis Wetzel ◽  
Sandrine Zufferey

Discourse connectives are lexical items like “but” and “so” that are well-known to influence the online processing of the discourse relations they convey. Yet, discourse relations like causality or contrast can also be signaled by other means than connectives, such as syntactic structures. So far, the influence of these alternative signals for discourse processing has been comparatively under-researched. In particular, their processing in a second language remains entirely unexplored. In a series of three self-paced reading experiments, we compare the reading patterns of contrastive relations by native French-speakers and non-native speakers of French with English as a first language. We focus on the effect of syntactic parallelism and how it interacts with different types of connectives. We test whether native and non-native readers equally recruit parallelism to process contrast in combination with or without a connective (Experiment 1), with a frequent vs. infrequent connective (Experiment 2) and with an ambiguous vs. unambiguous connective (Experiment 3), thus varying the explicitness and ease of retrieval of the contrast relation. Our results indicate that parallelism plays an important role for both groups of readers, but that it is a more prominent cue for non-native speakers, while its effect is modulated by task difficulty for native participants.


2017 ◽  
Vol 109 (1) ◽  
pp. 61-91 ◽  
Author(s):  
Jiří Mírovský ◽  
Pavlína Synková ◽  
Magdaléna Rysová ◽  
Lucie Poláková

Abstract CzeDLex is a new electronic lexicon of Czech discourse connectives, planned for publication by the end of this year. Its data format and structure are based on a study of similar existing resources, and adjusted to comply with the Czech syntactic tradition and specifics and with the Prague approach to the annotation of semantic discourse relations in text. In the article, we first put the lexicon in context of related resources and discuss theoretical aspects of building the lexicon – we present arguments for our choice of the data structure and for selecting features of the lexicon entries, while special attention is paid to a consistent and (as far as possible) uniform encoding of both primary (such as in English because, therefore) and secondary connectives (e.g. for this reason, this is the reason why). The main principle adopted for nesting entries in the lexicon is – apart from the lexical form of the connective – a discoursesemantic type (sense) expressed by the given connective, which enables us to deal with a broad formal variability of connectives and is convenient for interlinking CzeDLex with lexicons in other languages. Second, we introduce the chosen technical solution based on the Prague Markup Language, which allows for an efficient incorporation of the lexicon into the family of Prague treebanks – it can be directly opened and edited in the tree editor TrEd, processed from the command line in btred, interlinked with its source corpus and queried in the PML Tree Query engine. Third, we describe the process of getting data for the lexicon by exploiting a large corpus manually annotated with discourse relations – the Prague Discourse Treebank 2.0: we elaborate on the automatic extraction part, post-extraction checks and manual addition of supplementary linguistic information.


2018 ◽  
Vol 9 (1) ◽  
pp. 26-51 ◽  
Author(s):  
Augustin Speyer ◽  
Anita Fetzer

Abstract This paper compares the linguistic realization of coordinating and subordinating discourse relations in English and German short personal narratives, paying particular attention to the context-dependence of (1) their overt marking with discourse connectives, and (2) their adjacent and non-adjacent positioning. The analysis is based on 20 written texts collected from university students. The use of discourse connectives with adjacently and non-adjacently positioned discourse relations is more frequent in the English data. Considering the sentence as the unit of investigation, the coordinating relations of Contrast and Result and the subordinating relation of Explanation are marked overtly throughout the English data, while coordinating Narration and Background, and subordinating Elaboration and Comment relations are marked overtly less frequently. The picture is roughly similar with clauses as units of investigation. In the German data, the use of discourse connectives is also more frequent irrespective of adjacently or non-adjacently positioned discourse relations.


2014 ◽  
Vol 40 (4) ◽  
pp. 921-950 ◽  
Author(s):  
Rashmi Prasad ◽  
Bonnie Webber ◽  
Aravind Joshi

The Penn Discourse Treebank (PDTB) was released to the public in 2008. It remains the largest manually annotated corpus of discourse relations to date. Its focus on discourse relations that are either lexically-grounded in explicit discourse connectives or associated with sentential adjacency has not only facilitated its use in language technology and psycholinguistics but also has spawned the annotation of comparable corpora in other languages and genres. Given this situation, this paper has four aims: (1) to provide a comprehensive introduction to the PDTB for those who are unfamiliar with it; (2) to correct some wrong (or perhaps inadvertent) assumptions about the PDTB and its annotation that may have weakened previous results or the performance of decision procedures induced from the data; (3) to explain variations seen in the annotation of comparable resources in other languages and genres, which should allow developers of future comparable resources to recognize whether the variations are relevant to them; and (4) to enumerate and explain relationships between PDTB annotation and complementary annotation of other linguistic phenomena. The paper draws on work done by ourselves and others since the corpus was released.


Author(s):  
S. Toldova ◽  
◽  
T. Davydova ◽  
M. Kobozeva ◽  
D. Pisarevskaya ◽  
...  

The paper presents a corpus study of the discourse features in the corpus of blogs. It is based on the data of Ru-RSTreebank annotated within the framework of the Rhetorical Structure theory [Mann, Thompson 1988]. The Ru-RSTreebank represents genres of news and popular science, scientific papers, and blogs texts. Blog subcorpus contains such topics as travelling, cosmetics, sports and health, psychology, IT and tech and some others. Blogs texts constitute a specific genre as they combine properties of written and spoken discourse. The purpose of the paper is to investigate discourse features of blogs in comparison with other genres. We analyze the variation in rhetoric relations distribution among genres, and single out the differences in discourse connectives usage. Furthermore, we check the distribution of other discourse features reported in different studies for spoken discourse and for social media in the Ru-RSTreebank blogs subcorpus. The general frequency analysis and the experiments on RandomForest classifier application to genre recognition have shown that the most important rhetoric relations specific to blogs are Evaluation and Contrast, that there is a tendency to use shorter discourse units and not to express the discourse relations overtly via subordinative conjunctions.


Sign in / Sign up

Export Citation Format

Share Document