A computational model for measuring discourse complexity

2019 ◽  
Vol 21 (6) ◽  
pp. 690-712 ◽  
Author(s):  
Kun Sun ◽  
Wenxin Xiong

In past studies, the few quantitative approaches to discourse structure were mostly confined to the presentation of the frequency of discourse relations. However, quantitative approaches should take into account both hierarchical and relational layers in the discourse structure. This study considers these factors and addresses the issue of how discourse relations and discourse units are related. It draws upon the available corpora of discourse structure (rhetorical structure theory-discourse treebank (RST-DT)) from a new perspective. Since an RST tree can be converted into a syntactic dependency tree, the data extracted from the RST-DT can be useful for calculating the discourse distance in much the same way as syntactic dependency distance is calculated. Discourse distance is also applicable to measuring the depth of the human processing of discourse. Furthermore, the data derived from the RST-DT are also easily converted into network data. This study finds that discourse structure has its discourse distance minimum and each type of RST relations has its range of discourse distance. The frequency distribution of discourse data basically follows the power law on several levels, while a network approach reveals how discourse units are arranged spatially in regular patterns. The two methods are mutually complementary in revealing the interaction between discourse relations and discourse units in a comprehensive manner, as well as in revealing how people process and comprehend discourse dynamically. Accordingly, we propose merging the two methods so as to yield a computational model for assessing discourse complexity and comprehension.

2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Kun Sun ◽  
Rong Wang ◽  
Wenxin Xiong

Abstract The notion of genre has been widely explored using quantitative methods from both lexical and syntactical perspectives. However, discourse structure has rarely been used to examine genre. Mostly concerned with the interrelation of discourse units, discourse structure can play a crucial role in genre analysis. Nevertheless, few quantitative studies have explored genre distinctions from a discourse structure perspective. Here, we use two English discourse corpora (RST-DT and GUM) to investigate discourse structure from a novel viewpoint. The RST-DT is divided into four small subcorpora distinguished according to genre, and another corpus (GUM) containing seven genres are used for cross-verification. An RST (rhetorical structure theory) tree is converted into dependency representations by taking information from RST annotations to calculate the discourse distance through a process similar to that used to calculate syntactic dependency distance. Moreover, the data on dependency representations deriving from the two corpora are readily convertible into network data. Afterwards, we examine different genres in the two corpora by combining discourse distance and discourse network. The two methods are mutually complementary in comprehensively revealing the distinctiveness of various genres. Accordingly, we propose an effective quantitative method for assessing genre differences using discourse distance and discourse network. This quantitative study can help us better understand the nature of genre.


2021 ◽  
Author(s):  
Kun Sun

The notion of genre has been widely explored using quantitative methods from both lexical and syntactical perspectives. However, discourse structure has rarely been used to examine genre. Mostly concerned with the interrelation of discourse units, discourse structure can play a crucial role in genre analysis because genre is closely related to discourse (text). Nevertheless, few quantitative studies have explored genre distinctions from a discourse structure perspective. Here, we use two English discourse corpora (RST-DT and GUM) to investigate the hierarchical and relational dimensions of discourse structure from a novel viewpoint. The RST-DT is divided into four small subcorpora distinguished according to genre, and another corpus (GUM) containing seven genres is used for cross-verification. An RST (rhetorical structure theory) tree is converted into dependency representations by taking information from RST annotations to calculate the discourse distance through a process similar to that used to calculate syntactic dependency distance. Moreover, the data on dependency representations stemming from the two corpora is readily convertible into network data. Afterwards, we examine different genres in the two corpora by combining discourse distance and discourse network. The two methods are mutually complementary in comprehensively revealing the distinctiveness of various genres. Accordingly, we propose an effective quantitative method for assessing genre differences using discourse distance and discourse network. This quantitative study can help us better understand the nature of genre and develop effective strategies for genre-based writing.


2007 ◽  
Vol 01 (03) ◽  
pp. 319-334 ◽  
Author(s):  
HELMUT PRENDINGER ◽  
PAUL PIWEK ◽  
MITSURU ISHIZUKA

In this article, we propose a novel method for generating engaging multi-modal content automatically from text. Rhetorical Structure Theory (RST) is used to decompose text into discourse units and to identify rhetorical discourse relations between them. Rhetorical relations are then mapped to question–answer pairs in an information preserving way, i.e., the original text and the resulting dialogue convey essentially the same meaning. Finally, the dialogue is "acted out" by two virtual agents. The network of dialogue structures automatically built up during this process, called DialogueNet, can be reused for other purposes, such as personalization or question–answering.


2016 ◽  
Vol 7 (1) ◽  
pp. 1-49 ◽  
Author(s):  
Farah Benamara ◽  
Nicholas Asher ◽  
Yvette Yannick Mathieu ◽  
Vladimir Popescu ◽  
Baptiste Chardon

This paper describes the CASOAR corpus, the first manually annotated corpus that explores the impact of discourse structure on sentiment analysis with a study of movie reviews in French and in English as well as letters to the editor in French. While annotating opinions at the expression, the sentence or the document level is a well-established task and relatively straightforward, discourse annotation remains difficult, especially for non-experts. Therefore, combining both annotations poses several methodological problems that we address here. We propose a multi-layered annotation scheme that includes: the complete discourse structure according to the Segmented Discourse Representation Theory, the opinion orientation of elementary discourse units and opinion expressions, and their associated features. We detail each layer, explore the interactions between them and discuss our results. In particular, we examine the correlation between discourse and semantic category of opinion expressions, the impact of discourse relations on both subjectivity and polarity analysis and the impact of discourse on the determination of the overall opinion of a document. Our results demonstrate that discourse is an important cue for sentiment analysis, at least for the corpus genres we have studied.


2018 ◽  
Vol 232 ◽  
pp. 02020
Author(s):  
Guimin Huang ◽  
Min Tan ◽  
Zhenglin Sun ◽  
Ya Zhou

Against the problems which can’t be solved by the word-level based local coherence analysis model, we propose a new discourse coherence quality analysis model (abbreviated RST-DCQA) by analyzing the full hierarchical discourse structure of English essays. Under the framework of rhetorical structure theory (RST), firstly, we design an RST-style discourse relations parser to capture the deep hierarchical discourse structure of essays; secondly, we transform the discourse relation information into a discourse relation matrix; finally, we design an algorithm to analyze the discourse coherence quality of student’s English essays. The experimental results show that the average error of our model’s score and teacher’s score is only 2.63, and the Pearson correlation coefficient is 0.71. Compared with the other models, our RST-DCQA model has a higher accuracy and better practicality in the field of students’ essays assessment.


2001 ◽  
Vol 21 ◽  
pp. 59-66
Author(s):  
Michael Grabski

Elaboration or Narration, as so-called discourse relations (or rhetorical relations), are modelled in Segmented Discourse Structure Theory (SDRT) as relations between discourse constituents (or constituents for short). These are either propositions that come into being by interpretation of sentences occurring in a text; the propositions then have the status of DRSes. Or, constituents are compounds of such DRSes, constructed from DRSes (or compounds of them) by discourse relations. Elaboration and Narration in that sense, rather than referring to text types, provide links between constituents that allow them to combine in ways that, for a recipient, a resulting text is coherent and has (some) elaborative or narrative properties.  


Author(s):  
S. Toldova ◽  
◽  
T. Davydova ◽  
M. Kobozeva ◽  
D. Pisarevskaya ◽  
...  

The paper presents a corpus study of the discourse features in the corpus of blogs. It is based on the data of Ru-RSTreebank annotated within the framework of the Rhetorical Structure theory [Mann, Thompson 1988]. The Ru-RSTreebank represents genres of news and popular science, scientific papers, and blogs texts. Blog subcorpus contains such topics as travelling, cosmetics, sports and health, psychology, IT and tech and some others. Blogs texts constitute a specific genre as they combine properties of written and spoken discourse. The purpose of the paper is to investigate discourse features of blogs in comparison with other genres. We analyze the variation in rhetoric relations distribution among genres, and single out the differences in discourse connectives usage. Furthermore, we check the distribution of other discourse features reported in different studies for spoken discourse and for social media in the Ru-RSTreebank blogs subcorpus. The general frequency analysis and the experiments on RandomForest classifier application to genre recognition have shown that the most important rhetoric relations specific to blogs are Evaluation and Contrast, that there is a tendency to use shorter discourse units and not to express the discourse relations overtly via subordinative conjunctions.


2014 ◽  
Vol 31 ◽  
pp. 13-25
Author(s):  
Enrico Boone

This paper is concerned with the correct characterization of the licensing condition on clausal ellipsis and how it relates to the distribution of ellipsis. I argue, essentially following López (2000), that ellipsis is licensed when the ellipsis clause bears a relation to an antecedent in the discourse component. A relation between two discourse units can be established in two ways: (1) Either there holds a direct relation between the two discourse units or (2) there holds an anaphoric relation mediated by a discourse anaphor. In this paper, I show how this two-way distinction in setting up discourse relations accounts for the two-way split we find in the distribution of ellipsis.


Author(s):  
Jan Wira Gotama Putra ◽  
Kana Matsumura ◽  
Simone Teufel ◽  
Takenobu Tokunaga

AbstractDiscourse structure annotation aims at analysing how discourse units (e.g. sentences or clauses) relate to each other and what roles they play in the overall discourse. Several annotation tools for discourse structure have been developed. However, they often only support specific annotation schemes, making their usage limited to new schemes. This article presents TIARA 2.0, an annotation tool for discourse structure and text improvement. Departing from our specific needs, we extend an existing tool to accommodate four levels of annotation: discourse structure, argumentative structure, sentence rearrangement and content alteration. The latter two are particularly unique compared to existing tools. TIARA is implemented on standard web technologies and can be easily customised. It deals with the visual complexity during the annotation process by systematically simplifying the layout and by offering interactive visualisation, including clutter-reducing features and dual-view display. TIARA’s text-view allows annotators to focus on the analysis of logical sequencing between sentences. The tree-view allows them to review their analysis in terms of the overall discourse structure. Apart from being an annotation tool, it is also designed to be useful for educational purposes in the teaching of argumentation; this gives it an edge over other existing tools.


2021 ◽  
Vol 38 ◽  
pp. 21-39
Author(s):  
Sofia Bimpikou ◽  
Emar Maier ◽  
Petra Hendriks

Abstract We investigate the discourse structure of Free Indirect Discourse passages in narratives. We argue that Free Indirect Discourse reports consist of two separate propositional discourse units: an (explicit or implicit) frame segment and a reported content. These segments are connected at the level of discourse structure by a non-veridical, subordinating discourse relation of Attribution, familiar from recent SDRT analyses of indirect discourse constructions in natural conversation (Hunter, 2016). We conducted an experiment to detect the covert presence of a subordinating frame segment based on its effects on pronoun resolution. We compared (unframed) Free Indirect Discourse with overtly framed Indirect Discourse and a non-reportative segment. We found that the first two indeed pattern alike in terms of pronoun resolution, which we take as evidence against the pragmatic context split approach of Schlenker (2004) and Eckardt (2014), and in favor of our discourse structural Attribution analysis.


Sign in / Sign up

Export Citation Format

Share Document