scholarly journals TIARA 2.0: an interactive tool for annotating discourse structure and text improvement

Author(s):  
Jan Wira Gotama Putra ◽  
Kana Matsumura ◽  
Simone Teufel ◽  
Takenobu Tokunaga

AbstractDiscourse structure annotation aims at analysing how discourse units (e.g. sentences or clauses) relate to each other and what roles they play in the overall discourse. Several annotation tools for discourse structure have been developed. However, they often only support specific annotation schemes, making their usage limited to new schemes. This article presents TIARA 2.0, an annotation tool for discourse structure and text improvement. Departing from our specific needs, we extend an existing tool to accommodate four levels of annotation: discourse structure, argumentative structure, sentence rearrangement and content alteration. The latter two are particularly unique compared to existing tools. TIARA is implemented on standard web technologies and can be easily customised. It deals with the visual complexity during the annotation process by systematically simplifying the layout and by offering interactive visualisation, including clutter-reducing features and dual-view display. TIARA’s text-view allows annotators to focus on the analysis of logical sequencing between sentences. The tree-view allows them to review their analysis in terms of the overall discourse structure. Apart from being an annotation tool, it is also designed to be useful for educational purposes in the teaching of argumentation; this gives it an edge over other existing tools.

2021 ◽  
pp. 1-27
Author(s):  
Jan Wira Gotama Putra ◽  
Simone Teufel ◽  
Takenobu Tokunaga

Abstract Argument mining (AM) aims to explain how individual argumentative discourse units (e.g. sentences or clauses) relate to each other and what roles they play in the overall argumentation. The automatic recognition of argumentative structure is attractive as it benefits various downstream tasks, such as text assessment, text generation, text improvement, and summarization. Existing studies focused on analyzing well-written texts provided by proficient authors. However, most English speakers in the world are non-native, and their texts are often poorly structured, particularly if they are still in the learning phase. Yet, there is no specific prior study on argumentative structure in non-native texts. In this article, we present the first corpus containing argumentative structure annotation for English-as-a-foreign-language (EFL) essays, together with a specially designed annotation scheme. The annotated corpus resulting from this work is called “ICNALE-AS” and contains 434 essays written by EFL learners from various Asian countries. The corpus presented here is particularly useful for the education domain. On the basis of the analysis of argumentation-related problems in EFL essays, educators can formulate ways to improve them so that they more closely resemble native-level productions. Our argument annotation scheme is demonstrably stable, achieving good inter-annotator agreement and near-perfect intra-annotator agreement. We also propose a set of novel document-level agreement metrics that are able to quantify structural agreement from various argumentation aspects, thus providing a more holistic analysis of the quality of the argumentative structure annotation. The metrics are evaluated in a crowd-sourced meta-evaluation experiment, achieving moderate to good correlation with human judgments.


2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Kun Sun ◽  
Rong Wang ◽  
Wenxin Xiong

Abstract The notion of genre has been widely explored using quantitative methods from both lexical and syntactical perspectives. However, discourse structure has rarely been used to examine genre. Mostly concerned with the interrelation of discourse units, discourse structure can play a crucial role in genre analysis. Nevertheless, few quantitative studies have explored genre distinctions from a discourse structure perspective. Here, we use two English discourse corpora (RST-DT and GUM) to investigate discourse structure from a novel viewpoint. The RST-DT is divided into four small subcorpora distinguished according to genre, and another corpus (GUM) containing seven genres are used for cross-verification. An RST (rhetorical structure theory) tree is converted into dependency representations by taking information from RST annotations to calculate the discourse distance through a process similar to that used to calculate syntactic dependency distance. Moreover, the data on dependency representations deriving from the two corpora are readily convertible into network data. Afterwards, we examine different genres in the two corpora by combining discourse distance and discourse network. The two methods are mutually complementary in comprehensively revealing the distinctiveness of various genres. Accordingly, we propose an effective quantitative method for assessing genre differences using discourse distance and discourse network. This quantitative study can help us better understand the nature of genre.


2021 ◽  
Vol 38 ◽  
pp. 21-39
Author(s):  
Sofia Bimpikou ◽  
Emar Maier ◽  
Petra Hendriks

Abstract We investigate the discourse structure of Free Indirect Discourse passages in narratives. We argue that Free Indirect Discourse reports consist of two separate propositional discourse units: an (explicit or implicit) frame segment and a reported content. These segments are connected at the level of discourse structure by a non-veridical, subordinating discourse relation of Attribution, familiar from recent SDRT analyses of indirect discourse constructions in natural conversation (Hunter, 2016). We conducted an experiment to detect the covert presence of a subordinating frame segment based on its effects on pronoun resolution. We compared (unframed) Free Indirect Discourse with overtly framed Indirect Discourse and a non-reportative segment. We found that the first two indeed pattern alike in terms of pronoun resolution, which we take as evidence against the pragmatic context split approach of Schlenker (2004) and Eckardt (2014), and in favor of our discourse structural Attribution analysis.


2019 ◽  
Vol 21 (6) ◽  
pp. 690-712 ◽  
Author(s):  
Kun Sun ◽  
Wenxin Xiong

In past studies, the few quantitative approaches to discourse structure were mostly confined to the presentation of the frequency of discourse relations. However, quantitative approaches should take into account both hierarchical and relational layers in the discourse structure. This study considers these factors and addresses the issue of how discourse relations and discourse units are related. It draws upon the available corpora of discourse structure (rhetorical structure theory-discourse treebank (RST-DT)) from a new perspective. Since an RST tree can be converted into a syntactic dependency tree, the data extracted from the RST-DT can be useful for calculating the discourse distance in much the same way as syntactic dependency distance is calculated. Discourse distance is also applicable to measuring the depth of the human processing of discourse. Furthermore, the data derived from the RST-DT are also easily converted into network data. This study finds that discourse structure has its discourse distance minimum and each type of RST relations has its range of discourse distance. The frequency distribution of discourse data basically follows the power law on several levels, while a network approach reveals how discourse units are arranged spatially in regular patterns. The two methods are mutually complementary in revealing the interaction between discourse relations and discourse units in a comprehensive manner, as well as in revealing how people process and comprehend discourse dynamically. Accordingly, we propose merging the two methods so as to yield a computational model for assessing discourse complexity and comprehension.


1995 ◽  
Vol 31 (1) ◽  
pp. 109-147 ◽  
Author(s):  
Jan Van Kuppevelt

In this paper we present an alternative approach to discourse structure according to which topicality is the general organizing principle in discourse. This approach accounts for the fact that the segmentation structure of discourse is in correspondence with the hierarchy of topics defined for the discourse units. Fundamental to the proposed analysis is the relation it assumes between the notion of topic and that of explicit and implicit questioning in discourse. This relation implies that (1) the topic associated with a discourse unit is provided by the explicit or implicit question it answers and (2) the relation between discourse units is determined by the relation between these topic-providing questions.


1996 ◽  
Vol 32 (2) ◽  
pp. 403-438 ◽  
Author(s):  
Christoph Unger

The main aim of this paper is to discuss the claim that discourse connectives are best treated as indicators of coherence relations between hierarchically organized discourse units. It will be argued that coherence relations cannot be seen as cognitively real entities. Furthermore, there is no evidence for hierarchical organization in discourse. The intuitions underlying the notion of hierarchical discourse structure are instead explained in terms of consequences of processing a text in the search for optimal relevance. This account draws attention to a hitherto not widely discussed set of data.


2016 ◽  
Vol 7 (1) ◽  
pp. 1-49 ◽  
Author(s):  
Farah Benamara ◽  
Nicholas Asher ◽  
Yvette Yannick Mathieu ◽  
Vladimir Popescu ◽  
Baptiste Chardon

This paper describes the CASOAR corpus, the first manually annotated corpus that explores the impact of discourse structure on sentiment analysis with a study of movie reviews in French and in English as well as letters to the editor in French. While annotating opinions at the expression, the sentence or the document level is a well-established task and relatively straightforward, discourse annotation remains difficult, especially for non-experts. Therefore, combining both annotations poses several methodological problems that we address here. We propose a multi-layered annotation scheme that includes: the complete discourse structure according to the Segmented Discourse Representation Theory, the opinion orientation of elementary discourse units and opinion expressions, and their associated features. We detail each layer, explore the interactions between them and discuss our results. In particular, we examine the correlation between discourse and semantic category of opinion expressions, the impact of discourse relations on both subjectivity and polarity analysis and the impact of discourse on the determination of the overall opinion of a document. Our results demonstrate that discourse is an important cue for sentiment analysis, at least for the corpus genres we have studied.


1992 ◽  
Vol 15 (2) ◽  
pp. 120-136 ◽  
Author(s):  
Helen Tebble

It has been estimated by those who work in the computing industry that sixty per cent of their time is taken up in communication and only forty per cent is spent on technical work. There is then a clear need to develop the communicative abilities of those in the computer industry. Well designed communication courses for people in computing would benefit from linguistic descriptions of the discourses of this industry. A linguistic description of the structure and genre of the systems analyst’s interview should provide the basis for some of these courses. This paper discusses the genre of the two major types of interviews used by systems analysts and identifies the genre element as the unit of discourse structure that links the lower level and higher level units of discourse structure within systemic linguistics. It draws upon data collected from the depth phase of a national systems analysis project. It is argued that for a full linguistic description of the structure of lengthy speech events within a systemic linguistics framework it is necessary to take both a top down (generic) and bottom up (discourse units) approach.


2018 ◽  
Vol 23 (23) ◽  
pp. 33 ◽  
Author(s):  
Francis Cornish

By “anadeixis” (a termed first coined by Ehlich, 1982) is meant, prototypically, the indexical functioning of certain context-bound expressions to target discourse entities which are either not yet topical, or whose erstwhile topical status has faded. It is the discourse-structuring function of anadeictic indexicals that will be the particular focus of this study. The basis for the discussion will be two short whole texts, in two languages (French and English). This will make it possible to show how certain ‘strict’-anadeictic and discourse-deictic references may signal the macro- (content structures) and super-structures (discourse-functional structures) that characterize them. Such references may serve either to foreshadow a transition between major discourse units within a given text, or to actually introduce one.


2021 ◽  
Author(s):  
Kun Sun

The notion of genre has been widely explored using quantitative methods from both lexical and syntactical perspectives. However, discourse structure has rarely been used to examine genre. Mostly concerned with the interrelation of discourse units, discourse structure can play a crucial role in genre analysis because genre is closely related to discourse (text). Nevertheless, few quantitative studies have explored genre distinctions from a discourse structure perspective. Here, we use two English discourse corpora (RST-DT and GUM) to investigate the hierarchical and relational dimensions of discourse structure from a novel viewpoint. The RST-DT is divided into four small subcorpora distinguished according to genre, and another corpus (GUM) containing seven genres is used for cross-verification. An RST (rhetorical structure theory) tree is converted into dependency representations by taking information from RST annotations to calculate the discourse distance through a process similar to that used to calculate syntactic dependency distance. Moreover, the data on dependency representations stemming from the two corpora is readily convertible into network data. Afterwards, we examine different genres in the two corpora by combining discourse distance and discourse network. The two methods are mutually complementary in comprehensively revealing the distinctiveness of various genres. Accordingly, we propose an effective quantitative method for assessing genre differences using discourse distance and discourse network. This quantitative study can help us better understand the nature of genre and develop effective strategies for genre-based writing.


Sign in / Sign up

Export Citation Format

Share Document