discourse units
Recently Published Documents


TOTAL DOCUMENTS

96
(FIVE YEARS 30)

H-INDEX

12
(FIVE YEARS 1)

2021 ◽  
Vol 25 (2) ◽  
pp. 478-506
Author(s):  
Salvador Pons Bordería ◽  
Elena Pascual Aliaga

As databases make Corpus Linguistics a common tool for most linguists, corpus annotation becomes an increasingly important process. Corpus users do not need only raw data, but also annotated data, submitted to tagging or parsing processes through annotation protocols. One problem with corpus annotation lies in its reliability, that is, in the probability that its results can be replicable by independent researchers. Inter-annotation agreement (IAA) is the process which evaluates the probability that, applying the same protocol, different annotators reach similar results. To measure agreement, different statistical metrics are used. This study applies IAA for the first time to the Valencia Espaol Coloquial (Val.Es.Co.) discourse segmentation model, designed for segmenting and labelling spoken language into discourse units. Whereas most IAA studies merely label a set of in advance pre-defined units, this study applies IAA to the Val.Es.Co. protocol, which involves a more complex two-fold process: first, the speech continuum needs to be divided into units; second, the units have to be labelled. Kripendorffs u -family statistical metrics (Krippendorff et al. 2016) allow measuring IAA in both segmentation and labelling tasks. Three expert annotators segmented a spontaneous conversation into subacts, the minimal discursive unit of the Val.Es.Co. model, and labelled the resulting units according to a set of 10 subact categories. Kripendorffs u coefficients were applied in several rounds to elucidate whether the inclusion of a bigger number of categories and their distinction had an impact on the agreement results. The conclusions show high levels of IAA, especially in the annotation of procedural subact categories, where results reach coefficients over 0.8. This study validates the Val.Es.Co. model as an optimal method to fully analyze a conversation into pragmatically-based discourse units.


Author(s):  
Jan Wira Gotama Putra ◽  
Kana Matsumura ◽  
Simone Teufel ◽  
Takenobu Tokunaga

AbstractDiscourse structure annotation aims at analysing how discourse units (e.g. sentences or clauses) relate to each other and what roles they play in the overall discourse. Several annotation tools for discourse structure have been developed. However, they often only support specific annotation schemes, making their usage limited to new schemes. This article presents TIARA 2.0, an annotation tool for discourse structure and text improvement. Departing from our specific needs, we extend an existing tool to accommodate four levels of annotation: discourse structure, argumentative structure, sentence rearrangement and content alteration. The latter two are particularly unique compared to existing tools. TIARA is implemented on standard web technologies and can be easily customised. It deals with the visual complexity during the annotation process by systematically simplifying the layout and by offering interactive visualisation, including clutter-reducing features and dual-view display. TIARA’s text-view allows annotators to focus on the analysis of logical sequencing between sentences. The tree-view allows them to review their analysis in terms of the overall discourse structure. Apart from being an annotation tool, it is also designed to be useful for educational purposes in the teaching of argumentation; this gives it an edge over other existing tools.


2021 ◽  
Vol 38 ◽  
pp. 21-39
Author(s):  
Sofia Bimpikou ◽  
Emar Maier ◽  
Petra Hendriks

Abstract We investigate the discourse structure of Free Indirect Discourse passages in narratives. We argue that Free Indirect Discourse reports consist of two separate propositional discourse units: an (explicit or implicit) frame segment and a reported content. These segments are connected at the level of discourse structure by a non-veridical, subordinating discourse relation of Attribution, familiar from recent SDRT analyses of indirect discourse constructions in natural conversation (Hunter, 2016). We conducted an experiment to detect the covert presence of a subordinating frame segment based on its effects on pronoun resolution. We compared (unframed) Free Indirect Discourse with overtly framed Indirect Discourse and a non-reportative segment. We found that the first two indeed pattern alike in terms of pronoun resolution, which we take as evidence against the pragmatic context split approach of Schlenker (2004) and Eckardt (2014), and in favor of our discourse structural Attribution analysis.


2021 ◽  
pp. 1-27
Author(s):  
Jan Wira Gotama Putra ◽  
Simone Teufel ◽  
Takenobu Tokunaga

Abstract Argument mining (AM) aims to explain how individual argumentative discourse units (e.g. sentences or clauses) relate to each other and what roles they play in the overall argumentation. The automatic recognition of argumentative structure is attractive as it benefits various downstream tasks, such as text assessment, text generation, text improvement, and summarization. Existing studies focused on analyzing well-written texts provided by proficient authors. However, most English speakers in the world are non-native, and their texts are often poorly structured, particularly if they are still in the learning phase. Yet, there is no specific prior study on argumentative structure in non-native texts. In this article, we present the first corpus containing argumentative structure annotation for English-as-a-foreign-language (EFL) essays, together with a specially designed annotation scheme. The annotated corpus resulting from this work is called “ICNALE-AS” and contains 434 essays written by EFL learners from various Asian countries. The corpus presented here is particularly useful for the education domain. On the basis of the analysis of argumentation-related problems in EFL essays, educators can formulate ways to improve them so that they more closely resemble native-level productions. Our argument annotation scheme is demonstrably stable, achieving good inter-annotator agreement and near-perfect intra-annotator agreement. We also propose a set of novel document-level agreement metrics that are able to quantify structural agreement from various argumentation aspects, thus providing a more holistic analysis of the quality of the argumentative structure annotation. The metrics are evaluated in a crowd-sourced meta-evaluation experiment, achieving moderate to good correlation with human judgments.


Author(s):  
Jesse Egbert ◽  
Stacey Wizner ◽  
Daniel Keller ◽  
Douglas Biber ◽  
Tony McEnery ◽  
...  

Abstract On the surface, it appears that conversational language is produced in a stream of spoken utterances. In reality conversation is composed of contiguous units that are characterized by coherent communicative purposes. A large number of important research questions about the nature of conversational discourse could be addressed if researchers could investigate linguistic variation across functional discourse units. To date, however, no corpus of conversational language has been annotated according to functional units, and there are no existing methods for carrying out this type of annotation. We introduce a new method for segmenting transcribed conversation files into discourse units and characterizing those units based on their communicative purposes. In this paper, the development and piloting of this method is described in detail and the final framework is presented. We conclude with a discussion of an ongoing project where we are applying this coding framework to the British National Corpus Spoken 2014.


Author(s):  
Elena A. Semukhina ◽  
Aleksandr A. Zaraiskiy

The article analyzes the functioning of precedent phenomena in French religion-focused publicistic discourse. The paper investigates the role of precedent phenomena in discursive context, identifies their most frequent types and establishes the specifics of their cultural markedness. The authors used discourse units of up-to-date religious publications on French Internet websites. The main methods of analysis are continuous sampling, description, historical-cultural and contextual analysis, as well as certain statistical methods. All types of precedent phenomena, such as allusive names, utterances and texts, as well as situations and events, are used in French religion-focused publicistic discourse. The most frequently used phenomena are precedent texts, due to the nature of Christian culture based on scriptures and constant reference to sacred texts. Another notably frequent phenomena are precedent events; this might be owing to the specific character of publicistic discourse, the main function of which is to inform about events. Precedent names, utterances and situations are less frequent. Precedent phenomena add meanings to discourse, thus making an utterance deeper and more expressive. They can help to convey an opinion or add sentiment to a phrase. Precedent phenomena in a religion-focused publicistic discourse can express opinions implicitly, through allusions, thus influencing the recipient’s perception of the whole statement. To sum up, precedent phenomena in this type of discourse serve the most important function of publicistic discourse – the function of impact. The vast majority of precedent phenomena used in French religion-focused publicistic discourse are not culturally marked. This proves that religious French people consider themselves to be a part of the global Catholic community.


2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Kun Sun ◽  
Rong Wang ◽  
Wenxin Xiong

Abstract The notion of genre has been widely explored using quantitative methods from both lexical and syntactical perspectives. However, discourse structure has rarely been used to examine genre. Mostly concerned with the interrelation of discourse units, discourse structure can play a crucial role in genre analysis. Nevertheless, few quantitative studies have explored genre distinctions from a discourse structure perspective. Here, we use two English discourse corpora (RST-DT and GUM) to investigate discourse structure from a novel viewpoint. The RST-DT is divided into four small subcorpora distinguished according to genre, and another corpus (GUM) containing seven genres are used for cross-verification. An RST (rhetorical structure theory) tree is converted into dependency representations by taking information from RST annotations to calculate the discourse distance through a process similar to that used to calculate syntactic dependency distance. Moreover, the data on dependency representations deriving from the two corpora are readily convertible into network data. Afterwards, we examine different genres in the two corpora by combining discourse distance and discourse network. The two methods are mutually complementary in comprehensively revealing the distinctiveness of various genres. Accordingly, we propose an effective quantitative method for assessing genre differences using discourse distance and discourse network. This quantitative study can help us better understand the nature of genre.


2021 ◽  
Author(s):  
Kun Sun

The notion of genre has been widely explored using quantitative methods from both lexical and syntactical perspectives. However, discourse structure has rarely been used to examine genre. Mostly concerned with the interrelation of discourse units, discourse structure can play a crucial role in genre analysis because genre is closely related to discourse (text). Nevertheless, few quantitative studies have explored genre distinctions from a discourse structure perspective. Here, we use two English discourse corpora (RST-DT and GUM) to investigate the hierarchical and relational dimensions of discourse structure from a novel viewpoint. The RST-DT is divided into four small subcorpora distinguished according to genre, and another corpus (GUM) containing seven genres is used for cross-verification. An RST (rhetorical structure theory) tree is converted into dependency representations by taking information from RST annotations to calculate the discourse distance through a process similar to that used to calculate syntactic dependency distance. Moreover, the data on dependency representations stemming from the two corpora is readily convertible into network data. Afterwards, we examine different genres in the two corpora by combining discourse distance and discourse network. The two methods are mutually complementary in comprehensively revealing the distinctiveness of various genres. Accordingly, we propose an effective quantitative method for assessing genre differences using discourse distance and discourse network. This quantitative study can help us better understand the nature of genre and develop effective strategies for genre-based writing.


Author(s):  
O.K. Iriskhanova ◽  
◽  
O.N. Prokofyeva ◽  

Despite numerous studies, the difference between objects and events remains one of the most debatable issues, and scholars look for arguments relying on ontology, epistemology, and language. The authors of the paper hypothesize that differences between objects and events construal can be observed not only in linguistic expressions referring to these entities, but in the gestures that accompany them. To verify the hypothesis, an empirical study was carried out, with 20 Russian participants spontaneously describing four paintings belonging to different artistic styles. The authors analyze co-occurrence of the units of speech (Elementary Discourse Units, or EDU) denoting either objects or events with gestures classified into mimetic modes and mimetic categories (Molding, Acting, Drawing, and Representing categories). The results show that there exists significant correlation between object-construal EDU and Molding gestures, on the one hand, and between event-construal EDU and Acting gestures, on the other hand. Besides, the study reveals that some speech-gesture patterns relate to such qualities of the paintings, as content, style, genre, and technique.


Sign in / Sign up

Export Citation Format

Share Document