scholarly journals Constructing the Corpus of Chinese Textual ‘Run-on’ Sentences (CCTRS)

2021 ◽  
Author(s):  
Kun Sun

Chinese is a discourse-oriented language. “Run-on” sentences (liushui ju) are a typical and prevalent form of discourse in Chinese. These sentences show the capacity of the Chinese language for organizing loose structures into an effective and coherent discourse. Despite their widespread use in Chinese, previous studies have only explored “run-on” sentences by using small-scale examples. In order to carry out a quantitative investigation of “run-on” sentences, we need to establish a corpus. The present study selects 500 “run-on” sentences and annotates them on the levels of discourse, syntax and semantics. We mainly adopt PDTB (Penn Discourse Treebank) styles in the discourse annotations but we also borrow some features from RST (rhetorical structure theory). We find that the distribution of the frequency of discourse relations in the data extracted from this corpus follows the power law. The preliminary results reveal that semantic leaps in “run-on” sentences are closely related to the use of the topic chain and the animacy and the span of discourse relations. This corpus can thus aid in carrying out further computational and cognitive studies of Chinese discourse.

2007 ◽  
Vol 01 (03) ◽  
pp. 319-334 ◽  
Author(s):  
HELMUT PRENDINGER ◽  
PAUL PIWEK ◽  
MITSURU ISHIZUKA

In this article, we propose a novel method for generating engaging multi-modal content automatically from text. Rhetorical Structure Theory (RST) is used to decompose text into discourse units and to identify rhetorical discourse relations between them. Rhetorical relations are then mapped to question–answer pairs in an information preserving way, i.e., the original text and the resulting dialogue convey essentially the same meaning. Finally, the dialogue is "acted out" by two virtual agents. The network of dialogue structures automatically built up during this process, called DialogueNet, can be reused for other purposes, such as personalization or question–answering.


Corpora ◽  
2016 ◽  
Vol 11 (2) ◽  
pp. 169-190 ◽  
Author(s):  
Radoslava Trnavac ◽  
Debopam Das ◽  
Maite Taboada

In this paper, we examine the role of discourse relations (relations between propositions) in the interpretation of evaluative or opinion words. Through a combination of Rhetorical Structure Theory (or RST; Mann and Thompson, 1988 ) and Appraisal Theory ( Martin and White, 2005 ), we analyse how different discourse relations modify the evaluative content of opinion words, and what impact the nucleus–satellite structure in RST has on the evaluation. We conduct a corpus study, examining and annotating over 3,000 evaluative words in fifty movie reviews in the SFU Review Corpus ( Taboada, 2008 ) with respect to five parameters: word category (noun, verb, adjective or adverb), prior polarity (positive, negative or neutral), RST structure (both nucleus–satellite status and relation type) and change of polarity as a result of being part of a discourse relation (Intensify, Downtone, Reversal or No Change). Results show that relations such as Concession, Elaboration, Evaluation, Evidence and Restatement most frequently intensify the polarity of opinion words, although the majority of evaluative words do not undergo changes in their polarity related to the type of relation that they are a part of. We also find that most opinion words (about 70 percent) are positioned in the nucleus, confirming a hypothesis based on the literature that nuclei are the most important units when extracting opinion automatically.


Linguistica ◽  
2012 ◽  
Vol 52 (1) ◽  
pp. 323-336 ◽  
Author(s):  
Juliano Desiderato Antonio ◽  
Fernanda Trombini Rahmen Cassim

According to Rhetorical Structure Theory, implicit propositions emerge from the combination of pieces of text which hang together. Implicit propositions have received various labels as coherence relations, discourse relations, rhetorical relations or relational propositions. When two portions of a text hold a relation, the addressee of the text may recognize the connection even without the presence of a formal sign as a conjunction or a discourse marker. In this paper we claim that some intrinsic spoken discourse phenomena like paraphrasing, repetition, correction and parenthetical insertion hold coherence relations with other portions of discourse and, thus, may be considered strategies for the construction of coherence. The analysis, based on academic spoken discourse (five university lectures in Brazilian Portuguese), shows that these phenomena are recurring and relevant for the study of spoken discourse.


2019 ◽  
Vol 10 (3) ◽  
pp. 44-60
Author(s):  
Nuttapong Sanglerdsinlapachai ◽  
Anon Plangprasopchok ◽  
Tu Bao Ho ◽  
Ekawit Nantajeewarawat

The segments of a document that are relevant to a given aspect can be identified by using discourse relations of the rhetorical structure theory (RST). Different segments may contribute to the overall sentiment differently, and the sentiment of one segment may affect the contribution of another segment. This work exploits the RST structures of relevant segments to infer the sentiment of a given aspect. An input document is first parsed into an RST tree. For each aspect, relevant segments with their relations in the resulting tree are localized and transformed into a set of features. A set of classification rules is subsequently induced and evaluated on data. The proposed framework performs well in several experimental settings, with the accuracy values ranging from 74.0% to 77.1% being achieved. With proper strategies for removing conflicting rules and tuning the confidence threshold, f-measure values for the negative polarity class can be improved.


Author(s):  
Asem Ayed Al-Khawaldeh

The study aims at examining the functions of the discourse marker Kama in the Arabic journalistic discourse in the light of Rhetorical Structure Theory (RST) proposed by Mann and Thompson (1987). To this end, the study compiled a small-scale corpus of journalistic discourse taken from two prominent Arabic news websites:  Aljazeera.net and Alarabia.net. The corpus covers three distinct sub-genres of journalistic discourse: opinion articles, news reports, and sport reports. The journalistic discourse is chosen on the basis that it is considered as the best representative of the contemporary written Arabic and it receives a wide readership in the Arabic-speaking countries. The motivation for the study is that although it is frequently used in the written form of Arabic (particularly in the language of Arabic media), the discourse marker kama is largely neglected and very few has been said about it in the present literature on Arabic discourse markers. The current findings show that kama is found to achieve 290 occurrences in the corpus under investigation. This obviously indicates that kama is commonly used in the language of Arabic journalistic discourse, which calls for paying attention to its usage in such a type of discourse. In the light of Rhetorical Structure Theory (RST) proposed by Mann and Thompson (1987), kama was found to serve four common functions: elaboration (around 50 %), similarity (around 19 %), evidence (16 %), and exemplification (13 %). Two functions of kama (similarity and   exemplification) are listed in RST while the other two are incorporated.


2021 ◽  
Vol 10 (3) ◽  
pp. 157
Author(s):  
Paul-Mark DiFrancesco ◽  
David A. Bonneau ◽  
D. Jean Hutchinson

Key to the quantification of rockfall hazard is an understanding of its magnitude-frequency behaviour. Remote sensing has allowed for the accurate observation of rockfall activity, with methods being developed for digitally assembling the monitored occurrences into a rockfall database. A prevalent challenge is the quantification of rockfall volume, whilst fully considering the 3D information stored in each of the extracted rockfall point clouds. Surface reconstruction is utilized to construct a 3D digital surface representation, allowing for an estimation of the volume of space that a point cloud occupies. Given various point cloud imperfections, it is difficult for methods to generate digital surface representations of rockfall with detailed geometry and correct topology. In this study, we tested four different computational geometry-based surface reconstruction methods on a database comprised of 3668 rockfalls. The database was derived from a 5-year LiDAR monitoring campaign of an active rock slope in interior British Columbia, Canada. Each method resulted in a different magnitude-frequency distribution of rockfall. The implications of 3D volume estimation were demonstrated utilizing surface mesh visualization, cumulative magnitude-frequency plots, power-law fitting, and projected annual frequencies of rockfall occurrence. The 3D volume estimation methods caused a notable shift in the magnitude-frequency relations, while the power-law scaling parameters remained relatively similar. We determined that the optimal 3D volume calculation approach is a hybrid methodology comprised of the Power Crust reconstruction and the Alpha Solid reconstruction. The Alpha Solid approach is to be used on small-scale point clouds, characterized with high curvatures relative to their sampling density, which challenge the Power Crust sampling assumptions.


Author(s):  
Andrew Potter

Abstract Rhetorical structure theory (RST) and relational propositions have been shown useful in analyzing texts as expressions in propositional logic. Because these expressions are systematically derived, they may be expected to model discursive reasoning as articulated in the text. If this is the case, it would follow that logical operations performed on the expressions would be reflected in the texts. In this paper the logic of relational propositions is used to demonstrate the applicability of transitive inference to discourse. Starting with a selection of RST analyses from the research literature, analyses of the logic of relational propositions are performed to identify their corresponding logical expressions and within each expression to identify the inference path implicit within the text. By eliminating intermediary relational propositions, transitivity is then used to progressively compress the expression. The resulting compressions are applied to the corresponding texts and their compressed RST analyses. The application of transitive inference to logical expressions results in abridged texts that are intuitively coherent and logically compatible with their originals. This indicates an underlying isomorphism between the inferential structure of logical expressions and discursive coherence, and it confirms that these expressions function as logical models of the text. Potential areas for application include knowledge representation, logic and argumentation, and RST validation.


Sign in / Sign up

Export Citation Format

Share Document