Constructing the Corpus of Chinese Textual ‘Run-on’ Sentences (CCTRS)
Chinese is a discourse-oriented language. “Run-on” sentences (liushui ju) are a typical and prevalent form of discourse in Chinese. These sentences show the capacity of the Chinese language for organizing loose structures into an effective and coherent discourse. Despite their widespread use in Chinese, previous studies have only explored “run-on” sentences by using small-scale examples. In order to carry out a quantitative investigation of “run-on” sentences, we need to establish a corpus. The present study selects 500 “run-on” sentences and annotates them on the levels of discourse, syntax and semantics. We mainly adopt PDTB (Penn Discourse Treebank) styles in the discourse annotations but we also borrow some features from RST (rhetorical structure theory). We find that the distribution of the frequency of discourse relations in the data extracted from this corpus follows the power law. The preliminary results reveal that semantic leaps in “run-on” sentences are closely related to the use of the topic chain and the animacy and the span of discourse relations. This corpus can thus aid in carrying out further computational and cognitive studies of Chinese discourse.