Classification of Sentences by Word-length Distribution of Verbs with Proportions of Japanese Words and those of Compound Words

AbstractThis paper proposes a robust text classification and correspondence analysis approach to identification of similar languages. In particular, we propose to use the readily available information of clauses and word length distribution to model similar languages. The modeling and classification are based on the hypothesis that languages are self-adaptive complex systems and hence can be classified by dynamic features describing the system, especially in terms of distributional relations of constituents of a system. For similar languages whose grammatical differences are often subtle, classification based on dynamic system features should be more effective. To test this hypothesis, we considered both regional and genre varieties of Mandarin Chinese for classification. The data are extracted from two comparable balanced corpora to minimize possible confounding factors. The two corpora are the Sinica Corpus from Taiwan and the Lancaster Corpus of Mandarin Chinese from Mainland China, and the two genres are reportage and review. Our text classification and correspondence analysis results show that the linguistically felicitous two-level constituency model combining power functions between word and clauses effectively classifies the two varieties of Chinese for both genres. In addition, we found that genres do have compounding effect on classification of regional varieties. In particular, reportage in two varieties is more likely to be classified than review, corroborating the complex system view of language variations. That is, language variations and changes typically do not take place evenly across the board for the complete language system. This further enhances our hypothesis that dynamic complex system features, such as the power functions captured by the Menzerath–Altmann law, provide effective models in classifications of similar languages.

Download Full-text

From relational schemas to subject-specific semantic relations

Annual Review of Cognitive Linguistics ◽

10.1075/arcl.2.08ost ◽

2004 ◽

Vol 2 ◽

pp. 235-259

Author(s):

Ulrike Oster

Keyword(s):

Word Formation ◽

Semantic Relations ◽

Compound Words ◽

Complex Issue ◽

Relational Schemas ◽

Subject Specific

Compounding is a major word-formation procedure in many languages, and even more so in specialised terminology. The classification of these compound words is a very complex issue due to the large number of semantic relations that can hold between the constituents of the compound. Typologies for different special languages differ considerably from each other and usually combine rather general with highly subject-specific relations. This paper presents a proposal for a two-step classification of these intraterm relations. First, a set of basic relational schemas is worked out, whose purpose is to serve as a tool for the interpretation of semantic relations. These schemas, which are potentially applicable to any domain, are then used to classify the actual compound terms that appear in a corpus of texts from a specific technical field.

Download Full-text

FAST ALGORITHMIC NIELSEN–THURSTON CLASSIFICATION OF FOUR-STRAND BRAIDS

Journal of Knot Theory and Its Ramifications ◽

10.1142/s0218216511009959 ◽

2012 ◽

Vol 21 (05) ◽

pp. 1250043 ◽

Cited By ~ 5

Author(s):

MATTHIEU CALVEZ ◽

BERT WIEST

Keyword(s):

Conjugacy Class ◽

Polynomial Time ◽

Word Length ◽

Conjugacy Problem ◽

Time Solution

We give an algorithm which decides the Nielsen–Thurston type of a given four-strand braid. The complexity of our algorithm is quadratic with respect to word length. The proof of its validity is based on a result which states that for a reducible 4-braid which is as short as possible within its conjugacy class (short in the sense of Garside), reducing curves surrounding three punctures must be round or almost round. As an application, we give a polynomial time solution to the conjugacy problem for non-pseudo-Anosov four-strand braids.

Download Full-text

Approaching word length distribution via level spectra

Physica A Statistical Mechanics and its Applications ◽

10.1016/j.physa.2017.04.045 ◽

2017 ◽

Vol 481 ◽

pp. 167-175

Author(s):

Weibing Deng ◽

Mauricio Porto Pato

Keyword(s):

Word Length ◽

Length Distribution

Download Full-text

Word-Length Distribution in Present-Day Lower Sorbian Newspaper Texts

Contributions to the Science of Text and Language - Text, Speech and Language Technology ◽

10.1007/1-4020-4068-7_16 ◽

2006 ◽

pp. 319-327

Author(s):

Andrew Wilson

Keyword(s):

Word Length ◽

Length Distribution

Download Full-text

Towards a theory of word length distribution*

Journal of Quantitative Linguistics ◽

10.1080/09296179408590003 ◽

1994 ◽

Vol 1 (1) ◽

pp. 98-106 ◽

Cited By ~ 47

Author(s):

Gejza Wimmer ◽

Reinhard Köhler ◽

Rüdiger Grotjahn ◽

Gabriel Altmann

Keyword(s):

Word Length ◽

Length Distribution

Download Full-text

An effective algebraic detection of the Nielsen–Thurston classification of mapping classes

Journal of Topology and Analysis ◽

10.1142/s1793525315500016 ◽

2014 ◽

Vol 07 (01) ◽

pp. 1-21 ◽

Cited By ~ 1

Author(s):

Thomas Koberda ◽

Johanna Mangahas

Keyword(s):

Mapping Class Group ◽

Word Length ◽

Class Group ◽

Mapping Class ◽

Finite Cover ◽

Generating Set ◽

Reduction System ◽

Conjugacy Separability ◽

Mapping Classes

In this paper, we propose two algorithms for determining the Nielsen–Thurston classification of a mapping class ψ on a surface S. We start with a finite generating set X for the mapping class group and a word ψ in 〈X〉. We show that if ψ represents a reducible mapping class in Mod (S), then ψ admits a canonical reduction system whose total length is exponential in the word length of ψ. We use this fact to find the canonical reduction system of ψ. We also prove an effective conjugacy separability result for π1(S) which allows us to lift the action of ψ to a finite cover [Formula: see text] of S whose degree depends computably on the word length of ψ, and to use the homology action of ψ on [Formula: see text] to determine the Nielsen–Thurston classification of ψ.

Download Full-text