scholarly journals English Loanwords in Mongolian Usage

Glottometrics ◽  
2021 ◽  
pp. 27-41
Author(s):  
Minna Bao ◽  
Brintag Saheya ◽  
Dabhurbayar Huang

Many authors have examined the influence of loanwords in languages using statistical methods. However, English loanwords in Mongolian are rarely studied in quantitative linguistics. The results of the present study show that English loanwords in Mongolian share the universal feature of other tested languages, as their frequency distribution abides by Zipf’s Law. In addition, we define and test nine English loanword models depending on borrowing method and parts of speech, and find that the results can be described using a power function.

2016 ◽  
Vol 36 (331) ◽  
pp. 10-13
Author(s):  
Karolina Piaseckiene

Abstract The goal of this research is to explore sentence structures expressed by parts of speech. Due to a small amount of data, a problem of sparse data has arisen, which was solved by recording the annotated sentences and considering a “framework” of a sentence made up from a verb and a noun, which was conditionally called a code. The code of a sentence is created by changing each word of a sentence by a symbol (letter or number) that encodes one or other property of that word as a constituent of the sentence. Zipf’s law describes sentences, encoded like that, rather well. If we ‘learn’ well to identify and analyze (annotate, translate, etc.) sentences of the simplest structure, we can automatically process quite a large part of text sentences. It is possible to identify at least 17% of sentences consisting of the simplest structure.


Entropy ◽  
2020 ◽  
Vol 22 (2) ◽  
pp. 179 ◽  
Author(s):  
Álvaro Corral ◽  
Montserrat García del Muro

The word-frequency distribution provides the fundamental building blocks that generate discourse in natural language. It is well known, from empirical evidence, that the word-frequency distribution of almost any text is described by Zipf’s law, at least approximately. Following Stephens and Bialek (2010), we interpret the frequency of any word as arising from the interaction potentials between its constituent letters. Indeed, Jaynes’ maximum-entropy principle, with the constrains given by every empirical two-letter marginal distribution, leads to a Boltzmann distribution for word probabilities, with an energy-like function given by the sum of the all-to-all pairwise (two-letter) potentials. The so-called improved iterative-scaling algorithm allows us finding the potentials from the empirical two-letter marginals. We considerably extend Stephens and Bialek’s results, applying this formalism to words with length of up to six letters from the English subset of the recently created Standardized Project Gutenberg Corpus. We find that the model is able to reproduce Zipf’s law, but with some limitations: the general Zipf’s power-law regime is obtained, but the probability of individual words shows considerable scattering. In this way, a pure statistical-physics framework is used to describe the probabilities of words. As a by-product, we find that both the empirical two-letter marginal distributions and the interaction-potential distributions follow well-defined statistical laws.


2021 ◽  
Vol 290 ◽  
pp. 02016
Author(s):  
Yan Xu ◽  
Qianqian Tang ◽  
Ying Yuan

As an intangible cultural heritage, “Huar” from Northwest China is a folk song created and shared by many ethnic groups in Gansu, Qinghai and Ningxia provinces. It is a precious card of Chinese national culture. However, current research on “Huar” is mainly based on qualitative methods. This paper uses statistical methods to study the lyrics of “Huar". First, the word frequency of the lyrics of “Huar” is analysed statistically. Then, the lyrics of Hezhou Huar and Taomin Huar are compared and analysed from the perspective of quantitative linguistics ("Huar” mainly includes Hezhou Huar and Taomin Huar). By comparing three quantitative indicators, it is concluded that the lexical richness of Taomin Huar is higher than that of Hezhou Huar. Based on the frequency distribution of parts of speech, the similarities and differences of the use of parts of speech between Hezhou Huar and Taomin Huar are found. This paper uses statistical methods to analyse “Huar", which has certain research value and social value.


Glottotheory ◽  
2019 ◽  
Vol 9 (2) ◽  
pp. 113-129
Author(s):  
Victor Davis

Abstract Heap’s Law https://dl.acm.org/citation.cfm?id=539986 Heaps, H S 1978 Information Retrieval: Computational and Theoretical Aspects (Academic Press). states that in a large enough text corpus, the number of types as a function of tokens grows as N = K{M^\beta } for some free parameters K, \beta . Much has been written http://iopscience.iop.org/article/10.1088/1367-2630/15/9/093033 Font-Clos, Francesc 2013 A scaling law beyond Zipf’s law and its relation to Heaps’ law (New Journal of Physics 15 093033)., http://iopscience.iop.org/article/10.1088/1367-2630/11/12/123015 Bernhardsson S, da Rocha L E C and Minnhagen P 2009 The meta book and size-dependent properties of written language (New Journal of Physics 11 123015)., http://iopscience.iop.org/article/10.1088/1742-5468/2011/07/P07013 Bernhardsson S, Ki Baek and Minnhagen 2011 A paradoxical property of the monkey book (Journal of Statistical Mechanics: Theory and Experiment, Volume 2011)., http://milicka.cz/kestazeni/type-token_relation.pdf Milička, Jiří 2009 Type-token & Hapax-token Relation: A Combinatorial Model (Glottotheory. International Journal of Theoretical Linguistics 2 (1), 99–110)., https://www.nature.com/articles/srep00943 Petersen, Alexander 2012 Languages cool as they expand: Allometric scaling and the decreasing need for new words (Scientific Reports volume 2, Article number: 943). about how this result and various generalizations can be derived from Zipf’s Law. http://dx.doi.org/10.1037/h0052442 Zipf, George 1949 Human behavior and the principle of least effort (Reading: Addison-Wesley). Here we derive from first principles a completely novel expression of the type-token curve and prove its superior accuracy on real text. This expression naturally generalizes to equally accurate estimates for counting hapaxes and higher n-legomena.


2021 ◽  
Vol 3 (1) ◽  
Author(s):  
Giordano De Marzo ◽  
Andrea Gabrielli ◽  
Andrea Zaccaria ◽  
Luciano Pietronero

2021 ◽  
Vol 7 (s3) ◽  
Author(s):  
Matthew Stave ◽  
Ludger Paschen ◽  
François Pellegrino ◽  
Frank Seifart

Abstract Zipf’s Law of Abbreviation and Menzerath’s Law both make predictions about the length of linguistic units, based on corpus frequency and the length of the carrier unit. Each contributes to the efficiency of languages: for Zipf, units are more likely to be reduced when they are highly predictable, due to their frequency; for Menzerath, units are more likely to be reduced when there are more sub-units to contribute to the structural information of the carrier unit. However, it remains unclear how the two laws work together in determining unit length at a given level of linguistic structure. We examine this question regarding the length of morphemes in spoken corpora of nine typologically diverse languages drawn from the DoReCo corpus, showing that Zipf’s Law is a stronger predictor, but that the two laws interact with one another. We also explore how this is affected by specific typological characteristics, such as morphological complexity.


Sign in / Sign up

Export Citation Format

Share Document