agglutinative languages
Recently Published Documents


TOTAL DOCUMENTS

80
(FIVE YEARS 29)

H-INDEX

7
(FIVE YEARS 1)

Author(s):  
Ahrii Kim ◽  
Jinhyun Kim

SacreBLEU, by incorporating a text normalizing step in the pipeline, has been well-received as an automatic evaluation metric in recent years. With agglutinative languages such as Korean, however, the metric cannot provide a conceivable result without the help of customized pre-tokenization. In this regard, this paper endeavors to examine the influence of diversified pre-tokenization schemes –word, morpheme, character, and subword– on the aforementioned metric by performing a meta-evaluation with manually-constructed into-Korean human evaluation data. Our empirical study demonstrates that the correlation of SacreBLEU (to human judgment) fluctuates consistently by the token type. The reliability of the metric even deteriorates due to some tokenization, and MeCab is not an exception. Guiding through the proper usage of tokenizer for each metric, we stress the significance of a character level and the insignificance of a Jamo level in MT evaluation.


Open Mind ◽  
2022 ◽  
pp. 1-25
Author(s):  
Michael Hahn ◽  
Rebecca Mathew ◽  
Judith Degen

Abstract The ordering of morphemes in a word displays well-documented regularities across languages. Previous work has explained these in terms of notions such as semantic scope, relevance, and productivity. Here, we test a recently formulated processing theory of the ordering of linguistic units, the efficient tradeoff hypothesis (Hahn et al., 2021). The claim of the theory is that morpheme ordering can partly be explained by the optimization of a tradeoff between memory and surprisal. This claim has received initial empirical support from two languages. In this work, we test this idea more extensively using data from four additional agglutinative languages with significant amounts of morphology, and by considering nouns in addition to verbs. We find that the efficient tradeoff hypothesis predicts ordering in most cases with high accuracy, and accounts for cross-linguistic regularities in noun and verb inflection. Our work adds to a growing body of work suggesting that many ordering properties of language arise from a pressure for efficient language processing.


2021 ◽  
Vol 12 (5-2021) ◽  
pp. 57-66
Author(s):  
Dzavdet Sh. Suleimanov ◽  
◽  
Alexander Ya. Fridman ◽  
Rinat A. Gilmullin ◽  
Boris A. Kulik ◽  
...  

System analysis of the problem of modeling a natural language (NL) made it possible to formulate the root cause of the low efficiency of modern means for accumulating and processing knowledge in such languages. This is the complexity of intellectualization for such tools, which are created on the basis of primitive artificial programming languages that practically represent a subset of flectional analytical languages or artificial constructions based on them. To reduce the severity of the identified problem, it is proposed to build NL modeling systems on the basis of technological tools for verbalization and recognition of sense. These tools consist of semiotic models of NL lexical and grammatical means. This approach seems to be especially promising for agglutinative languages; it is supposed to be implemented on the example of the Tatar language.


Languages ◽  
2021 ◽  
Vol 6 (4) ◽  
pp. 207
Author(s):  
Laura Colantoni ◽  
Liliana Sánchez

The mapping of information structure onto morphology or intonation varies greatly crosslinguistically. Agglutinative languages, like Inuktitut or Quechua, have a rich morphological layer onto which discourse-level features are mapped but a limited use of intonation. Instead, English or Spanish lack grammaticalized morphemes that convey discourse-level information but use intonation to a relatively large extent. We propose that the difference found in these two pairs of languages follows from a division of labor across language modules, such that two extreme values of the continuum of possible interactions across modules are available as well as combinations of morphological and intonational markers. At one extreme, in languages such as Inuktitut and Quechua, a rich set of morphemes with scope over constituents convey sentence-level and discourse-level distinctions, making the alignment of intonational patterns and information structure apparently redundant. At the other extreme, as in English and to some extent Spanish, a series of consistent alignments of PF and syntactic structure are required to distinguish sentence types and to determine the information value of a constituent. This results in a complementary distribution of morphology and intonation in these languages. In contact situations, overlap between patterns of module interaction are attested. Evidence from Quechua–Spanish and Inuktitut–English bilinguals supports a bidirectionality of crosslinguistic influence; intonational patterns emerge in non-intonational languages to distinguish sentence types, whereas morphemes or discourse particles emerge in intonational languages to mark discourse-level features.


2021 ◽  
pp. e021040
Author(s):  
Venera Nafikovna Khisamova ◽  
Alina Airatovna Khaliullina ◽  
Rafik Rashitovich Magdeev

Comparing two languages may lead to the results, which would help to understand both of the languages being under the analysis deeper, and it may be helpful to see the language from a new side. The results which are achieved by the comparative method, showing common features of the languages being compared, make the process of studying a foreign language easier. As the Tatar speaking audience does not commonly study Japanese language, studying it through the mother language may make the process of studying and acquisition less difficult. Being the representatives of agglutinative languages, the Tatar and the Japanese have common features that have to be studied by the comparative linguistics. This article deals with the constructions, which consist of verbal adverb and modal verb in Tatar and Japanese languages whose meaning is to give/to receive. Such constructions are analyzed and compared in terms of their grammatical form, semantics and considered through the lens of cultural linguistics.


2021 ◽  
Author(s):  
Ruben van de Vijver ◽  
Emmanuel Uwambayinema ◽  
Yu-Ying Chuang

How do speakers comprehend and produce complex words? In the theory ofthe Discriminative Lexicon this is hypothesized to be the results of mapping the phonology of whole word forms onto their semantics and vice versa, without recourse to morphemes. This raises the question whether this hypothesis also holds true in highly agglutinative languages, which are oǒten seen to exemplify the compositional nature of morphology. On the one hand, one could expect that the hypothesis for agglutinative languages is correct, since it remains unclear whether speakers are able to isolate the morphemes they need to achieve this. On the other hand, agglutinative languages have so many different words that it is not obvious how speakers can use their knowledge of words to comprehend and produce them.In this paper, we investigate comprehension and production of verbs in Kinyarwanda,an agglutinative Bantu language, by means of computational modeling within the theDiscriminative Lexicon, a theory of the mental lexicon, which is grounded in word andparadigm morphology, distributional semantics, error-driven learning, and uses insightsof psycholinguistic theories, and is implemented mathematically and computationallyas a shallow, two-layered network.In order to do this, we compiled a data set of 11528 verb forms and annotated for eachverb form its meaning and grammatical functions, and, additionally, we used our dataset to extract 573 verbs that are present in our full data set and for which meanings ofverbs are based on word embeddings. In order to assess comprehension and production of Kinyarwanda verbs, we fed both data sets into the Linear Discriminative Learningalgorithm, a two-layered, fully connected network. One layer represent the phonological form and the layer represents meaning. Comprehension is modeled as a mapping from phonology to meaning and production is modeled as a mapping from meaning to phonology. Both comprehension and production is learned with high accuracy in all data and in held-out data, both for the full data set, with manually annotated semantic features, and for the data set with meanings derived from word embeddings.Our findings provide support for the various hypotheses of the Discriminative Lexicon:Words are stored as wholes, meanings are a result of the distribution of words in utterances, comprehension and production can be successfully modeled from mappings from form to meaning and vice versa, which can be modeled in a shallow two-layered network, and these mappings are learned in by minimizing errors.


Author(s):  
Umsalimat Bagautdinova Abdullabekova

The article examines the functioning of the polypredicative construction in the Kumyk language. The notion of a "polypredicative sentence" was introduced by the Novosibirsk syntactic school. Turkic languages are not characterized by properly complex sentences with two formally independent finite parts connected by an analytical form. Case affixes and postpositions form not finite verb forms, but infinite verb forms. Such constructions in agglutinative languages are the most frequent.


Sign in / Sign up

Export Citation Format

Share Document